Apricot Bits

8089 Exploration

At the beginning of April, I started Aprilcot and continued my exploration of the 8089 IOP. I had written a lot of code for asm89, my 8089 assembler, but I hadn't done a lot of testing. It was time to change that.

Getting Oriented

The Intel 8089 IO Processor is configured through a series of "control blocks" that exist somewhere in memory. Where? Uh... I figured this out last year and I did not write it down. 😅 How do we figure it out? Let's look at the 8086 Family User's Manual.

So the first place to look is the top of ROM space, where the reset vector lives. Right next to that is the "system configuration pointer", which defines the bit width of the system and points to a second thing that points to the channel control block, which points to the individual task blocks and ultimately the programs being executed. So let's pop open DEBUG and see what's there.

Oh, huh. The offset and segment are all zeroes. Well, fat chance of that still being there after initialization is finished. That's interrupt vector space. At this point things were beginning to feel familiar.

What else can we do? Well, we also know from this diagram that this is only read after the first "CA" ("Channel Attention"). CA is an input line to the 8089 that tells it to check its control block. This is controlled directly by the 8086 on I/O port 70h. I can't find it on the schematic right now, but I'm pretty sure the data is irrelevant and the port acts merely as a strobe. So what we can do is search the ROM for an OUT 70h, AL/AX instruction and we should find it.

First I assemble both versions of the instruction to get the bytes for it. E6 70 and E7 70. Then I search for those bytes in ROM space from FC00:0 to FC00:4000 (I actually did more because I didn't want to do the math but whatever). I found four locations.

Disassembling the last one, we can see our OUT 70,AL preceded directly by three words written starting at 0000:0500. One of them is just copying the code segment register, the second is probably an offset, and the third looks like a control word. Bingo! That's our channel control block.

I'll save you the rest of searching. I didn't actually find an initialization routine, in the ROM or the BIOS. That's another weird thing about the Apricot - the on board ROM is mostly just for initializing devices. It loads the BIOS from floppy disk into RAM on boot. The Apricot even has a weird disk format that reserves space for the BIOS. I'm guessing the pointer to the CCB is set up somewhere a bit further away from the actual CA strobe.

The Apricot had 256K originally and mine is a 512K model. The memory map actually reserves quite a lot of space for system data - about 110K. But that does include all of the character graphics data since that's also not in ROM. Unlike the PC, RAM space extends all the way up to E0000h, so instead of a 640K limit you have 896K.

And then I looked back at the Apricot Technical Reference and found they had documented it the whole time. 😆

Testing, Testing...

I got some 8089 code to compile. :D

This is a glitch I did the previous year, but at that time I had hand assembled it and typed the bytes manually into a DEBUG session. This year I'd completed enough of an 8089 assembler that I assembled the code into an OBJ file and linked it with a C file compiled with Turbo C. What does that code look like?

_screen_fuzz:
        lpd ga, [pp].4  ; pp.4 is the input buffer address
        movi ix, 0
loop:   inc [ga+ix+]
        andi ix, 07ffh
        jmp loop

It just loads an address from the parameter block (assuming it's 2KB-aligned), and increments the bytes in that block one at a time, looping infinitely. And the block of memory just happens to be the character pointers/attribute memory at F000:0000. This only does half of it so you can see the difference. But as you can see, the 8089 frobs the memory while the rest of the system still runs.

There's a trick here, though. This is running with "bus load limit" enabled. This waits 128 cycles between executing instructions. The Apricot was designed so that the 8089 can hog the bus, so without that, the 8086 would never be able to execute any more code and we'd hang.

One interesting thing about the 8089 is that it treats memory space and I/O space as two more or less equivalent address spaces. It has no problem executing code from I/O space (what it calls "local" space). And if it's doing that, it's not using the main memory bus and it actually can run simultaneously with the 8086. Unfortunately, the Apricot doesn't have a separate I/O bus so that's not possible.

Need For Speed

I've been working on a memcpy benchmark to see how fast the 8089 is at moving bytes, and I have some interesting results! This is copying 32K between two buffers.

The first three here are CPU copies. "memcpy" is the built-in provided by Borland Turbo C 2.0. It's a pretty standard REPNE STOSW implementation, which is about as fast as you can go with the 8086 (I think?). The next two are 8- and 16-bit versions of the thing you'd see in a textbook. They're very slow! Don't do that!

The last four are executing 8089 programs. First with mem-mem MOV instructions with bytes and words. And second with the XFER instruction.

Time measurement is "ticks", which is DOS hundredths of a second. Ignore the decimal point to pretend it's milliseconds (though I think DOS is using the 50Hz timer which would actually only have a resolution of 20ms). Each test is run ten times and the result is an average of those ten runs.

But these results don't seem quite right since XFER should be significantly faster than MOV. And by my understanding, an 8089 MOV should also be slower than REPNE STOSW. I kind of wonder if the 8089 is fouling the timer interrupt somehow. But it does seem to be working how you'd expect in that moving a word at a time is twice as fast as moving bytes. Or maybe the manual is full of lies and the 8086 mem-mem MOV is actually blazing fast.

I needed to get some more reliable timing data. And I had just the shenanigans for it - toggle a bit on the parallel port and time it with and oscilloscope. :D

Now that's some SCIENCE.

I just added a loop at the end of the program that runs each copy routine in sequence, toggling a parallel port bit at the beginning and the end. And to keep them separate, I used a different bit (and thus parallel port pin) for each test.

So in order, first we have the REPNE STOSW memcopy at 78ms. Dead on!

I skipped the naive ones because they're not really that interesting.

Next is 8-bit mem-mem MOV on the 8089. It's actually pretty slow at 560ms. Then the 16-bit version is about twice as fast at 295ms.

And finally, the 8-bit XFER at 109ms and the 16-bit at 55ms.

Which means the 8089 is about 50% faster at copying words than the 8086. Almost 600KB/sec! That also means the timing for the XFER tests were actually accurate. So if you were copying the Apricot's entire screen buffer (40KB) with the 8089, you could manage almost 15 frames per second. 😆

There are some other interesting conclusions here. First is that the 8086 does get to run concurrently with the 8089, just at a much reduced rate due to bus contention. The measured ticks versus actual suggests it was running maybe 10% of the time? That suggests that using the 8089 a lot will screw up your DOS clock. 😄 Second is that XFER seems to have a low enough priority that timer interrupts still work.

Scanline Shenanigans?

I did some calculations on memory bandwidth versus screen refresh rate and even with the 8089’s help it doesn’t look like it’s fast enough to do scanline effects.

So I experimented with drawing fewer pixels. The horizontal resolution is basically fixed since that’s based on the non-configurable dot clock, but the 6845 CRTC has some things you can do to adjust the vertical resolution. The Apricot runs in interlaced mode (72Hz, incidentally), but you can switch it into a mode where it just draws every line twice. And if you cut the number of lines per character cell in half, you get a chunky half height mode.

Unfortunately, the other thing you can’t change is the character data stride. So this wastes half your character RAM on lines it’s not displaying.

But anyway, here's an attempt.

The screen memory (which points to characters) is being overwritten several times within the 16 scanlines of the character cell. It alternately writes white and black characters in a tight 8089 XFER loop.

It’s not fast enough to do that in one scanline (about 3x too slow in my estimation), so it winds up with this kind of barber pole effect.

asm89

I did publish asm89, my 8089 assembler used here, even though it is not complete. See the page for more details.