8089 Exploration
At the beginning of April, I started Aprilcot and continued my
exploration of the 8089 IOP. I had written a lot of code for
asm89, my 8089 assembler, but I hadn't done a lot of
testing. It was time to change that.
Getting Oriented
The Intel 8089 IO Processor is configured through a series of "control
blocks" that exist somewhere in memory. Where? Uh... I figured this out
last year and I did not write it down. 😅 How do we figure it out? Let's
look at the 8086 Family User's
Manual.
So the first place to look is the top of ROM space, where the reset
vector lives. Right next to that is the "system configuration pointer",
which defines the bit width of the system and points to a second thing
that points to the channel control block, which points to the individual
task blocks and ultimately the programs being executed. So let's pop
open DEBUG and see what's there.
Oh, huh. The offset and segment are all zeroes. Well, fat chance of that
still being there after initialization is finished. That's interrupt
vector space. At this point things were beginning to feel familiar.
What else can we do? Well, we also know from this diagram that this is
only read after the first "CA" ("Channel Attention"). CA is an input
line to the 8089 that tells it to check its control block. This is
controlled directly by the 8086 on I/O port 70h. I can't find it on
the schematic right now, but I'm pretty sure the data is irrelevant and
the port acts merely as a strobe. So what we can do is search the ROM
for an OUT 70h, AL/AX instruction and we should find it.
First I assemble both versions of the instruction to get the bytes for
it. E6 70 and E7 70. Then I search for those bytes in ROM space from
FC00:0 to FC00:4000 (I actually did more because I didn't want to do
the math but whatever). I found four locations.
Disassembling the last one, we can see our OUT 70,AL preceded directly
by three words written starting at 0000:0500. One of them is just
copying the code segment register, the second is probably an offset, and
the third looks like a control word. Bingo! That's our channel control
block.
I'll save you the rest of searching. I didn't actually find an
initialization routine, in the ROM or the BIOS. That's another weird
thing about the Apricot - the on board ROM is mostly just for
initializing devices. It loads the BIOS from floppy disk into RAM on
boot. The Apricot even has a weird disk format that reserves space for
the BIOS. I'm guessing the pointer to the CCB is set up somewhere a bit
further away from the actual CA strobe.
The Apricot had 256K originally and mine is a 512K model. The memory map
actually reserves quite a lot of space for system data - about 110K. But
that does include all of the character graphics data since that's also
not in ROM. Unlike the PC, RAM space extends all the way up to E0000h,
so instead of a 640K limit you have 896K.
And then I looked back at the Apricot Technical Reference and found
they had documented it the whole time. 😆
Testing, Testing...
I got some 8089 code to compile. :D
This is a glitch I did the previous year, but at that time I had hand
assembled it and typed the bytes manually into a DEBUG session. This
year I'd completed enough of an 8089 assembler that I assembled the code
into an OBJ file and linked it with a C file compiled with Turbo C. What
does that code look like?
_screen_fuzz:
lpd ga, [pp].4 ; pp.4 is the input buffer address
movi ix, 0
loop: inc [ga+ix+]
andi ix, 07ffh
jmp loop
It just loads an address from the parameter block (assuming it's
2KB-aligned), and increments the bytes in that block one at a time,
looping infinitely. And the block of memory just happens to be the
character pointers/attribute memory at F000:0000. This only does half
of it so you can see the difference. But as you can see, the 8089 frobs
the memory while the rest of the system still runs.
There's a trick here, though. This is running with "bus load limit"
enabled. This waits 128 cycles between executing instructions. The
Apricot was designed so that the 8089 can hog the bus, so without that,
the 8086 would never be able to execute any more code and we'd hang.
One interesting thing about the 8089 is that it treats memory space and
I/O space as two more or less equivalent address spaces. It has no
problem executing code from I/O space (what it calls "local" space). And
if it's doing that, it's not using the main memory bus and it actually
can run simultaneously with the 8086. Unfortunately, the Apricot doesn't
have a separate I/O bus so that's not possible.
Need For Speed
I've been working on a memcpy benchmark to see how fast the 8089 is at
moving bytes, and I have some interesting results! This is copying 32K
between two buffers.
The first three here are CPU copies. "memcpy" is the built-in provided
by Borland Turbo C 2.0. It's a pretty standard REPNE STOSW
implementation, which is about as fast as you can go with the 8086 (I
think?). The next two are 8- and 16-bit versions of the thing you'd see
in a textbook. They're very slow! Don't do that!
The last four are executing 8089 programs. First with mem-mem MOV
instructions with bytes and words. And second with the XFER instruction.
Time measurement is "ticks", which is DOS hundredths of a second. Ignore
the decimal point to pretend it's milliseconds (though I think DOS is
using the 50Hz timer which would actually only have a resolution of
20ms). Each test is run ten times and the result is an average of those
ten runs.
But these results don't seem quite right since XFER should be
significantly faster than MOV. And by my understanding, an 8089 MOV
should also be slower than REPNE STOSW. I kind of wonder if the 8089
is fouling the timer interrupt somehow. But it does seem to be working
how you'd expect in that moving a word at a time is twice as fast as
moving bytes. Or maybe the manual is full of lies and the 8086 mem-mem
MOV is actually blazing fast.
I needed to get some more reliable timing data. And I had just the
shenanigans for it - toggle a bit on the parallel port and time it with
and oscilloscope. :D
Now that's some SCIENCE.
I just added a loop at the end of the program that runs each copy
routine in sequence, toggling a parallel port bit at the beginning and
the end. And to keep them separate, I used a different bit (and thus
parallel port pin) for each test.
So in order, first we have the REPNE STOSW memcopy at 78ms. Dead on!

I skipped the naive ones because they're not really that interesting.
Next is 8-bit mem-mem MOV on the 8089. It's actually pretty slow at
560ms. Then the 16-bit version is about twice as fast at 295ms.
And finally, the 8-bit XFER at 109ms and the 16-bit at 55ms.
Which means the 8089 is about 50% faster at copying words than the 8086.
Almost 600KB/sec! That also means the timing for the XFER tests were
actually accurate. So if you were copying the Apricot's entire screen
buffer (40KB) with the 8089, you could manage almost 15 frames per
second. 😆
There are some other interesting conclusions here. First is that the
8086 does get to run concurrently with the 8089, just at a much reduced
rate due to bus contention. The measured ticks versus actual suggests it
was running maybe 10% of the time? That suggests that using the 8089 a
lot will screw up your DOS clock. 😄 Second is that XFER seems to have a
low enough priority that timer interrupts still work.
Scanline Shenanigans?
I did some calculations on memory bandwidth versus screen refresh rate
and even with the 8089’s help it doesn’t look like it’s fast enough to
do scanline effects.
So I experimented with drawing fewer pixels. The horizontal resolution
is basically fixed since that’s based on the non-configurable dot clock,
but the 6845 CRTC has some things you can do to adjust the vertical
resolution. The Apricot runs in interlaced mode (72Hz, incidentally),
but you can switch it into a mode where it just draws every line twice.
And if you cut the number of lines per character cell in half, you get a
chunky half height mode.
Unfortunately, the other thing you can’t change is the character data
stride. So this wastes half your character RAM on lines it’s not
displaying.
But anyway, here's an attempt.
The screen memory (which points to characters) is being overwritten
several times within the 16 scanlines of the character cell. It
alternately writes white and black characters in a tight 8089 XFER
loop.
It’s not fast enough to do that in one scanline (about 3x too slow in my
estimation), so it winds up with this kind of barber pole effect.
asm89
I did publish asm89, my 8089 assembler used here, even
though it is not complete. See the page for more details.