Z80 has a nice (for programmers) instruction to copy a block of memory (and similar types of operations), and it’s easy to remember how to use the magic that is LDIR, which stands for load, increment, repeat. Think of BC standing for byte count, DE is the start of the word destination, leaving HL which is typically used as an address pointer is the source.
As mentioned once before, I wrote a third-height 50 FPS (UK television refresh speed, also UK mains frequency) smooth scrolling text routine on a 3.5 MHz Sinclair ZX Spectrum, moving a message across the lower third of the screen. LDIR was an obvious first choice. It was the wrong one.
LDIR takes 21 “ticks” per byte to copy (16 on last byte), the memory block was 2K (2048 bytes), so around 43,000 ticks. Though 3.5 MHz means 70,000 ticks per frame, there are by coincidence around 43,000 ticks before the screen starts drawing the lower third of the ZX Spectrum display. We need more time in order to put new graphics on screen.
We gain time by reducing the number of checks the Z80 does to see if it has reached the end. The Z80 also has an instruction LDI (load and increment) which is what we use. It takes 16 ticks for one byte (sound familiar) but doesn’t repeat. We could just string 2,048 of these together, but with the LDI instruction taking 2 bytes of memory, it’s memory hungry (the classic speed versus size argument, again), so we don’t do that either.
If we use 16 LDI instructions in a row we gain 80 ticks, though to loop around these (a conditional jump instruction) will take us 12 ticks rather than the 5 ticks LDIR uses. Net gain 73 ticks per 16 bytes. Do that for 2,048 bytes (128 sets of 16) we save in excess of 9,300 ticks (about 20%), and we have time to work out what to draw after we’ve shifted the screen memory sideways.
Good old 80’s high performance computer programming – you wrote all the actual instructions for the processor yourself, you took into account the speed of every instruction, the size of every instruction, and you also had to know the hardware of the machine incredibly well so you could decide the optimum trade-offs. You also learned to save all your work before trying ANYTHING out.