Table of Contents
|
Introduction
After you've worked the bugs out, you may if you wish make your program smaller and run faster. This section is dedicated to just that purpose. Although there are a lot of things you can do, here are some general things that can help:
Code replacements
xor a vs. ld a,0
A simple way to set a to zero, saving 1 byte and 3 T-states. Don't use this if you want to preserve flags.
or a vs. cp 0
If you want to compare for equality, sign or parity, you can save 1 byte and 3 T-states. Also always resets C flag.
dec a vs. cp 1
If you can, dec a is a smaller and faster way to check if a or any other register is 1. 8-bit increments and decrements will effect both the z flag and sign flag, among other things.
inc a vs. cp 255
Again, this is a byte smaller and 3 t-states faster if you use inc a. It does not preserve a, but you can often do this and it works on all of the main 8-bit registers and (hl).
adc a,0 vs. jr nc,$+3 \ inc a
adc a,0 is 7 t-states and 2 bytes, whereas the latter is 3 bytes and 11 t-states if the c flag is set, 12 if it is reset. Save a byte and 4 to 5 cycles !
ccf \ adc a,0 vs. jr c,$+3 \ inc a
They are the same size, but the former is always 11 t-states whereas the latter is either 11 or 12 depending on the c flag.
sbc a,0 vs. jr nc,$+3 \ dec a
See adc a,0 vs. jr nc,$+3 \ inc a.
ccf \ sbc a,0 vs. jr c,$+3 \ dec a
See ccf \ adc a,0 vs. jr c,$+3 \ inc a.
scf \ ccf
This is used to reset the c flag, but there are many other ways to do that. This is 8 t-states, 2 bytes, but the following are 1 byte, 4 t-states:
or a ;z flag is set if a = 0
and a ;z flag is set if a=0
xor a ;always sets the z flag, sets A=0
cp a ;always sets the z flag.
sub a ;always sets the z flag, sets A=0
As well, the following are two bytes, but 7 t-states. You should not use these :
sub 0
add a,0
cp 0
In each of these cases, other flags are also modified.
Cursor/pen
ld hl,$0100 ;$01 is the row, and $00 is the column
ld (curRow),hl
ld (penCol),hl
This is much more efficient if you're going to change both cursor/pen positions. Because curCol is right after curRow (and penRow is right after penCol), you can use a 16-bit register to load both at once.
PutS
Something you may or may not know, it is that PutS and any other variation modifies HL to point to the byte after the null-term. This is very useful, especially when displaying multiple items to different locations on the screen without having to load string after string into hl.
ld hl,txtTest
bcall(_PutS)
ld de,$0100
ld (curRow),de
ld hl,txtTest2
bcall(_PutS)
;...
txtTest:
.db "Test",0
txtTest2:
.db "Test2",0
can be
ld hl,txtTest
bcall(_PutS)
ld de,$0100
ld (curRow),de
;we don't need "ld hl,txtTest2", because hl already points to txtTest2
bcall(_PutS)
;...
txtTest:
.db "Test",0
;txtTest2 ;Optional, doesn't affect speed or size here
.db "Test2",0
It also allows you to display strings through a loop say, for a high score board.
high:
ld b,8
ld de,0
ld (curRow),de
ld hl,txtHigh
highloop:
push hl
push de
ld a,(hl)
ld h,0
ld l,a
bcall(_DispHL)
pop de
pop hl
inc hl
bcall(_PutS)
inc e
ld d,0
ld (curRow),de
djnz highloop
bcall(_GetKey)
ret
txtHigh:
.db 20,"HIGH SCORE!",0
txt2nd:
.db 19,"HIGH SCORE!",0
txt3rd:
.db 18,"HIGH SCORE!",0
txt4th:
.db 17,"HIGH SCORE!",0
txt5th:
.db 16,"HIGH SCORE!",0
txt6th:
.db 15,"HIGH SCORE!",0
txt7th:
.db 14,"HIGH SCORE!",0
txt8th:
.db 13,"HIGH SCORE!",0
Optimised Code Snippets
Test For 0 (8-bits)
For any 8-bit register, you can use the following:
inc [reg8]
dec [reg8]
This will set the z flag if the register is 0, else nz. It is 8 t-states, 2 bytes, and preserves registers.
Set A=0
ld a,0 is 2 bytes, 7 t-states, the following are 1-byte and 4 t-states:
xor a
sub a
Note that these will change flags, but usually that is okay.
16-bit CP
To compare HL to another 16-bit register, you can do the following:
or a
sbc hl,[reg16]
add hl,[reg16]
The or a is simply to reset the c flag, so if the c flag is reset at this point, don't include that and save a byte plus 4 t-states. The speed here is 4+15+11 = 30 t-states and it is 4 bytes total.
Conditionally Set or Reset A
In some cases, you need to set all of the bits in A or reset all of them based on a flag. If you are using the c flag:
sbc a,a
1 byte, 4 t-states is all it takes. It also preserves the c flag, so if the c flag was set, it sets A=255, else A=0 and the c flag stays the same.
16-bit NEG
To get the negative (additive inverse) of a 16-bit register, the following 6 byte, 24 t-state routine can be used:
xor a
sub [LSBreg16]
ld [LSBreg16],a
sbc a,a
sub [MSBreg16]
ld [MSBreg16],a
An example code would be:
xor a
sub l
ld l,a
sbc a,a
sub h
ld h,a
ld hl,(hl)
Often we want to use indirection when using a lookup table of addresses. For example, say you have a look-up table for strings:
LUT:
.dw String1
.dw String2
.dw String3
.dw String4
String1: .db "String1",0
String2: .db "String2",0
String3: .db "String3",0
String4: .db "String4",0
And say you wanted to store the location of the string in HL. Assuming HL already points to the address located in the LUT:
ld e,(hl)
inc hl
ld d,(hl)
ex de,hl
That is 4 bytes, 24 t-states, but it destroys DE. The following is the same size and speed, destroying A:
ld a,(hl)
inc hl
ld h,(hl)
ld l,a
In the case that you need extreme speed or size optimisations, the following also does the trick, but has a few drawbacks:
ld sp,hl
pop hl
At just 2 bytes, 16-tstates that is pretty optimised, but it destroys the stack pointer which is a crucial element to most routines. In general, you would need to save the stack pointer somewhere and later restore it at a total cost of 40 t-states and 8 bytes and your routine wouldn't be able to use the stack. You would then need to use this version of indirection at least 6 times to get a speed saving and 5 times for a size saving, at the cost of 2 bytes of RAM.
Optimized 'ld a,Y \ jr nc,$+4 \ ld a,X'
Instead of using the above code (6 bytes, 19cc or 21cc), try these:
In the case that Y==0: (3 bytes, 11cc)
sbc a,a \ and X
In the case that X==0: (4 bytes, 15cc)
ccf \ sbc a,a \ xor Y
When neither X==0, nor Y==0: (5 bytes, 18cc)
sbc a,a \ and X^Y \ xor Y
Conclusion
From this point on, you may be perfectly happy with your program. It works, runs at a decent speed and is also smaller than it use to be. What more could there be to do? Read on to find out what else you need to do before you decide to release your program to the general public.