This is an "internal" call, i.e. a little routine that resides in a bigger one. My question is whether a short back jump to R0 is better than a long forward jump to R1. I have sometimes read here that backwards jumps are predicted as "taken" in loops - does that mean R0 is better if the condition is more frequently true than not?
QuoteR0: retn 0
abc: bsf ecx, edx ; bit scan forward
je R0
; je R1 ; better?
inc eax
btr edx, ecx ; clear that bit
lea ecx, [edi+ecx-31]
mov [ebp+8*eax], ecx
cmp byte ptr [ecx-2], 13
mov byte ptr [ecx-2], 0 ; mov does not change flags
je abc
dec eax
jmp abc
R1: retn 0
Short of testing, it stands to reason that a short jump would be faster than a long jump because it's less distance your code has to jump in memory. The only case I think of this to be untrue is if you're using some kind of code emulator or similar to where all of the memory for the code isn't already mapped, only when you attempt to use a block of code is it mapped, in which case jumping forward it forces it to first map it, hence taking more time then a backward jump, short or long, to already mapped mem.
hiyas Jochen
i would think the distance is of little concern - they will both be short, i think
as for direction, it depends on what is most often likely to occur in the BSF instruction
if the EDX register is most often likely to be 0 than any other value, jump backward
probably splitting hairs, either way :bg
Thanks, folks. The R1 jump would be long - I omitted a couple of instructions in the example.
oops - i guess i underestimated the code size - lol (btw - you mean NEAR, right ?)
the jmp at the end bugs me a little
i am guessing that [ecx-2] is usually not 13
if that's the case, you might do this...
R0: retn 0
abb: dec eax
abc: bsf ecx, edx ; bit scan forward
je R0
inc eax
btr edx, ecx ; clear that bit
lea ecx, [edi+ecx-31]
mov [ebp+8*eax], ecx
cmp byte ptr [ecx-2], 13
mov byte ptr [ecx-2], 0 ; mov does not change flags
jne abb
jmp abc
the CALL still references "abc"
JJ,
Over time I have played with the distinction between short and near jumps, the latter being a larger instruction, and have found almost no difference and if any difference the longer instruction is faster by some tiny amount. Such a test is highly dependent on the sequence of instructions it occurs within, I would tend to use the longer version as the minor code size change simply does not matter.
Code byte size assumptions come from the DOS days and very rarely ever effect the code speed where instruction choice when scheduled correctly does show some improvements in some circumstances.
one thing i have noticed - at least on the P4
labels that are the target of NEAR branches want to be 4-aligned
SHORT branches don't seem to care about target alignment
Why not faster...
bsf ecx, edx
je R0
@@:
mov [eax], ecx ; eax=ebp+eax ; when retrieve from [eax] will be dword [eax]+edi-31
btr edx, ecx
cmp byte ptr [edi+ecx-31-2], 13
mov byte ptr [edi+ecx-31-2], 0
lea ecx, [eax+8]
cmovz eax, ecx
bsf ecx, edx
jne @b
R0:
ret
As long as your loop fits in the CPU loop buffer, it shouldn't matter either way; the instructions stay 'predecoded' inside the CPU, so as long as you stay below the max (~64 aligned bytes, i think?), then the size of the instruction in memory only affects the first iteration through.
-r