Question on loop design

jj2007 · July 09, 2010, 10:12:55 PM

This is an "internal" call, i.e. a little routine that resides in a bigger one. My question is whether a short back jump to R0 is better than a long forward jump to R1. I have sometimes read here that backwards jumps are predicted as "taken" in loops - does that mean R0 is better if the condition is more frequently true than not?

QuoteR0:   retn 0
abc:   bsf ecx, edx   ; bit scan forward
   je R0
   ; je R1 ; better?
   inc eax
   btr edx, ecx   ; clear that bit
   lea ecx, [edi+ecx-31]
   mov [ebp+8*eax], ecx
   cmp byte ptr [ecx-2], 13
   mov byte ptr [ecx-2], 0   ; mov does not change flags
   je abc
   dec eax
   jmp abc
R1:   retn 0

ecube · July 09, 2010, 10:28:07 PM

Short of testing, it stands to reason that a short jump would be faster than a long jump because it's less distance your code has to jump in memory. The only case I think of this to be untrue is if you're using some kind of code emulator or similar to where all of the memory for the code isn't already mapped, only when you attempt to use a block of code is it mapped, in which case jumping forward it forces it to first map it, hence taking more time then a backward jump, short or long, to already mapped mem.

dedndave · July 09, 2010, 10:30:53 PM

hiyas Jochen
i would think the distance is of little concern - they will both be short, i think
as for direction, it depends on what is most often likely to occur in the BSF instruction
if the EDX register is most often likely to be 0 than any other value, jump backward
probably splitting hairs, either way :bg

jj2007 · July 09, 2010, 10:44:43 PM

Thanks, folks. The R1 jump would be long - I omitted a couple of instructions in the example.

dedndave · July 09, 2010, 11:16:05 PM

oops - i guess i underestimated the code size - lol (btw - you mean NEAR, right ?)

the jmp at the end bugs me a little
i am guessing that [ecx-2] is usually not 13
if that's the case, you might do this...

Code Select

R0:   retn 0
abb:   dec eax
abc:   bsf ecx, edx   ; bit scan forward
   je R0
   inc eax
   btr edx, ecx   ; clear that bit
   lea ecx, [edi+ecx-31]
   mov [ebp+8*eax], ecx
   cmp byte ptr [ecx-2], 13
   mov byte ptr [ecx-2], 0   ; mov does not change flags
   jne abb
   jmp abc

the CALL still references "abc"

hutch-- · July 10, 2010, 04:55:28 AM

JJ,

Over time I have played with the distinction between short and near jumps, the latter being a larger instruction, and have found almost no difference and if any difference the longer instruction is faster by some tiny amount. Such a test is highly dependent on the sequence of instructions it occurs within, I would tend to use the longer version as the minor code size change simply does not matter.

Code byte size assumptions come from the DOS days and very rarely ever effect the code speed where instruction choice when scheduled correctly does show some improvements in some circumstances.

dedndave · July 10, 2010, 04:58:16 AM

one thing i have noticed - at least on the P4
labels that are the target of NEAR branches want to be 4-aligned
SHORT branches don't seem to care about target alignment

lingo · July 10, 2010, 05:06:32 AM

Why not faster...
bsf ecx, edx
je R0
@@:
mov [eax], ecx          ; eax=ebp+eax ; when retrieve from [eax] will be dword [eax]+edi-31
btr    edx, ecx
cmp byte ptr [edi+ecx-31-2], 13
mov byte ptr [edi+ecx-31-2], 0
lea ecx, [eax+8]
cmovz eax, ecx
bsf ecx, edx
jne @b
R0:
ret

redskull · July 10, 2010, 05:43:00 PM

As long as your loop fits in the CPU loop buffer, it shouldn't matter either way; the instructions stay 'predecoded' inside the CPU, so as long as you stay below the max (~64 aligned bytes, i think?), then the size of the instruction in memory only affects the first iteration through.

-r

News:

Question on loop design

ecube