News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Question on loop design

Started by jj2007, July 09, 2010, 10:12:55 PM

Previous topic - Next topic

jj2007

This is an "internal" call, i.e. a little routine that resides in a bigger one. My question is whether a short back jump to R0 is better than a long forward jump to R1. I have sometimes read here that backwards jumps are predicted as "taken" in loops - does that mean R0 is better if the condition is more frequently true than not?

QuoteR0:   retn 0
abc:   bsf ecx, edx   ; bit scan forward
   je R0
   ; je R1  ; better?
   inc eax
   btr edx, ecx   ; clear that bit
   lea ecx, [edi+ecx-31]
   mov [ebp+8*eax], ecx
   cmp byte ptr [ecx-2], 13
   mov byte ptr [ecx-2], 0   ; mov does not change flags
   je abc
   dec eax
   jmp abc
R1:   retn 0

ecube

Short of testing, it stands to reason that a short jump would be faster than a long jump because it's less distance your code has to jump in memory. The only case I think of this to be untrue is if you're using some kind of code emulator or similar to where all of the memory for the code isn't already mapped, only when you attempt to use a block of code is it mapped, in which case jumping forward it forces it to first map it, hence taking more time then a backward jump, short or long, to already mapped mem.

dedndave

hiyas Jochen
i would think the distance is of little concern - they will both be short, i think
as for direction, it depends on what is most often likely to occur in the BSF instruction
if the EDX register is most often likely to be 0 than any other value, jump backward
probably splitting hairs, either way   :bg

jj2007

Thanks, folks. The R1 jump would be long - I omitted a couple of instructions in the example.

dedndave

oops - i guess i underestimated the code size - lol (btw - you mean NEAR, right ?)

the jmp at the end bugs me a little
i am guessing that [ecx-2] is usually not 13
if that's the case, you might do this...
R0:   retn 0
abb:   dec eax
abc:   bsf ecx, edx   ; bit scan forward
   je R0
   inc eax
   btr edx, ecx   ; clear that bit
   lea ecx, [edi+ecx-31]
   mov [ebp+8*eax], ecx
   cmp byte ptr [ecx-2], 13
   mov byte ptr [ecx-2], 0   ; mov does not change flags
   jne abb
   jmp abc

the CALL still references "abc"

hutch--

JJ,

Over time I have played with the distinction between short and near jumps, the latter being a larger instruction, and have found almost no difference and if any difference the longer instruction is faster by some tiny amount. Such a test is highly dependent on the sequence of instructions it occurs within, I would tend to use the longer version as the minor code size change simply does not matter.

Code byte size assumptions come from the DOS days and very rarely ever effect the code speed where instruction choice when scheduled correctly does show some improvements in some circumstances.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave

one thing i have noticed - at least on the P4
labels that are the target of NEAR branches want to be 4-aligned
SHORT branches don't seem to care about target alignment

lingo

Why not faster...
   bsf      ecx, edx
   je        R0     
@@:
   mov     [eax], ecx          ; eax=ebp+eax  ; when retrieve from [eax] will be dword [eax]+edi-31
   btr       edx, ecx                                   
   cmp     byte ptr [edi+ecx-31-2], 13
   mov     byte ptr [edi+ecx-31-2], 0
   lea       ecx, [eax+8]
   cmovz eax, ecx       
   bsf      ecx, edx
   jne      @b
R0:
   ret     

redskull

As long as your loop fits in the CPU loop buffer, it shouldn't matter either way; the instructions stay 'predecoded' inside the CPU, so as long as you stay below the max (~64 aligned bytes, i think?), then the size of the instruction in memory only affects the first iteration through.

-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government