News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Instruction Timing

Started by Tight_Coder_Ex, December 14, 2010, 07:06:16 PM

Previous topic - Next topic

Tight_Coder_Ex

I probably shouldn't be bothered by something that is so irrelevant, but this business of instruction timing has got me baffled.

I will not believe in a million years that the instruction @ C7 when consider memory fetches etc only takes 1 cycle versus the one @ BB taking 3.

Of course that MS would leave something from earlier versions that has no meaning at all now doesn't surprise me either.

000000BB   3   FE 03 inc byte ptr [ebx] ; Set parameter table active status
000000BD   1   8B 65 FC         mov esp, StackPntr            ; Unroll stack in case pointer misaligned
     
000000C0   6p  66| 9D popf ; Recover flags from previous operation
   
000000C2   1   77 10 ja @F ; Description string had to be truncated
000000C4   1   83 EC 10 sub esp, 16 ; Point to parameters required by MessageBox
000000C7   1   C7 44 24 04 mov [esp + 4], offset ErrString
        0000004C R


So if anybody has any idea why this is, please share

redskull

While I certainly wouldn't trust the timings, the fact that the inc would take longer than the mov is correct.  After all, 'inc [ebx]' requires moving the value from memory to the CPU, incrementing the value, and then moving the value back, which is three times as many steps as just moving to memory.

But again, all these timings are generally meaningless with CPU's how they are today

-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government

drizz

inc also modifies EFLAGS and preserves the CARRY flag.
The truth cannot be learned ... it can only be recognized.

dedndave

what is killing you is PUSHFD / POPFD (moreso the latter)
we cannot see the first part of the code that sets the flag for the JA instruction
so, can't make suggestions on how to avoid POPFD
show us a more complete snippet

right off hand, i would say you could duplicate the INC byte ptr [ebx] and MOV esp,StackPntr instructions so they are executed whether JA branches or not

        ja      @F

        inc byte ptr [ebx]
        mov     esp,StackPntr
;
;
;
@@:     inc byte ptr [ebx]
        mov     esp,StackPntr


another thing you might do would be to inc dword ptr [ebx] - make them dwords if it is practical to do so

dedndave

i also try to avoid CLD, STD, SAHF   :P
instructions that directly alter the flags are sluggish - i haven't tested CLC, CMC, STC

for example, if i need to STD, i may pushfd/pop eax and test to see if it is cleared, first - only setting it if needed

dedndave

oh - and.....

you are using PUSHF/POPF - that misaligns the stack   :bg
use PUSHFD/POPFD if you must

jj2007

QuoteIntel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
6348    cycles for inc byte ptr [reg32]
6362    cycles for inc dword ptr [reg32]
6346    cycles for inc byte ptr Counter
6346    cycles for inc dword ptr Counter
6342    cycles for mov/inc/mov byte ptr Counter
5247    cycles for movzx/inc/mov byte ptr Counter

Tight_Coder_Ex

Thanks guys.  As this procedure is only called once everytime I intialize a new linked list, I'm not to concerned about cycles, but rather more from an interest point of view.  There will be as the application progresses a need to pay closer attention as some loops may have 2 to 3 million iterations.  Therefor I had hoped this listing be a little more meaningful.

Incediously Dave, you caused me to catch a mistake, that without looking for the implication, meaning if there wasn't any API between PUSHF & POPF, I wasn't to concerned about alignment, but it did make me catch this.

pushf ; Save result of test

push eax
push ListD
lea eax, [ebx].Description
push eax ; Place in buffer
call lstrcpyn ; Move

inc byte ptr [ebx] ; Set parameter table active status
mov esp, StackPntr ; Unroll stack in case pointer misaligned

popf ; Recover flags from previous operation


Probably not a good idea changing the stack pointer just before one pops something off stack

dedndave

prescott w/htt...
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
6827    cycles for inc byte ptr [reg32]
6148    cycles for inc dword ptr [reg32]
6369    cycles for inc byte ptr Counter
5972    cycles for inc dword ptr Counter
8615    cycles for mov/inc/mov byte ptr Counter
8234    cycles for movzx/inc/mov byte ptr Counter

6433    cycles for inc byte ptr [reg32]
7257    cycles for inc dword ptr [reg32]
5931    cycles for inc byte ptr Counter
6027    cycles for inc dword ptr Counter
9330    cycles for mov/inc/mov byte ptr Counter
7974    cycles for movzx/inc/mov byte ptr Counter

dedndave

Tight

you might try this, rather than using POPF or POPFD
(and - stack misalignment is bad, whether you use an API or not)

        pushfd
;
;
;
        pop     eax
        test    eax,FlagBits
        jz      SomeLabel

this may seem like a little more code, but avoids setting the flags directly   :U
lemme see - JA is jump if no carry and not zero - both those bits must be cleared to branch

Tight_Coder_Ex

Quote from: dedndave on December 14, 2010, 09:03:00 PM
stack misalignment is bad, whether you use an API or not

Good Point and I do usually try to avoid bad habits

dedndave

here is the evidence   :bg
POP EAX/TEST
7 10 8 9 7 7 9 8 9 7 9 7 9 6 7 8
POPFD
100 103 101 98 98 99 98 99 97 100 99 100 97 100 98 99


you can replace this code
        popfd
        ja      @F


with this code
        pop     eax
        test    eax,41h
        jz      @F

Tight_Coder_Ex

Thanks Dave, and I'll definately make a note of it as in high interation loops it would much a big difference, espcially where video is involved

dedndave

i hope that helps a lot of coders   :U