I probably shouldn't be bothered by something that is so irrelevant, but this business of instruction timing has got me baffled.
I will not believe in a million years that the instruction @ C7 when consider memory fetches etc only takes 1 cycle versus the one @ BB taking 3.
Of course that MS would leave something from earlier versions that has no meaning at all now doesn't surprise me either.
000000BB 3 FE 03 inc byte ptr [ebx] ; Set parameter table active status
000000BD 1 8B 65 FC mov esp, StackPntr ; Unroll stack in case pointer misaligned
000000C0 6p 66| 9D popf ; Recover flags from previous operation
000000C2 1 77 10 ja @F ; Description string had to be truncated
000000C4 1 83 EC 10 sub esp, 16 ; Point to parameters required by MessageBox
000000C7 1 C7 44 24 04 mov [esp + 4], offset ErrString
0000004C R
So if anybody has any idea why this is, please share
While I certainly wouldn't trust the timings, the fact that the inc would take longer than the mov is correct. After all, 'inc [ebx]' requires moving the value from memory to the CPU, incrementing the value, and then moving the value back, which is three times as many steps as just moving to memory.
But again, all these timings are generally meaningless with CPU's how they are today
-r
inc also modifies EFLAGS and preserves the CARRY flag.
what is killing you is PUSHFD / POPFD (moreso the latter)
we cannot see the first part of the code that sets the flag for the JA instruction
so, can't make suggestions on how to avoid POPFD
show us a more complete snippet
right off hand, i would say you could duplicate the INC byte ptr [ebx] and MOV esp,StackPntr instructions so they are executed whether JA branches or not
ja @F
inc byte ptr [ebx]
mov esp,StackPntr
;
;
;
@@: inc byte ptr [ebx]
mov esp,StackPntr
another thing you might do would be to inc dword ptr [ebx] - make them dwords if it is practical to do so
i also try to avoid CLD, STD, SAHF :P
instructions that directly alter the flags are sluggish - i haven't tested CLC, CMC, STC
for example, if i need to STD, i may pushfd/pop eax and test to see if it is cleared, first - only setting it if needed
oh - and.....
you are using PUSHF/POPF - that misaligns the stack :bg
use PUSHFD/POPFD if you must
QuoteIntel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
6348 cycles for inc byte ptr [reg32]
6362 cycles for inc dword ptr [reg32]
6346 cycles for inc byte ptr Counter
6346 cycles for inc dword ptr Counter
6342 cycles for mov/inc/mov byte ptr Counter
5247 cycles for movzx/inc/mov byte ptr Counter
Thanks guys. As this procedure is only called once everytime I intialize a new linked list, I'm not to concerned about cycles, but rather more from an interest point of view. There will be as the application progresses a need to pay closer attention as some loops may have 2 to 3 million iterations. Therefor I had hoped this listing be a little more meaningful.
Incediously Dave, you caused me to catch a mistake, that without looking for the implication, meaning if there wasn't any API between PUSHF & POPF, I wasn't to concerned about alignment, but it did make me catch this.
pushf ; Save result of test
push eax
push ListD
lea eax, [ebx].Description
push eax ; Place in buffer
call lstrcpyn ; Move
inc byte ptr [ebx] ; Set parameter table active status
mov esp, StackPntr ; Unroll stack in case pointer misaligned
popf ; Recover flags from previous operation
Probably not a good idea changing the stack pointer just before one pops something off stack
prescott w/htt...
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
6827 cycles for inc byte ptr [reg32]
6148 cycles for inc dword ptr [reg32]
6369 cycles for inc byte ptr Counter
5972 cycles for inc dword ptr Counter
8615 cycles for mov/inc/mov byte ptr Counter
8234 cycles for movzx/inc/mov byte ptr Counter
6433 cycles for inc byte ptr [reg32]
7257 cycles for inc dword ptr [reg32]
5931 cycles for inc byte ptr Counter
6027 cycles for inc dword ptr Counter
9330 cycles for mov/inc/mov byte ptr Counter
7974 cycles for movzx/inc/mov byte ptr Counter
Tight
you might try this, rather than using POPF or POPFD
(and - stack misalignment is bad, whether you use an API or not)
pushfd
;
;
;
pop eax
test eax,FlagBits
jz SomeLabel
this may seem like a little more code, but avoids setting the flags directly :U
lemme see - JA is jump if no carry and not zero - both those bits must be cleared to branch
Quote from: dedndave on December 14, 2010, 09:03:00 PM
stack misalignment is bad, whether you use an API or not
Good Point and I do usually try to avoid bad habits
here is the evidence :bg
POP EAX/TEST
7 10 8 9 7 7 9 8 9 7 9 7 9 6 7 8
POPFD
100 103 101 98 98 99 98 99 97 100 99 100 97 100 98 99
you can replace this code
popfd
ja @F
with this code
pop eax
test eax,41h
jz @F
Thanks Dave, and I'll definately make a note of it as in high interation loops it would much a big difference, espcially where video is involved
i hope that helps a lot of coders :U