What is your opinion on:
loop
vs
sub, Jcc
In the intel optimization manual, loop has a latency of 8 cycles and throughput of 1.5 cycles
sub has a latency of 1 and throughput of 0.5 cycles
Jcc has throughput of 0.5 cycles
Which means that
sub + Jcc has only 1 cycle throughput together since there is dependencies between sub and Jcc it will accumulate to at least 2 cycles
loop has 1.5 cycles throughput
In total, a loop with the Loop instruction has a overhead that is 75% that of sub + Jcc. In addition, since the loop instruction is dependent on ecx, i'm not sure how efficient it is. But the loop instruction is not being executed after one another, so the latency will apply every iteration. Correct me if im wrong.
So 8 cycles overhead for the loop instruction and
2 cycles for sub/Jcc
(assuming the loop content is over 8 cycles long)
LOOP is slow - but still handy when speed isn't critical
the same is true for JECXZ
these instructions should complete in:
mov eax, ecx
mov edx, ebx
mov esi, edi
mov ebp, esp
This should finish in 2.5 cycles.
the funny thing is that these:
mov eax, ecx
mov eax, edx
will finish in 2 cycles.
Test your luck... for my puter it's 4 cycles slower.
If I am not mistaken I used 06_2FH which is code for Sandy Bridge, there is probably a bit different timings in yours. Or perhaps I miscalculated it? However, the optimization manual declares that the timings is not guaranteed in real practical examples.
11033 loop
7961 jnz
11047 loop
7838 jnz
I had randomly thrown together some instructions to get the 8 cycles together, but maybe there is a combination that makes loop faster than jnz...
EatCycles MACRO
push eax
push edx
xchg eax, edx
inc eax
dec edx
nops 5
pop edx
xchg eax, ebx
mov ebx, edx
sub ebx, ecx
pop eax
sub edx, eax
add eax, edx
ENDM
mov eax, ecx
mov eax, edx
some processors may foresee that the first instruction need not be executed
Quote from: dedndave on December 22, 2011, 06:20:57 PM
mov eax, ecx
mov eax, edx
some processors may foresee that the first instruction need not be executed
And others will see it as a stall (both target eax).
Dave.