Print Page - loop instruction

Title: loop instruction
Post by: zemtex on December 22, 2011, 01:24:45 PM

What is your opinion on:

loop

vs

sub, Jcc

In the intel optimization manual, loop has a latency of 8 cycles and throughput of 1.5 cycles

sub has a latency of 1 and throughput of 0.5 cycles
Jcc has throughput of 0.5 cycles

Which means that

sub + Jcc has only 1 cycle throughput together since there is dependencies between sub and Jcc it will accumulate to at least 2 cycles
loop has 1.5 cycles throughput

In total, a loop with the Loop instruction has a overhead that is 75% that of sub + Jcc. In addition, since the loop instruction is dependent on ecx, i'm not sure how efficient it is. But the loop instruction is not being executed after one another, so the latency will apply every iteration. Correct me if im wrong.

So 8 cycles overhead for the loop instruction and
2 cycles for sub/Jcc

(assuming the loop content is over 8 cycles long)

Title: Re: loop instruction
Post by: dedndave on December 22, 2011, 03:11:56 PM

LOOP is slow - but still handy when speed isn't critical
the same is true for JECXZ

Title: Re: loop instruction
Post by: zemtex on December 22, 2011, 04:21:43 PM

these instructions should complete in:

mov eax, ecx
mov edx, ebx
mov esi, edi
mov ebp, esp

This should finish in 2.5 cycles.

the funny thing is that these:

mov eax, ecx
mov eax, edx

will finish in 2 cycles.

Title: Re: loop instruction
Post by: jj2007 on December 22, 2011, 04:46:57 PM

Test your luck... for my puter it's 4 cycles slower.

Title: Re: loop instruction
Post by: zemtex on December 22, 2011, 05:02:31 PM

If I am not mistaken I used 06_2FH which is code for Sandy Bridge, there is probably a bit different timings in yours. Or perhaps I miscalculated it? However, the optimization manual declares that the timings is not guaranteed in real practical examples.

Title: Re: loop instruction
Post by: zemtex on December 22, 2011, 05:06:29 PM

11033 loop
7961 jnz

11047 loop
7838 jnz

Title: Re: loop instruction
Post by: jj2007 on December 22, 2011, 05:10:55 PM

I had randomly thrown together some instructions to get the 8 cycles together, but maybe there is a combination that makes loop faster than jnz...

Code Select

EatCycles MACRO
  push eax
  push edx
  xchg eax, edx
  inc eax
  dec edx
  nops 5
  pop edx
  xchg eax, ebx
  mov ebx, edx
  sub ebx, ecx
  pop eax
  sub edx, eax
  add eax, edx
ENDM

Title: Re: loop instruction
Post by: dedndave on December 22, 2011, 06:20:57 PM

Code Select

mov eax, ecx
mov eax, edx

some processors may foresee that the first instruction need not be executed

Title: Re: loop instruction
Post by: KeepingRealBusy on December 22, 2011, 10:50:19 PM

Quote from: dedndave on December 22, 2011, 06:20:57 PM
Code Select Expand
mov eax, ecx mov eax, edx

some processors may foresee that the first instruction need not be executed

And others will see it as a stall (both target eax).

Dave.

The MASM Forum Archive 2004 to 2012

General Forums => The Colosseum => Topic started by: zemtex on December 22, 2011, 01:24:45 PM