Hi,
I need to know what of these instruction sets is faster:
Set 1:
mov ecx, 0100h
label:
dec ecx
jnz label
Or set 2:
mov ecx, 0100h
label:
loop label
Because I need speed in a crc32 calculation.
Thank you in advance.
set 1 is faster on most pentiums (if not all)
but - you can measure it
in the Laboratory sub-forum, first post of the first thread are timing macros :U
give me a few minutes and i will write a little program
Pentium 4 Prescott:
Loop
592 clock cycles
581 clock cycles
596 clock cycles
Dec ECX
436 clock cycles
434 clock cycles
436 clock cycles
Hi,
Gackelfish!
Regards,
Steve N
P-III
G:\WORK>loopdec
Loop
1483 clock cycles
1483 clock cycles
1483 clock cycles
Dec ECX
522 clock cycles
523 clock cycles
523 clock cycles
Press any key to continue ...
AMD Athlon X2 QL-62
Loop
788 clock cycles
778 clock cycles
778 clock cycles
Dec ECX
521 clock cycles
521 clock cycles
522 clock cycles
Loop
529 clock cycles
528 clock cycles
536 clock cycles
Dec ECX
399 clock cycles
402 clock cycles
403 clock cycles
Press any key to continue ...
Intel Quad core 9550
Loop
1301 clock cycles
1301 clock cycles
1302 clock cycles
Dec ECX
280 clock cycles
281 clock cycles
281 clock cycles
Celeron M
Loop
1615 clock cycles
1613 clock cycles
1613 clock cycles
Dec ECX
290 clock cycles
290 clock cycles
290 clock cycles
CoreDuo 2, 1.89 GHz
Loop
1325 clock cycles
1325 clock cycles
1324 clock cycles
Dec ECX
286 clock cycles
286 clock cycles
286 clock cycles
It's still surprising to see significant variations for both codes between the various processors, specially for the "Dec ECX" case which would be expected to be almost standard based on clock cycles.
BTW, I also added a test replacing the "Dec ECX" instruction by "Sub ECX,1". The timings were identical.
Hi,
The answers are clear.
I'll use dec ecx.
A lot of thanks
Hi,
As a rough guide, if you look in the MASM32 help file opcodes.chm you will see that loop takes 6 cycles, and dec only 1 on a 486. As the above results show though, "your milage may vary".
Best regards,
Robin.
AMD Athlon(tm) 64 X2 Dual Core Processor 6000+
Loop
781 clock cycles
781 clock cycles
781 clock cycles
Dec ECX
524 clock cycles
523 clock cycles
524 clock cycles
Loop
1294 clock cycles
1294 clock cycles
1294 clock cycles
Dec ECX
279 clock cycles
279 clock cycles
279 clock cycles
on my Core2Duo e8400. Seems pentium4+AMD failed. Anyone with an i5 or i7 ? :green
Loop
1300 clock cycles
1300 clock cycles
1299 clock cycles
Dec ECX
280 clock cycles
280 clock cycles
281 clock cycles
Press any key to continue ...
E5200 P4 DualCore 2.7ghz
Quote from: untio on February 14, 2010, 11:38:37 AM
Because I need speed in a crc32 calculation.
What specific CRC32 polynomial are you using?
This might be of interest http://www.masm32.com/board/index.php?topic=13420.0
Also if you are iterating the loop a lot, the top of the loop should be aligned on a 16 byte boundary, a cache line boundary may be even better.
-Clive