News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

What is faster

Started by untio, February 14, 2010, 11:38:37 AM

Previous topic - Next topic

untio

Hi,
I need to know what of these instruction sets is faster:
Set 1:
mov ecx, 0100h
label:
dec ecx
jnz label
Or set 2:
mov ecx, 0100h
label:
loop label
Because I need speed in a crc32 calculation.

Thank you in advance.

dedndave

set 1 is faster on most pentiums (if not all)
but - you can measure it
in the Laboratory sub-forum, first post of the first thread are timing macros   :U

give me a few minutes and i will write a little program

dedndave

Pentium 4 Prescott:
Loop
592     clock cycles
581     clock cycles
596     clock cycles
Dec ECX
436     clock cycles
434     clock cycles
436     clock cycles

FORTRANS

Hi,

   Gackelfish!

Regards,

Steve N


P-III

G:\WORK>loopdec
Loop
1483    clock cycles
1483    clock cycles
1483    clock cycles
Dec ECX
522     clock cycles
523     clock cycles
523     clock cycles
Press any key to continue ...

donkey

AMD Athlon X2 QL-62

Loop
788     clock cycles
778     clock cycles
778     clock cycles
Dec ECX
521     clock cycles
521     clock cycles
522     clock cycles
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

Gunner

Loop
529     clock cycles
528     clock cycles
536     clock cycles
Dec ECX
399     clock cycles
402     clock cycles
403     clock cycles
Press any key to continue ...
~Rob (Gunner)
- IE Zone Editor
- Gunners File Type Editor
http://www.gunnerinc.com

Neil

Intel Quad core 9550

Loop
1301      clock cycles
1301      clock cycles
1302      clock cycles
Dec ECX
280        clock cycles
281        clock cycles
281        clock cycles

jj2007

Celeron M

Loop
1615    clock cycles
1613    clock cycles
1613    clock cycles
Dec ECX
290     clock cycles
290     clock cycles
290     clock cycles

raymond

CoreDuo 2, 1.89 GHz

Loop
1325   clock cycles
1325   clock cycles
1324   clock cycles
Dec ECX
286    clock cycles
286    clock cycles
286    clock cycles

It's still surprising to see significant variations for both codes between the various processors, specially for the "Dec ECX" case which would be expected to be almost standard based on clock cycles.

BTW, I also added a test replacing the "Dec ECX" instruction by "Sub ECX,1". The timings were identical.
When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com

untio

Hi,
The answers are clear.
I'll use dec ecx.

A lot of thanks

Astro

Hi,

As a rough guide, if you look in the MASM32 help file opcodes.chm you will see that loop takes 6 cycles, and dec only 1 on a 486. As the above results show though, "your milage may vary".

Best regards,
Robin.

dacid

AMD Athlon(tm) 64 X2 Dual Core Processor 6000+

Loop
781     clock cycles
781     clock cycles
781     clock cycles
Dec ECX
524     clock cycles
523     clock cycles
524     clock cycles

BlackVortex

Loop
1294    clock cycles
1294    clock cycles
1294    clock cycles
Dec ECX
279     clock cycles
279     clock cycles
279     clock cycles

on my Core2Duo e8400. Seems pentium4+AMD failed. Anyone with an i5 or i7 ?   :green

Ghandi


Loop
1300    clock cycles
1300    clock cycles
1299    clock cycles
Dec ECX
280     clock cycles
280     clock cycles
281     clock cycles
Press any key to continue ...


E5200 P4 DualCore 2.7ghz

clive

Quote from: untio on February 14, 2010, 11:38:37 AM
Because I need speed in a crc32 calculation.

What specific CRC32 polynomial are you using?

This might be of interest http://www.masm32.com/board/index.php?topic=13420.0

Also if you are iterating the loop a lot, the top of the loop should be aligned on a 16 byte boundary, a cache line boundary may be even better.

-Clive
It could be a random act of randomness. Those happen a lot as well.