News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Faster alternative to .While ... .Endw

Started by jj2007, December 27, 2009, 09:32:10 AM

Previous topic - Next topic

Gunner

Here is mine...
Intel(R) Pentium(R) 4 CPU 2.40GHz (SSE2)
6       cycles for inline loop, add, no align
4       cycles for inline loop, sub, no align
7       cycles for inline loop, cmp two byte regs, no align
3       cycles for inline loop, cmp two immediates, no align
5       cycles for LoopJmpLingo
9       cycles for LoopJmpLingoJ

4       cycles for inline loop, add, no align
8       cycles for inline loop, sub, no align
2       cycles for inline loop, cmp two byte regs, no align
2       cycles for inline loop, cmp two immediates, no align
5       cycles for LoopJmpLingo
8       cycles for LoopJmpLingoJ

-2      cycles for inline loop, add, no align
-2      cycles for inline loop, sub, no align
6       cycles for inline loop, cmp two byte regs, no align
-4      cycles for inline loop, cmp two immediates, no align
0       cycles for LoopJmpLingo
8       cycles for LoopJmpLingoJ

-7      cycles for inline loop, add, no align
5       cycles for inline loop, sub, no align
-2      cycles for inline loop, cmp two byte regs, no align
-7      cycles for inline loop, cmp two immediates, no align
-2      cycles for LoopJmpLingo
4       cycles for LoopJmpLingoJ

Sizes:
18      inline, add
18      inline, sub
20      inline, cmp, two byte regs
12      inline, cmp, two byte immediates
34      LoopJmpLingo
22      LoopJmpLingoJ
--- ok ---
~Rob (Gunner)
- IE Zone Editor
- Gunners File Type Editor
http://www.gunnerinc.com

BlackVortex

Quote from: WryBugz on February 23, 2010, 10:12:44 PM
The other one....

Loop
991     clock cycles
1054    clock cycles
1045    clock cycles
Dec ECX
505     clock cycles
505     clock cycles
513     clock cycles
Press any key to continue ...


From the link.
Why does it take twice as many cycles as mine ? Not very efficient.

FairLight

...more data...


Intel(R) Core(TM)2 Duo CPU     E6850  @ 3.00GHz (SSE4)
7       cycles for LoopDecAl
7       cycles for LoopDecZx
9       cycles for LoopDec
9       cycles for LoopWhile
9       cycles for LoopJmpAl
9       cycles for LoopJmp

5       cycles for LoopDecAl
5       cycles for LoopDecZx
5       cycles for LoopDec
7       cycles for LoopWhile
7       cycles for LoopJmpAl
7       cycles for LoopJmp

3       cycles for LoopDecAl
3       cycles for LoopDecZx
3       cycles for LoopDec
5       cycles for LoopWhile
5       cycles for LoopJmpAl
5       cycles for LoopJmp

1       cycles for LoopDecAl
1       cycles for LoopDecZx
1       cycles for LoopDec
3       cycles for LoopWhile
3       cycles for LoopJmpAl
3       cycles for LoopJmp

Sizes:
19      LoopDecAl
26      LoopDecZx
19      LoopDec
20      LoopWhile
20      LoopJmpAl
20      LoopJmp
--- ok ---







Intel(R) Core(TM)2 Duo CPU     E6850  @ 3.00GHz (SSE4)
9       cycles for LoopJmpAlInc
9       cycles for LoopJmpAlAdd
9       cycles for LoopJmpAlSub
9       cycles for LoopJmpZxInc
9       cycles for LoopJmpZxAdd
9       cycles for LoopJmpZxSub
7       cycles for LoopJmpLingo

7       cycles for LoopJmpAlInc
7       cycles for LoopJmpAlAdd
7       cycles for LoopJmpAlSub
7       cycles for LoopJmpZxInc
7       cycles for LoopJmpZxAdd
7       cycles for LoopJmpZxSub
5       cycles for LoopJmpLingo

5       cycles for LoopJmpAlInc
5       cycles for LoopJmpAlAdd
5       cycles for LoopJmpAlSub
5       cycles for LoopJmpZxInc
5       cycles for LoopJmpZxAdd
5       cycles for LoopJmpZxSub
3       cycles for LoopJmpLingo

3       cycles for LoopJmpAlInc
3       cycles for LoopJmpAlAdd
3       cycles for LoopJmpAlSub
3       cycles for LoopJmpZxInc
3       cycles for LoopJmpZxAdd
3       cycles for LoopJmpZxSub
2       cycles for LoopJmpLingo

Sizes:
20      LoopJmpAlInc
22      LoopJmpAlAdd
22      LoopJmpAlSub
21      LoopJmpZxInc
23      LoopJmpZxAdd
23      LoopJmpZxSub
43      LoopJmpLingo
--- ok ---







Intel(R) Core(TM)2 Duo CPU     E6850  @ 3.00GHz (SSE4)
6       cycles for inline loop, add, no align
9       cycles for inline loop, sub, no align
3       cycles for inline loop, cmp two byte regs, no align
2       cycles for inline loop, cmp two immediates, no align
8       cycles for LoopJmpLingo
11      cycles for LoopJmpLingoJ

23      cycles for inline loop, add, no align
22      cycles for inline loop, sub, no align
1       cycles for inline loop, cmp two byte regs, no align
1       cycles for inline loop, cmp two immediates, no align
6       cycles for LoopJmpLingo
8       cycles for LoopJmpLingoJ

20      cycles for inline loop, add, no align
10      cycles for inline loop, sub, no align
0       cycles for inline loop, cmp two byte regs, no align
0       cycles for inline loop, cmp two immediates, no align
4       cycles for LoopJmpLingo
5       cycles for LoopJmpLingoJ

0       cycles for inline loop, add, no align
0       cycles for inline loop, sub, no align
0       cycles for inline loop, cmp two byte regs, no align
0       cycles for inline loop, cmp two immediates, no align
1       cycles for LoopJmpLingo
2       cycles for LoopJmpLingoJ

Sizes:
18      inline, add
18      inline, sub
20      inline, cmp, two byte regs
12      inline, cmp, two byte immediates
34      LoopJmpLingo
22      LoopJmpLingoJ
--- ok ---





Loop
1294    clock cycles
1294    clock cycles
1294    clock cycles
Dec ECX
279     clock cycles
279     clock cycles
279     clock cycles
Press any key to continue ...

joemc


Intel(R) Core(TM)2 CPU         T5600  @ 1.83GHz (SSE4)
8       cycles for LoopDecAl
10      cycles for LoopDec
13      cycles for LoopWhile
13      cycles for LoopJmpAl
13      cycles for LoopJmp

6       cycles for LoopDecAl
6       cycles for LoopDec
9       cycles for LoopWhile
9       cycles for LoopJmpAl
9       cycles for LoopJmp

4       cycles for LoopDecAl
4       cycles for LoopDec
7       cycles for LoopWhile
7       cycles for LoopJmpAl
7       cycles for LoopJmp

2       cycles for LoopDecAl
2       cycles for LoopDec
4       cycles for LoopWhile
4       cycles for LoopJmpAl
4       cycles for LoopJmp

Sizes:
19      LoopDecAl
19      LoopDec
20      LoopWhile
20      LoopJmpAl
20      LoopJmp

-------------

3       cycles for inline loop, add, align 16
6       cycles for inline loop, sub, align 4
3       cycles for inline loop, cmp two byte regs, align 4
3       cycles for inline loop, cmp two immediates, align 4
6       cycles for LoopJmpLingo
7       cycles for LoopJmpLingoJ

3       cycles for inline loop, add, align 16
2       cycles for inline loop, sub, align 4
1       cycles for inline loop, cmp two byte regs, align 4
1       cycles for inline loop, cmp two immediates, align 4
5       cycles for LoopJmpLingo
7       cycles for LoopJmpLingoJ

0       cycles for inline loop, add, align 16
0       cycles for inline loop, sub, align 4
0       cycles for inline loop, cmp two byte regs, align 4
0       cycles for inline loop, cmp two immediates, align 4
2       cycles for LoopJmpLingo
3       cycles for LoopJmpLingoJ

0       cycles for inline loop, add, align 16
0       cycles for inline loop, sub, align 4
0       cycles for inline loop, cmp two byte regs, align 4
0       cycles for inline loop, cmp two immediates, align 4
2       cycles for LoopJmpLingo
2       cycles for LoopJmpLingoJ

Sizes:
23      inline, add
19      inline, sub
20      inline, cmp, two byte regs
14      inline, cmp, two byte immediates
34      LoopJmpLingo
22      LoopJmpLingoJ
--- ok ---

Greenhorn__

Hi,

here are my results ...

LoopDecWhile.exe (2nd version)

AMD Phenom(tm) II X4 955 Processor (SSE3)
16 cycles for LoopDecAl
16 cycles for LoopDecZx
9 cycles for LoopDec
15 cycles for LoopWhile
14 cycles for LoopJmpAl
15 cycles for LoopJmp


8 cycles for LoopDecAl
17 cycles for LoopDecZx
8 cycles for LoopDec
26 cycles for LoopWhile
29 cycles for LoopJmpAl
26 cycles for LoopJmp


7 cycles for LoopDecAl
7 cycles for LoopDecZx
7 cycles for LoopDec
6 cycles for LoopWhile
6 cycles for LoopJmpAl
6 cycles for LoopJmp


4 cycles for LoopDecAl
4 cycles for LoopDecZx
6 cycles for LoopDec
2 cycles for LoopWhile
2 cycles for LoopJmpAl
2 cycles for LoopJmp

Sizes:
19 LoopDecAl
26 LoopDecZx
19 LoopDec
20 LoopWhile
20 LoopJmpAl
20 LoopJmp
--- ok ---


... and for IncAddSub.exe

AMD Phenom(tm) II X4 955 Processor (SSE3)
14 cycles for LoopJmpAlInc
15 cycles for LoopJmpAlAdd
15 cycles for LoopJmpAlSub
15 cycles for LoopJmpZxInc
15 cycles for LoopJmpZxAdd
15 cycles for LoopJmpZxSub


27 cycles for LoopJmpAlInc
29 cycles for LoopJmpAlAdd
27 cycles for LoopJmpAlSub
29 cycles for LoopJmpZxInc
27 cycles for LoopJmpZxAdd
27 cycles for LoopJmpZxSub


6 cycles for LoopJmpAlInc
6 cycles for LoopJmpAlAdd
6 cycles for LoopJmpAlSub
6 cycles for LoopJmpZxInc
6 cycles for LoopJmpZxAdd
6 cycles for LoopJmpZxSub


2 cycles for LoopJmpAlInc
2 cycles for LoopJmpAlAdd
2 cycles for LoopJmpAlSub
2 cycles for LoopJmpZxInc
2 cycles for LoopJmpZxAdd
2 cycles for LoopJmpZxSub

Sizes:
20 LoopJmpAlInc
22 LoopJmpAlAdd
22 LoopJmpAlSub
21 LoopJmpZxInc
23 LoopJmpZxAdd
23 LoopJmpZxSub
--- ok ---


Regards
Greenhorn
You can fool some of the people all of the time, and all of the people some of the time, but you can not fool all of the people all of the time.
(Abraham Lincoln)