News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Trace Stack Test

Started by dedndave, August 24, 2010, 05:46:18 PM

Previous topic - Next topic

clive

Intel(R) Atom(TM) CPU N450   @ 1.66GHz (SSE4)

JMP ECX

203     clock cycles
199     clock cycles
199     clock cycles
201     clock cycles
199     clock cycles

RET

172     clock cycles
156     clock cycles
155     clock cycles
152     clock cycles
152     clock cycles
It could be a random act of randomness. Those happen a lot as well.

dedndave

any comments or suggestions regarding the validity of the test method/code ?

i was trying to load the trace stack with some (10, in this case) return addresses
then, test whether "skipping" one with pop/jmp causes problems with prediction of the others
that way, the pop/jmp is executed once, while the other returns are executed 10 times
that should minimize the difference in the pop/jmp vs ret code time

clive

Quote from: dedndave on August 25, 2010, 06:53:02 PM
any comments or suggestions regarding the validity of the test method/code ?

Personally, i'd be apt to significantly reduce the loop count, and do a lot more iterations within the loop itself. Another thing to look at is the number of instructions between the POP ECX, and JMP ECX, lingo does the POP early so it becomes less of a dependency.
It could be a random act of randomness. Those happen a lot as well.

dedndave

thanks Clive
i only used 10, becuase i didn't want to bump into the trace cache 16 limit

as for the dependancy
i figured, seeing as the "End" routines are executed only once, a few clock cycles wouldn't hurt
we could ignore a small difference in the result
but, the time differences we are seeing are more than a few clock cycles

i tried a couple other variations, with pretty much the same result on my P4
1) where the reference End routine used push ebp, mov ebp,esp, and mov eax,[ebp+8]
   the jmp ecx routine used pop ecx, pop eax to load a parm
2) another version - same jmp ecx routine as above
   in the reference routine, i used mov eax,[esp+4]

clive

No I understand the 10 (16 limit), I'm talking about a REPT x; XOR EAX,EAX; CALL test; ENDM construct to reduce the significance of the inner loop.

Non HTT Prescott

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)

JMP ECX last (100)

17359   clock cycles
17997   clock cycles
17140   clock cycles
18362   clock cycles
18015   clock cycles

RET all (100)

14813   clock cycles
14467   clock cycles
14919   clock cycles
14521   clock cycles
14584   clock cycles

JMP ECX all (100)

24593   clock cycles
24388   clock cycles
27049   clock cycles
24570   clock cycles
24478   clock cycles

JMP ECX mostly (100)

28926   clock cycles
28382   clock cycles
30173   clock cycles
28154   clock cycles
28588   clock cycles

RET mostly (100)

17025   clock cycles
17326   clock cycles
17862   clock cycles
17137   clock cycles
19619   clock cycles


The "JMP ECX mostly (100)" is an interesting case as the RET is buried in the nesting and looks to seriously foul the timing, unless I screwed up the code.
It could be a random act of randomness. Those happen a lot as well.

jj2007

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)

JMP ECX last (100)

6335    clock cycles
6343    clock cycles
6335    clock cycles
6341    clock cycles
6335    clock cycles

RET all (100)

5135    clock cycles
5184    clock cycles
5143    clock cycles
5135    clock cycles
5143    clock cycles

JMP ECX all (100)

11070   clock cycles
11078   clock cycles
11275   clock cycles
11134   clock cycles
11152   clock cycles

JMP ECX mostly (100)

11369   clock cycles
11317   clock cycles
11323   clock cycles
11359   clock cycles
11373   clock cycles

RET mostly (100)

6434    clock cycles
6491    clock cycles
6435    clock cycles
6449    clock cycles
6435    clock cycles

clive

Intel(R) Atom(TM) CPU N450   @ 1.66GHz (SSE4)

JMP ECX last (100)

15314   clock cycles
15725   clock cycles
15802   clock cycles
15771   clock cycles
15731   clock cycles

RET all (100)

14953   clock cycles
15690   clock cycles
15674   clock cycles
15718   clock cycles
15716   clock cycles

JMP ECX all (100)

18393   clock cycles
18856   clock cycles
18878   clock cycles
18867   clock cycles
18871   clock cycles

JMP ECX mostly (100)

23632   clock cycles
24144   clock cycles
24233   clock cycles
24265   clock cycles
24293   clock cycles

RET mostly (100)

19615   clock cycles
20225   clock cycles
20139   clock cycles
20238   clock cycles
20105   clock cycles
It could be a random act of randomness. Those happen a lot as well.

dedndave

interesting

prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)

JMP ECX last (100)

16694   clock cycles
16913   clock cycles
21244   clock cycles
16927   clock cycles
16897   clock cycles

RET all (100)

14278   clock cycles
14270   clock cycles
14287   clock cycles
14288   clock cycles
14397   clock cycles

JMP ECX all (100)

24158   clock cycles
24195   clock cycles
24171   clock cycles
24158   clock cycles
24133   clock cycles

JMP ECX mostly (100)

27883   clock cycles
29341   clock cycles
27852   clock cycles
27850   clock cycles
27845   clock cycles

RET mostly (100)

17595   clock cycles
16701   clock cycles
17094   clock cycles
17199   clock cycles
16947   clock cycles

Antariy


Prescott, w/out :) HT and many other technologies :)


Intel(R) Celeron(R) CPU 2.13GHz (SSE3)

JMP ECX last (100)

17527   clock cycles
19281   clock cycles
20132   clock cycles
17529   clock cycles
17534   clock cycles

RET all (100)

17244   clock cycles
16141   clock cycles
16479   clock cycles
14937   clock cycles
15011   clock cycles

JMP ECX all (100)

28735   clock cycles
25446   clock cycles
26784   clock cycles
26794   clock cycles
25414   clock cycles

JMP ECX mostly (100)

31973   clock cycles
31013   clock cycles
29242   clock cycles
30805   clock cycles
29498   clock cycles

RET mostly (100)

17452   clock cycles
17841   clock cycles
20516   clock cycles
18214   clock cycles
17607   clock cycles

Press any key to continue ...




Alex

FORTRANS

Hi,

   As before.  Odd?

Regards,

Steve N.


pre-P4 (SSE1)

JMP ECX last (100)

9211 clock cycles
9197 clock cycles
9194 clock cycles
9208 clock cycles
9197 clock cycles

RET all (100)

8061 clock cycles
8070 clock cycles
8122 clock cycles
8058 clock cycles
8072 clock cycles

JMP ECX all (100)

11313 clock cycles
11314 clock cycles
11303 clock cycles
11306 clock cycles
11297 clock cycles

JMP ECX mostly (100)

14741 clock cycles
14734 clock cycles
14745 clock cycles
14727 clock cycles
14725 clock cycles

RET mostly (100)

9203 clock cycles
9190 clock cycles
9188 clock cycles
9196 clock cycles
9185 clock cycles

Press any key to continue ...

pre-P4
JMP ECX last (100)

9461 clock cycles
9505 clock cycles
9696 clock cycles
9490 clock cycles
9452 clock cycles

RET all (100)

8986 clock cycles
8987 clock cycles
9009 clock cycles
9129 clock cycles
8955 clock cycles

JMP ECX all (100)

9898 clock cycles
9935 clock cycles
9898 clock cycles
9914 clock cycles
9907 clock cycles

JMP ECX mostly (100)

11099 clock cycles
11095 clock cycles
11177 clock cycles
11092 clock cycles
11088 clock cycles

RET mostly (100)

10559 clock cycles
9892 clock cycles
9945 clock cycles
9933 clock cycles
9914 clock cycles

Press any key to continue ...