The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: dedndave on August 24, 2010, 05:46:18 PM

Title: Trace Stack Test
Post by: dedndave on August 24, 2010, 05:46:18 PM
here is a simple test to compare POP ECX|JMP ECX with RET
i use a recursive routine to load the trace stack, then execute the selected test routine

result on a prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)

JMP ECX

175     clock cycles
181     clock cycles
177     clock cycles
175     clock cycles
180     clock cycles

RET

113     clock cycles
114     clock cycles
114     clock cycles
113     clock cycles
116     clock cycles
Title: Re: Trace Stack Test
Post by: Neil on August 24, 2010, 05:55:15 PM
Intel Quad CPU Q9550 @2.83 GHz (SSE4)

JMP ECX

85          clock cycles
86          clock cycles
86          clock cycles
86          clock cycles
86          clock cycles

RET

41          clock cycles
43          clock cycles
43          clock cycles
43          clock cycles
43          clock cycles
Title: Re: Trace Stack Test
Post by: dioxin on August 24, 2010, 06:02:26 PM
AMD Phenom(tm) II X4 945 Processor (SSE3)

JMP ECX

90      clock cycles
90      clock cycles
90      clock cycles
90      clock cycles
90      clock cycles

RET

60      clock cycles
60      clock cycles
60      clock cycles
60      clock cycles
60      clock cycles
Title: Re: Trace Stack Test
Post by: dedndave on August 24, 2010, 06:24:43 PM
thanks guys
it looks like that works as documented
messing up the trace stack has a penalty
i wonder how it will look on a PIII
Title: Re: Trace Stack Test
Post by: jj2007 on August 24, 2010, 07:01:34 PM
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
JMP ECX  76 clock cycles
RET      45 clock cycles
Title: Re: Trace Stack Test
Post by: FORTRANS on August 24, 2010, 08:28:37 PM
Hi,

   PIII as requedted.

Regards,

Steve N.


G:\WORK>tstest
pre-P4 (SSE1)

JMP ECX

86      clock cycles
86      clock cycles
86      clock cycles
86      clock cycles
86      clock cycles

RET

62      clock cycles
62      clock cycles
62      clock cycles
62      clock cycles
62      clock cycles

Press any key to continue ...


P MMX since it ran.


pre-P4
JMP ECX

95 clock cycles
95 clock cycles
95 clock cycles
95 clock cycles
95 clock cycles

RET

90 clock cycles
90 clock cycles
90 clock cycles
90 clock cycles
90 clock cycles

Press any key to continue ...


Title: Re: Trace Stack Test
Post by: Antariy on August 24, 2010, 08:49:43 PM
Hi, Dave!

This is my timings:

Intel(R) Celeron(R) CPU 2.13GHz (SSE3)

JMP ECX

176     clock cycles
176     clock cycles
176     clock cycles
175     clock cycles
174     clock cycles

RET

118     clock cycles
118     clock cycles
117     clock cycles
118     clock cycles
119     clock cycles

Press any key to continue ...


I confidense, what even on 8086 "pop ecx; jmp ecx" CANNOT be faster, because software work CANNOT beat hardware work (CPU implemention of restoring stack and changing of (E)IP). On new CPUs, with prediction mechanic, this is cannot be faster in anyway, in real world (in "clock-test" this might have not very big difference, because macro use cpuid).



Alex
Title: Re: Trace Stack Test
Post by: dedndave on August 24, 2010, 08:53:09 PM
thanks guys
i didn't know you had a pentium mmx, Steve - lol
i have one here, too, although it isn't connected to the internet
i use it for troubleshooting, mostly
it's a 200 MHz, but it runs win98 ok at 225   :P
Title: Re: Trace Stack Test
Post by: Antariy on August 24, 2010, 08:57:50 PM
Quote from: dedndave on August 24, 2010, 08:53:09 PM
thanks guys
i didn't know you had a pentium mmx, Steve - lol
i have one here, too, although it isn't connected to the internet

Dave, novadays even hand calculators have connection to internet  :P
Need to modernize your PMMX  :P



Alex
Title: Re: Trace Stack Test
Post by: dedndave on August 24, 2010, 09:00:01 PM
well - it's just a tool, really
i can plug any old IDE drive in it and it will recognize it
same for older video and sound cards, as well as floppy drives
i am used to XP, now - i don't want to work on win98, any more - lol
i have a NIC for it - but no wireless
i am too lazy to run the cable - lol
Title: Re: Trace Stack Test
Post by: Antariy on August 24, 2010, 09:04:39 PM
Quote from: dedndave on August 24, 2010, 09:00:01 PM
well - it's just a tool, really
i can plug any old IDE drive in it and it will recognize it
same for older video and sound cards, as well as floppy drives
i am used to XP, now - i don't want to work on win98, any more - lol
i have a NIC for it - but no wireless
i am too lazy to run the cable - lol

Sorry, Dave, this is *joke* only.
I also desire have old hardware for testing, but I cannot find them.



Alex
Title: Re: Trace Stack Test
Post by: FORTRANS on August 24, 2010, 09:09:09 PM
Hi Dave,

Quote from: dedndave on August 24, 2010, 08:53:09 PM
thanks guys
i didn't know you had a pentium mmx, Steve - lol

   I believe I posted some results before.  I think it was
for a CPU ID program?  Anyway...

Quote
i have one here, too, although it isn't connected to the internet
i use it for troubleshooting, mostly
it's a 200 MHz, but it runs win98 ok at 225   :P

   Mine can hook to the internet with a PCMCIA NIC, but it is
floppy SneakerNet for now.  166 MHz laptop with Win 98.
LOL indeed.

Cheers,

Steve
Title: Re: Trace Stack Test
Post by: Antariy on August 24, 2010, 09:36:45 PM
Quote from: FORTRANS on August 24, 2010, 09:09:09 PM
   Mine can hook to the internet with a PCMCIA NIC, but it is
floppy SneakerNet for now.  166 MHz laptop with Win 98.
LOL indeed.

This is not HP's white-color notebook with green track-point? I see that once.



Alex
Title: Re: Trace Stack Test
Post by: hutch-- on August 25, 2010, 08:54:00 AM
Dave,


Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)

JMP ECX

86      clock cycles
86      clock cycles
86      clock cycles
86      clock cycles
86      clock cycles

RET

43      clock cycles
43      clock cycles
43      clock cycles
43      clock cycles
43      clock cycles

Press any key to continue ...
Title: Re: Trace Stack Test
Post by: FORTRANS on August 25, 2010, 12:25:23 PM
Quote from: Antariy on August 24, 2010, 09:36:45 PM
This is not HP's white-color notebook with green track-point? I see that once.

Hi Alex,

   No.  It is an HP OmniBook 800 with the "paw" style mouse.
The mouse pops out of the side on a strip of plastic.  Dark
gray case.

Regards,

Steve
Title: Re: Trace Stack Test
Post by: clive on August 25, 2010, 04:08:30 PM
Intel(R) Atom(TM) CPU N450   @ 1.66GHz (SSE4)

JMP ECX

203     clock cycles
199     clock cycles
199     clock cycles
201     clock cycles
199     clock cycles

RET

172     clock cycles
156     clock cycles
155     clock cycles
152     clock cycles
152     clock cycles
Title: Re: Trace Stack Test
Post by: dedndave on August 25, 2010, 06:53:02 PM
any comments or suggestions regarding the validity of the test method/code ?

i was trying to load the trace stack with some (10, in this case) return addresses
then, test whether "skipping" one with pop/jmp causes problems with prediction of the others
that way, the pop/jmp is executed once, while the other returns are executed 10 times
that should minimize the difference in the pop/jmp vs ret code time
Title: Re: Trace Stack Test
Post by: clive on August 25, 2010, 07:29:39 PM
Quote from: dedndave on August 25, 2010, 06:53:02 PM
any comments or suggestions regarding the validity of the test method/code ?

Personally, i'd be apt to significantly reduce the loop count, and do a lot more iterations within the loop itself. Another thing to look at is the number of instructions between the POP ECX, and JMP ECX, lingo does the POP early so it becomes less of a dependency.
Title: Re: Trace Stack Test
Post by: dedndave on August 25, 2010, 07:36:45 PM
thanks Clive
i only used 10, becuase i didn't want to bump into the trace cache 16 limit

as for the dependancy
i figured, seeing as the "End" routines are executed only once, a few clock cycles wouldn't hurt
we could ignore a small difference in the result
but, the time differences we are seeing are more than a few clock cycles

i tried a couple other variations, with pretty much the same result on my P4
1) where the reference End routine used push ebp, mov ebp,esp, and mov eax,[ebp+8]
   the jmp ecx routine used pop ecx, pop eax to load a parm
2) another version - same jmp ecx routine as above
   in the reference routine, i used mov eax,[esp+4]
Title: Re: Trace Stack Test
Post by: clive on August 25, 2010, 08:21:44 PM
No I understand the 10 (16 limit), I'm talking about a REPT x; XOR EAX,EAX; CALL test; ENDM construct to reduce the significance of the inner loop.

Non HTT Prescott

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)

JMP ECX last (100)

17359   clock cycles
17997   clock cycles
17140   clock cycles
18362   clock cycles
18015   clock cycles

RET all (100)

14813   clock cycles
14467   clock cycles
14919   clock cycles
14521   clock cycles
14584   clock cycles

JMP ECX all (100)

24593   clock cycles
24388   clock cycles
27049   clock cycles
24570   clock cycles
24478   clock cycles

JMP ECX mostly (100)

28926   clock cycles
28382   clock cycles
30173   clock cycles
28154   clock cycles
28588   clock cycles

RET mostly (100)

17025   clock cycles
17326   clock cycles
17862   clock cycles
17137   clock cycles
19619   clock cycles


The "JMP ECX mostly (100)" is an interesting case as the RET is buried in the nesting and looks to seriously foul the timing, unless I screwed up the code.
Title: Re: Trace Stack Test
Post by: jj2007 on August 25, 2010, 08:24:37 PM
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)

JMP ECX last (100)

6335    clock cycles
6343    clock cycles
6335    clock cycles
6341    clock cycles
6335    clock cycles

RET all (100)

5135    clock cycles
5184    clock cycles
5143    clock cycles
5135    clock cycles
5143    clock cycles

JMP ECX all (100)

11070   clock cycles
11078   clock cycles
11275   clock cycles
11134   clock cycles
11152   clock cycles

JMP ECX mostly (100)

11369   clock cycles
11317   clock cycles
11323   clock cycles
11359   clock cycles
11373   clock cycles

RET mostly (100)

6434    clock cycles
6491    clock cycles
6435    clock cycles
6449    clock cycles
6435    clock cycles
Title: Re: Trace Stack Test
Post by: clive on August 25, 2010, 08:31:34 PM
Intel(R) Atom(TM) CPU N450   @ 1.66GHz (SSE4)

JMP ECX last (100)

15314   clock cycles
15725   clock cycles
15802   clock cycles
15771   clock cycles
15731   clock cycles

RET all (100)

14953   clock cycles
15690   clock cycles
15674   clock cycles
15718   clock cycles
15716   clock cycles

JMP ECX all (100)

18393   clock cycles
18856   clock cycles
18878   clock cycles
18867   clock cycles
18871   clock cycles

JMP ECX mostly (100)

23632   clock cycles
24144   clock cycles
24233   clock cycles
24265   clock cycles
24293   clock cycles

RET mostly (100)

19615   clock cycles
20225   clock cycles
20139   clock cycles
20238   clock cycles
20105   clock cycles
Title: Re: Trace Stack Test
Post by: dedndave on August 25, 2010, 08:38:20 PM
interesting

prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)

JMP ECX last (100)

16694   clock cycles
16913   clock cycles
21244   clock cycles
16927   clock cycles
16897   clock cycles

RET all (100)

14278   clock cycles
14270   clock cycles
14287   clock cycles
14288   clock cycles
14397   clock cycles

JMP ECX all (100)

24158   clock cycles
24195   clock cycles
24171   clock cycles
24158   clock cycles
24133   clock cycles

JMP ECX mostly (100)

27883   clock cycles
29341   clock cycles
27852   clock cycles
27850   clock cycles
27845   clock cycles

RET mostly (100)

17595   clock cycles
16701   clock cycles
17094   clock cycles
17199   clock cycles
16947   clock cycles
Title: Re: Trace Stack Test
Post by: Antariy on August 28, 2010, 11:34:30 PM

Prescott, w/out :) HT and many other technologies :)


Intel(R) Celeron(R) CPU 2.13GHz (SSE3)

JMP ECX last (100)

17527   clock cycles
19281   clock cycles
20132   clock cycles
17529   clock cycles
17534   clock cycles

RET all (100)

17244   clock cycles
16141   clock cycles
16479   clock cycles
14937   clock cycles
15011   clock cycles

JMP ECX all (100)

28735   clock cycles
25446   clock cycles
26784   clock cycles
26794   clock cycles
25414   clock cycles

JMP ECX mostly (100)

31973   clock cycles
31013   clock cycles
29242   clock cycles
30805   clock cycles
29498   clock cycles

RET mostly (100)

17452   clock cycles
17841   clock cycles
20516   clock cycles
18214   clock cycles
17607   clock cycles

Press any key to continue ...




Alex
Title: Re: Trace Stack Test
Post by: FORTRANS on August 29, 2010, 02:27:12 PM
Hi,

   As before.  Odd?

Regards,

Steve N.


pre-P4 (SSE1)

JMP ECX last (100)

9211 clock cycles
9197 clock cycles
9194 clock cycles
9208 clock cycles
9197 clock cycles

RET all (100)

8061 clock cycles
8070 clock cycles
8122 clock cycles
8058 clock cycles
8072 clock cycles

JMP ECX all (100)

11313 clock cycles
11314 clock cycles
11303 clock cycles
11306 clock cycles
11297 clock cycles

JMP ECX mostly (100)

14741 clock cycles
14734 clock cycles
14745 clock cycles
14727 clock cycles
14725 clock cycles

RET mostly (100)

9203 clock cycles
9190 clock cycles
9188 clock cycles
9196 clock cycles
9185 clock cycles

Press any key to continue ...

pre-P4
JMP ECX last (100)

9461 clock cycles
9505 clock cycles
9696 clock cycles
9490 clock cycles
9452 clock cycles

RET all (100)

8986 clock cycles
8987 clock cycles
9009 clock cycles
9129 clock cycles
8955 clock cycles

JMP ECX all (100)

9898 clock cycles
9935 clock cycles
9898 clock cycles
9914 clock cycles
9907 clock cycles

JMP ECX mostly (100)

11099 clock cycles
11095 clock cycles
11177 clock cycles
11092 clock cycles
11088 clock cycles

RET mostly (100)

10559 clock cycles
9892 clock cycles
9945 clock cycles
9933 clock cycles
9914 clock cycles

Press any key to continue ...