here is a simple test to compare POP ECX|JMP ECX with RET
i use a recursive routine to load the trace stack, then execute the selected test routine
result on a prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
JMP ECX
175 clock cycles
181 clock cycles
177 clock cycles
175 clock cycles
180 clock cycles
RET
113 clock cycles
114 clock cycles
114 clock cycles
113 clock cycles
116 clock cycles
Intel Quad CPU Q9550 @2.83 GHz (SSE4)
JMP ECX
85 clock cycles
86 clock cycles
86 clock cycles
86 clock cycles
86 clock cycles
RET
41 clock cycles
43 clock cycles
43 clock cycles
43 clock cycles
43 clock cycles
AMD Phenom(tm) II X4 945 Processor (SSE3)
JMP ECX
90 clock cycles
90 clock cycles
90 clock cycles
90 clock cycles
90 clock cycles
RET
60 clock cycles
60 clock cycles
60 clock cycles
60 clock cycles
60 clock cycles
thanks guys
it looks like that works as documented
messing up the trace stack has a penalty
i wonder how it will look on a PIII
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
JMP ECX 76 clock cycles
RET 45 clock cycles
Hi,
PIII as requedted.
Regards,
Steve N.
G:\WORK>tstest
pre-P4 (SSE1)
JMP ECX
86 clock cycles
86 clock cycles
86 clock cycles
86 clock cycles
86 clock cycles
RET
62 clock cycles
62 clock cycles
62 clock cycles
62 clock cycles
62 clock cycles
Press any key to continue ...
P MMX since it ran.
pre-P4
JMP ECX
95 clock cycles
95 clock cycles
95 clock cycles
95 clock cycles
95 clock cycles
RET
90 clock cycles
90 clock cycles
90 clock cycles
90 clock cycles
90 clock cycles
Press any key to continue ...
Hi, Dave!
This is my timings:
Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
JMP ECX
176 clock cycles
176 clock cycles
176 clock cycles
175 clock cycles
174 clock cycles
RET
118 clock cycles
118 clock cycles
117 clock cycles
118 clock cycles
119 clock cycles
Press any key to continue ...
I confidense, what even on 8086 "pop ecx; jmp ecx" CANNOT be faster, because software work CANNOT beat hardware work (CPU implemention of restoring stack and changing of (E)IP). On new CPUs, with prediction mechanic, this is cannot be faster in anyway, in real world (in "clock-test" this might have not very big difference, because macro use cpuid).
Alex
thanks guys
i didn't know you had a pentium mmx, Steve - lol
i have one here, too, although it isn't connected to the internet
i use it for troubleshooting, mostly
it's a 200 MHz, but it runs win98 ok at 225 :P
Quote from: dedndave on August 24, 2010, 08:53:09 PM
thanks guys
i didn't know you had a pentium mmx, Steve - lol
i have one here, too, although it isn't connected to the internet
Dave, novadays even hand calculators have connection to internet :P
Need to modernize your PMMX :P
Alex
well - it's just a tool, really
i can plug any old IDE drive in it and it will recognize it
same for older video and sound cards, as well as floppy drives
i am used to XP, now - i don't want to work on win98, any more - lol
i have a NIC for it - but no wireless
i am too lazy to run the cable - lol
Quote from: dedndave on August 24, 2010, 09:00:01 PM
well - it's just a tool, really
i can plug any old IDE drive in it and it will recognize it
same for older video and sound cards, as well as floppy drives
i am used to XP, now - i don't want to work on win98, any more - lol
i have a NIC for it - but no wireless
i am too lazy to run the cable - lol
Sorry, Dave, this is *joke* only.
I also desire have old hardware for testing, but I cannot find them.
Alex
Hi Dave,
Quote from: dedndave on August 24, 2010, 08:53:09 PM
thanks guys
i didn't know you had a pentium mmx, Steve - lol
I believe I posted some results before. I think it was
for a CPU ID program? Anyway...
Quote
i have one here, too, although it isn't connected to the internet
i use it for troubleshooting, mostly
it's a 200 MHz, but it runs win98 ok at 225 :P
Mine can hook to the internet with a PCMCIA NIC, but it is
floppy SneakerNet for now. 166 MHz laptop with Win 98.
LOL indeed.
Cheers,
Steve
Quote from: FORTRANS on August 24, 2010, 09:09:09 PM
Mine can hook to the internet with a PCMCIA NIC, but it is
floppy SneakerNet for now. 166 MHz laptop with Win 98.
LOL indeed.
This is not HP's white-color notebook with green track-point? I see that once.
Alex
Dave,
Intel(R) Core(TM)2 Quad CPU Q9650 @ 3.00GHz (SSE4)
JMP ECX
86 clock cycles
86 clock cycles
86 clock cycles
86 clock cycles
86 clock cycles
RET
43 clock cycles
43 clock cycles
43 clock cycles
43 clock cycles
43 clock cycles
Press any key to continue ...
Quote from: Antariy on August 24, 2010, 09:36:45 PM
This is not HP's white-color notebook with green track-point? I see that once.
Hi Alex,
No. It is an HP OmniBook 800 with the "paw" style mouse.
The mouse pops out of the side on a strip of plastic. Dark
gray case.
Regards,
Steve
Intel(R) Atom(TM) CPU N450 @ 1.66GHz (SSE4)
JMP ECX
203 clock cycles
199 clock cycles
199 clock cycles
201 clock cycles
199 clock cycles
RET
172 clock cycles
156 clock cycles
155 clock cycles
152 clock cycles
152 clock cycles
any comments or suggestions regarding the validity of the test method/code ?
i was trying to load the trace stack with some (10, in this case) return addresses
then, test whether "skipping" one with pop/jmp causes problems with prediction of the others
that way, the pop/jmp is executed once, while the other returns are executed 10 times
that should minimize the difference in the pop/jmp vs ret code time
Quote from: dedndave on August 25, 2010, 06:53:02 PM
any comments or suggestions regarding the validity of the test method/code ?
Personally, i'd be apt to significantly reduce the loop count, and do a lot more iterations within the loop itself. Another thing to look at is the number of instructions between the POP ECX, and JMP ECX, lingo does the POP early so it becomes less of a dependency.
thanks Clive
i only used 10, becuase i didn't want to bump into the trace cache 16 limit
as for the dependancy
i figured, seeing as the "End" routines are executed only once, a few clock cycles wouldn't hurt
we could ignore a small difference in the result
but, the time differences we are seeing are more than a few clock cycles
i tried a couple other variations, with pretty much the same result on my P4
1) where the reference End routine used push ebp, mov ebp,esp, and mov eax,[ebp+8]
the jmp ecx routine used pop ecx, pop eax to load a parm
2) another version - same jmp ecx routine as above
in the reference routine, i used mov eax,[esp+4]
No I understand the 10 (16 limit), I'm talking about a REPT x; XOR EAX,EAX; CALL test; ENDM construct to reduce the significance of the inner loop.
Non HTT Prescott
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
JMP ECX last (100)
17359 clock cycles
17997 clock cycles
17140 clock cycles
18362 clock cycles
18015 clock cycles
RET all (100)
14813 clock cycles
14467 clock cycles
14919 clock cycles
14521 clock cycles
14584 clock cycles
JMP ECX all (100)
24593 clock cycles
24388 clock cycles
27049 clock cycles
24570 clock cycles
24478 clock cycles
JMP ECX mostly (100)
28926 clock cycles
28382 clock cycles
30173 clock cycles
28154 clock cycles
28588 clock cycles
RET mostly (100)
17025 clock cycles
17326 clock cycles
17862 clock cycles
17137 clock cycles
19619 clock cycles
The "JMP ECX mostly (100)" is an interesting case as the RET is buried in the nesting and looks to seriously foul the timing, unless I screwed up the code.
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
JMP ECX last (100)
6335 clock cycles
6343 clock cycles
6335 clock cycles
6341 clock cycles
6335 clock cycles
RET all (100)
5135 clock cycles
5184 clock cycles
5143 clock cycles
5135 clock cycles
5143 clock cycles
JMP ECX all (100)
11070 clock cycles
11078 clock cycles
11275 clock cycles
11134 clock cycles
11152 clock cycles
JMP ECX mostly (100)
11369 clock cycles
11317 clock cycles
11323 clock cycles
11359 clock cycles
11373 clock cycles
RET mostly (100)
6434 clock cycles
6491 clock cycles
6435 clock cycles
6449 clock cycles
6435 clock cycles
Intel(R) Atom(TM) CPU N450 @ 1.66GHz (SSE4)
JMP ECX last (100)
15314 clock cycles
15725 clock cycles
15802 clock cycles
15771 clock cycles
15731 clock cycles
RET all (100)
14953 clock cycles
15690 clock cycles
15674 clock cycles
15718 clock cycles
15716 clock cycles
JMP ECX all (100)
18393 clock cycles
18856 clock cycles
18878 clock cycles
18867 clock cycles
18871 clock cycles
JMP ECX mostly (100)
23632 clock cycles
24144 clock cycles
24233 clock cycles
24265 clock cycles
24293 clock cycles
RET mostly (100)
19615 clock cycles
20225 clock cycles
20139 clock cycles
20238 clock cycles
20105 clock cycles
interesting
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
JMP ECX last (100)
16694 clock cycles
16913 clock cycles
21244 clock cycles
16927 clock cycles
16897 clock cycles
RET all (100)
14278 clock cycles
14270 clock cycles
14287 clock cycles
14288 clock cycles
14397 clock cycles
JMP ECX all (100)
24158 clock cycles
24195 clock cycles
24171 clock cycles
24158 clock cycles
24133 clock cycles
JMP ECX mostly (100)
27883 clock cycles
29341 clock cycles
27852 clock cycles
27850 clock cycles
27845 clock cycles
RET mostly (100)
17595 clock cycles
16701 clock cycles
17094 clock cycles
17199 clock cycles
16947 clock cycles
Prescott, w/out :) HT and many other technologies :)
Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
JMP ECX last (100)
17527 clock cycles
19281 clock cycles
20132 clock cycles
17529 clock cycles
17534 clock cycles
RET all (100)
17244 clock cycles
16141 clock cycles
16479 clock cycles
14937 clock cycles
15011 clock cycles
JMP ECX all (100)
28735 clock cycles
25446 clock cycles
26784 clock cycles
26794 clock cycles
25414 clock cycles
JMP ECX mostly (100)
31973 clock cycles
31013 clock cycles
29242 clock cycles
30805 clock cycles
29498 clock cycles
RET mostly (100)
17452 clock cycles
17841 clock cycles
20516 clock cycles
18214 clock cycles
17607 clock cycles
Press any key to continue ...
Alex
Hi,
As before. Odd?
Regards,
Steve N.
pre-P4 (SSE1)
JMP ECX last (100)
9211 clock cycles
9197 clock cycles
9194 clock cycles
9208 clock cycles
9197 clock cycles
RET all (100)
8061 clock cycles
8070 clock cycles
8122 clock cycles
8058 clock cycles
8072 clock cycles
JMP ECX all (100)
11313 clock cycles
11314 clock cycles
11303 clock cycles
11306 clock cycles
11297 clock cycles
JMP ECX mostly (100)
14741 clock cycles
14734 clock cycles
14745 clock cycles
14727 clock cycles
14725 clock cycles
RET mostly (100)
9203 clock cycles
9190 clock cycles
9188 clock cycles
9196 clock cycles
9185 clock cycles
Press any key to continue ...
pre-P4
JMP ECX last (100)
9461 clock cycles
9505 clock cycles
9696 clock cycles
9490 clock cycles
9452 clock cycles
RET all (100)
8986 clock cycles
8987 clock cycles
9009 clock cycles
9129 clock cycles
8955 clock cycles
JMP ECX all (100)
9898 clock cycles
9935 clock cycles
9898 clock cycles
9914 clock cycles
9907 clock cycles
JMP ECX mostly (100)
11099 clock cycles
11095 clock cycles
11177 clock cycles
11092 clock cycles
11088 clock cycles
RET mostly (100)
10559 clock cycles
9892 clock cycles
9945 clock cycles
9933 clock cycles
9914 clock cycles
Press any key to continue ...