News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

MovSx slower than MovZx

Started by jj2007, March 29, 2011, 07:41:55 AM

Previous topic - Next topic

jj2007

One is inclined to think that "twins" like movsx and movzx behave similarly...

Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
141     cycles for cwde
95      cycles for movzx
120     cycles for movsx


sinsi

Depends on who's watching

AMD Phenom(tm) II X6 1100T Processor (SSE3)
66      cycles for cwde
51      cycles for movzx
62      cycles for movsx

50      cycles for cwde
62      cycles for movzx
51      cycles for movsx

Light travels faster than sound, that's why some people seem bright until you hear them.

six_L

QuoteIntel(R) Core(TM) i3 CPU         550  @ 3.20GHz (SSE4)
64   cycles for cwde
45   cycles for movzx
63   cycles for movsx

58   cycles for cwde
63   cycles for movzx
45   cycles for movsx

64   cycles for cwde
44   cycles for movzx
63   cycles for movsx

58   cycles for cwde
63   cycles for movzx
44   cycles for movsx

64   cycles for cwde
44   cycles for movzx
63   cycles for movsx


--- ok ---

regards

jj2007

Thanks, Sinsi & six_L. My P4 consistently favours the Z, will see tonight how the Celeron behaves.
Not that it mattered: You rarely have a choice between the two instructions ;-)

MichaelW

P3:

pre-P4 (SSE1)
70      cycles for cwde
70      cycles for movzx
70      cycles for movsx

70      cycles for cwde
70      cycles for movzx
70      cycles for movsx

70      cycles for cwde
70      cycles for movzx
70      cycles for movsx

71      cycles for cwde
70      cycles for movzx
70      cycles for movsx

70      cycles for cwde
70      cycles for movzx
70      cycles for movsx
eschew obfuscation

FORTRANS

Hi,

   P-III and two laptops.

Regards,

Steve N.


++ P-III
pre-P4 (SSE1)
70      cycles for cwde
71      cycles for movzx
71      cycles for movsx

72      cycles for cwde
71      cycles for movzx
71      cycles for movsx

71      cycles for cwde
71      cycles for movzx
71      cycles for movsx

71      cycles for cwde
71      cycles for movzx
71      cycles for movsx

71      cycles for cwde
71      cycles for movzx
71      cycles for movsx


--- ok ---

++ P-MMX
pre-P4136   cycles for cwde
115   cycles for movzx
114   cycles for movsx

135   cycles for cwde
114   cycles for movzx
113   cycles for movsx

136   cycles for cwde
122   cycles for movzx
114   cycles for movsx

136   cycles for cwde
113   cycles for movzx
113   cycles for movsx

134   cycles for cwde
114   cycles for movzx
113   cycles for movsx


--- ok ---
++ P-4?
Intel(R) Pentium(R) M processor 1.70GHz (SSE2)
60      cycles for cwde
59      cycles for movzx
62      cycles for movsx

61      cycles for cwde
61      cycles for movzx
60      cycles for movsx

60      cycles for cwde
66      cycles for movzx
64      cycles for movsx

61      cycles for cwde
60      cycles for movzx
61      cycles for movsx

60      cycles for cwde
61      cycles for movzx
60      cycles for movsx


--- ok ---

lingo

Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz (SSE4)
43      cycles for cwde
26      cycles for movzx
57      cycles for movsx

31      cycles for cwde
43      cycles for movzx
27      cycles for movsx

44      cycles for cwde
26      cycles for movzx
56      cycles for movsx

45      cycles for cwde
59      cycles for movzx
27      cycles for movsx

43      cycles for cwde
26      cycles for movzx
56      cycles for movsx


--- ok ---

Intel(R) Core(TM)2 Duo CPU  E8500  @ 3.16GHz (SSE4)
57      cycles for cwde
37      cycles for movzx
57      cycles for movsx

37      cycles for cwde
57      cycles for movzx
37      cycles for movsx

87      cycles for cwde
58      cycles for movzx
86      cycles for movsx

37      cycles for cwde
57      cycles for movzx
37      cycles for movsx

57      cycles for cwde
59      cycles for movzx
57      cycles for movsx


--- ok ---

oex

AMD Sempron(tm) Processor 3100+ (SSE3)
73      cycles for cwde
68      cycles for movzx
68      cycles for movsx

73      cycles for cwde
68      cycles for movzx
68      cycles for movsx

73      cycles for cwde
68      cycles for movzx
68      cycles for movsx

75      cycles for cwde
68      cycles for movzx
68      cycles for movsx

74      cycles for cwde
68      cycles for movzx
69      cycles for movsx
We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv

six_L

QuoteIntel(R) Xeon(R) CPU           E7520  @ 1.87GHz (SSE4)
104   cycles for cwde
78   cycles for movzx
104   cycles for movsx

102   cycles for cwde
102   cycles for movzx
78   cycles for movsx

105   cycles for cwde
78   cycles for movzx
104   cycles for movsx

102   cycles for cwde
103   cycles for movzx
78   cycles for movsx

105   cycles for cwde
78   cycles for movzx
105   cycles for movsx


--- ok ---

regards

jj2007

Thanks to everybody. What is really odd is that several CPUs show an alternating pattern. In contrast, my Celeron favours cwde and yields very stable timings:
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
59      cycles for cwde
62      cycles for movzx
62      cycles for movsx

59      cycles for cwde
62      cycles for movzx
62      cycles for movsx

59      cycles for cwde
62      cycles for movzx
62      cycles for movsx

dedndave

#10
i get unrepeatable results - but i always do - lol

to be fair, the CWDE test uses a MOV with a size-override
so, i added a CWDE test with a dword MOV

for comparison with MOVZX, i also added a test with AND EAX,0FFFFh   :P

see reply #13 for attachment

jj2007

Dave,
All around 60 cycles, no winner. The and eax, 0FFFFh is sometimes slower but that could be outliers.

ragdog

hehe is to cry :bdg


AMD Turion(tm) 64 X2 Mobile Technology TL-52 (SSE3)
213     cycles for cwde
-69     cycles for movzx
-69     cycles for movsx

74      cycles for cwde
-69     cycles for movzx
345     cycles for movsx

75      cycles for cwde
69      cycles for movzx
69      cycles for movsx

75      cycles for cwde
69      cycles for movzx
75      cycles for movsx

-55     cycles for cwde
69      cycles for movzx
69      cycles for movsx

dedndave

i got better results by restricting execution to a single core...

prescott w/htt:
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
144     cycles for cwde (mov word)
126     cycles for cwde (mov dword)
113     cycles for and 0FFFFh
90      cycles for movzx
111     cycles for movsx

141     cycles for cwde (mov word)
147     cycles for cwde (mov dword)
127     cycles for and 0FFFFh
95      cycles for movzx
111     cycles for movsx

200     cycles for cwde (mov word)
125     cycles for cwde (mov dword)
116     cycles for and 0FFFFh
107     cycles for movzx
129     cycles for movsx

140     cycles for cwde (mov word)
126     cycles for cwde (mov dword)
111     cycles for and 0FFFFh
95      cycles for movzx
111     cycles for movsx

152     cycles for cwde (mov word)
146     cycles for cwde (mov dword)
125     cycles for and 0FFFFh
97      cycles for movzx
111     cycles for movsx


see attached...

MichaelW

Try adding about a 3 second delay at the start of the code to allow time for the system activities involved in launching an app to finish.
eschew obfuscation