The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: jj2007 on March 29, 2011, 07:41:55 AM

Title: MovSx slower than MovZx
Post by: jj2007 on March 29, 2011, 07:41:55 AM
One is inclined to think that "twins" like movsx and movzx behave similarly...

Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
141     cycles for cwde
95      cycles for movzx
120     cycles for movsx

Title: Re: MovSx slower than MovZx
Post by: sinsi on March 29, 2011, 08:02:49 AM
Depends on who's watching

AMD Phenom(tm) II X6 1100T Processor (SSE3)
66      cycles for cwde
51      cycles for movzx
62      cycles for movsx

50      cycles for cwde
62      cycles for movzx
51      cycles for movsx

Title: Re: MovSx slower than MovZx
Post by: six_L on March 29, 2011, 08:32:17 AM
QuoteIntel(R) Core(TM) i3 CPU         550  @ 3.20GHz (SSE4)
64   cycles for cwde
45   cycles for movzx
63   cycles for movsx

58   cycles for cwde
63   cycles for movzx
45   cycles for movsx

64   cycles for cwde
44   cycles for movzx
63   cycles for movsx

58   cycles for cwde
63   cycles for movzx
44   cycles for movsx

64   cycles for cwde
44   cycles for movzx
63   cycles for movsx


--- ok ---

Title: Re: MovSx slower than MovZx
Post by: jj2007 on March 29, 2011, 08:47:04 AM
Thanks, Sinsi & six_L. My P4 consistently favours the Z, will see tonight how the Celeron behaves.
Not that it mattered: You rarely have a choice between the two instructions ;-)
Title: Re: MovSx slower than MovZx
Post by: MichaelW on March 29, 2011, 08:55:13 AM
P3:

pre-P4 (SSE1)
70      cycles for cwde
70      cycles for movzx
70      cycles for movsx

70      cycles for cwde
70      cycles for movzx
70      cycles for movsx

70      cycles for cwde
70      cycles for movzx
70      cycles for movsx

71      cycles for cwde
70      cycles for movzx
70      cycles for movsx

70      cycles for cwde
70      cycles for movzx
70      cycles for movsx
Title: Re: MovSx slower than MovZx
Post by: FORTRANS on March 29, 2011, 12:31:48 PM
Hi,

   P-III and two laptops.

Regards,

Steve N.


++ P-III
pre-P4 (SSE1)
70      cycles for cwde
71      cycles for movzx
71      cycles for movsx

72      cycles for cwde
71      cycles for movzx
71      cycles for movsx

71      cycles for cwde
71      cycles for movzx
71      cycles for movsx

71      cycles for cwde
71      cycles for movzx
71      cycles for movsx

71      cycles for cwde
71      cycles for movzx
71      cycles for movsx


--- ok ---

++ P-MMX
pre-P4136   cycles for cwde
115   cycles for movzx
114   cycles for movsx

135   cycles for cwde
114   cycles for movzx
113   cycles for movsx

136   cycles for cwde
122   cycles for movzx
114   cycles for movsx

136   cycles for cwde
113   cycles for movzx
113   cycles for movsx

134   cycles for cwde
114   cycles for movzx
113   cycles for movsx


--- ok ---
++ P-4?
Intel(R) Pentium(R) M processor 1.70GHz (SSE2)
60      cycles for cwde
59      cycles for movzx
62      cycles for movsx

61      cycles for cwde
61      cycles for movzx
60      cycles for movsx

60      cycles for cwde
66      cycles for movzx
64      cycles for movsx

61      cycles for cwde
60      cycles for movzx
61      cycles for movsx

60      cycles for cwde
61      cycles for movzx
60      cycles for movsx


--- ok ---
Title: Re: MovSx slower than MovZx
Post by: lingo on March 29, 2011, 02:43:47 PM
Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz (SSE4)
43      cycles for cwde
26      cycles for movzx
57      cycles for movsx

31      cycles for cwde
43      cycles for movzx
27      cycles for movsx

44      cycles for cwde
26      cycles for movzx
56      cycles for movsx

45      cycles for cwde
59      cycles for movzx
27      cycles for movsx

43      cycles for cwde
26      cycles for movzx
56      cycles for movsx


--- ok ---

Intel(R) Core(TM)2 Duo CPU  E8500  @ 3.16GHz (SSE4)
57      cycles for cwde
37      cycles for movzx
57      cycles for movsx

37      cycles for cwde
57      cycles for movzx
37      cycles for movsx

87      cycles for cwde
58      cycles for movzx
86      cycles for movsx

37      cycles for cwde
57      cycles for movzx
37      cycles for movsx

57      cycles for cwde
59      cycles for movzx
57      cycles for movsx


--- ok ---
Title: Re: MovSx slower than MovZx
Post by: oex on March 29, 2011, 04:03:47 PM
AMD Sempron(tm) Processor 3100+ (SSE3)
73      cycles for cwde
68      cycles for movzx
68      cycles for movsx

73      cycles for cwde
68      cycles for movzx
68      cycles for movsx

73      cycles for cwde
68      cycles for movzx
68      cycles for movsx

75      cycles for cwde
68      cycles for movzx
68      cycles for movsx

74      cycles for cwde
68      cycles for movzx
69      cycles for movsx
Title: Re: MovSx slower than MovZx
Post by: six_L on March 29, 2011, 05:03:27 PM
QuoteIntel(R) Xeon(R) CPU           E7520  @ 1.87GHz (SSE4)
104   cycles for cwde
78   cycles for movzx
104   cycles for movsx

102   cycles for cwde
102   cycles for movzx
78   cycles for movsx

105   cycles for cwde
78   cycles for movzx
104   cycles for movsx

102   cycles for cwde
103   cycles for movzx
78   cycles for movsx

105   cycles for cwde
78   cycles for movzx
105   cycles for movsx


--- ok ---

Title: Re: MovSx slower than MovZx
Post by: jj2007 on March 29, 2011, 05:43:54 PM
Thanks to everybody. What is really odd is that several CPUs show an alternating pattern. In contrast, my Celeron favours cwde and yields very stable timings:
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
59      cycles for cwde
62      cycles for movzx
62      cycles for movsx

59      cycles for cwde
62      cycles for movzx
62      cycles for movsx

59      cycles for cwde
62      cycles for movzx
62      cycles for movsx
Title: Re: MovSx slower than MovZx
Post by: dedndave on March 29, 2011, 05:47:45 PM
i get unrepeatable results - but i always do - lol

to be fair, the CWDE test uses a MOV with a size-override
so, i added a CWDE test with a dword MOV

for comparison with MOVZX, i also added a test with AND EAX,0FFFFh   :P

see reply #13 for attachment
Title: Re: MovSx slower than MovZx
Post by: jj2007 on March 29, 2011, 06:58:26 PM
Dave,
All around 60 cycles, no winner. The and eax, 0FFFFh is sometimes slower but that could be outliers.
Title: Re: MovSx slower than MovZx
Post by: ragdog on March 29, 2011, 08:33:40 PM
hehe is to cry :bdg


AMD Turion(tm) 64 X2 Mobile Technology TL-52 (SSE3)
213     cycles for cwde
-69     cycles for movzx
-69     cycles for movsx

74      cycles for cwde
-69     cycles for movzx
345     cycles for movsx

75      cycles for cwde
69      cycles for movzx
69      cycles for movsx

75      cycles for cwde
69      cycles for movzx
75      cycles for movsx

-55     cycles for cwde
69      cycles for movzx
69      cycles for movsx
Title: Re: MovSx slower than MovZx
Post by: dedndave on March 30, 2011, 08:55:01 AM
i got better results by restricting execution to a single core...

prescott w/htt:
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
144     cycles for cwde (mov word)
126     cycles for cwde (mov dword)
113     cycles for and 0FFFFh
90      cycles for movzx
111     cycles for movsx

141     cycles for cwde (mov word)
147     cycles for cwde (mov dword)
127     cycles for and 0FFFFh
95      cycles for movzx
111     cycles for movsx

200     cycles for cwde (mov word)
125     cycles for cwde (mov dword)
116     cycles for and 0FFFFh
107     cycles for movzx
129     cycles for movsx

140     cycles for cwde (mov word)
126     cycles for cwde (mov dword)
111     cycles for and 0FFFFh
95      cycles for movzx
111     cycles for movsx

152     cycles for cwde (mov word)
146     cycles for cwde (mov dword)
125     cycles for and 0FFFFh
97      cycles for movzx
111     cycles for movsx


see attached...
Title: Re: MovSx slower than MovZx
Post by: MichaelW on March 30, 2011, 09:35:32 AM
Try adding about a 3 second delay at the start of the code to allow time for the system activities involved in launching an app to finish.
Title: Re: MovSx slower than MovZx
Post by: clive on March 30, 2011, 10:15:52 AM
Intel(R) Atom(TM) CPU N270   @ 1.60GHz (SSE4)
148     cycles for cwde (mov word)
218     cycles for cwde (mov dword)
299     cycles for and 0FFFFh
257     cycles for movzx
126     cycles for movsx

147     cycles for cwde (mov word)
147     cycles for cwde (mov dword)
148     cycles for and 0FFFFh
152     cycles for movzx
154     cycles for movsx

210     cycles for cwde (mov word)
177     cycles for cwde (mov dword)
178     cycles for and 0FFFFh
152     cycles for movzx
191     cycles for movsx

230     cycles for cwde (mov word)
222     cycles for cwde (mov dword)
221     cycles for and 0FFFFh
193     cycles for movzx
191     cycles for movsx

225     cycles for cwde (mov word)
222     cycles for cwde (mov dword)
255     cycles for and 0FFFFh
193     cycles for movzx
197     cycles for movsx
Title: Re: MovSx slower than MovZx
Post by: dedndave on March 30, 2011, 05:04:58 PM
by increasing LOOP_COUNT from 1,000,000 to 10,000,000, i get considerably more repeatable results
this value makes each test ~0.5 seconds
the culprit seems to be the CPUID instructions used to serialize
i guess we already knew that   :P