The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: jj2007 on May 14, 2011, 07:43:54 AM

Title: mov [reg32], 0 or and [reg32], 0?
Post by: jj2007 on May 14, 2011, 07:43:54 AM
and [reg32], 0 is 3 bytes shorter than mov [reg32], 0 but theory suggests it should be slower, since and has to both get the value from mem and then write it back...
Evidence says it doesn't matter, but I am curious how that behaves on other CPUs.

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
130     cycles for 100*mov
131     cycles for 100*and

130     cycles for 100*mov
131     cycles for 100*and
Title: Re: mov [reg32], 0 or and [reg32], 0?
Post by: ERNST on May 14, 2011, 08:08:24 AM
QuoteIntel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz (SSE4)
112     cycles for 100*mov
132     cycles for 100*and

116     cycles for 100*mov
133     cycles for 100*and


--- ok ---
Title: Re: mov [reg32], 0 or and [reg32], 0?
Post by: Neil on May 14, 2011, 08:25:38 AM
Intel(R) Core(TM)2 Quad   CPU    Q9550  @ 2.83GHz  (SSE4)
124      cycles for 100*mov
141      cycles for 100*and

123      cycles for 100*mov
120      cycles for 100*and
Title: Re: mov [reg32], 0 or and [reg32], 0?
Post by: MichaelW on May 14, 2011, 08:42:51 AM
P3:

pre-P4 (SSE1)
130     cycles for 100*mov
177     cycles for 100*and

131     cycles for 100*mov
177     cycles for 100*and

Title: Re: mov [reg32], 0 or and [reg32], 0?
Post by: hutch-- on May 14, 2011, 08:49:53 AM

Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)
102     cycles for 100*mov
120     cycles for 100*and

101     cycles for 100*mov
121     cycles for 100*and


--- ok ---


I would be inclined to try a more complex test as MOV may have some advantage in tight looping that AND does not. Try a timed framework that has enough other instructions in it then try both out. mke sure you don't use other instructions around it that stall or you will get unreliable readings as both MOV and AND will fill a hole left by a stall.
Title: Re: mov [reg32], 0 or and [reg32], 0?
Post by: jj2007 on May 14, 2011, 09:59:37 AM
Thanks to everybody. Here is version that checks the difference between a "real" loop and REPEAT 100. In any case, we are talking here about the rather hypothetical difference between 2.1 and 2.2 cycles, so in all real life apps it won't matter. Except that and occupies less space in the instruction cache....

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
214     cycles for 100*mov, loop
224     cycles for 100*and, loop
130     cycles for 100*mov, REP
131     cycles for 100*and, REP
Title: Re: mov [reg32], 0 or and [reg32], 0?
Post by: hutch-- on May 14, 2011, 10:21:09 AM

Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)
128     cycles for 100*mov, loop
209     cycles for 100*and, loop
100     cycles for 100*mov, REP
120     cycles for 100*and, REP

118     cycles for 100*mov, loop
209     cycles for 100*and, loop
95      cycles for 100*mov, REP
120     cycles for 100*and, REP


--- ok ---
Title: Re: mov [reg32], 0 or and [reg32], 0?
Post by: sinsi on May 14, 2011, 11:50:28 AM
Am I the lone AMD?

AMD Phenom(tm) II X6 1100T Processor (SSE3)
278     cycles for 100*mov, loop
186     cycles for 100*and, loop
79      cycles for 100*mov, REP
129     cycles for 100*and, REP

184     cycles for 100*mov, loop
274     cycles for 100*and, loop
80      cycles for 100*mov, REP
130     cycles for 100*and, REP

Title: Re: mov [reg32], 0 or and [reg32], 0?
Post by: dedndave on May 14, 2011, 11:56:11 AM
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
292     cycles for 100*mov, loop
337     cycles for 100*and, loop
188     cycles for 100*mov, REP
207     cycles for 100*and, REP

294     cycles for 100*mov, loop
334     cycles for 100*and, loop
186     cycles for 100*mov, REP
213     cycles for 100*and, REP
Title: Re: mov [reg32], 0 or and [reg32], 0?
Post by: FORTRANS on May 14, 2011, 01:40:40 PM
   Strange.

pre-P41153   cycles for 100*mov, loop
413   cycles for 100*and, loop
1024   cycles for 100*mov, REP
407   cycles for 100*and, REP

1020   cycles for 100*mov, loop
416   cycles for 100*and, loop
1018   cycles for 100*mov, REP
409   cycles for 100*and, REP


--- ok ---

pre-P4 (SSE1)
308   cycles for 100*mov, loop
309   cycles for 100*and, loop
131   cycles for 100*mov, REP
178   cycles for 100*and, REP

308   cycles for 100*mov, loop
308   cycles for 100*and, loop
131   cycles for 100*mov, REP
178   cycles for 100*and, REP


--- ok ---
Title: Re: mov [reg32], 0 or and [reg32], 0?
Post by: hutch-- on May 14, 2011, 02:20:46 PM
JJ,

I had a quick play with 2 loops, one with AND, the other with MOV and as soon as you start adding identical  instructions to both loops the timing becomes close enough to identical. It is probably because both AND and MOV are preferred instructions that pair through pipelines so I would imagine they have very similar times in most contexts.
Title: Re: mov [reg32], 0 or and [reg32], 0?
Post by: jj2007 on May 14, 2011, 05:53:28 PM
Yes, they are almost identical. Which is surprising as initially stated, since in theory and [reg32] implies two actions, a read plus a write, while mov requires only one write. Since inc behaves very differently, see below, the reason for the "fast" and might be some special circuitry.

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
215     cycles for 100*mov, loop
225     cycles for 100*and, loop
130     cycles for 100*mov, REP
132     cycles for 100*and, REP
642     cycles for 100*inc, loop
601     cycles for 100*inc, REP
Title: Re: mov [reg32], 0 or and [reg32], 0?
Post by: mineiro on May 14, 2011, 08:08:42 PM
Intel(R) Pentium(R) Dual  CPU  E2160  @ 1.80GHz (SSE4)
129     cycles for 100*mov, loop
209     cycles for 100*and, loop
101     cycles for 100*mov, REP
121     cycles for 100*and, REP
608     cycles for 100*inc, loop
595     cycles for 100*inc, REP

118     cycles for 100*mov, loop
209     cycles for 100*and, loop
97      cycles for 100*mov, REP
122     cycles for 100*and, REP
607     cycles for 100*inc, loop
595     cycles for 100*inc, REP
--- ok ---
Title: Re: mov [reg32], 0 or and [reg32], 0?
Post by: hutch-- on May 15, 2011, 12:39:25 AM
For a long time Intel have recommended using ADD over INC and its probably the case of putting a redundant instruction back to a lower priority in terms of die layout. The x86 instruction set that we see is in fact an interface to whatever lies below which varies from one processor core to another but it appears that that they use a statistical derived instruction priority stacking that puts the most commonly used instructions a lot closer to the silicon and the less used ones back into the microcode. The very late Intel hardware is a lot faster with SSE2/3/4 than the earlier PIVs and it seems that technology advances in die size are mainly being used for the SSE instruction sets with only a subset of the integer instructions being in the fast lane.
Title: Re: mov [reg32], 0 or and [reg32], 0?
Post by: jj2007 on May 15, 2011, 12:50:18 AM
Quote from: hutch-- on May 15, 2011, 12:39:25 AM
For a long time Intel have recommended using ADD over INC

Add & inc behave identically on my Celeron:
640     cycles for 100*inc, loop
598     cycles for 100*inc, REP
640     cycles for 100*add, loop
598     cycles for 100*add, REP


But that's not the point. For an and mem you need to know what is in mem, so you need to read it, and it, write it. That is not the case for a mov mem, immediate - no read necessary. That is why and mem should be slower than mov mem, but evidence shows it isn't slower.
Title: Re: mov [reg32], 0 or and [reg32], 0?
Post by: hutch-- on May 15, 2011, 03:20:12 AM
You will probably find that at the bit level a copy bit occurs at about the same speed as a modify bit so if the source is an immediate in both, they should take a similar amount of time if they have similar circuitry in hardware.
Title: Re: mov [reg32], 0 or and [reg32], 0?
Post by: azdps on May 15, 2011, 03:54:18 AM
Intel(R) Atom(TM) CPU N475   @ 1.83GHz (SSE4)
430     cycles for 100*mov, loop
424     cycles for 100*and, loop
412     cycles for 100*mov, REP
407     cycles for 100*and, REP

424     cycles for 100*mov, loop
424     cycles for 100*and, loop
410     cycles for 100*mov, REP
411     cycles for 100*and, REP

Title: Re: mov [reg32], 0 or and [reg32], 0?
Post by: Rockoon on May 22, 2011, 10:51:22 PM
AMD Phenom(tm) II X6 1055T Processor (SSE3)
308     cycles for 100*mov, loop
208     cycles for 100*and, loop
88      cycles for 100*mov, REP
146     cycles for 100*and, REP
696     cycles for 100*inc, loop
695     cycles for 100*inc, REP

207     cycles for 100*mov, loop
307     cycles for 100*and, loop
88      cycles for 100*mov, REP
146     cycles for 100*and, REP
695     cycles for 100*inc, loop
695     cycles for 100*inc, REP