Which is faster? Mul or 32 times addition? If Mul is slower I will reinvent the mul instruction.
Hi Onan,
If I understand correctly, you want to multiply a number by 32. Is that correct? If this is the case, why not use the shl instruction?
shl eax,5 ; multiply eax by 2^5 = 32
MUL is surprisingly fast ~8 clock cycles
but, i am sure Erol is rigt - SHL is probably ~3 clock cycles or less
No not that Erol, I mean something like this
http://en.wikipedia.org/wiki/Binary_multiplier#Example
1011 (this is 11 in binary)
x 1110 (this is 14 in binary)
======
0000 (this is 1011 x 0)
1011 (this is 1011 x 1, shifted one position to the left)
1011 (this is 1011 x 1, shifted two positions to the left)
+ 1011 (this is 1011 x 1, shifted three positions to the left)
=========
10011010 (this is 154 in binary)
Well I guess that Intel Corp Guys were far sophisticated than me on this stuff.
that method of multiplying was faster than MUL on older processors (like the 8088)
one any modern processor, MUL or IMUL is much faster (and much smaller :P )
ok - little surprise here :eek
i used a multiplier constant of 10
for that constant, only 2 bits are set (1010)
if the constant has more than 2 bits set, MUL or IMUL will be faster :P
prescott w/htt:
Pentium 4 Prescott (2005+), MMX, SSE3
X 10 SHL/ADD: 6 6 6 6 6
X 10 MUL: 7 7 7 7 7
X 10 IMUL: 7 7 7 7 7
If you are multiplying by a constant 10, how fast in comparison would the following be:
lea eax,[eax*4+eax]
add eax,eax
thanks for reminding me, Ray :P
Pentium 4 Prescott (2005+), MMX, SSE3
X 10 SHL/ADD: 6 6 6 6 6
X 10 MUL: 7 7 7 7 7
X 10 IMUL: 7 7 7 7 7
X 10 LEA [EAX*4+EAX]: 7 7 7 7 7
X 10 ADD [EAX*8+EAX]: 7 7 7 7 7
i believe those methods will be faster on most processors newer than the P4's
Core (2006+), MMX, SSE3
X 10 SHL/ADD: 4 4 4 4 4
X 10 MUL: 6 6 6 6 6
X 10 IMUL: 6 6 6 6 6
X 10 LEA [EAX*4+EAX]: 3 3 3 3 3
X 10 ADD [EAX*8+EAX]: 3 3 3 3 3
P3 (2000+), MMX, SSE1
X 10 SHL/ADD: 6 6 6 6 6
X 10 MUL: 6 6 6 6 6
X 10 IMUL: 7 7 7 7 7
X 10 LEA [EAX*4+EAX]: 6 6 6 6 6
X 10 ADD [EAX*8+EAX]: 6 6 6 6 6
I bet that intel Guy did not use one IC for the mul, but multiple. On MCS51, Inc 24 clock Add 12 Clock and MUL and DIV 48 Clock, obviously, they done something special except for the INC instruction.