This is very uncontentious stuff but I thought someone may see a faster way to do it.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
umul proc num:DWORD, mult:DWORD
xor edx, edx ; clear EDX
mov eax, [esp+4] ; load number into EAX
mov ecx, [esp+8] ; load multiplier into ECX
mul ecx ; perform unsigned multiply
ret 8
umul endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Huch,
Just delete the xor edx,edx, it will be overwritten. This is not necessary for a multiply which yields the result in edx,eax. DIV needs to have edx cleared because the divide is edx,eax by the divisor yielding the quotient in eax, the remainder in edx (or overflow if the quotient exceeds 32 bits).
Dave.
I may be having a senior moment, but what's wrong with
mul dword ptr [esp+8]
instead of using ecx? Is that slower?
I thought it had to be an immediate or a reg, let me check.
Dave.
Jimg,
Wrong (me), right (you). It can be either reg, or mem, or immediate. Thus only occupying 2 regs, eax and edx. I'm not sure about the timing consequences of reg vs mem for the mul itself, but when you have to load the value into a reg to begin with, it just takes one more instruction, so your way should be faster.
Dave.
And it seems to me that a macro that just inserted the instructions would be smaller and faster than invoking a proc, so Hutch must have had some ulterior motives for this whole thread.
Save all of the push of the arguments and the eip and then the ret with the stack adjustment (all take cycles).
Running on a P3 and using an unsigned multiply I cannot find any coding that is significantly faster.
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
include \masm32\include\masm32rt.inc
.686
include \masm32\macros\timers.asm
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
.data
.code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 4
umul proc num:DWORD, mult:DWORD
xor edx, edx ; clear EDX
mov eax, [esp+4] ; load number into EAX
mov ecx, [esp+8] ; load multiplier into ECX
mul ecx ; perform unsigned multiply
ret 8
umul endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 4
umul1 proc num:DWORD, mult:DWORD
mov eax, [esp+4]
mov ecx, [esp+8]
mul ecx
ret 8
umul1 endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 4
umul2 proc num:DWORD, mult:DWORD
mov eax, [esp+4]
mul DWORD PTR [esp+8]
ret 8
umul2 endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 4
umul3 proc num:DWORD, mult:DWORD
mov eax, [esp+4]
imul eax, [esp+8]
ret 8
umul3 endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
invoke Sleep, 3000
counter_begin 1000, HIGH_PRIORITY_CLASS
invoke umul, 123, 456
invoke umul, 123, 456
invoke umul, 123, 456
invoke umul, 123, 456
counter_end
print ustr$(eax),13,10
counter_begin 1000, HIGH_PRIORITY_CLASS
invoke umul1, 123, 456
invoke umul1, 123, 456
invoke umul1, 123, 456
invoke umul1, 123, 456
counter_end
print ustr$(eax),13,10
counter_begin 1000, HIGH_PRIORITY_CLASS
invoke umul2, 123, 456
invoke umul2, 123, 456
invoke umul2, 123, 456
invoke umul2, 123, 456
counter_end
print ustr$(eax),13,10
counter_begin 1000, HIGH_PRIORITY_CLASS
invoke umul3, 123, 456
invoke umul3, 123, 456
invoke umul3, 123, 456
invoke umul3, 123, 456
counter_end
print ustr$(eax),13,10
inkey "Press any key to exit..."
exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start
30
29
30
24
These are the timings on my old PIV.
66
58
58
38
Press any key to exit...
Inlining the code is obviously faster which is what I have normally done for years but I wanted a callable procedure so I could put it into the masm32 library so others could use it as they were learning. The XOR EDX, EDX was to ensure the EDX register was not loaded with a random value so you could reliably check for numbers larger than 4 gig. I tend to do the 1 to 10 range with combinations of shifts and LEA.
Quote from: MichaelW on November 04, 2008, 08:07:50 PM
Running on a P3 and using an unsigned multiply I cannot find any coding that is significantly faster.
Celeron M:
24
24
20
20
Hutch,
I still don't see where the XOR is needed. You are only loading EAX and ECX with a DWORD so you would never see if one of them was > 32 bits, in fact, the values are pushed on the stack where they are accessed as DWORDS. If someone pushed a 64 but number on the stack, then the loads would get strange results and the ret 8 would not return to the correct point. Once the mul is done, EDX will always be overwritten with the upper half of the quotient, which can be checked upon return from the function for a value > 32 bits, it makes no difference what was in EDX at the time of the call.
Dave.
Is there something wrong with my code?? I always get zero cycles for the macro version... :dazzled:
24
24
20
20
0
123*456=56088
umul4 MACRO accu:REQ, mult:REQ
ifdif <accu>, <eax>
mov eax, accu
endif
imul eax, mult
ENDM
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
include \masm32\include\masm32rt.inc
.686
include \masm32\macros\timers.asm
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
.data
.code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 4
umul proc num:DWORD, mult:DWORD
xor edx, edx ; clear EDX
mov eax, [esp+4] ; load number into EAX
mov ecx, [esp+8] ; load multiplier into ECX
mul ecx ; perform unsigned multiply
ret 8
umul endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 4
umul1 proc num:DWORD, mult:DWORD
mov eax, [esp+4]
mov ecx, [esp+8]
mul ecx
ret 8
umul1 endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 4
umul2 proc num:DWORD, mult:DWORD
mov eax, [esp+4]
mul DWORD PTR [esp+8]
ret 8
umul2 endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 4
umul3 proc num:DWORD, mult:DWORD
mov eax, [esp+4]
imul eax, [esp+8]
ret 8
umul3 endp
umul4 MACRO accu:REQ, mult:REQ
ifdif <accu>, <eax>
mov eax, accu
endif
imul eax, mult
ENDM
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
invoke Sleep, 1000
Loops = 10000000
counter_begin Loops, HIGH_PRIORITY_CLASS
invoke umul, 123, 456
invoke umul, 123, 456
invoke umul, 123, 456
invoke umul, 123, 456
counter_end
print ustr$(eax),13,10
counter_begin Loops, HIGH_PRIORITY_CLASS
invoke umul1, 123, 456
invoke umul1, 123, 456
invoke umul1, 123, 456
invoke umul1, 123, 456
counter_end
print ustr$(eax),13,10
counter_begin Loops, HIGH_PRIORITY_CLASS
invoke umul2, 123, 456
invoke umul2, 123, 456
invoke umul2, 123, 456
invoke umul2, 123, 456
counter_end
print ustr$(eax),13,10
counter_begin Loops, HIGH_PRIORITY_CLASS
invoke umul3, 123, 456
invoke umul3, 123, 456
invoke umul3, 123, 456
invoke umul3, 123, 456
counter_end
print ustr$(eax),13,10
counter_begin Loops, HIGH_PRIORITY_CLASS
umul4 123, 456
umul4 123, 456
umul4 123, 456
umul4 123, 456
counter_end
print ustr$(eax),13,10,10
print "123*456="
umul4 123, 456
print ustr$(eax),13,10
; inkey "Press any key to exit..."
exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start
JJ,
Please correct me if I'm wrong, but "ifdif..." should evaluate to an assembly time compare of the content of the parameter "accu" and the content of the EAX register. You need a run time check to see if this is true, but this check will involve the same memory access that the MOV would, and would have an extra jump to skip the MOV if EAX just happened to contain the value to begin with. What have you gained?
Dave.
Quote from: KeepingRealBusy on November 04, 2008, 11:58:25 PM
JJ,
Please correct me if I'm wrong, but "ifdif..." should evaluate to an assembly time compare of the content of the parameter "accu" and the content of the EAX register. You need a run time check to see if this is true, but this check will involve the same memory access that the MOV would, and would have an extra jump to skip the MOV if EAX just happened to contain the value to begin with. What have you gained?
Dave.
Dave,
The ifdifi checks at assembly time if the argument is eax, and thus avoids an unnecessary
mov eax, eax.
smul4 MACRO accu:REQ, mult:REQ
ifdif <accu>, <eax>
mov eax, accu
endif
imul eax, mult
ENDM
...
mov eax, 12345678h
nop
smul4 eax, 456h
nop
mov ecx, 12345678h
nop
smul4 ecx, 456h
nop
Disassembly:
Address Hex dump Command Comments
00401039 ³. B8 78563412 mov eax, 12345678
0040103E ³. 90 nop
0040103F ³? 69C0 56040000 imul eax, eax, 456 <--- one instruction
00401045 ³. 90 nop
00401046 ³? B9 78563412 mov ecx, 12345678
0040104B ³? 90 nop
0040104C ³. 8BC1 mov eax, ecx <--- additional mov
0040104E ³? 69C0 56040000 imul eax, eax, 456
Analogous, you can check twice for eax and edx when using the unsigned multiply (I use edx as the second register because it's trashed anyway):
umul4 MACRO accu:REQ, mult:REQ
ifdif <accu>, <eax>
mov eax, accu
endif
ifdif <mult>, <edx>
mov edx, mult
endif
mul edx
ENDM
I might go for this version:
xmul MACRO accu:REQ, mult:REQ
ifdif <accu>, <eax>
mov eax, accu
endif
imul eax, mult
EXITM <eax>
ENDM
.data
V1u dd 456
V2s SDWORD 456
Result dd 0
.code
start:
mov Result, xmul(V1u, V2s)
push xmul(V1u, V2s)
pop eax
I had a look at the postigs, Dave was right, no need to XOR edx. Here is a test piece with 4 macros, 2 for mnemonic format and 2 as functions.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include\masm32rt.inc
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
comment * -----------------------------------------------------
Build this template with
"CONSOLE ASSEMBLE AND LINK"
----------------------------------------------------- *
; -------------
; mnemonic form
; -------------
umul MACRO num, mult
mov eax, num
mov ecx, mult
mul ecx
ENDM
smul MACRO num, mult
mov eax, num
mov ecx, mult
imul ecx
ENDM
; -------------
; function form
; -------------
fnumul MACRO num, mult
mov eax, num
mov ecx, mult
mul ecx
EXITM <eax>
ENDM
fnsmul MACRO num, mult
mov eax, num
mov ecx, mult
imul ecx
EXITM <eax>
ENDM
.code
start:
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
call main
inkey
exit
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
main proc
numb equ <2000000000>
mulb equ <2>
smul numb, mulb
print sdword$(eax),13,10
umul numb, mulb
print udword$(eax),13,10
print sdword$(fnsmul(numb,mulb)),13,10
print udword$(fnumul(numb,mulb)),13,10
ret
main endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end start
Odd or Even? :bdg
Quote from: hutch-- on November 05, 2008, 12:53:34 AM
fnumul MACRO num, mult
mov eax, num
mov ecx, mult
mul ecx
EXITM <eax>
ENDM
fnsmul MACRO num, mult
mov eax, num
mov ecx, mult
imul ecx
EXITM <eax>
ENDM
You might check for redundant mov eax, eax, and avoid using ecx:
fnsmul MACRO accu:REQ, mult:REQ
ifdifi <accu>, <eax>
mov eax, accu
endif
imul eax, mult
EXITM <eax>
ENDM
fnumul MACRO accu:REQ, mult:REQ
ifdifi <accu>, <eax>
mov eax, accu
endif
ifdifi <mult>, <edx>
mov edx, mult
endif
mul edx
EXITM <eax>
ENDM
I would definitely avoid using an instruction in a macro such as:
imul eax, mult
regardless if it is used for a signed or unsigned multiplication.
If the user knows what he's doing, he probably doesn't need a macro anyway. For a "newby", if the result should exceed 32 bits, it may throw an exception and possibly crash the program.
jj2007's latest proposed macro for a signed multiplication should thus be:
fnsmul MACRO accu:REQ, mult:REQ
ifdifi <accu>, <eax>
mov eax, accu
endif
ifdifi <mult>, <edx>
mov edx, mult
endif
imul edx
EXITM <eax>
ENDM
The user of these multiplication macros should then be advised that the multiplication result, whether signed or unsigned, will always be returned as a 64-bit value in the EDX:EAX pair (the imul eax,mult instruction would only return a 32-bit result in the EAX register while the mul edx instruction would return 64 bits).
Spot the difference or the instruction count using the macro I posted and JJs "improvement". By my count 4 instructions is 4 instructions.
This,
nop
mov edx, fnumuljj(numb, mulb)
nop
mov edx, fnumul(numb,mulb)
nop
mov edx, fnumuljj(esi, mulb)
nop
mov edx, fnumul(esi,mulb)
nop
mov edx, fnumuljj(esi, mem)
nop
mov edx, fnumul(esi, mem)
nop
mov edx, fnumuljj(esi, edi)
nop
mov edx, fnumul(esi, edi)
nop
mov edx, fnumuljj(mem, edi)
nop
mov edx, fnumul(mem, edi)
nop
Produces,
0040103E 90 nop
0040103F B800943577 mov eax,77359400h
00401044 BA02000000 mov edx,2
00401049 F7E2 mul edx
0040104B 8BD0 mov edx,eax
0040104D 90 nop
0040104E B800943577 mov eax,77359400h
00401053 B902000000 mov ecx,2
00401058 F7E1 mul ecx
0040105A 8BD0 mov edx,eax
0040105C 90 nop
0040105D 8BC6 mov eax,esi
0040105F BA02000000 mov edx,2
00401064 F7E2 mul edx
00401066 8BD0 mov edx,eax
00401068 90 nop
00401069 8BC6 mov eax,esi
0040106B B902000000 mov ecx,2
00401070 F7E1 mul ecx
00401072 8BD0 mov edx,eax
00401074 90 nop
00401075 8BC6 mov eax,esi
00401077 8B55FC mov edx,[ebp-4]
0040107A F7E2 mul edx
0040107C 8BD0 mov edx,eax
0040107E 90 nop
0040107F 8BC6 mov eax,esi
00401081 8B4DFC mov ecx,[ebp-4]
00401084 F7E1 mul ecx
00401086 8BD0 mov edx,eax
00401088 90 nop
00401089 8BC6 mov eax,esi
0040108B 8BD7 mov edx,edi
0040108D F7E2 mul edx
0040108F 8BD0 mov edx,eax
00401091 90 nop
00401092 8BC6 mov eax,esi
00401094 8BCF mov ecx,edi
00401096 F7E1 mul ecx
00401098 8BD0 mov edx,eax
0040109A 90 nop
0040109B 8B45FC mov eax,[ebp-4]
0040109E 8BD7 mov edx,edi
004010A0 F7E2 mul edx
004010A2 8BD0 mov edx,eax
004010A4 90 nop
004010A5 8B45FC mov eax,[ebp-4]
004010A8 8BCF mov ecx,edi
004010AA F7E1 mul ecx
004010AC 8BD0 mov edx,eax
004010AE 90 nop
Quote from: hutch-- on November 05, 2008, 04:26:43 AM
Spot the difference or the instruction count using the macro I posted and JJs "improvement". By my count 4 instructions is 4 instructions.
Odd or Even? I prefer Odd :bg
JJ macro:
ecx contained 12345, now its value is 12345
Result=500000
Codesize=
5Hutch macro:
ecx contained 12345, now its value is 500
Result=500000
Codesize=
8include \masm32\include\masm32rt.inc
; **** CONSOLE assembly ****
fnsmul MACRO accu:REQ, mult:REQ
ifdifi <accu>, <eax>
mov eax, accu
endif
imul eax, mult
EXITM <eax>
ENDM
fnumul MACRO accu:REQ, mult:REQ
ifdifi <accu>, <eax>
mov eax, accu
endif
ifdifi <mult>, <edx>
mov edx, mult
endif
mul edx
EXITM <eax>
ENDM
fnumulHutch MACRO num, mult
mov eax, num
mov ecx, mult
mul ecx
EXITM <eax>
ENDM
fnsmulHutch MACRO num, mult
mov eax, num
mov ecx, mult
imul ecx
EXITM <eax>
ENDM
.code
start:
print chr$(13, 10, "ecx contained 12345, now its value is ")
mov ecx, 12345
mov eax, 1000
mov edx, 500
mov esi, esi ; marker for Olly - jj macro starts
inJJ:
mov edi, fnsmul(eax, edx)
outJJ:
mov esi, esi ; marker for Olly - jj macro ends
print str$(ecx)
print chr$(13, 10, "Result=")
print str$(edi)
print chr$(13, 10, "Codesize=")
print str$(offset outJJ-inJJ), 13, 10
print chr$(13, 10, "ecx contained 12345, now its value is ")
mov ecx, 12345
mov eax, 1000
mov edx, 500
mov ah, ah ; marker for Olly - Hutch macro starts
inHutch:
mov edi, fnsmulHutch(eax, edx)
outHutch:
mov ah, ah ; marker for Olly - Hutch macro starts
print str$(ecx)
print chr$(13, 10, "Result=")
print str$(edi)
print chr$(13, 10, "Codesize=")
print str$(offset outHutch-inHutch), 13, 10, 10
inkey "Hit any key to get outta here"
exit
end start
Quote from: raymond on November 05, 2008, 03:15:26 AM
For a "newby", if the result should exceed 32 bits, it may throw an exception and possibly crash the program.
Raymond, what you write is technically absolutely correct. However, if any of my code produces an integer that exceeds 32 bit, I would love to see it crash - at least, it would force me to insert an
int 3 and launch Olly to see what's wrong with the code. I guess newbies would also appreciate that kind of behaviour.
But of course, there are people who work with 64 bit integers. As you rightly say, these people probably don't need a macro anyway.
EDIT: By the way, under which conditions does
imul eax, mult crash? I have tested it with a global variable and two high 32 bits values, but it won't do me the favour to crash... :eek
JJ,
I cheat, I look in the second column of the disasembly I posted as that tells you the byte count of each instruction. Add up ech digit in hex and Bingo, you have the byte size.
Ray,
What do you see as the problem, overflow is handled in EDX in the normal manner. With a macro of this type, this would need to be in the documentation but then this is already the case with the macros that MASM32 uses. Here is the example of an overflow result and it just shows this result in EDX.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include\masm32rt.inc
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
comment * -----------------------------------------------------
Build this template with
"CONSOLE ASSEMBLE AND LINK"
----------------------------------------------------- *
.code
start:
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
call main
inkey
exit
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
main proc
LOCAL var1 :DWORD
LOCAL var2 :DWORD
mov var1, 4000000000
mov var2, 2
mov eax, var1
mov ecx, var2
mul ecx
push edx
print ustr$(eax),13,10
pop edx
print ustr$(edx),13,10
ret
main endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end start
Quote from: hutch-- on November 05, 2008, 09:32:15 AM
JJ,
I cheat, I look in the second column of the disasembly I posted as that tells you the byte count of each instruction. Add up ech digit in hex and Bingo, you have the byte size.
Hey, is cheating allowed by the forum rules?? ::)
And what about recklessly destroying the contents of ecx without any need??
mov edi, fnumul(eax, edx)
0040101A |. 8BF6 mov esi, esi <-marker start
0040101C |. F7E2 mul edx
0040101E |. 8BF8 mov edi, eax
00401020 |. 8BF6 mov esi, esi <-marker end
mov edi, fnumulHutch(eax, edx)
0040109A |. 8AE4 mov ah, ah <-marker start
0040109C |. 8BC0 mov eax, eax
0040109E |. 8BCA mov ecx, edx
004010A0 |. F7E1 mul ecx
004010A2 |. 8BF8 mov edi, eax
004010A4 |. 8AE4 mov ah, ah <-marker end
:bg
JJ,
> And what about recklessly destroying the contents of ecx without any need??
Who cares, the register convention allows ECX to be overwritten. :bg
QuoteWhat do you see as the problem, overflow is handled in EDX in the normal manner.
You did not read what I wrote. I will repeat it once more.
QuoteI would definitely avoid using an instruction in a macro such as:
imul eax, mult
If the result of such an instruction exceeds 32 bits, it
DOES NOT overflow into the EDX register. Your "test" was based on the regular
mul instruction with a single parameter which obviously returns the result in the EDX:EAX registers.
jj
QuoteBy the way, under which conditions does imul eax, mult crash?
I had not verified if an overflow would throw an exception, that's why I mentioned "it may throw ...". I have now verified it and it does NOT throw any exception nor does it crash the program. An overflow simply leaves in EAX the same result as if the regular mul instruction would have, any overflow being discarded. With overflow, the result in EAX would thus be erroneous.
Ray,
Tolerate me here for the moment, I did not undersand your complaint.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include\masm32rt.inc
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
comment * -----------------------------------------------------
Build this template with
"CONSOLE ASSEMBLE AND LINK"
----------------------------------------------------- *
.code
start:
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
call main
inkey
exit
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
main proc
LOCAL var1 :DWORD
LOCAL var2 :DWORD
; ------------
; within range
; ------------
mov var1, 1000000000
mov var2, 2
mov eax, var1
mov ecx, var2
imul ecx
push edx
print sstr$(eax),13,10
pop edx
print sstr$(edx),13,10
; --------
; overflow
; --------
mov var1, 4000000000
mov var2, 2
mov eax, var1
mov ecx, var2
imul ecx
push edx
print sstr$(eax),13,10
pop edx
print sstr$(edx),13,10
ret
main endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end start
2000000000
0
-589934592
-1
Press any key to continue ...
On both forms I tested EDX and its result shows if the result exceeds the size of 32 bit.
Hutch and Raymond,
If you multiply two 32 bit numbers, each containing 040000000h (a 1 bit followed by 31 zero bits), the result will only be 63 bits. Do the math in binary, looks just like doing decimal. You will end up with a 1 bit followed by 2*31 zeros. Only if BOTH numbers start with "11" will the result end up with 64 bits.
Dave.
Hutch,
When you use mul (for unsigned multiplication) or imul (for signed multiplication) with a single parameter, the result is returned as 64 bits in the EDX:EAX pair. One of the multiplicands is expected to be in the EAX register and the other is the single parameter which can be a register or a memory operand. An immediate operand is not allowed as the single parameter.
When you use imul with more than one parameter, only the lower 32 bits of the result is returned strictly in the 32-bit destination, any overflow gets discarded. None of the multiplicands needs to be in the EAX register. They can be almost anywhere according to the parameters (Ex.: imul ebx,esi). An immediate operand is also allowed as one of the multiplicands in this format.
That is why I'm opposed to the use of an instruction such as
imul eax, mult (notice the two parameters)
in a macro, or whatever, to be used by "newbies" who may not know the difference.
Raymond,
I didn't know that. My book with Intel specs (Kip Irving) doesn't mention that at all. And I don't think that my AMD spec does either. I will have to check into this more.
Is this an Intel or AMD or both quirk? The hardware must be involved, it cannot be just an assembler allowance. I mean, not affecting the xDx reg. There must be some special preface byte generated.
This simple multiply is getting wierd!
Dave.
QuoteIs this an Intel or AMD or both quirk?
I don't know about AMD but I would assume with confidence that its processor would not be any different than Intel's. Directly from the Intel manual itself regarding the IMUL instruction:
QuoteWith the two- and three- operand forms, however, the result is truncated to the length of the destination before it is stored in the destination register.
Raymond,
You are absolutely correct. Here it is straight from the AMD spec. I didn't realize this, but I have only really dealt with unsigned multiplies up to this point. My book from Kip Irving describing Intel operation (which came with MASM 615) gave the same formats (less the encodings), but did not even mention the two and three operand storage results.
Instruction Reference
24594 Rev. 3.12 September 2006 AMD64 Technology
IMUL Instruction Reference
Multiplies two signed operands. The number of operands determines the form of the instruction.
If a single operand is specified, the instruction multiplies the value in the specified general-purpose
register or memory location by the value in the AL, AX, EAX, or RAX register (depending on the
operand size) and stores the product in AX, DX:AX, EDX:EAX, or RDX:RAX, respectively.
If two operands are specified, the instruction multiplies the value in a general-purpose register (first
operand) by an immediate value or the value in a general-purpose register or memory location (second
operand) and stores the product in the first operand location.
If three operands are specified, the instruction multiplies the value in a general-purpose register or
memory location (second operand), by an immediate value (third operand) and stores the product in a
register (first operand).
The IMUL instruction sign-extends an immediate operand to the length of the other register/memory
operand.
The CF and OF flags are set if, due to integer overflow, the double-width multiplication result cannot
be represented in the half-width destination register. Otherwise the CF and OF flags are cleared.
IMUL Signed Multiply
Mnemonic Opcode Description
IMUL reg/mem8 F6 /5 Multiply the contents of AL by the contents of an 8-bit
memory or register operand and put the signed result in AX.
IMUL reg/mem16 F7 /5 Multiply the contents of AX by the contents of a 16-bit
memory or register operand and put the signed result in DX:AX.
IMUL reg/mem32 F7 /5 Multiply the contents of EAX by the contents of a 32-bit
memory or register operand and put the signed result in EDX:EAX.
IMUL reg/mem64 F7 /5 Multiply the contents of RAX by the contents of a 64-bit
memory or register operand and put the signed result in RDX:RAX.
IMUL reg16, reg/mem16 0F AF /r Multiply the contents of a 16-bit destination register by
the contents of a 16-bit register or memory operand and
put the signed result in the 16-bit destination register.
IMUL reg32, reg/mem32 0F AF /r Multiply the contents of a 32-bit destination register by
the contents of a 32-bit register or memory operand and
put the signed result in the 32-bit destination register.
IMUL reg64, reg/mem64 0F AF /r Multiply the contents of a 64-bit destination register by
the contents of a 64-bit register or memory operand and
put the signed result in the 64-bit destination register.
IMUL reg16, reg/mem16, imm8 6B /r ib Multiply the contents of a 16-bit register or memory
operand by a sign-extended immediate byte and put the
signed result in the 16-bit destination register.
IMUL reg32, reg/mem32, imm8 6B /r ib Multiply the contents of a 32-bit register or memory
operand by a sign-extended immediate byte and put the
signed result in the 32-bit destination register.
IMUL reg64, reg/mem64, imm8 6B /r ib Multiply the contents of a 64-bit register or memory
operand by a sign-extended immediate byte and put the
signed result in the 64-bit destination register.
IMUL reg16, reg/mem16, imm16 69 /r iw Multiply the contents of a 16-bit register or memory
operand by a sign-extended immediate word and put the
signed result in the 16-bit destination register.
IMUL reg32, reg/mem32, imm32 69 /r id Multiply the contents of a 32-bit register or memory
operand by a sign-extended immediate double and put
the signed result in the 32-bit destination register.
IMUL reg64, reg/mem64, imm32 69 /r id Multiply the contents of a 64-bit register or memory
operand by a sign-extended immediate double and put
the signed result in the 64-bit destination register.
IMUL reg32, reg/mem32, imm8 6B /r ib Multiply the contents of a 32-bit register or memory
operand by a sign-extended immediate byte and put the
signed result in the 32-bit destination register.
IMUL reg64, reg/mem64, imm8 6B /r ib Multiply the contents of a 64-bit register or memory
operand by a sign-extended immediate byte and put the
signed result in the 64-bit destination register.
IMUL reg16, reg/mem16, imm16 69 /r iw Multiply the contents of a 16-bit register or memory
operand by a sign-extended immediate word and put the
signed result in the 16-bit destination register.
IMUL reg32, reg/mem32, imm32 69 /r id Multiply the contents of a 32-bit register or memory
operand by a sign-extended immediate double and put
the signed result in the 32-bit destination register.
IMUL reg64, reg/mem64, imm32 69 /r id Multiply the contents of a 64-bit register or memory
operand by a sign-extended immediate double and put
the signed result in the 64-bit destination register.
Dave.
So all that means we need a better way of handling this. Proposal:
smul MACRO accu:REQ, mult:REQ
ifdifi <accu>, <eax>
mov eax, accu
endif
if opattr (mult) eq 36 ; immediate
mov edx, mult
imul edx
else
imul mult
endif
EXITM <eax>
ENDM
.data
mem4 dd 10000h
.code
start:
print chr$(13, 10, "edx=")
mov edi, smul(12345678h, 10000h) ; immediate * immediate
print hex$(edx)
print ", eax="
print hex$(edi)
print chr$(13, 10, "edx=")
mov edi, smul(12345678h, mem4) ; immediate * mem32
print hex$(edx)
print ", eax="
print hex$(edi)
print chr$(13, 10, "edx=")
mov edx, 10000h
mov edi, smul(12345678h, edx) ; immediate * reg32
print hex$(edx)
print ", eax="
print hex$(edi)
print chr$(13, 10, "edx=")
mov edx, 10000h
mov eax, 12345678h
mov edi, smul(eax, edx) ; reg32 * reg32
print hex$(edx)
print ", eax="
print hex$(edi)
print chr$(13, 10, "edx=")
mov edx, 10000h
mov mem4, 12345678h
mov edi, smul(mem4, edx) ; mem32 * reg32
print hex$(edx)
print ", eax="
print hex$(edi)
Output:
edx=00001234, eax=56780000
edx=00001234, eax=56780000
edx=00001234, eax=56780000
edx=00001234, eax=56780000
edx=00001234, eax=56780000
I'm using a redundant EDX a lot in this calculation section .... I haven't finalised it but it it's close to the same idea of the topic
;--- EXTENDED GREGORIAN DATE SECTION ---
Mov Eax, myMonth ;Load Month (M)
Mov Ebx, myYear ;load Year (Y)
Cmp Eax, c003 ;Is Month - Jan or Feb
Jge @F ;Nope - skip next instruction
Dec Ebx ;Decrement year value
@@: ;(Y + (M-9)/7)
Mov Eax, Ebx ;Move year value
Mov Ebx, c100 ;Load denominator (= 100)
Cdq ;Sign extend EDX:EAX for DIV
Div Ebx ;((Y + (M-9)/7) / 100)
Add Eax, c001 ;((Y + (M-9)/7) / 100) + 1
Mov Ebx, c003 ;Load 3
Mul Ebx ;3*(((Y + (M-9)/7) / 100) + 1)
Mov Ebx, c004 ;Load denominator (= 4)
Cdq ;Sign extend EDX:EAX for DIV
Div Ebx ;(3*(((Y + (M-9)/7) / 100) + 1)) / 4
Mov myJulianDay, Eax ;Saved
Now looking at it, i can improve via swopping EAX, EBX parameter storage... just as well I read this topic... :lol