The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: hutch-- on November 04, 2008, 01:09:12 PM

Title: Simple multiply algorithm.
Post by: hutch-- on November 04, 2008, 01:09:12 PM
This is very uncontentious stuff but I thought someone may see a faster way to do it.


; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

umul proc num:DWORD, mult:DWORD

    xor edx, edx        ; clear EDX
    mov eax, [esp+4]    ; load number into EAX
    mov ecx, [esp+8]    ; load multiplier into ECX
    mul ecx             ; perform unsigned multiply

    ret 8

umul endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Title: Re: Simple multiply algorithm.
Post by: KeepingRealBusy on November 04, 2008, 01:51:50 PM
Huch,

Just delete the xor edx,edx, it will be overwritten. This is not necessary for a multiply which yields the result in edx,eax. DIV needs to have edx cleared because the divide is edx,eax by the divisor yielding the quotient in eax, the remainder in edx (or overflow if the quotient exceeds 32 bits).

Dave.
Title: Re: Simple multiply algorithm.
Post by: Jimg on November 04, 2008, 02:43:33 PM
I may be having a senior moment, but what's wrong with
mul dword ptr [esp+8]
instead of using ecx?  Is that slower?
Title: Re: Simple multiply algorithm.
Post by: KeepingRealBusy on November 04, 2008, 03:21:50 PM
I thought it had to be an immediate or a reg, let me check.

Dave.
Title: Re: Simple multiply algorithm.
Post by: KeepingRealBusy on November 04, 2008, 03:31:14 PM
Jimg,

Wrong (me), right (you). It can be either reg, or mem, or immediate. Thus only occupying 2 regs, eax and edx. I'm not sure about the timing consequences of reg vs mem for the mul itself, but when you have to load the value into a reg to begin with, it just takes one more instruction, so your way should be faster.

Dave.
Title: Re: Simple multiply algorithm.
Post by: Jimg on November 04, 2008, 03:55:26 PM
And it seems to me that a macro that just inserted the instructions would be smaller and faster than invoking a proc, so Hutch must have had some ulterior motives for this whole thread.
Title: Re: Simple multiply algorithm.
Post by: KeepingRealBusy on November 04, 2008, 06:09:03 PM
Save all of the push of the arguments and  the eip and then the ret with the stack adjustment (all take cycles).
Title: Re: Simple multiply algorithm.
Post by: MichaelW on November 04, 2008, 08:07:50 PM
Running on a P3 and using an unsigned multiply I cannot find any coding that is significantly faster.

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc
    .686
    include \masm32\macros\timers.asm
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
    .code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

align 4

umul proc num:DWORD, mult:DWORD

    xor edx, edx        ; clear EDX
    mov eax, [esp+4]    ; load number into EAX
    mov ecx, [esp+8]    ; load multiplier into ECX
    mul ecx             ; perform unsigned multiply

    ret 8

umul endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

align 4

umul1 proc num:DWORD, mult:DWORD

    mov eax, [esp+4]
    mov ecx, [esp+8]
    mul ecx

    ret 8

umul1 endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

align 4

umul2 proc num:DWORD, mult:DWORD

    mov eax, [esp+4]
    mul DWORD PTR [esp+8]

    ret 8

umul2 endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

align 4

umul3 proc num:DWORD, mult:DWORD

    mov eax, [esp+4]
    imul eax, [esp+8]

    ret 8

umul3 endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    invoke Sleep, 3000

    counter_begin 1000, HIGH_PRIORITY_CLASS
      invoke umul, 123, 456
      invoke umul, 123, 456
      invoke umul, 123, 456
      invoke umul, 123, 456
    counter_end
    print ustr$(eax),13,10

    counter_begin 1000, HIGH_PRIORITY_CLASS
      invoke umul1, 123, 456
      invoke umul1, 123, 456
      invoke umul1, 123, 456
      invoke umul1, 123, 456
    counter_end
    print ustr$(eax),13,10

    counter_begin 1000, HIGH_PRIORITY_CLASS
      invoke umul2, 123, 456
      invoke umul2, 123, 456
      invoke umul2, 123, 456
      invoke umul2, 123, 456
    counter_end
    print ustr$(eax),13,10

    counter_begin 1000, HIGH_PRIORITY_CLASS
      invoke umul3, 123, 456
      invoke umul3, 123, 456
      invoke umul3, 123, 456
      invoke umul3, 123, 456
    counter_end
    print ustr$(eax),13,10

    inkey "Press any key to exit..."
    exit

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start


30
29
30
24


Title: Re: Simple multiply algorithm.
Post by: hutch-- on November 04, 2008, 10:51:42 PM
These are the timings on my old PIV.


66
58
58
38
Press any key to exit...


Inlining the code is obviously faster which is what I have normally done for years but I wanted a callable procedure so I could put it into the masm32 library so others could use it as they were learning. The XOR EDX, EDX was to ensure the EDX register was not loaded with a random  value so you could reliably check for numbers larger than 4 gig. I tend to do the 1 to 10 range with combinations of shifts and LEA.
Title: Re: Simple multiply algorithm.
Post by: jj2007 on November 04, 2008, 10:59:25 PM
Quote from: MichaelW on November 04, 2008, 08:07:50 PM
Running on a P3 and using an unsigned multiply I cannot find any coding that is significantly faster.


Celeron M:
24
24
20
20
Title: Re: Simple multiply algorithm.
Post by: KeepingRealBusy on November 04, 2008, 11:06:47 PM
Hutch,

I still don't see where the XOR is needed. You are only loading EAX and ECX with a DWORD so you would never see if one of them was > 32 bits, in fact, the values are pushed on the stack where they are accessed as DWORDS. If someone pushed a 64 but number on the stack, then the loads would get strange results and the ret 8 would not return to the correct point. Once the mul is done, EDX will always be overwritten with the upper half of the quotient, which can be checked upon return from the function for a value > 32 bits, it makes no difference what was in EDX at the time of the call.

Dave.
Title: Re: Simple multiply algorithm.
Post by: jj2007 on November 04, 2008, 11:31:08 PM
Is there something wrong with my code?? I always get zero cycles for the macro version... :dazzled:

24
24
20
20
0

123*456=56088

umul4 MACRO accu:REQ, mult:REQ
  ifdif <accu>, <eax>
   mov eax, accu
  endif
  imul eax, mult
ENDM

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc
    .686
    include \masm32\macros\timers.asm
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
    .code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

align 4

umul proc num:DWORD, mult:DWORD

    xor edx, edx        ; clear EDX
    mov eax, [esp+4]    ; load number into EAX
    mov ecx, [esp+8]    ; load multiplier into ECX
    mul ecx             ; perform unsigned multiply

    ret 8

umul endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

align 4

umul1 proc num:DWORD, mult:DWORD

    mov eax, [esp+4]
    mov ecx, [esp+8]
    mul ecx

    ret 8

umul1 endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

align 4

umul2 proc num:DWORD, mult:DWORD

    mov eax, [esp+4]
    mul DWORD PTR [esp+8]

    ret 8

umul2 endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

align 4

umul3 proc num:DWORD, mult:DWORD

    mov eax, [esp+4]
    imul eax, [esp+8]

    ret 8

umul3 endp

umul4 MACRO accu:REQ, mult:REQ
  ifdif <accu>, <eax>
mov eax, accu
  endif
  imul eax, mult
ENDM

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    invoke Sleep, 1000
    Loops = 10000000

    counter_begin Loops, HIGH_PRIORITY_CLASS
      invoke umul, 123, 456
      invoke umul, 123, 456
      invoke umul, 123, 456
      invoke umul, 123, 456
    counter_end
    print ustr$(eax),13,10

    counter_begin Loops, HIGH_PRIORITY_CLASS
      invoke umul1, 123, 456
      invoke umul1, 123, 456
      invoke umul1, 123, 456
      invoke umul1, 123, 456
    counter_end
    print ustr$(eax),13,10

    counter_begin Loops, HIGH_PRIORITY_CLASS
      invoke umul2, 123, 456
      invoke umul2, 123, 456
      invoke umul2, 123, 456
      invoke umul2, 123, 456
    counter_end
    print ustr$(eax),13,10

    counter_begin Loops, HIGH_PRIORITY_CLASS
      invoke umul3, 123, 456
      invoke umul3, 123, 456
      invoke umul3, 123, 456
      invoke umul3, 123, 456
    counter_end
    print ustr$(eax),13,10

    counter_begin Loops, HIGH_PRIORITY_CLASS
      umul4 123, 456
      umul4 123, 456
      umul4 123, 456
      umul4 123, 456
    counter_end
    print ustr$(eax),13,10,10
    print "123*456="
    umul4 123, 456
    print ustr$(eax),13,10

    ; inkey "Press any key to exit..."
    exit

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start
Title: Re: Simple multiply algorithm.
Post by: KeepingRealBusy on November 04, 2008, 11:58:25 PM
JJ,

Please correct me if I'm wrong, but "ifdif..." should evaluate to an assembly time compare of the content of the parameter "accu" and the content of the EAX register. You need a run time check to see if this is true, but this check will involve the same memory access that the MOV would, and would have an extra jump to skip the MOV if EAX just happened to contain the value to begin with. What have you gained?

Dave.
Title: Re: Simple multiply algorithm.
Post by: jj2007 on November 05, 2008, 12:20:22 AM
Quote from: KeepingRealBusy on November 04, 2008, 11:58:25 PM
JJ,

Please correct me if I'm wrong, but "ifdif..." should evaluate to an assembly time compare of the content of the parameter "accu" and the content of the EAX register. You need a run time check to see if this is true, but this check will involve the same memory access that the MOV would, and would have an extra jump to skip the MOV if EAX just happened to contain the value to begin with. What have you gained?

Dave.

Dave,
The ifdifi checks at assembly time if the argument is eax, and thus avoids an unnecessary mov eax, eax.

smul4 MACRO accu:REQ, mult:REQ
  ifdif <accu>, <eax>
mov eax, accu
  endif
  imul eax, mult
ENDM

...
   mov eax, 12345678h
   nop
   smul4 eax, 456h
   nop

   mov ecx, 12345678h
   nop
   smul4 ecx, 456h
   nop


Disassembly:

Address    Hex dump                       Command                                     Comments
00401039   ³. B8 78563412                 mov eax, 12345678
0040103E   ³. 90                          nop
0040103F   ³? 69C0 56040000               imul eax, eax, 456   <--- one instruction
00401045   ³. 90                          nop
00401046   ³? B9 78563412                 mov ecx, 12345678
0040104B   ³? 90                          nop
0040104C   ³. 8BC1                        mov eax, ecx            <--- additional mov
0040104E   ³? 69C0 56040000               imul eax, eax, 456
Title: Re: Simple multiply algorithm.
Post by: jj2007 on November 05, 2008, 12:31:24 AM
Analogous, you can check twice for eax and edx when using the unsigned multiply (I use edx as the second register because it's trashed anyway):

umul4 MACRO accu:REQ, mult:REQ
  ifdif <accu>, <eax>
mov eax, accu
  endif
  ifdif <mult>, <edx>
mov edx, mult
  endif
  mul edx
ENDM


I might go for this version:

xmul MACRO accu:REQ, mult:REQ
  ifdif <accu>, <eax>
mov eax, accu
  endif
  imul eax, mult
  EXITM <eax>
ENDM

.data

V1u dd 456
V2s SDWORD 456
Result dd 0

.code
start:

  mov Result, xmul(V1u, V2s)
  push xmul(V1u, V2s)
  pop eax
Title: Re: Simple multiply algorithm.
Post by: hutch-- on November 05, 2008, 12:53:34 AM
I had a look at the postigs, Dave was right, no need to XOR edx. Here is a test piece with 4 macros, 2 for mnemonic format and 2 as functions.


; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
    include \masm32\include\masm32rt.inc
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

comment * -----------------------------------------------------
                        Build this  template with
                       "CONSOLE ASSEMBLE AND LINK"
        ----------------------------------------------------- *

  ; -------------
  ; mnemonic form
  ; -------------
    umul MACRO num, mult
      mov eax, num
      mov ecx, mult
      mul ecx
    ENDM

    smul MACRO num, mult
      mov eax, num
      mov ecx, mult
      imul ecx
    ENDM

  ; -------------
  ; function form
  ; -------------
    fnumul MACRO num, mult
      mov eax, num
      mov ecx, mult
      mul ecx
      EXITM <eax>
    ENDM

    fnsmul MACRO num, mult
      mov eax, num
      mov ecx, mult
      imul ecx
      EXITM <eax>
    ENDM

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    numb equ <2000000000>
    mulb equ <2>

    smul numb, mulb
    print sdword$(eax),13,10

    umul numb, mulb
    print udword$(eax),13,10

    print sdword$(fnsmul(numb,mulb)),13,10
    print udword$(fnumul(numb,mulb)),13,10

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start
Title: Re: Simple multiply algorithm.
Post by: drizz on November 05, 2008, 01:06:07 AM
Odd or Even?  :bdg
Title: Re: Simple multiply algorithm.
Post by: jj2007 on November 05, 2008, 01:15:05 AM
Quote from: hutch-- on November 05, 2008, 12:53:34 AM
    fnumul MACRO num, mult
      mov eax, num
      mov ecx, mult
      mul ecx
      EXITM <eax>
    ENDM

    fnsmul MACRO num, mult
      mov eax, num
      mov ecx, mult
      imul ecx
      EXITM <eax>
    ENDM


You might check for redundant mov eax, eax, and avoid using ecx:

fnsmul MACRO accu:REQ, mult:REQ
  ifdifi <accu>, <eax>
mov eax, accu
  endif
  imul eax, mult
  EXITM <eax>
ENDM

fnumul MACRO accu:REQ, mult:REQ
  ifdifi <accu>, <eax>
mov eax, accu
  endif
  ifdifi <mult>, <edx>
mov edx, mult
  endif
  mul edx
  EXITM <eax>
ENDM
Title: Re: Simple multiply algorithm.
Post by: jj2007 on November 05, 2008, 01:46:44 AM
Quote from: drizz on November 05, 2008, 01:06:07 AM
Odd or Even?  :bdg

Straight!
Title: Re: Simple multiply algorithm.
Post by: raymond on November 05, 2008, 03:15:26 AM
I would definitely avoid using an instruction in a macro such as:

imul eax, mult

regardless if it is used for a signed or unsigned multiplication.

If the user knows what he's doing, he probably doesn't need a macro anyway. For a "newby", if the result should exceed 32 bits, it may throw an exception and possibly crash the program.
jj2007's latest proposed macro for a signed multiplication should thus be:

fnsmul MACRO accu:REQ, mult:REQ
  ifdifi <accu>, <eax>
mov eax, accu
  endif
  ifdifi <mult>, <edx>
mov edx, mult
  endif
  imul edx
  EXITM <eax>
ENDM


The user of these multiplication macros should then be advised that the multiplication result, whether signed or unsigned, will always be returned as a 64-bit value in the EDX:EAX pair (the imul eax,mult instruction would only return a 32-bit result in the EAX register while the mul edx instruction would return 64 bits).
Title: Re: Simple multiply algorithm.
Post by: hutch-- on November 05, 2008, 04:26:43 AM
Spot the difference or the instruction count using the macro I posted and JJs "improvement". By my count 4 instructions is 4 instructions.

This,


    nop
    mov edx, fnumuljj(numb, mulb)
    nop
    mov edx, fnumul(numb,mulb)
    nop
    mov edx, fnumuljj(esi, mulb)
    nop
    mov edx, fnumul(esi,mulb)
    nop
    mov edx, fnumuljj(esi, mem)
    nop
    mov edx, fnumul(esi, mem)
    nop
    mov edx, fnumuljj(esi, edi)
    nop
    mov edx, fnumul(esi, edi)
    nop
    mov edx, fnumuljj(mem, edi)
    nop
    mov edx, fnumul(mem, edi)
    nop


Produces,


0040103E 90                     nop
0040103F B800943577             mov     eax,77359400h
00401044 BA02000000             mov     edx,2
00401049 F7E2                   mul     edx
0040104B 8BD0                   mov     edx,eax
0040104D 90                     nop
0040104E B800943577             mov     eax,77359400h
00401053 B902000000             mov     ecx,2
00401058 F7E1                   mul     ecx
0040105A 8BD0                   mov     edx,eax
0040105C 90                     nop
0040105D 8BC6                   mov     eax,esi
0040105F BA02000000             mov     edx,2
00401064 F7E2                   mul     edx
00401066 8BD0                   mov     edx,eax
00401068 90                     nop
00401069 8BC6                   mov     eax,esi
0040106B B902000000             mov     ecx,2
00401070 F7E1                   mul     ecx
00401072 8BD0                   mov     edx,eax
00401074 90                     nop
00401075 8BC6                   mov     eax,esi
00401077 8B55FC                 mov     edx,[ebp-4]
0040107A F7E2                   mul     edx
0040107C 8BD0                   mov     edx,eax
0040107E 90                     nop
0040107F 8BC6                   mov     eax,esi
00401081 8B4DFC                 mov     ecx,[ebp-4]
00401084 F7E1                   mul     ecx
00401086 8BD0                   mov     edx,eax
00401088 90                     nop
00401089 8BC6                   mov     eax,esi
0040108B 8BD7                   mov     edx,edi
0040108D F7E2                   mul     edx
0040108F 8BD0                   mov     edx,eax
00401091 90                     nop
00401092 8BC6                   mov     eax,esi
00401094 8BCF                   mov     ecx,edi
00401096 F7E1                   mul     ecx
00401098 8BD0                   mov     edx,eax
0040109A 90                     nop
0040109B 8B45FC                 mov     eax,[ebp-4]
0040109E 8BD7                   mov     edx,edi
004010A0 F7E2                   mul     edx
004010A2 8BD0                   mov     edx,eax
004010A4 90                     nop
004010A5 8B45FC                 mov     eax,[ebp-4]
004010A8 8BCF                   mov     ecx,edi
004010AA F7E1                   mul     ecx
004010AC 8BD0                   mov     edx,eax
004010AE 90                     nop
Title: Re: Simple multiply algorithm.
Post by: jj2007 on November 05, 2008, 09:09:13 AM
Quote from: hutch-- on November 05, 2008, 04:26:43 AM
Spot the difference or the instruction count using the macro I posted and JJs "improvement". By my count 4 instructions is 4 instructions.

Odd or Even? I prefer Odd :bg

JJ macro:
ecx contained 12345, now its value is 12345
Result=500000
Codesize=5

Hutch macro:
ecx contained 12345, now its value is 500
Result=500000
Codesize=8

include \masm32\include\masm32rt.inc

; **** CONSOLE assembly ****

fnsmul MACRO accu:REQ, mult:REQ
  ifdifi <accu>, <eax>
mov eax, accu
  endif
  imul eax, mult
  EXITM <eax>
ENDM

fnumul MACRO accu:REQ, mult:REQ
  ifdifi <accu>, <eax>
mov eax, accu
  endif
  ifdifi <mult>, <edx>
mov edx, mult
  endif
  mul edx
  EXITM <eax>
ENDM

fnumulHutch MACRO num, mult
      mov eax, num
      mov ecx, mult
      mul ecx
      EXITM <eax>
ENDM

fnsmulHutch MACRO num, mult
      mov eax, num
      mov ecx, mult
      imul ecx
      EXITM <eax>
ENDM

.code
start:
print chr$(13, 10, "ecx contained 12345, now its value is ")
mov ecx, 12345
mov eax, 1000
mov edx, 500
mov esi, esi ; marker for Olly - jj macro starts
inJJ:
mov edi, fnsmul(eax, edx)
outJJ:
mov esi, esi ; marker for Olly - jj macro ends
print str$(ecx)
print chr$(13, 10, "Result=")
print str$(edi)
print chr$(13, 10, "Codesize=")
print str$(offset outJJ-inJJ), 13, 10

print chr$(13, 10, "ecx contained 12345, now its value is ")
mov ecx, 12345
mov eax, 1000
mov edx, 500
mov ah, ah ; marker for Olly - Hutch macro starts
inHutch:
mov edi, fnsmulHutch(eax, edx)
outHutch:
mov ah, ah ; marker for Olly - Hutch macro starts
print str$(ecx)
print chr$(13, 10, "Result=")
print str$(edi)
print chr$(13, 10, "Codesize=")
print str$(offset outHutch-inHutch), 13, 10, 10

inkey "Hit any key to get outta here"
exit

end start
Title: Re: Simple multiply algorithm.
Post by: jj2007 on November 05, 2008, 09:14:57 AM
Quote from: raymond on November 05, 2008, 03:15:26 AM
For a "newby", if the result should exceed 32 bits, it may throw an exception and possibly crash the program.

Raymond, what you write is technically absolutely correct. However, if any of my code produces an integer that exceeds 32 bit, I would love to see it crash - at least, it would force me to insert an int 3 and launch Olly to see what's wrong with the code. I guess newbies would also appreciate that kind of behaviour.

But of course, there are people who work with 64 bit integers. As you rightly say, these people probably don't need a macro anyway.

EDIT: By the way, under which conditions does imul eax, mult crash? I have tested it with a global variable and two high 32 bits values, but it won't do me the favour to crash... :eek
Title: Re: Simple multiply algorithm.
Post by: hutch-- on November 05, 2008, 09:32:15 AM
JJ,

I cheat, I look in the second column of the disasembly I posted as that tells you the byte count of each instruction. Add up ech digit in hex and Bingo, you have the byte size.

Ray,

What do you see as the problem, overflow is handled in EDX in the normal manner. With a macro of this type, this would need to be in the documentation but then this is already the case with the macros that MASM32 uses. Here is the example of an overflow result and it just shows this result in EDX.


; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
    include \masm32\include\masm32rt.inc
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

comment * -----------------------------------------------------
                        Build this  template with
                       "CONSOLE ASSEMBLE AND LINK"
        ----------------------------------------------------- *

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    LOCAL var1  :DWORD
    LOCAL var2  :DWORD

    mov var1, 4000000000
    mov var2, 2

    mov eax, var1
    mov ecx, var2
    mul ecx

    push edx
    print ustr$(eax),13,10
    pop edx
    print ustr$(edx),13,10

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start
Title: Re: Simple multiply algorithm.
Post by: jj2007 on November 05, 2008, 09:44:56 AM
Quote from: hutch-- on November 05, 2008, 09:32:15 AM
JJ,
I cheat, I look in the second column of the disasembly I posted as that tells you the byte count of each instruction. Add up ech digit in hex and Bingo, you have the byte size.
Hey, is cheating allowed by the forum rules?? ::)
And what about recklessly destroying the contents of ecx without any need??

mov edi, fnumul(eax, edx)
0040101A   |. 8BF6             mov esi, esi <-marker start
0040101C   |. F7E2             mul edx
0040101E   |. 8BF8             mov edi, eax
00401020   |. 8BF6             mov esi, esi <-marker end


mov edi, fnumulHutch(eax, edx)
0040109A   |. 8AE4             mov ah, ah <-marker start
0040109C   |. 8BC0             mov eax, eax
0040109E   |. 8BCA             mov ecx, edx
004010A0   |. F7E1             mul ecx
004010A2   |. 8BF8             mov edi, eax
004010A4   |. 8AE4             mov ah, ah <-marker end

:bg
Title: Re: Simple multiply algorithm.
Post by: hutch-- on November 05, 2008, 12:25:48 PM
JJ,

> And what about recklessly destroying the contents of ecx without any need??

Who cares, the register convention allows ECX to be overwritten.  :bg
Title: Re: Simple multiply algorithm.
Post by: raymond on November 06, 2008, 01:24:37 AM
QuoteWhat do you see as the problem, overflow is handled in EDX in the normal manner.

You did not read what I wrote. I will repeat it once more.
QuoteI would definitely avoid using an instruction in a macro such as:

imul eax, mult

If the result of such an instruction exceeds 32 bits, it DOES NOT overflow into the EDX register. Your "test" was based on the regular mul instruction with a single parameter which obviously returns the result in the EDX:EAX registers.

jj
QuoteBy the way, under which conditions does imul eax, mult crash?

I had not verified if an overflow would throw an exception, that's why I mentioned "it may throw ...". I have now verified it and it does NOT throw any exception nor does it crash the program. An overflow simply leaves in EAX the same result as if the regular mul instruction would have, any overflow being discarded. With overflow, the result in EAX would thus be erroneous.
Title: Re: Simple multiply algorithm.
Post by: hutch-- on November 06, 2008, 02:01:46 AM
Ray,

Tolerate me here for the moment, I did not undersand your complaint.


; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
    include \masm32\include\masm32rt.inc
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

comment * -----------------------------------------------------
                        Build this  template with
                       "CONSOLE ASSEMBLE AND LINK"
        ----------------------------------------------------- *

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    LOCAL var1  :DWORD
    LOCAL var2  :DWORD

  ; ------------
  ; within range
  ; ------------
    mov var1, 1000000000
    mov var2, 2

    mov eax, var1
    mov ecx, var2
    imul ecx

    push edx
    print sstr$(eax),13,10
    pop edx
    print sstr$(edx),13,10

  ; --------
  ; overflow
  ; --------
    mov var1, 4000000000
    mov var2, 2

    mov eax, var1
    mov ecx, var2
    imul ecx

    push edx
    print sstr$(eax),13,10
    pop edx
    print sstr$(edx),13,10

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start



2000000000
0
-589934592
-1
Press any key to continue ...


On both forms I tested EDX and its result shows if the result exceeds the size of 32 bit.
Title: Re: Simple multiply algorithm.
Post by: KeepingRealBusy on November 06, 2008, 02:35:20 AM
Hutch and Raymond,

If you multiply two 32 bit numbers, each containing 040000000h (a 1 bit followed by 31 zero bits), the result will only be 63 bits. Do the math in binary, looks just like doing decimal. You will end up with a 1 bit followed by 2*31 zeros. Only if BOTH numbers start with "11" will the result end up with 64 bits.

Dave.
Title: Re: Simple multiply algorithm.
Post by: raymond on November 06, 2008, 02:49:06 AM
Hutch,

When you use mul (for unsigned multiplication) or imul (for signed multiplication) with a single parameter, the result is returned as 64 bits in the EDX:EAX pair. One of the multiplicands is expected to be in the EAX register and the other is the single parameter which can be a register or a memory operand. An immediate operand is not allowed as the single parameter.

When you use imul with more than one parameter, only the lower 32 bits of the result is returned strictly in the 32-bit destination, any overflow gets discarded. None of the multiplicands needs to be in the EAX register. They can be almost anywhere according to the parameters (Ex.: imul  ebx,esi). An immediate operand is also allowed as one of the multiplicands in this format.

That is why I'm opposed to the use of an instruction such as
imul eax, mult (notice the two parameters)
in a macro, or whatever, to be used by "newbies" who may not know the difference.

Title: Re: Simple multiply algorithm.
Post by: KeepingRealBusy on November 06, 2008, 03:00:07 AM
Raymond,

I didn't know that. My book with Intel specs (Kip Irving) doesn't mention that at all. And I don't think that my AMD spec does either. I will have to check into this more.

Is this an Intel or AMD or both quirk? The hardware must be involved, it cannot be just an assembler allowance. I mean, not affecting the xDx reg. There must be some special preface byte generated.

This simple multiply is getting wierd!

Dave.
Title: Re: Simple multiply algorithm.
Post by: raymond on November 06, 2008, 03:37:42 AM
QuoteIs this an Intel or AMD or both quirk?

I don't know about AMD but I would assume with confidence that its processor would not be any different than Intel's. Directly from the Intel manual itself regarding the IMUL instruction:

QuoteWith the two- and three- operand forms, however, the result is truncated to the length of the destination before it is stored in the destination register.
Title: Re: Simple multiply algorithm.
Post by: KeepingRealBusy on November 06, 2008, 06:11:52 AM
Raymond,

You are absolutely correct. Here it is straight from the AMD spec. I didn't realize this, but I have only really dealt with unsigned multiplies up to this point. My book from Kip Irving describing Intel operation (which came with MASM 615) gave the same formats (less the encodings), but did not even mention the two and three operand storage results.


Instruction Reference

24594   Rev. 3.12   September 2006 AMD64 Technology

IMUL Instruction Reference

Multiplies two signed operands. The number of operands determines the form of the instruction.

If a single operand is specified, the instruction multiplies the value in the specified general-purpose
register or memory location by the value in the AL, AX, EAX, or RAX register (depending on the
operand size) and stores the product in AX, DX:AX, EDX:EAX, or RDX:RAX, respectively.

If two operands are specified, the instruction multiplies the value in a general-purpose register (first
operand) by an immediate value or the value in a general-purpose register or memory location (second
operand) and stores the product in the first operand location.

If three operands are specified, the instruction multiplies the value in a general-purpose register or
memory location (second operand), by an immediate value (third operand) and stores the product in a
register (first operand).

The IMUL instruction sign-extends an immediate operand to the length of the other register/memory
operand.

The CF and OF flags are set if, due to integer overflow, the double-width multiplication result cannot
be represented in the half-width destination register. Otherwise the CF and OF flags are cleared.

IMUL Signed Multiply

Mnemonic             Opcode    Description

IMUL reg/mem8         F6 /5       Multiply the contents of AL by the contents of an 8-bit
                                                               memory or register operand and put the signed result in AX.

IMUL reg/mem16         F7 /5       Multiply the contents of AX by the contents of a 16-bit
                                                               memory or register operand and put the signed result in DX:AX.

IMUL reg/mem32         F7 /5       Multiply the contents of EAX by the contents of a 32-bit
                                                               memory or register operand and put the signed result in EDX:EAX.

IMUL reg/mem64         F7 /5       Multiply the contents of RAX by the contents of a 64-bit
                                                               memory or register operand and put the signed result in RDX:RAX.

IMUL reg16, reg/mem16     0F AF /r     Multiply the contents of a 16-bit destination register by
                                                               the contents of a 16-bit register or memory operand and
                                                               put the signed result in the 16-bit destination register.

IMUL reg32, reg/mem32     0F AF /r    Multiply the contents of a 32-bit destination register by
                                                               the contents of a 32-bit register or memory operand and
                                                               put the signed result in the 32-bit destination register.

IMUL reg64, reg/mem64     0F AF /r    Multiply the contents of a 64-bit destination register by
                                                               the contents of a 64-bit register or memory operand and
                                                               put the signed result in the 64-bit destination register.

IMUL reg16, reg/mem16, imm8 6B /r ib      Multiply the contents of a 16-bit register or memory
                                                               operand by a sign-extended immediate byte and put the
                                                               signed result in the 16-bit destination register.

IMUL reg32, reg/mem32, imm8 6B /r ib      Multiply the contents of a 32-bit register or memory
                                                               operand by a sign-extended immediate byte and put the
                                                               signed result in the 32-bit destination register.

IMUL reg64, reg/mem64, imm8 6B /r ib      Multiply the contents of a 64-bit register or memory
                                                               operand by a sign-extended immediate byte and put the
                                                               signed result in the 64-bit destination register.

IMUL reg16, reg/mem16, imm16   69 /r iw      Multiply the contents of a 16-bit register or memory
                                                              operand by a sign-extended immediate word and put the
                                                              signed result in the 16-bit destination register.

IMUL reg32, reg/mem32, imm32   69 /r id      Multiply the contents of a 32-bit register or memory
                                                              operand by a sign-extended immediate double and put
                                                              the signed result in the 32-bit destination register.

IMUL reg64, reg/mem64, imm32   69 /r id      Multiply the contents of a 64-bit register or memory
                                                              operand by a sign-extended immediate double and put
                                                              the signed result in the 64-bit destination register.

IMUL reg32, reg/mem32, imm8     6B /r ib      Multiply the contents of a 32-bit register or memory
                                                               operand by a sign-extended immediate byte and put the
                                                               signed result in the 32-bit destination register.

IMUL reg64, reg/mem64, imm8     6B /r ib      Multiply the contents of a 64-bit register or memory
                                                               operand by a sign-extended immediate byte and put the
                                                               signed result in the 64-bit destination register.

IMUL reg16, reg/mem16, imm16    69 /r iw      Multiply the contents of a 16-bit register or memory
                                                                operand by a sign-extended immediate word and put the
                                                                signed result in the 16-bit destination register.

IMUL reg32, reg/mem32, imm32    69 /r id       Multiply the contents of a 32-bit register or memory
                                                                operand by a sign-extended immediate double and put
                                                                the signed result in the 32-bit destination register.

IMUL reg64, reg/mem64, imm32    69 /r id       Multiply the contents of a 64-bit register or memory
                                                                operand by a sign-extended immediate double and put
                                                                the signed result in the 64-bit destination register.


Dave.
Title: Re: Simple multiply algorithm.
Post by: jj2007 on November 06, 2008, 06:54:15 AM
So all that means we need a better way of handling this. Proposal:

smul MACRO accu:REQ, mult:REQ
  ifdifi <accu>, <eax>
mov eax, accu
  endif
  if opattr (mult) eq 36   ; immediate
mov edx, mult
imul edx
  else
imul mult
  endif
  EXITM <eax>
ENDM

.data
mem4 dd 10000h

.code
start:
print chr$(13, 10, "edx=")
mov edi, smul(12345678h, 10000h) ; immediate * immediate
print hex$(edx)
print ", eax="
print hex$(edi)

print chr$(13, 10, "edx=")
mov edi, smul(12345678h, mem4) ; immediate * mem32
print hex$(edx)
print ", eax="
print hex$(edi)

print chr$(13, 10, "edx=")
mov edx, 10000h
mov edi, smul(12345678h, edx) ; immediate * reg32
print hex$(edx)
print ", eax="
print hex$(edi)

print chr$(13, 10, "edx=")
mov edx, 10000h
mov eax, 12345678h
mov edi, smul(eax, edx) ; reg32 * reg32
print hex$(edx)
print ", eax="
print hex$(edi)

print chr$(13, 10, "edx=")
mov edx, 10000h
mov mem4, 12345678h
mov edi, smul(mem4, edx) ; mem32 * reg32
print hex$(edx)
print ", eax="
print hex$(edi)

Output:

edx=00001234, eax=56780000
edx=00001234, eax=56780000
edx=00001234, eax=56780000
edx=00001234, eax=56780000
edx=00001234, eax=56780000
Title: Re: Simple multiply algorithm.
Post by: vanjast on December 22, 2008, 09:39:45 PM
I'm using a redundant EDX a lot in this calculation section .... I haven't finalised it but it it's close to the same idea of the topic

;--- EXTENDED GREGORIAN DATE SECTION ---
Mov Eax, myMonth ;Load Month (M)
Mov Ebx, myYear         ;load Year (Y)
Cmp Eax, c003 ;Is Month - Jan or Feb
Jge @F ;Nope - skip next instruction
Dec Ebx ;Decrement year value

@@: ;(Y + (M-9)/7)
Mov Eax, Ebx ;Move year value
Mov Ebx, c100 ;Load denominator (= 100)
Cdq ;Sign extend EDX:EAX for DIV
Div Ebx ;((Y + (M-9)/7) / 100)
Add Eax, c001 ;((Y + (M-9)/7) / 100) + 1
Mov Ebx, c003 ;Load 3
Mul Ebx ;3*(((Y + (M-9)/7) / 100) + 1)
Mov Ebx, c004 ;Load denominator (= 4)
Cdq ;Sign extend EDX:EAX for DIV
Div Ebx ;(3*(((Y + (M-9)/7) / 100) + 1)) / 4
Mov myJulianDay, Eax ;Saved


Now looking at it, i can improve via swopping EAX, EBX  parameter storage... just as well I read this topic... :lol