The MASM Forum Archive 2004 to 2012

General Forums => The Campus => Topic started by: Posit on May 28, 2005, 07:23:13 AM

Title: FPU Question
Post by: Posit on May 28, 2005, 07:23:13 AM
Is there an FPU instruction to multiply two unsigned DWORD integers and store the result in 64-bit format? I'm looking for a faster alternative to the MUL instruction, which stores its result in EDX:EAX.
Title: Re: FPU Question
Post by: AeroASM on May 28, 2005, 07:31:12 AM

.data
in1 dd 2
in2 dd 3
out dq 0
.code
fild in1
fimul in2
fistp out


Because you declared the variables as dd and dq, MASM will generate the right data size encoding.
Title: Re: FPU Question
Post by: Posit on May 28, 2005, 09:09:20 AM
In what format is the QWORD stored? I've tried moving the low DWORD into EAX and the high DWORD into EDX, and vice versa, but the values are different than what MUL yields.

According to the Intel docs, "The FIMUL instructions convert an integer source operand to double extended-precision floating-point format before performing the multiplication." I'm probably missing something here, but I don't want floating-point format, I want integer format.
Title: Re: FPU Question
Post by: MichaelW on May 28, 2005, 11:16:57 AM
Hi Posit,

Also from the Intel documents:
Quote
Internally, the FPU holds all number in a uniform 80-bit extended format. Operands that may be represented in memory as 16-, 32-, or 64-bit integers, 32-, 64-, or 80-bit floating point numbers, or 18-digit packed BCD numbers, are automatically converted into extended format as they are loaded into the FPU registers.

http://www.website.masmforum.com/tutorials/fptute/fpuchap2.htm#real10

In his example Aero forgot to deal with the problem that FIMUL performs a signed multiply.

http://www.website.masmforum.com/tutorials/fptute/fpuchap9.htm

And I doubt that using the FPU will be faster on any processor.

; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .586                       ; create 32 bit code
    .model flat, stdcall       ; 32 bit memory model
    option casemap :none       ; case sensitive

    include \masm32\include\windows.inc
    include \masm32\include\masm32.inc
    include \masm32\include\kernel32.inc

    includelib \masm32\lib\masm32.lib
    includelib \masm32\lib\kernel32.lib

    include \masm32\macros\macros.asm

    include timers.asm
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
        dword1  dd 2
        dword2  dd 4
        dword3  dd 2
        dword4  dd 80000000h
        result  dq 0
    .code
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    mov   eax, dword1
    mul   dword2
    mov   ebx, eax
    print uhex$(edx)
    print uhex$(ebx), 13, 10

    fild  dword1
    fimul dword2
    fabs   
    fistp result
    print uhex$(DWORD PTR result+4)
    print uhex$(DWORD PTR result), 13, 10

    mov   eax, dword3
    mul   dword4
    mov   ebx, eax
    print uhex$(edx)
    print uhex$(ebx),13,10

    fild  dword3
    fimul dword4
    fabs   
    fistp result
    print uhex$(DWORD PTR result+4)
    print uhex$(DWORD PTR result),13,10,13,10
       
    LOOP_COUNT    EQU 10000000
    REPEAT_COUNT  EQU 10

    counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      REPEAT REPEAT_COUNT
        mov   eax, dword1
        mul   dword2
        mov   DWORD PTR result, eax
        mov   DWORD PTR result+4, edx
      ENDM       
    counter_end
    print ustr$(eax)
    print chr$(" cycles (* 10)",13,10)

    counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      REPEAT REPEAT_COUNT
        fild  dword1
        fimul dword2
        fabs   
        fistp result
      ENDM
    counter_end
    print ustr$(eax)
    print chr$(" cycles (* 10)",13,10)

    mov   eax, input(13,10,"Press enter to exit...")
    exit
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start


Result on my P3:

0000000000000008
0000000000000008
0000000100000000
0000000100000000

32 cycles (* 10)
94 cycles (* 10)

Title: Re: FPU Question
Post by: Posit on May 28, 2005, 11:47:34 AM
Can FMUL somehow be used, or does it just yield a DWORD result?
Title: Re: FPU Question
Post by: MichaelW on May 28, 2005, 12:11:30 PM
FMUL also performs a signed multiplication, and it cannot handle integer operands.

http://www.website.masmforum.com/tutorials/fptute/fpuchap8.htm#fmul

I think for a speed increase you will need to use MMX, SSE, or SSE2.


Title: Re: FPU Question
Post by: GregL on May 29, 2005, 04:21:38 AM
Wait a minute, FMUL will work just fine if you load both integers into FPU registers.

.data
in1 dd 2
in2 dd 3
out dq 0
.code
fild in1      ; in1 is converted from a DWORD integer to a REAL10, ST(0) = 2.0
fild in2      ; in2 is converted from a DWORD integer to a REAL10, ST(0) = 3.0, ST(1) = 2.0
fmul          ; ST(0) = 6.0
fistp out     ; ST(0) is converted from a REAL10 to a QWORD integer and saved to out


Once an integer is loaded into an FPU register with FILD it is converted to REAL10 floating-point format. The values in FPU registers are in REAL10 format. The only time you have to worry about different formats is when loading from or storing to memory, or when accessing values in memory. FIMUL is for multiplying ST(0) by an integer located in memory.

AeroASM's code would work just fine, you don't need the FABS. In fact you dont want the FABS.

.data
in1 dd 2
in2 dd 3
out dq 0
.code
fild in1      ; in1 is converted from a DWORD integer to a REAL10, ST(0) = 2.0
fimul in2     ; in2 is converted from a DWORD integer to a REAL10,  ST(0) = ST(0) * 3.0, ST(0) = 6.0
fistp out     ; ST(0) is converted from a REAL10 to a QWORD integer and saved to out



FABS clears the sign bit of the REAL10 value in ST(0). You don't want to do that here, it would cause errors for negative values. ie. if in1 = -2 and in2 = 3.

No offense MichaelW, I just couldn't let that be.

Title: Re: FPU Question
Post by: MichaelW on May 29, 2005, 09:57:23 AM
Greg,

No offense taken. My goal here is to provide correct answers, and if I don't then I should be, and would prefer to be, corrected.

Regarding the FABS, from Posit's initial post, emphasis added:
Quote
Is there an FPU instruction to multiply two unsigned DWORD integers and store the result in 64-bit format?
AeroAsm's code will not work over the full range of unsigned values. The FABS was an attempt to correct the result that, unfortunately, will also not work over the full range of unsigned values. At this point I can't think of any clean method of converting the value, and in any case the FPU version will still be slower than the ALU version.

Regarding FMUL, it will not accept an integer memory operand. When I considered using FMUL it seemed to me that the extra instruction would make the code execute slower. Now that I test it, on a P3, the FMUL sequence and the FIMUL sequence both take 72 cycles (without the FABS). FMUL may be faster on other processors, but again, the FPU version will still be slower than ALU version.
Title: Re: FPU Question
Post by: GregL on May 29, 2005, 06:55:06 PM
Hi MichaelW,

After re-reading the posts and running your code, I see where you were coming from, I was taking part of what you were saying in the wrong context.

You're right, for speed Posit is best off with the ALU code.