News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Fast Floating-Point to Integer Truncation

Started by GregL, September 14, 2007, 02:23:41 AM

Previous topic - Next topic

GregL

After reading the following in the Intel Optimization Manual

QuoteUser/Source Coding Rule 16. (H impact, ML generality)

Use fast float-to-int routines, FISTTP, or SSE2 instructions. If coding these
routines, use the FISTTP instruction if SSE3 is available, or the CVTTSS2SI and
CVTTSD2SI instructions if coding with Streaming SIMD Extensions 2. Many
libraries generate X87 code that does more work than is necessary. The FISTTP
instruction in SSE3 can convert floating-point values to 16-bit, 32-bit, or
64-bit integers using truncation without accessing the floating-point control
word (FCW). The instructions CVTTSS2SI and CVTTSD2SI save many µops and some
store-forwarding delays over some compiler implementations. This avoids
changing the rounding mode.

I came up with a couple of procedures for truncation, one that uses FISTTP if
SSE3 available and another that uses CVTTSD2SI if SSE2 is available. See attached
file. They are for REAL8 variables, they could be easily modified for REAL10 or
REAL4 variables.

I did some timing tests using MichaelW's timing procedures and got the following results
on my Pentium D 940. The Truncate procedure uses the standard method that
changes the FPU rounding mode to truncate and then changes it back to what it
was.

TruncateSSE3: 12 cycles
TruncateSSE2: 16 cycles
Truncate:     40 cycles


Any comments are welcome.


[attachment deleted by admin]

GregL

Macro versions:

FISTTP MACRO adr:REQ
    ;; This macro is for MASM 6.x and 7x. If you are using
    ;; MASM 8.0 you don't need this MACRO.
    ;; FISTTP is only available on CPUs that support SSE3.
    LOCAL x,y
    IF (OPATTR(adr)) AND 00010100y  ;; register or const
        .ERR <Invalid operand, dst must be an address in memory!>
    ELSEIF (TYPE (adr) EQ WORD) OR (TYPE (adr) EQ SWORD)
      x:
        fimul adr
      y:
        org  x
        byte 0DFh
        org  y
    ELSEIF (TYPE (adr) EQ DWORD) OR (TYPE (adr) EQ SDWORD)
      x:
        fimul adr
      y:
        org  x
        byte 0DBh
        org  y
    ELSEIF (TYPE (adr) EQ QWORD)
      x:
        fmul adr
      y:
        org  x
        byte 0DDh
        org  y
    ELSE
        .ERR <Invalid operand, dst can be 2, 4 or 8 bytes address only!>
    ENDIF
ENDM

TruncateSSE3 MACRO pR8:REQ, pInt32:REQ
    ;; This macro requires SSE3 support
    mov edx, pR8
    mov eax, pInt32
    fld REAL8 PTR [edx]
    FISTTP SDWORD PTR [eax]
ENDM

TruncateSSE2 MACRO pR8:REQ, pInt32:REQ
    ;; This macro requires SSE2 support
    mov edx, pR8
    mov ecx, pInt32
    ;;cvttsd2si eax, REAL8 PTR [edx]  ;; the cvttsd2si SSE2 instruction requires ML 6.15 or later
    DB 0F2h,0Fh,02Ch,02               ;; ML 6.14 can use equivalent opcodes.
    mov DWORD PTR [ecx], eax
ENDM


I don't take credit for the FISTTP macro, I got it here.

cekic

hi:)
you wrote some good codes but I couldn't understand anything?
I wish you explain the codes and please may you tell about  FISTTP,CVTTSS2S,SSE2 and SSE3
what do they mean?
I can't find these things in ebooks end somet other things?
which source do you benefit?
regards

GregL

cekic,

You can read about those instructions in the Intel Manuals that you can download here. You can also download the AMD Manuals here.

cekic

Quote from: Greg on September 18, 2007, 05:29:08 PM
cekic,

You can read about those instructions in the Intel Manuals that you can download here. You can also download the AMD Manuals here.


dear Greg thanx for your links  :bg
I have understood what about they are :U
and these things for advanced programmer  I thought
I will be a good programmer in future on assembly if God give me longlive :green
regards
god bless you