After reading the following in the Intel Optimization Manual
QuoteUser/Source Coding Rule 16. (H impact, ML generality)
Use fast float-to-int routines, FISTTP, or SSE2 instructions. If coding these
routines, use the FISTTP instruction if SSE3 is available, or the CVTTSS2SI and
CVTTSD2SI instructions if coding with Streaming SIMD Extensions 2. Many
libraries generate X87 code that does more work than is necessary. The FISTTP
instruction in SSE3 can convert floating-point values to 16-bit, 32-bit, or
64-bit integers using truncation without accessing the floating-point control
word (FCW). The instructions CVTTSS2SI and CVTTSD2SI save many µops and some
store-forwarding delays over some compiler implementations. This avoids
changing the rounding mode.
I came up with a couple of procedures for truncation, one that uses FISTTP if
SSE3 available and another that uses CVTTSD2SI if SSE2 is available. See attached
file. They are for REAL8 variables, they could be easily modified for REAL10 or
REAL4 variables.
I did some timing tests using MichaelW's timing procedures and got the following results
on my Pentium D 940. The Truncate procedure uses the standard method that
changes the FPU rounding mode to truncate and then changes it back to what it
was.
TruncateSSE3: 12 cycles
TruncateSSE2: 16 cycles
Truncate: 40 cyclesAny comments are welcome.
[attachment deleted by admin]
Macro versions:
FISTTP MACRO adr:REQ
;; This macro is for MASM 6.x and 7x. If you are using
;; MASM 8.0 you don't need this MACRO.
;; FISTTP is only available on CPUs that support SSE3.
LOCAL x,y
IF (OPATTR(adr)) AND 00010100y ;; register or const
.ERR <Invalid operand, dst must be an address in memory!>
ELSEIF (TYPE (adr) EQ WORD) OR (TYPE (adr) EQ SWORD)
x:
fimul adr
y:
org x
byte 0DFh
org y
ELSEIF (TYPE (adr) EQ DWORD) OR (TYPE (adr) EQ SDWORD)
x:
fimul adr
y:
org x
byte 0DBh
org y
ELSEIF (TYPE (adr) EQ QWORD)
x:
fmul adr
y:
org x
byte 0DDh
org y
ELSE
.ERR <Invalid operand, dst can be 2, 4 or 8 bytes address only!>
ENDIF
ENDM
TruncateSSE3 MACRO pR8:REQ, pInt32:REQ
;; This macro requires SSE3 support
mov edx, pR8
mov eax, pInt32
fld REAL8 PTR [edx]
FISTTP SDWORD PTR [eax]
ENDM
TruncateSSE2 MACRO pR8:REQ, pInt32:REQ
;; This macro requires SSE2 support
mov edx, pR8
mov ecx, pInt32
;;cvttsd2si eax, REAL8 PTR [edx] ;; the cvttsd2si SSE2 instruction requires ML 6.15 or later
DB 0F2h,0Fh,02Ch,02 ;; ML 6.14 can use equivalent opcodes.
mov DWORD PTR [ecx], eax
ENDM
I don't take credit for the FISTTP macro, I got it here (http://www.intel.com/cd/ids/developer/asmo-na/eng/167741.htm?page=9).
hi:)
you wrote some good codes but I couldn't understand anything?
I wish you explain the codes and please may you tell about FISTTP,CVTTSS2S,SSE2 and SSE3
what do they mean?
I can't find these things in ebooks end somet other things?
which source do you benefit?
regards
cekic,
You can read about those instructions in the Intel Manuals that you can download here (http://www.intel.com/products/processor/manuals/index.htm). You can also download the AMD Manuals here (http://www.amd.com/us-en/Processors/DevelopWithAMD/0,,30_2252_739_7044,00.html).
Quote from: Greg on September 18, 2007, 05:29:08 PM
cekic,
You can read about those instructions in the Intel Manuals that you can download here (http://www.intel.com/products/processor/manuals/index.htm). You can also download the AMD Manuals here (http://www.amd.com/us-en/Processors/DevelopWithAMD/0,,30_2252_739_7044,00.html).
dear Greg thanx for your links :bg
I have understood what about they are :U
and these things for advanced programmer I thought
I will be a good programmer in future on assembly if God give me longlive :green
regards
god bless you