News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Fast SSE floor() function

Started by GregL, July 18, 2008, 05:02:20 AM

Previous topic - Next topic

Neil

Intel Quad core 9550

Floor timings:

MySingle1       REAL4 12.34567890123456 floor= 12
MyDouble1       REAL8 12.34567890123456 floor= 12
MySingle2       REAL4 -12.345678901234  floor= -13
MyDouble2       REAL8 -12.345678901234  floor= -13
MySingle3       REAL4 0.99999999        floor= 1        (sic!)
MyDouble3       REAL8 0.99999999        floor= 0

95      cycles for Floor8
20      cycles for Floor4
23      cycles for Floor4a
20      cycles for Floor4b
23      cycles for Floor4c

95      cycles for Floor8
20      cycles for Floor4
22      cycles for Floor4a
20      cycles for Floor4b
22      cycles for Floor4c

96      cycles for Floor8
20      cycles for Floor4
23      cycles for Floor4a
20      cycles for Floor4b
24      cycles for Floor4c

96      cycles for Floor8
20      cycles for Floor4
22      cycles for Floor4a
20      cycles for Floor4b
22      cycles for Floor4c


jj2007

Quote from: Mark Jones on March 12, 2009, 03:27:13 PM
Lets see if I get a "sporty" reply... :P

4b seems to perform well, see also herge's and Neil's timings. Now one of the other sportsmen will probably cry foul because a fast floor() is a waste of the forum's bandwidth... :bg

(to be honest, I have never used floor() in any real programming, but I guess there are applications where huge chunks of data need to be processed fast enough)

MichaelW

This procedure leaves the return value on the FPU stack, and runs in 50 cycles on a P3:

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

align 4
_floor8 proc double:REAL8
    fld FP8(2.0)
    fmul QWORD PTR [esp+4]
    fadd FP8(-0.5)
    sub esp, 8
    fistp QWORD PTR [esp]
    shr DWORD PTR [esp+4], 1
    rcr DWORD PTR [esp], 1
    fild QWORD PTR [esp]
    add esp, 8
    ret 8
_floor8 endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

eschew obfuscation