News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

SSE VS FPU

Started by Farabi, February 27, 2012, 11:02:16 AM

Previous topic - Next topic

jj2007


OceanJeff32

That wasn't my fireworks demo, btw, just clarifying, but I did experiment with it, and learned about MMX and SSE and how they work from that.

Very cool stuff!

it's been a while since i've visited ronybc.com, but his website looks like it's been taken over by BUGS and ADS! Beware upon visiting.

later,

jeff c
:U
Any good programmer knows, every large and/or small job, is equally large, to the programmer!

qWord

Quote from: Farabi on March 03, 2012, 10:01:42 AMWhy not using Div and Mul for the floating point subtitutions?
floating point <> integer
FPU in a trice: SmplMath
It's that simple!

Farabi

Mul 1 ms, fmul 918 ms.
Those who had universe knowledges can control the world by a micro processor.
http://www.wix.com/farabio/firstpage

"Etos siperi elegi"

Farabi

We could use what they call fixed point as a subtitution for the Floating points. I proposed mul and div instruction for the precicions, but shr-ing 32-bit is a lot faster than doing so.
Those who had universe knowledges can control the world by a micro processor.
http://www.wix.com/farabio/firstpage

"Etos siperi elegi"

oex

Quote from: OceanJeff32 on March 03, 2012, 10:42:51 AM
That wasn't my fireworks demo, btw, just clarifying, but I did experiment with it, and learned about MMX and SSE and how they work from that.

Very cool stuff!

it's been a while since i've visited ronybc.com, but his website looks like it's been taken over by BUGS and ADS! Beware upon visiting.

later,

jeff c
:U

:lol Hi Jeff, I checked the code briefly but didnt find the offending intel instruction, it was good code though.... Sorry wasnt an accusation just a heads up :lol, I wondered if you would see it
We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv

jj2007

Quote from: Farabi on March 03, 2012, 12:33:32 PM
Mul 1 ms, fmul 918 ms.

     timer_begin 10000000, REALTIME_PRIORITY_CLASS
      fmul
    timer_end


RT..., e.g. this one

dedndave

Jochen,
i think his times are for completion of entire functions, one using mul and one using fmul
much of the results will depend on how they are written, of course   :P

jj2007

Dave,

He puts a simple fmul between timer_begin and timer_end. After (in the most optimistic scenario) the 8th iteration, he gets an exception, and the FPU grinds down to a halt. I had mentioned this already in reply #4, but why read posts if you can boldly state that the FPU is shit, and SSE is the future?

dedndave

the FPU is pretty fast
Raymond has proved that on more than one occasion   :P

the difference here is between floats and integers, i think

jj2007

Well, not really... it's a bit more complex:
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
494     cycles for 100*mul eax
438     cycles for 100*fmul, properly used
17274   cycles for 100*fmul, improperly used

494     cycles for 100*mul eax
438     cycles for 100*fmul, properly used
17252   cycles for 100*fmul, improperly used


P.S.: Google just told me we've treated that already, not so long ago :bg

Farabi

I dont get it, so on my code, the FPU simply error and halt?
Those who had universe knowledges can control the world by a micro processor.
http://www.wix.com/farabio/firstpage

"Etos siperi elegi"

dedndave

read the FPU tutorial by Ray   :U

when you "put something" into the FPU, it gets pushed onto the internal stack
when the FPU stack is full, bad things can happen   :P
to make space, pop something out
this can generally be done by using an instruction that pops and saves to memory at the same time
but - it also means there has to be an empty register to start with - you get 8 of them

i am surprised that i do not see more use of local variables for storage of reals...
        fstp real8 ptr [ebp-16]
very efficient   :U

of course, if you make a local for a real10, it should be 12 bytes   :P

qWord

Quote from: dedndave on March 04, 2012, 11:49:30 AMi am surprised that i do not see more use of local variables for storage of reals...
        fstp real8 ptr [ebp-16]
very efficient   :U
I do not know why, but I got the impression that especially assembler programmers seems to be infested by the use-globals-as-much-as-possible pest  :bg
FPU in a trice: SmplMath
It's that simple!

MichaelW

#29
Onan,

By default the FPU handles exceptions internally, so the only evidence you see of the exceptions are incorrect results, and if you bother to check the execution time, much slower execution. This code detects the exceptions by checking the FPU status word.

;==============================================================================
include \masm32\include\masm32rt.inc
;==============================================================================
.data
    junk real8 ?
.code
;==============================================================================
ShowStatusWord proc
    local sw:word
    fstsw sw
    test sw, 1111111b
    jnz @F
    printf(".")
  @@:
    test sw, 0000001b
    jz  @F
    printf("I")
  @@:
    test sw, 0000010b
    jz  @F
    printf("D")
  @@:
    test sw, 0000100b
    jz  @F
    printf("Z")
  @@:
    test sw, 0001000b
    jz  @F
    printf("O")
  @@:
    test sw, 0010000b
    jz  @F
    printf("U")
  @@:
    test sw, 0100000b
    jz  @F
    printf("P")
  @@:
    test sw, 1000000b
    jz  @F
    printf("S")
  @@:
    ret
ShowStatusWord endp
;==============================================================================
start:
;==============================================================================

    ;----------------------------------------------------
    ; The exception flags are identified as follows:
    ; I = invalid operation
    ; D = denormalized
    ; Z = zero divide
    ; O = overflow
    ; U = underflow
    ; P = precision
    ; S = stack fault
    ; See Raymond's FPU Tutorial for more information.
    ;----------------------------------------------------

    finit
    mov ebx, 20
    .while ebx
        fmul
        call ShowStatusWord
        dec ebx
    .endw
    printf("\n")

    finit
    mov ebx, 20
    .while ebx
        fld1
        fld1
        fmul
        call ShowStatusWord
        dec ebx
    .endw
    printf("\n")

    finit
    mov ebx, 20
    .while ebx
        fld1
        fld1
        fmul
        fstp junk
        call ShowStatusWord
        dec ebx
    .endw
    printf("\n\n")

    inkey
    exit
;==============================================================================
end start


ISISISISISISISISISISISISISISISISISISISIS
.......ISISISISISISISISISISISISIS
....................


eschew obfuscation