News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Fpu is very slow....

Started by Airam, August 09, 2009, 11:23:23 PM

Previous topic - Next topic

dedndave

i have wondered about that (i.e. returns and out-of-order execution)
that means that a return from a subroutine is a serializing instruction ?
MichaelW was saying that cpuid, iret, and bound (i think it was bound) were the only serializing instructions
i tried using iret to serialize rdtsc, but it was slower than cpuid
a regular ret should be much faster, if it will work
now, i trust Michael - he has done a lot of research on this subject
but, his source of information could be wrong
it would make sense that ret serializes instructions

Airam

I have send to all of you a private message.

Thank you!!

Airam

I forgot to tell you how to use it when you run it.
Go to: Función -> Insertar Función -> Calcular
And then comment/uncomment the invoke SendMessage line of wndMDIMain.

Astro

Where can I find these functions?

EDIT: Found them

Best regards,
Astro.

Astro

My gut is telling me "race condition" but I can't see where to start finding the source.

You haven't made a typo somewhere have you? Put a ret or a call in the middle of a function or something bizarre like that?

Worse, you don't circular reference your own function do you?

Best regards,
Astro.

MichaelW

Airam,

I got your PM and your code. I'm not set up to do much with it, but I did patch together an executable with the essential components, cleared the interrupt masks, and used Dr. Watson to catch the exceptions.

I started with:

clearintmasks UM or OM or ZM or IM

Leaving out DM and PM on the assumption that these are less serious than the others are. The first exception was EXCEPTION_FLT_DIVIDE_BY_ZERO in this code (fault detected at last instruction, but the problem is obviously the FDIVR instruction):

Fld Real10 Ptr [Edi]  ; x | -
Fld St(0)             ; x | x | -
Fld EIGHT             ; 8.0 | x | x | -
Fdivr                 ; 8.0 / x | x | -
Fld St(0)             ; 8.0 / x | 8.0 / x | x | -


I then eliminated ZM from the mask, and the next exception was EXCEPTION_FLT_INVALID_OPERATION in this code (fault detected at the last instruction, so the problem is probably the FSUBR instruction):

Fmul St(0), St(3) ; cos(xx) · ans1 | sin(xx) · ans2 · z | ans2 | ans1 | z | x | -
Fwait
Fsubr             ; cos(xx) · ans1 - sin(xx) · ans2 · z | ans2 | ans1 | z | x | -
Fstp St(1)        ; cos(xx) · ans1 - sin(xx) · ans2 · z | ans1 | z | x | -


Judging from the complexity of the code I suspect that other exceptions are being generated, and possibly many others. And judging from the size of your arrays, if even most of the elements are being used, this would explain the slow execution.
eschew obfuscation