News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Quick test for NaN?

Started by jj2007, June 23, 2009, 04:00:04 PM

Previous topic - Next topic

jj2007

With esi as a pointer to a REAL10, I try to test for a NaN before restoring a value on the FPU:

movzx eax, word ptr [esi+8]
inc ax
.if Zero?
fld1 ; set a harmless value
.else
fld REAL10 ptr [esi]
.endif


I have a suspicion that this hack does not cover all situations. Is there any better, fast and foolproof way to do that? fxam is horribly slow...

qWord

iirc,  you had to test bits 64-78, (the sign bit is used to differ +/- INFINITY ).
my suggestion:

movzx eax,WORD ptr [esi+8]
or eax,08000h
xor eax,0ffffh
jz @NAN
FPU in a trice: SmplMath
It's that simple!

dedndave

#2
compare the value to itself
NANs are never equal to anything
you just have to do it before you store it into memory

        fcom    st(0)

jj2007

Quote from: qWord on June 23, 2009, 04:42:53 PM
iirc,  you had to test bits 64-78, (the sign bit is used to differ +/- INFINITY ).
my suggestion:

movzx eax,WORD ptr [esi+8]
or eax,08000h
xor eax,0ffffh
jz @NAN


Thanks, looks convincing and works :U

jj2007

Quote from: dedndave on June 23, 2009, 04:42:59 PM
compare the value to itself
NANs are never equal to anything
you just have to do it before you store it into memory

        fcom    st(0)


Sounds interesting, too. My problem is as follows:
- I need three FPU registers for a routine
- I store ST0...ST2 to memory
- that works fine most of the time
- except when they are empty
- then they become BAD NaN's after loading them into the FPU...

Anyproc proc
LOCAL f2sST0:REAL10, f2sST1:REAL10, f2sST2:REAL10
fstp f2sST0 ; save & free 3 registers: 3*(6+6)=36 cycles, in practice 10 per FPU register, 30 cycles
fstp f2sST1 ; No. 2 saved
fstp f2sST2 ; No. 3 saved

... do stuff with FPU ...

lea esi, f2sST2
m2m ecx, 3
.Repeat
movzx eax, word ptr [esi+8]
or eax, 08000h
xor eax, 0ffffh
.if Zero?
fld1 ; set a harmless value
.else
fld REAL10 ptr [esi]
.endif
lea esi, [esi+10]
dec ecx
.Until Zero?


That prevents NaN in the FPU.

dedndave

i see - well - it is prolly faster than fcom, also
but....

        lea     esi,[esi+10]

same as ???

        add     esi,10

or is the lea a faster, shorter way ?
EDIT
oh - btw - to be a NaN, the mantissa must also be non-zero
if the exponent bits are all 1's and the mantisaa bits are all 0's, it evaluates to signed infinity

jj2007

Quote from: dedndave on June 23, 2009, 08:36:34 PM
i see - well - it is prolly faster than fcom, also
but....

        lea     esi,[esi+10]

same as ???

        add     esi,10

or is the lea a faster, shorter way ?
EDIT
oh - btw - to be a NaN, the mantissa must also be non-zero
if the exponent bits are all 1's and the mantisaa bits are all 0's, it evaluates to signed infinity

> lea     esi,[esi+10]
> add     esi,10

lea does not modify flags and is marginally faster on most CPUs.

> the mantissa must also be non-zero

testing that would make it a bit slow...

In the meantime, I inserted ffree ST(7) in front of any fld. That helps to prevent NaNs at the source :bg

dedndave

that is what i was thinking - testing conditions prior to creating the NaN's to avoid them may be better altogether
but, if you are treating infinity as though it was a NaN, you are alright
for example, don't take the squareroot of a negative number
or trig functions that are out of valid range - that kind of thing

Jimg

This all sounds suspiciously like the same problem I've been fighting.
You're writing a general purpose routine.  You have to return the fpu in exactly the condition the user gave it to you.
You don't want to do an fsave/frestor and pay the 500 cycle penalty, so you try to save just what you use.  Unfortunately, besides the registers, you have to save and restore several status words, and it all adds up to about the same.  And you have to do an fnclex or you will crash if he already had an exception.  If you do the fnclex, you can't return the fpu in the same condition.
My final solution was I only had to do 4 real10 multiplies, so I wrote a real10 fmul emulator.  Seems like it would be slow, but it's faster than fsave/frestor.

dedndave

it's funny you say that
i was looking for a simple way to display 64-bit integers as decimal ASCII
i could have used the FPU to do it
but, being a bit rusty at assembler coding, i decided to write a multiple-precision divide loop and make my own routine
i was surprised at how fast my code was
i want to spend some more time on it, as i think i can improve on it even further
i suspect a lot of simpler math can be done without the FPU
of course, if you need exponential or trig functions, it would be hard to improve on the FPU for speed and accuracy, combined

raymond

QuoteAnd you have to do an fnclex or you will crash if he already had an exception.

Exceptions on the FPU will never crash a program by themselves. They strictly get recorded as such in the Status Word. It is then the programmer's responsibility to check whichever exception flag may be of significance whenever necessary, and possibly provide exception handling code if deemed appropriate.

The above does not deny the fact that a program may later crash when erroneous FPU results due to some exceptions are allowed to continue being used.
When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com

Jimg

I stand corrected.  But doing an fnclex cured my problem.

jj2007

#12
Here is the complete sequence I use:

.nolist
include \masm32\include\masm32rt.inc

.code
start:
; int 3 ; uncomment to see what happens in Olly
push 1234567890
fild dword ptr [esp] ; 1234567890 in the FPU, NEAR 53
add esp, 4
call MyTest ; call the FPU proc that uses ST0...ST3

push 1111111111
REPEAT 7
fild dword ptr [esp] ; 7*1111111111 in the FPU, 123456789 in ST(7), NEAR 53
ENDM
add esp, 4
; int 3 ; another breakpoint for Olly
call MyTest ; call the proc again, this time with a fully loaded FPU
; int 3 ; another breakpoint for Olly

print "OK"
getkey
exit

MyTest proc
LOCAL f2sST0:REAL10, f2sST1:REAL10, f2sST2:REAL10
LOCAL dummy1:WORD, f2sConWold:WORD, f2sConDy:WORD, f2sConW:WORD, dummy3:WORD, dummy4:dword
fstcw f2sConWold ; status word could be saved as fstsw ax,
mov eax, dword ptr f2sConWold ; but no such instruction for Control word
or ax, 011100000000b ; rounding mode DOWN,
mov dword ptr f2sConW, eax ; precision max=64
fldcw f2sConW ; set new control word
fstp f2sST0 ; save & free 3 registers: 3*(6+6)=36 cycles, in practice 10 per FPU register, 30 cycles
fstp f2sST1 ; No. 2 saved
fstp f2sST2 ; No. 3 saved

push 11111111
fild dword ptr [esp] ; just for fun
pop eax
fstp st

lea esi, f2sST2
m2m ecx, 3
.Repeat
movzx eax, word ptr [esi+8]
or eax, 08000h
xor eax, 0ffffh
.if Zero?
fldz ; push zero and
ffree ST ; free the register
.else
fld REAL10 ptr [esi]
.endif
lea esi, [esi+10]
dec ecx
.Until Zero?
fldcw f2sConWold ; restore control word ->NEAR 53
ret
MyTest endp

end start



On return, the FPU is back to NEAR 53, has 123456879 in ST0 plus two NaNs in ST1 and ST2. In Olly, this looks ugly, but in real life it should not have any negative impact on other code that uses the FPU. If there was valid stuff in the FPU, it is still there.

EDIT: New code above tests for NaN restored from memory, and empties registers if it finds one. This should be foolproof, but testers welcome :bg

SteveCurtis

Quote from: jj2007 on June 23, 2009, 04:00:04 PM
With esi as a pointer to a REAL10, I try to test for a NaN before restoring a value on the FPU:

movzx eax, word ptr [esi+8]
inc ax
.if Zero?
fld1 ; set a harmless value
.else
fld REAL10 ptr [esi]
.endif


I have a suspicion that this hack does not cover all situations. Is there any better, fast and foolproof way to do that? fxam is horribly slow...

Hi jj2007,

I know this does not represent a 'quick' anything, but when I got interested in doing Digital Signal Processing functions on the x86 processor I began looking for anything about floating point format and limitations of use etc and was staggered to discover how many things we need to be aware of!  :eek

This link is titled 'What Every Computer Scientist Should Know About Floating-Point Arithmetic" and is worth a read for background anyway, if you haven't seen it.

http://docs.sun.com/source/806-3568/ncg_goldberg.html

Also have a look on www.embedded.com and scout around what 'Jack W. Crenshaw' says. I'm sure he had a few tricks for the NAN problem and the 'denormal' number problem, by hacking/reading the floating point format in memory. Some of this is also treated by Goldberg in detail, whom I think is the oracle for floating point stuff anyway.

Goldberg also goes into quite a bit of detail about using 'integer arithmetic'  on the floating format to tell if one float equals another float 'or close thereto'. Very fast and reliable.

Regards,
Steve






SteveCurtis

Quote from: dedndave on June 24, 2009, 02:28:51 AM
it's funny you say that
i was looking for a simple way to display 64-bit integers as decimal ASCII
i could have used the FPU to do it
but, being a bit rusty at assembler coding, i decided to write a multiple-precision divide loop and make my own routine
i was surprised at how fast my code was
i want to spend some more time on it, as i think i can improve on it even further
i suspect a lot of simpler math can be done without the FPU
of course, if you need exponential or trig functions, it would be hard to improve on the FPU for speed and accuracy, combined

Hi Dave,

I got a book by Jack Crenshaw that goes into the development of the trig functions from first principles (Chebychev Coefficient optimisation of Taylor Series etc) and allows you to tailor your functions for speed or accuracy (or both!). He shows how to do integer based versions  of everything also and that was why I got into it (for 8051 and small end processors, -- doing a square root can be time consuming unles you optimise in small processor!)

Although most C library functions should be optimised for speed, they are not always intrinsically safe for all numbers. He shows how to guard against accindental FPU errors and wrap calls so they will always be safe and to write your own functions if they are not supported in your compiler implementaiton.

ArcTan, AcrTan2, Tan, Cos, Sine and Modulo artihmetic for floats and integers and 'pi' definitions etc are all covered, plus Simpsons rule for integration, plus matrix operations and a host of other ideas to speed things along and make things as reliable as possible (he hates exceptions as they are slow and says you should guard your code and silently handle any errors, depending on the outcome of course!).

The book is called "Math Toolkit for Real-Time Programming" by Jack W. Crenshaw.  I liked it a lot and have haunted the embedded.com website for other high quality information also. THere is a heap of stuff about my pet subject (protecte mode in embedded applications for example!)  :wink

Regards,
Steve