The MASM Forum Archive 2004 to 2012

General Forums => The Workshop => Topic started by: jj2007 on June 23, 2009, 04:00:04 PM

Title: Quick test for NaN?
Post by: jj2007 on June 23, 2009, 04:00:04 PM
With esi as a pointer to a REAL10, I try to test for a NaN before restoring a value on the FPU:

movzx eax, word ptr [esi+8]
inc ax
.if Zero?
fld1 ; set a harmless value
.else
fld REAL10 ptr [esi]
.endif


I have a suspicion that this hack does not cover all situations. Is there any better, fast and foolproof way to do that? fxam is horribly slow...
Title: Re: Quick test for NaN?
Post by: qWord on June 23, 2009, 04:42:53 PM
iirc,  you had to test bits 64-78, (the sign bit is used to differ +/- INFINITY ).
my suggestion:

movzx eax,WORD ptr [esi+8]
or eax,08000h
xor eax,0ffffh
jz @NAN
Title: Re: Quick test for NaN?
Post by: dedndave on June 23, 2009, 04:42:59 PM
compare the value to itself
NANs are never equal to anything
you just have to do it before you store it into memory

        fcom    st(0)
Title: Re: Quick test for NaN?
Post by: jj2007 on June 23, 2009, 08:02:49 PM
Quote from: qWord on June 23, 2009, 04:42:53 PM
iirc,  you had to test bits 64-78, (the sign bit is used to differ +/- INFINITY ).
my suggestion:

movzx eax,WORD ptr [esi+8]
or eax,08000h
xor eax,0ffffh
jz @NAN


Thanks, looks convincing and works :U
Title: Re: Quick test for NaN?
Post by: jj2007 on June 23, 2009, 08:30:32 PM
Quote from: dedndave on June 23, 2009, 04:42:59 PM
compare the value to itself
NANs are never equal to anything
you just have to do it before you store it into memory

        fcom    st(0)


Sounds interesting, too. My problem is as follows:
- I need three FPU registers for a routine
- I store ST0...ST2 to memory
- that works fine most of the time
- except when they are empty
- then they become BAD NaN's after loading them into the FPU...

Anyproc proc
LOCAL f2sST0:REAL10, f2sST1:REAL10, f2sST2:REAL10
fstp f2sST0 ; save & free 3 registers: 3*(6+6)=36 cycles, in practice 10 per FPU register, 30 cycles
fstp f2sST1 ; No. 2 saved
fstp f2sST2 ; No. 3 saved

... do stuff with FPU ...

lea esi, f2sST2
m2m ecx, 3
.Repeat
movzx eax, word ptr [esi+8]
or eax, 08000h
xor eax, 0ffffh
.if Zero?
fld1 ; set a harmless value
.else
fld REAL10 ptr [esi]
.endif
lea esi, [esi+10]
dec ecx
.Until Zero?


That prevents NaN in the FPU.
Title: Re: Quick test for NaN?
Post by: dedndave on June 23, 2009, 08:36:34 PM
i see - well - it is prolly faster than fcom, also
but....

        lea     esi,[esi+10]

same as ???

        add     esi,10

or is the lea a faster, shorter way ?
EDIT
oh - btw - to be a NaN, the mantissa must also be non-zero
if the exponent bits are all 1's and the mantisaa bits are all 0's, it evaluates to signed infinity
Title: Re: Quick test for NaN?
Post by: jj2007 on June 23, 2009, 09:06:59 PM
Quote from: dedndave on June 23, 2009, 08:36:34 PM
i see - well - it is prolly faster than fcom, also
but....

        lea     esi,[esi+10]

same as ???

        add     esi,10

or is the lea a faster, shorter way ?
EDIT
oh - btw - to be a NaN, the mantissa must also be non-zero
if the exponent bits are all 1's and the mantisaa bits are all 0's, it evaluates to signed infinity

> lea     esi,[esi+10]
> add     esi,10

lea does not modify flags and is marginally faster on most CPUs.

> the mantissa must also be non-zero

testing that would make it a bit slow...

In the meantime, I inserted ffree ST(7) in front of any fld. That helps to prevent NaNs at the source :bg
Title: Re: Quick test for NaN?
Post by: dedndave on June 23, 2009, 09:13:39 PM
that is what i was thinking - testing conditions prior to creating the NaN's to avoid them may be better altogether
but, if you are treating infinity as though it was a NaN, you are alright
for example, don't take the squareroot of a negative number
or trig functions that are out of valid range - that kind of thing
Title: Re: Quick test for NaN?
Post by: Jimg on June 23, 2009, 10:53:32 PM
This all sounds suspiciously like the same problem I've been fighting.
You're writing a general purpose routine.  You have to return the fpu in exactly the condition the user gave it to you.
You don't want to do an fsave/frestor and pay the 500 cycle penalty, so you try to save just what you use.  Unfortunately, besides the registers, you have to save and restore several status words, and it all adds up to about the same.  And you have to do an fnclex or you will crash if he already had an exception.  If you do the fnclex, you can't return the fpu in the same condition.
My final solution was I only had to do 4 real10 multiplies, so I wrote a real10 fmul emulator.  Seems like it would be slow, but it's faster than fsave/frestor.
Title: Re: Quick test for NaN?
Post by: dedndave on June 24, 2009, 02:28:51 AM
it's funny you say that
i was looking for a simple way to display 64-bit integers as decimal ASCII
i could have used the FPU to do it
but, being a bit rusty at assembler coding, i decided to write a multiple-precision divide loop and make my own routine
i was surprised at how fast my code was
i want to spend some more time on it, as i think i can improve on it even further
i suspect a lot of simpler math can be done without the FPU
of course, if you need exponential or trig functions, it would be hard to improve on the FPU for speed and accuracy, combined
Title: Re: Quick test for NaN?
Post by: raymond on June 24, 2009, 02:38:08 AM
QuoteAnd you have to do an fnclex or you will crash if he already had an exception.

Exceptions on the FPU will never crash a program by themselves. They strictly get recorded as such in the Status Word. It is then the programmer's responsibility to check whichever exception flag may be of significance whenever necessary, and possibly provide exception handling code if deemed appropriate.

The above does not deny the fact that a program may later crash when erroneous FPU results due to some exceptions are allowed to continue being used.
Title: Re: Quick test for NaN?
Post by: Jimg on June 24, 2009, 03:20:21 AM
I stand corrected.  But doing an fnclex cured my problem.
Title: Re: Quick test for NaN?
Post by: jj2007 on June 24, 2009, 08:07:01 AM
Here is the complete sequence I use:

.nolist
include \masm32\include\masm32rt.inc

.code
start:
; int 3 ; uncomment to see what happens in Olly
push 1234567890
fild dword ptr [esp] ; 1234567890 in the FPU, NEAR 53
add esp, 4
call MyTest ; call the FPU proc that uses ST0...ST3

push 1111111111
REPEAT 7
fild dword ptr [esp] ; 7*1111111111 in the FPU, 123456789 in ST(7), NEAR 53
ENDM
add esp, 4
; int 3 ; another breakpoint for Olly
call MyTest ; call the proc again, this time with a fully loaded FPU
; int 3 ; another breakpoint for Olly

print "OK"
getkey
exit

MyTest proc
LOCAL f2sST0:REAL10, f2sST1:REAL10, f2sST2:REAL10
LOCAL dummy1:WORD, f2sConWold:WORD, f2sConDy:WORD, f2sConW:WORD, dummy3:WORD, dummy4:dword
fstcw f2sConWold ; status word could be saved as fstsw ax,
mov eax, dword ptr f2sConWold ; but no such instruction for Control word
or ax, 011100000000b ; rounding mode DOWN,
mov dword ptr f2sConW, eax ; precision max=64
fldcw f2sConW ; set new control word
fstp f2sST0 ; save & free 3 registers: 3*(6+6)=36 cycles, in practice 10 per FPU register, 30 cycles
fstp f2sST1 ; No. 2 saved
fstp f2sST2 ; No. 3 saved

push 11111111
fild dword ptr [esp] ; just for fun
pop eax
fstp st

lea esi, f2sST2
m2m ecx, 3
.Repeat
movzx eax, word ptr [esi+8]
or eax, 08000h
xor eax, 0ffffh
.if Zero?
fldz ; push zero and
ffree ST ; free the register
.else
fld REAL10 ptr [esi]
.endif
lea esi, [esi+10]
dec ecx
.Until Zero?
fldcw f2sConWold ; restore control word ->NEAR 53
ret
MyTest endp

end start



On return, the FPU is back to NEAR 53, has 123456879 in ST0 plus two NaNs in ST1 and ST2. In Olly, this looks ugly, but in real life it should not have any negative impact on other code that uses the FPU. If there was valid stuff in the FPU, it is still there.

EDIT: New code above tests for NaN restored from memory, and empties registers if it finds one. This should be foolproof, but testers welcome :bg
Title: Re: Quick test for NaN?
Post by: SteveCurtis on June 24, 2009, 08:50:43 AM
Quote from: jj2007 on June 23, 2009, 04:00:04 PM
With esi as a pointer to a REAL10, I try to test for a NaN before restoring a value on the FPU:

movzx eax, word ptr [esi+8]
inc ax
.if Zero?
fld1 ; set a harmless value
.else
fld REAL10 ptr [esi]
.endif


I have a suspicion that this hack does not cover all situations. Is there any better, fast and foolproof way to do that? fxam is horribly slow...

Hi jj2007,

I know this does not represent a 'quick' anything, but when I got interested in doing Digital Signal Processing functions on the x86 processor I began looking for anything about floating point format and limitations of use etc and was staggered to discover how many things we need to be aware of!  :eek

This link is titled 'What Every Computer Scientist Should Know About Floating-Point Arithmetic" and is worth a read for background anyway, if you haven't seen it.

http://docs.sun.com/source/806-3568/ncg_goldberg.html

Also have a look on www.embedded.com and scout around what 'Jack W. Crenshaw' says. I'm sure he had a few tricks for the NAN problem and the 'denormal' number problem, by hacking/reading the floating point format in memory. Some of this is also treated by Goldberg in detail, whom I think is the oracle for floating point stuff anyway.

Goldberg also goes into quite a bit of detail about using 'integer arithmetic'  on the floating format to tell if one float equals another float 'or close thereto'. Very fast and reliable.

Regards,
Steve





Title: Re: Quick test for NaN?
Post by: SteveCurtis on June 24, 2009, 09:12:15 AM
Quote from: dedndave on June 24, 2009, 02:28:51 AM
it's funny you say that
i was looking for a simple way to display 64-bit integers as decimal ASCII
i could have used the FPU to do it
but, being a bit rusty at assembler coding, i decided to write a multiple-precision divide loop and make my own routine
i was surprised at how fast my code was
i want to spend some more time on it, as i think i can improve on it even further
i suspect a lot of simpler math can be done without the FPU
of course, if you need exponential or trig functions, it would be hard to improve on the FPU for speed and accuracy, combined

Hi Dave,

I got a book by Jack Crenshaw that goes into the development of the trig functions from first principles (Chebychev Coefficient optimisation of Taylor Series etc) and allows you to tailor your functions for speed or accuracy (or both!). He shows how to do integer based versions  of everything also and that was why I got into it (for 8051 and small end processors, -- doing a square root can be time consuming unles you optimise in small processor!)

Although most C library functions should be optimised for speed, they are not always intrinsically safe for all numbers. He shows how to guard against accindental FPU errors and wrap calls so they will always be safe and to write your own functions if they are not supported in your compiler implementaiton.

ArcTan, AcrTan2, Tan, Cos, Sine and Modulo artihmetic for floats and integers and 'pi' definitions etc are all covered, plus Simpsons rule for integration, plus matrix operations and a host of other ideas to speed things along and make things as reliable as possible (he hates exceptions as they are slow and says you should guard your code and silently handle any errors, depending on the outcome of course!).

The book is called "Math Toolkit for Real-Time Programming" by Jack W. Crenshaw.  I liked it a lot and have haunted the embedded.com website for other high quality information also. THere is a heap of stuff about my pet subject (protecte mode in embedded applications for example!)  :wink

Regards,
Steve
Title: Re: Quick test for NaN?
Post by: jj2007 on June 24, 2009, 09:33:11 AM
Thanks, Steve. The FPU is tricky indeed ;-)
The problem here is that if you need, say, 3 registers, and you want to save the current content to memory and restore it later, that empty registers turn into NaNs when reloaded.

That *may* become a problem if other code pushes 6 values on the FPU without using ffree st(7).

Testing if ST(0) is empty is pretty slow and difficult, unfortunately (via FSTENV). But fortunately, the value of a saved empty register in memory can be tested easily for NaN - see my edited code above.
Title: Re: Quick test for NaN?
Post by: dedndave on June 24, 2009, 11:47:59 AM
here is the FPU reference i use, from Washington University...
http://www.arl.wustl.edu/~lockwood/class/cs306/books/artofasm/Chapter_14/CH14-1.html

Steve, yes, i have heard of Jack and read a few of his articles
from what i understand, he quit doing them  :(

i suppose a multi-tasking system takes the fun out of the FPU
i don't like sharing - lol
Title: Re: Quick test for NaN?
Post by: SteveCurtis on June 24, 2009, 12:36:12 PM
Quote from: dedndave on June 24, 2009, 11:47:59 AM

http://www.arl.wustl.edu/~lockwood/class/cs306/books/artofasm/Chapter_14/CH14-1.html
Steve, yes, i have heard of Jack and read a few of his articles
from what i understand, he quit doing them  :(

Hi Dave,
Thanks mate.. Just added to my favs list for a read on the cold winter nights.   :wink
Loved Jack's rambling style. Apparently he worked on the early Apollo missions for NASA. Still had very much the 'down to earth' quailty about him I really liked. Picked up his book at Borders and read the first few lines and promptly bought the book.

Saw the Mr Bean Vid. Yay thanks. I'll also pass it on to my 2 girls who got areal kick out of him when they were younger.

Regards,
Steve

Title: Re: Quick test for NaN?
Post by: jj2007 on June 24, 2009, 02:16:12 PM
Just stumbled into a strange behaviour of the popular fstp ST instruction:

459     cycles for 100*fdecstp, fincstp
502     cycles for 100*fldz, fincstp
104813  cycles for 100*fdecstp, fstp ST
107337  cycles for 100*fldz, fstp ST


Here are the four timed instruction sequences:
fdecstp
ffree st(0) ; rotate the barrel, free ST
fincstp ; (rotate back for next test below)

fldz
ffree st(0) ; push a zero, free ST
fincstp ; (rotate back for next test below)

fdecstp
ffree st(0) ; rotate the barrel, free ST
fstp ST ; (pop ST for next test below)

fldz
ffree st(0) ; push a zero, free ST
fstp ST ; (pop ST for next test below...)


Source and executable attached. Any explanations?

[attachment deleted by admin]
Title: Re: Quick test for NaN?
Post by: Jimg on June 24, 2009, 03:01:58 PM
amd results:
110     cycles for 100*fdecstp, fincstp
193     cycles for 100*fldz, fincstp
196     cycles for 100*fdecstp, fstp ST
142     cycles for 100*fldz, fstp ST

--- ok ---
Title: Re: Quick test for NaN?
Post by: jj2007 on June 24, 2009, 04:39:12 PM
Seems to be an Intel problem. Timings above were for a Prescoot P4, these are for a Intel(R) Pentium(R) 4 CPU 2.40GHz (SSE2):

685     cycles for 100*fdecstp, fincstp
487     cycles for 100*fldz, fincstp
85383   cycles for 100*fdecstp, fstp ST
88323   cycles for 100*fldz, fstp ST
Title: Re: Quick test for NaN?
Post by: dedndave on June 24, 2009, 04:42:35 PM
i could paint my house in that many clock cycles
well, i could get the brush out, at least
Title: Re: Quick test for NaN?
Post by: qWord on June 24, 2009, 05:19:08 PM
jj,

you are trying to pop a "empty" register. I think this cause the speed issues.

regards qWord
Title: Re: Quick test for NaN?
Post by: jj2007 on June 24, 2009, 05:27:57 PM
Quote from: qWord on June 24, 2009, 05:19:08 PM
jj,

you are trying to pop a "empty" register. I think this cause the speed issues.

regards qWord

Sounds plausible, interesting though thqt AMD can deal with it.

@Dave: You are fast, can you come over to paint my house? I pay 1,000 bucks per hour :bg
Title: Re: Quick test for NaN?
Post by: dedndave on June 24, 2009, 06:00:45 PM
ok - i didn't say it would look good
however, i will look great doing it - lol
probably just as well - i can present less danger painting a house than writing code, eh?
Title: Re: Quick test for NaN?
Post by: jj2007 on June 24, 2009, 07:26:32 PM
Quote from: dedndave on June 24, 2009, 06:00:45 PM
ok - i didn't say it would look good
however, i will look great doing it - lol
probably just as well - i can present less danger painting a house than writing code, eh?

Hey, I wasn't particularly interested in a beauty contest. But you claimed to be able to do it in around 100,000 cycles. I would be generous and grant you 1 Million cycles, but still, at 2 GHz that makes 1000000/2E9/3600*1000 = 0.000138 US$ :bg

My 2 cts worth for today :thumbu