Quick test for NaN?

jj2007 · June 23, 2009, 04:00:04 PM

With esi as a pointer to a REAL10, I try to test for a NaN before restoring a value on the FPU:

		movzx eax, word ptr [esi+8]
		inc ax
		.if Zero?
			fld1		; set a harmless value
		.else
			fld REAL10 ptr [esi]
		.endif

I have a suspicion that this hack does not cover all situations. Is there any better, fast and foolproof way to do that? fxam is horribly slow...

qWord · June 23, 2009, 04:42:53 PM

iirc, you had to test bits 64-78, (the sign bit is used to differ +/- INFINITY ).
my suggestion:

Code Select


	movzx eax,WORD ptr [esi+8]
	or eax,08000h
	xor eax,0ffffh
	jz @NAN

dedndave · June 23, 2009, 04:42:59 PM

compare the value to itself
NANs are never equal to anything
you just have to do it before you store it into memory

fcom st(0)

jj2007 · June 23, 2009, 08:02:49 PM

Quote from: qWord on June 23, 2009, 04:42:53 PM
iirc, you had to test bits 64-78, (the sign bit is used to differ +/- INFINITY ).
my suggestion:
Code Select Expand
movzx eax,WORD ptr [esi+8] or eax,08000h xor eax,0ffffh jz @NAN

Thanks, looks convincing and works :U

jj2007 · June 23, 2009, 08:30:32 PM

Quote from: dedndave on June 23, 2009, 04:42:59 PM
compare the value to itself
NANs are never equal to anything
you just have to do it before you store it into memory

fcom st(0)

Sounds interesting, too. My problem is as follows:
- I need three FPU registers for a routine
- I store ST0...ST2 to memory
- that works fine most of the time
- except when they are empty
- then they become BAD NaN's after loading them into the FPU...

Code Select

Anyproc proc
LOCAL f2sST0:REAL10, f2sST1:REAL10, f2sST2:REAL10
	fstp f2sST0					; save & free 3 registers: 3*(6+6)=36 cycles, in practice 10 per FPU register, 30 cycles
	fstp f2sST1					; No. 2 saved
	fstp f2sST2					; No. 3 saved

... do stuff with FPU ...

	lea esi, f2sST2
	m2m ecx, 3
	.Repeat
		movzx eax, word ptr [esi+8]
		or eax, 08000h
		xor eax, 0ffffh
		.if Zero?
			fld1		; set a harmless value
		.else
			fld REAL10 ptr [esi]
		.endif
		lea esi, [esi+10]
		dec ecx
	.Until Zero?

That prevents NaN in the FPU.

dedndave · June 23, 2009, 08:36:34 PM

i see - well - it is prolly faster than fcom, also
but....

lea esi,[esi+10]

same as ???

add esi,10

or is the lea a faster, shorter way ?
EDIT
oh - btw - to be a NaN, the mantissa must also be non-zero
if the exponent bits are all 1's and the mantisaa bits are all 0's, it evaluates to signed infinity

jj2007 · June 23, 2009, 09:06:59 PM

Quote from: dedndave on June 23, 2009, 08:36:34 PM
i see - well - it is prolly faster than fcom, also
but....

lea esi,[esi+10]

same as ???

add esi,10

or is the lea a faster, shorter way ?
EDIT
oh - btw - to be a NaN, the mantissa must also be non-zero
if the exponent bits are all 1's and the mantisaa bits are all 0's, it evaluates to signed infinity

> lea esi,[esi+10]
> add esi,10

lea does not modify flags and is marginally faster on most CPUs.

> the mantissa must also be non-zero

testing that would make it a bit slow...

In the meantime, I inserted ffree ST(7) in front of any fld. That helps to prevent NaNs at the source :bg

dedndave · June 23, 2009, 09:13:39 PM

that is what i was thinking - testing conditions prior to creating the NaN's to avoid them may be better altogether
but, if you are treating infinity as though it was a NaN, you are alright
for example, don't take the squareroot of a negative number
or trig functions that are out of valid range - that kind of thing

Jimg · June 23, 2009, 10:53:32 PM

This all sounds suspiciously like the same problem I've been fighting.
You're writing a general purpose routine. You have to return the fpu in exactly the condition the user gave it to you.
You don't want to do an fsave/frestor and pay the 500 cycle penalty, so you try to save just what you use. Unfortunately, besides the registers, you have to save and restore several status words, and it all adds up to about the same. And you have to do an fnclex or you will crash if he already had an exception. If you do the fnclex, you can't return the fpu in the same condition.
My final solution was I only had to do 4 real10 multiplies, so I wrote a real10 fmul emulator. Seems like it would be slow, but it's faster than fsave/frestor.

dedndave · June 24, 2009, 02:28:51 AM

it's funny you say that
i was looking for a simple way to display 64-bit integers as decimal ASCII
i could have used the FPU to do it
but, being a bit rusty at assembler coding, i decided to write a multiple-precision divide loop and make my own routine
i was surprised at how fast my code was
i want to spend some more time on it, as i think i can improve on it even further
i suspect a lot of simpler math can be done without the FPU
of course, if you need exponential or trig functions, it would be hard to improve on the FPU for speed and accuracy, combined

raymond · June 24, 2009, 02:38:08 AM

QuoteAnd you have to do an fnclex or you will crash if he already had an exception.

Exceptions on the FPU will never crash a program by themselves. They strictly get recorded as such in the Status Word. It is then the programmer's responsibility to check whichever exception flag may be of significance whenever necessary, and possibly provide exception handling code if deemed appropriate.

The above does not deny the fact that a program may later crash when erroneous FPU results due to some exceptions are allowed to continue being used.

Jimg · June 24, 2009, 03:20:21 AM

I stand corrected. But doing an fnclex cured my problem.

jj2007 · June 24, 2009, 08:07:01 AM

Here is the complete sequence I use:

Code Select

.nolist
include \masm32\include\masm32rt.inc

.code
start:
;	int 3			; uncomment to see what happens in Olly
	push 1234567890
	fild dword ptr [esp]	; 1234567890 in the FPU, NEAR 53
	add esp, 4
	call MyTest		; call the FPU proc that uses ST0...ST3

	push 1111111111
	REPEAT 7
	fild dword ptr [esp]	; 7*1111111111 in the FPU, 123456789 in ST(7), NEAR 53
	ENDM
	add esp, 4
;	int 3			; another breakpoint for Olly
	call MyTest		; call the proc again, this time with a fully loaded FPU
;	int 3			; another breakpoint for Olly

	print "OK"
	getkey
	exit

MyTest proc
LOCAL f2sST0:REAL10, f2sST1:REAL10, f2sST2:REAL10
LOCAL dummy1:WORD, f2sConWold:WORD, f2sConDy:WORD, f2sConW:WORD, dummy3:WORD, dummy4:dword
	fstcw f2sConWold		; status word could be saved as fstsw ax,
	mov eax, dword ptr f2sConWold	; but no such instruction for Control word
	or ax, 011100000000b		; rounding mode DOWN,
	mov dword ptr f2sConW, eax	; precision max=64
	fldcw f2sConW	; set new control word
	fstp f2sST0	; save & free 3 registers: 3*(6+6)=36 cycles, in practice 10 per FPU register, 30 cycles
	fstp f2sST1	; No. 2 saved
	fstp f2sST2	; No. 3 saved

	push 11111111
	fild dword ptr [esp]	; just for fun
	pop eax
	fstp st

	lea esi, f2sST2
	m2m ecx, 3
	.Repeat
		movzx eax, word ptr [esi+8]
		or eax, 08000h
		xor eax, 0ffffh
		.if Zero?
			fldz		; push zero and
			ffree ST	; free the register
		.else
			fld REAL10 ptr [esi]
		.endif
		lea esi, [esi+10]
		dec ecx
	.Until Zero?
	fldcw f2sConWold	; restore control word ->NEAR 53
	ret
MyTest endp

end start

On return, the FPU is back to NEAR 53, has 123456879 in ST0 ~~plus two NaNs in ST1 and ST2. In Olly, this looks ugly, but in real life it should not have any negative impact on other code that uses the FPU.~~ If there was valid stuff in the FPU, it is still there.

EDIT: New code above tests for NaN restored from memory, and empties registers if it finds one. This should be foolproof, but testers welcome :bg

SteveCurtis · June 24, 2009, 08:50:43 AM

Quote from: jj2007 on June 23, 2009, 04:00:04 PM
With esi as a pointer to a REAL10, I try to test for a NaN before restoring a value on the FPU:

Code Select Expand
movzx eax, word ptr [esi+8] inc ax .if Zero? fld1 ; set a harmless value .else fld REAL10 ptr [esi] .endif

I have a suspicion that this hack does not cover all situations. Is there any better, fast and foolproof way to do that? fxam is horribly slow...

Hi jj2007,

I know this does not represent a 'quick' anything, but when I got interested in doing Digital Signal Processing functions on the x86 processor I began looking for anything about floating point format and limitations of use etc and was staggered to discover how many things we need to be aware of! :eek

This link is titled 'What Every Computer Scientist Should Know About Floating-Point Arithmetic" and is worth a read for background anyway, if you haven't seen it.

http://docs.sun.com/source/806-3568/ncg_goldberg.html

Also have a look on www.embedded.com and scout around what 'Jack W. Crenshaw' says. I'm sure he had a few tricks for the NAN problem and the 'denormal' number problem, by hacking/reading the floating point format in memory. Some of this is also treated by Goldberg in detail, whom I think is the oracle for floating point stuff anyway.

Goldberg also goes into quite a bit of detail about using 'integer arithmetic' on the floating format to tell if one float equals another float 'or close thereto'. Very fast and reliable.

Regards,
Steve

SteveCurtis · June 24, 2009, 09:12:15 AM

Quote from: dedndave on June 24, 2009, 02:28:51 AM
it's funny you say that
i was looking for a simple way to display 64-bit integers as decimal ASCII
i could have used the FPU to do it
but, being a bit rusty at assembler coding, i decided to write a multiple-precision divide loop and make my own routine
i was surprised at how fast my code was
i want to spend some more time on it, as i think i can improve on it even further
i suspect a lot of simpler math can be done without the FPU
of course, if you need exponential or trig functions, it would be hard to improve on the FPU for speed and accuracy, combined

Hi Dave,

I got a book by Jack Crenshaw that goes into the development of the trig functions from first principles (Chebychev Coefficient optimisation of Taylor Series etc) and allows you to tailor your functions for speed or accuracy (or both!). He shows how to do integer based versions of everything also and that was why I got into it (for 8051 and small end processors, -- doing a square root can be time consuming unles you optimise in small processor!)

Although most C library functions should be optimised for speed, they are not always intrinsically safe for all numbers. He shows how to guard against accindental FPU errors and wrap calls so they will always be safe and to write your own functions if they are not supported in your compiler implementaiton.

ArcTan, AcrTan2, Tan, Cos, Sine and Modulo artihmetic for floats and integers and 'pi' definitions etc are all covered, plus Simpsons rule for integration, plus matrix operations and a host of other ideas to speed things along and make things as reliable as possible (he hates exceptions as they are slow and says you should guard your code and silently handle any errors, depending on the outcome of course!).

The book is called "Math Toolkit for Real-Time Programming" by Jack W. Crenshaw. I liked it a lot and have haunted the embedded.com website for other high quality information also. THere is a heap of stuff about my pet subject (protecte mode in embedded applications for example!) :wink

Regards,
Steve

News:

Quick test for NaN?

SteveCurtis

SteveCurtis