News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Quick test for NaN?

Started by jj2007, June 23, 2009, 04:00:04 PM

Previous topic - Next topic

jj2007

Thanks, Steve. The FPU is tricky indeed ;-)
The problem here is that if you need, say, 3 registers, and you want to save the current content to memory and restore it later, that empty registers turn into NaNs when reloaded.

That *may* become a problem if other code pushes 6 values on the FPU without using ffree st(7).

Testing if ST(0) is empty is pretty slow and difficult, unfortunately (via FSTENV). But fortunately, the value of a saved empty register in memory can be tested easily for NaN - see my edited code above.

dedndave

here is the FPU reference i use, from Washington University...
http://www.arl.wustl.edu/~lockwood/class/cs306/books/artofasm/Chapter_14/CH14-1.html

Steve, yes, i have heard of Jack and read a few of his articles
from what i understand, he quit doing them  :(

i suppose a multi-tasking system takes the fun out of the FPU
i don't like sharing - lol

SteveCurtis

Quote from: dedndave on June 24, 2009, 11:47:59 AM

http://www.arl.wustl.edu/~lockwood/class/cs306/books/artofasm/Chapter_14/CH14-1.html
Steve, yes, i have heard of Jack and read a few of his articles
from what i understand, he quit doing them  :(

Hi Dave,
Thanks mate.. Just added to my favs list for a read on the cold winter nights.   :wink
Loved Jack's rambling style. Apparently he worked on the early Apollo missions for NASA. Still had very much the 'down to earth' quailty about him I really liked. Picked up his book at Borders and read the first few lines and promptly bought the book.

Saw the Mr Bean Vid. Yay thanks. I'll also pass it on to my 2 girls who got areal kick out of him when they were younger.

Regards,
Steve


jj2007

Just stumbled into a strange behaviour of the popular fstp ST instruction:

459     cycles for 100*fdecstp, fincstp
502     cycles for 100*fldz, fincstp
104813  cycles for 100*fdecstp, fstp ST
107337  cycles for 100*fldz, fstp ST


Here are the four timed instruction sequences:
fdecstp
ffree st(0) ; rotate the barrel, free ST
fincstp ; (rotate back for next test below)

fldz
ffree st(0) ; push a zero, free ST
fincstp ; (rotate back for next test below)

fdecstp
ffree st(0) ; rotate the barrel, free ST
fstp ST ; (pop ST for next test below)

fldz
ffree st(0) ; push a zero, free ST
fstp ST ; (pop ST for next test below...)


Source and executable attached. Any explanations?

[attachment deleted by admin]

Jimg

amd results:
110     cycles for 100*fdecstp, fincstp
193     cycles for 100*fldz, fincstp
196     cycles for 100*fdecstp, fstp ST
142     cycles for 100*fldz, fstp ST

--- ok ---

jj2007

Seems to be an Intel problem. Timings above were for a Prescoot P4, these are for a Intel(R) Pentium(R) 4 CPU 2.40GHz (SSE2):

685     cycles for 100*fdecstp, fincstp
487     cycles for 100*fldz, fincstp
85383   cycles for 100*fdecstp, fstp ST
88323   cycles for 100*fldz, fstp ST

dedndave

i could paint my house in that many clock cycles
well, i could get the brush out, at least

qWord

jj,

you are trying to pop a "empty" register. I think this cause the speed issues.

regards qWord
FPU in a trice: SmplMath
It's that simple!

jj2007

Quote from: qWord on June 24, 2009, 05:19:08 PM
jj,

you are trying to pop a "empty" register. I think this cause the speed issues.

regards qWord

Sounds plausible, interesting though thqt AMD can deal with it.

@Dave: You are fast, can you come over to paint my house? I pay 1,000 bucks per hour :bg

dedndave

ok - i didn't say it would look good
however, i will look great doing it - lol
probably just as well - i can present less danger painting a house than writing code, eh?

jj2007

Quote from: dedndave on June 24, 2009, 06:00:45 PM
ok - i didn't say it would look good
however, i will look great doing it - lol
probably just as well - i can present less danger painting a house than writing code, eh?

Hey, I wasn't particularly interested in a beauty contest. But you claimed to be able to do it in around 100,000 cycles. I would be generous and grant you 1 Million cycles, but still, at 2 GHz that makes 1000000/2E9/3600*1000 = 0.000138 US$ :bg

My 2 cts worth for today :thumbu