Hi,
after some FPU calculations, i store all 8 registers in an array of real8. Then i go through lots and lots of macros, pushing nesting to it's limits (i think).
Sometimes the registers are 0.0 values, and when i output them to text, they show up properly as "0.0".
However, when i use FCOM on the real8 values (with fstsw and sahf), if one is 0.0, it often (not always) gives an indeterminate result - meaning "jpe" is activated.
Also, when i do FLD MyReal8Array[index] and then FISTP MyInteger, the integer value becomes -2147483648.
If the value is anything other than 0.0 (like 0.2 or -0.545 or whatever) both FCOM and FISTP work properly.
Note: before using FCOM i always do FINIT so that's not it.
I tried reproducing the problem in a separate project, but i couldn't, because then everything worked as expected.
I am starting to think i might somehow be messing up the stored values with all of my code, or that i flipped some flags somewhere by accident.
The code is very complex by now and i can't even keep track of all the macros and conditionals, so it took a lot to just make it work, let alone figure out this mystery easily.
What could it be?
Should i convert more of my macros to proc, could it be the masm compiler? I'm clueless.
Thanks.
You want to compare two REAL-values for equality? That's not simple because the numbers can differ in the least significant fraction bits.
One simple solution would be to test for a range: e.g.
if (abs(a-b) <= threshold)
equal
else
not equal
BTW: use FCOMI[P] instead of FCOM
1) the FPU registers are real10 - not real8
2) it sounds like the tag word may not be restored properly
however, if you address issue 1, this may go away
otherwise, you may want to look at the FSTENV/FLDENV instructions, which save/restore the entire FPU state, including registers
Quote from: dedndave on March 29, 2012, 07:20:55 PM
1) the FPU registers are real10 - not real8
but you can save and load also other formats (double,float,sdword,sword,sqword) :wink
true
but, without seeing his test code, i am taking shots in the dark :P
Quote from: qWord on March 29, 2012, 07:18:29 PM
You want to compare two REAL-values for equality?
BTW: use FCOMI[P] instead of FCOM
- no, i am comparing only for > and <, i am comparing two real8 values in memory by loading one into st0 and using FCOM[P] to compare it to the other.
- I used FCOMI but it really makes no difference. I just do FSTSW and SAHF by myself after FCOM. I also tried using FWAIT but it didn't change anything.
Quote from: dedndave on March 29, 2012, 07:20:55 PM
1) the FPU registers are real10 - not real8
2) it sounds like the tag word may not be restored properly
however, if you address issue 1, this may go away
otherwise, you may want to look at the FSTENV/FLDENV instructions, which save/restore the entire FPU state, including registers
1) ok this might be the campus but i'm not that much of a noob. :) my code has 5271 lines by now and i know the SimplyFPU document by heart (even some byte opcodes).
2) i thought that finit restores the tag word too, no? i use finit before each command so how can that be it? maybe you just didn't read i mentioned FINIT, or im wrong about it restoring the tag word?
Quote from: dedndave on March 29, 2012, 07:39:18 PM
true
but, without seeing his test code, i am taking shots in the dark :P
I can give you code that works and that has sections that look identical to what i'm doing in my more complex code (only obviously something is different). The full code i really can't post, it's too huge and contains private info sadly.
Anyway, the main question could be now: Does FINIT restore the tag word, and if not, what would be the best way to do it? I couldn't find any documentation on that.
This might be the reason:
Proc1 calls Proc2.
The FPU is initialized in Proc2, gets the values for the registers and returns from the proc (with ret) back to Proc1.
It does nothing else but load zeros into registers and some other non-zero numbers and performs FPU calculations and commands.
When we return to Proc1, within Proc1 it then stores the FPU into my array of real8 (using FSTP 8 times). This is done immediately after Proc2 returns.
Does the push/pop this proc call (and ret) causes maybe something to mess up the tag word?
Edit: never mind, i just tested storing st(0) into a single variable within the same proc, and when compared to a constant of 10.0, it jumps to "jpe", so still errorish.
braincell,
did you ever use a debugger?
I would suggest you place a INT 3 before the point of interest and then use OllyDbg to step through the code.
Quote from: braincell on March 29, 2012, 08:08:52 PMi use finit before each command so how can that be it?
we may have found the problem, right there :P
i suggest FINIT once at the beginning of the program
then - try not to stuff more than 8 values into the FPU at any given time
oh - and read the FINIT part of Ray's tutorial one more time :bg
Quote from: qWord on March 29, 2012, 08:28:49 PM
braincell,
did you ever use a debugger?
I would suggest you place a INT 3 before the point of interest and then use OllyDbg to step through the code.
Did i ever? I use Olly every day. I can't get it to work with INT3 because this is a dll called from .Net (managed code) it just hangs. I debug by outputting to a text file. :)
Quote from: dedndave on March 29, 2012, 08:30:26 PM
Quote from: braincell on March 29, 2012, 08:08:52 PMi use finit before each command so how can that be it?
we may have found the problem, right there :P
i suggest FINIT once at the beginning of the program
then - try not to stuff more than 8 values into the FPU at any given time
oh - and read the FINIT part of Ray's tutorial one more time :bg
I just removed all FINIT from my code except for the first one. I wasn't sure it would work, but obviously i was careful enough with it, so the code worked.
However the problem is still there.
I read it again 1 hour ago. Did i miss something for the 5th time reading it?
Never mind the comparison problem, i can do FTST for zeros and simply scoot around the issue.
The bigger problem is why FISTP returns a -217million value to the integer when it's a 0.0 value.
when you ask these questions, it is hard for us to see what the real (pun) problems are without seeing specific code :P
for the most part, you can assume that the FPU works correctly
so - and i do this, myself - lol - "if it doesn't work, it must be me" :bg
-217 million sounds like a stored real being interpreted (or treated) as a large signed integer
from Ray...
QuoteThis instruction initializes the FPU by resetting all the registers and flags to their default values.
Quote from: braincell on March 29, 2012, 08:38:45 PM
I can't get it to work with INT3 because this is a dll called from .Net (managed code) it just hangs. I debug by outputting to a text file. :)
should be not a problem, because VS can show you the x86-disassembly of the program, when debugging it.
The odd result can easily be reproduced with this snippet:
include \masm32\MasmBasic\MasmBasic.inc ; download (http://www.masm32.com/board/index.php?topic=12460)
Init
Dim MyR8(9) As REAL8
FpuFill ; pushes 1001....1007 on the FPU
deb 4, "Fpu", ST(0), ST(1), ST(2), ST(3), ST(4), ST(5), ST(6), ST(7), 80000000h
fld MyR8(0)
push eax
fist dword ptr [esp]
pop eax
deb 4, "Result", eax, ST(0)
fstp st
Inkey "ok"
Exit
end start
Fpu
ST(0) 1001.00000000000000
ST(1) 1002.00000000000000
ST(2) 1003.00000000000000
ST(3) 1004.00000000000000
ST(4) 1005.00000000000000
ST(5) 1006.00000000000000
ST(6) 0.0 <<<<<<<<< deb cannot display the last two regs, so no surprise
ST(7) 0.0
80000000h -2147483648 <<<<<<<< the 217 Mio is just the FPU's way to express anger about unfair treatment ;-)
Result
eax -2147483648 <<<<<<<<<<<<<<<<<<<<<<<<
ST(0) 0.0
Most probably, you push values on the FPU in a loop, and after iteration 7 the FPU is full. Which results in 80000000h popped into an integer...
Try to put an ffree ST(7) before each and every fld or fild command.
Quote from: dedndave on March 29, 2012, 08:42:50 PM
when you ask these questions, it is hard for us to see what the real (pun) problems are without seeing specific code :P
for the most part, you can assume that the FPU works correctly
so - and i do this, myself - lol - "if it doesn't work, it must be me" :bg
-217 million sounds like a stored real being interpreted (or treated) as a large signed integer
from Ray...
QuoteThis instruction initializes the FPU by resetting all the registers and flags to their default values.
Hehe it's the realest problem i've had so far, that's why im posting. ;)
Of course it's "just me" and not the FPU, it's just a mystery. Yeah -214million (-217 was a mistake) is exactly that. I think it's probably what jj said below.
Quote from: qWord on March 29, 2012, 08:44:46 PM
should be not a problem, because VS can show you the x86-disassembly of the program, when debugging it.
Hmm, it hung the last few times i tried it. I will try again soon.
Quote from: jj2007 on March 29, 2012, 08:58:55 PM
Most probably, you push values on the FPU in a loop, and after iteration 7 the FPU is full. Which results in 80000000h popped into an integer...
Try to put an ffree ST(7) before each and every fld or fild command.
Oh wow, now that's a really good answer. I thought that if i did FLD into a full register that i would get a stack overflow, or some other kind of program-halting problem. I had no idea it could be silent, so i didn't even look into it.
I'll check all my FLDs and let you know how that goes. Thanks!
I think i found the problem.
I am creating an array of bytes as FPU opcodes which i insert at a label within a VirtualProtected proc.
When i allow only FADD opcodes to be inserted, even when the value is 0.0, the error is not there.
Obviously, some of the more complex opcodes i am inserting are messing up the FPU. The question is now to find out which ones those are.
I can't really "see" the code as it's only passed as an array of bytes, randomly generated (but within logical constraints to not break anything), but maybe Olly can help.
I'll try to get Olly running with INT3 again, so thanks for that suggestion.
Thanks jj for suggesting your idea too, obviously, that solved it for me.
I'm glad i keep adding features to my code, otherwise i would miss bugs like this.
Thanks everyone!
Quote from: braincell on March 29, 2012, 09:23:27 PMI can't really "see" the code as it's only passed as an array of bytes, randomly generated
:bg
when you're done, you can write a book...
Learning Assembly Language, The Hard Way
Err yea :8)
For what it's worth, the problem ended up being this: it's that my array of Double (real8) included NaNs and when they reached my MASM dll and were included in some computations, it only showed them as 0.0 in debug, so i didn't notice.
Any computation which included a NaN ended up being well ... just that.
I guess i expected exceptions where there arent any.
This is a "well duh" moment, but yeah, i learn a lot. The hard way.
BTW, what's the best way to check if a number on the FPU (REAL10) or in memory (REAL8) is NaN?
What's better of these two:
Method #1
fld MyReal8
fxtract
fstp st(0)
fistp dbg_test_Xponent
mov ax, dbg_test_Xponent
cmp ax, 32768
jne dbg_Not_NaN
deb 5, " exponent detected NaN value ..."
dbg_Not_NaN:
Method #2
fld MyReal8
fcomp SomeValidConstant
fstsw ax
sahf
jpe dbg_IsNaN
jmp dbg_IsNotNaN
dbg_IsNaN:
;NaN detection code
dbg_IsNotNaN:
;no NaN detected...
the status word is the easiest way
there are a few value formats that constitute NaN's, so it would probably take more code to check all of them
if all the exponent bits are 1's and one or more of the mantissa bits is 1, it is a NaN
you'd have to time it, but i suspect the status word is faster
push eax
fstsw word ptr [esp]
pop eax ;AX = status word
as far as "strange" code goes, i am a big fan - lol
a while back, Jochen and i had a personal contest to see who could build a table in the fewest bytes of code
i think i won by a few bytes - look at the Gen10 and Cor10 PROC's....
Hmm ok i'll have to test it a bit. Right now only Sqrt() of negative numbers and 0 division is giving me problems. I can throw out those operators completely as the don't contribute much to my solutions, and i keep the speed AND integrity. Nifty.
I had a look at the code.
Are you that guy that coded the program for a Pacemaker? 4k of memory due to low power, using 1 bit of memory for multiple purposes, etc? Yeah, crazy.
I'd probably need pencil and paper to draw out what your code does, but it looks clever. Maybe when i go on holiday that can be my hobby. :)
nah - that wasn't me
i am going to be famous for winning half a billion in the mega lottery :U
then i am gonna change my name, so i won't be famous again
Quote from: braincell on March 29, 2012, 09:07:24 PM
I thought that if i did FLD into a full register that i would get a stack overflow, or some other kind of program-halting problem.
The FPU normally handles exceptions internally, but you can enable the FPU to generate external interrupts for specific exceptions, and so trigger an exception that a debugger can catch.
;==============================================================================
include \masm32\include\masm32rt.inc
.686
;==============================================================================
;-----------------------------------------
; Values for the FPU interrupt mask bits.
;-----------------------------------------
FIM_INVALID equ 1
FIM_DENORMALIZED equ 2
FIM_ZERODIVIDE equ 4
FIM_OVERFLOW equ 8
FIM_UNDERFLOW equ 16
FIM_PRECISION equ 32
FIM_ANY equ 63
;--------------------------------------------------------
; This macro selectively clears the FPU interrupt masks.
; Example usage: FCLEARIM FIM_ZERODIVIDE or FIM_OVERFLOW
;--------------------------------------------------------
FCLEARIM MACRO maskbits
push eax
fstcw [esp]
pop eax
and ax, NOT (maskbits)
push eax
fldcw [esp]
pop eax
ENDM
;==============================================================================
.data
r8 REAL8 ?
r10max REAL10 1.18E4932 ; Approximate
r10min REAL10 3.37E-4932 ; Approximate
.code
;==============================================================================
start:
;==============================================================================
finit
fldpi
fldz
fdiv
fstp r8
inkey
;FCLEARIM FIM_ZERODIVIDE
;fldpi
;fldz
;fdiv
;fstp r8
;FCLEARIM FIM_INVALID
;REPEAT 9
; fldpi
;ENDM
;fstp r8
;FCLEARIM FIM_OVERFLOW
;fld r10max
;fld r10max
;fmul
;fstp r8
FCLEARIM FIM_UNDERFLOW
fld r10min
fld r10max
fdiv
fstp r8
inkey
exit
;==============================================================================
end start
#define STATUS_FLOAT_DENORMAL_OPERAND ((NTSTATUS)0xC000008DL)
#define STATUS_FLOAT_DIVIDE_BY_ZERO ((NTSTATUS)0xC000008EL)
#define STATUS_FLOAT_INEXACT_RESULT ((NTSTATUS)0xC000008FL)
#define STATUS_FLOAT_INVALID_OPERATION ((NTSTATUS)0xC0000090L)
#define STATUS_FLOAT_OVERFLOW ((NTSTATUS)0xC0000091L)
#define STATUS_FLOAT_STACK_CHECK ((NTSTATUS)0xC0000092L)
#define STATUS_FLOAT_UNDERFLOW ((NTSTATUS)0xC0000093L)
Quote from: MichaelW on March 30, 2012, 12:54:35 AM
The FPU normally handles exceptions internally, but you can enable the FPU to generate external interrupts for specific exceptions, and so trigger an exception that a debugger can catch.
Ah! So that's how the pros do it. Ok, cheers, code saved for future use.
I would suggest that you read at least once more the description of the "Status word" in Chap. 1 of Simply FPU.
Whenever there may be a risk of an invalid instruction in my programs, I simply add the following code before continuing to use a result from a computation:
fstsw ax
test al,1
jz everything_valid
fclex ;rezero the exception flags, once set, they don't get reset otherwise
;include code or set flags to cope with an invalid operation
May I suggest going through Ramond's Simply FPU thoroughly...as well as Intels doccies :wink
Ray can only do so much :P
generating random code to see what happens is a bit "hit-and-miss" - lol
better to try each instruction, individually - play with it until you are comfortable, and move on to the next one
for all its' complexities, i always thought the FPU was pretty simple to learn
a little trickier back in the old days - the 8087 had a few more buttons to mash
things like "affine" or "projective" infinity - huh ? :eek
maybe Ray could add an 8087 addendum to his tutorials (lest it be forgotten in the annals of time)
should be required learning for math majors in college :bg