Comparing real variables: fcmp

jj2007 · December 28, 2009, 09:51:59 PM

A quick way to compare real variables:

Quotefcmp MyReal10, MyOtherReal8
.if Sign?
print "lower"
.elseif Zero?
print "equal"
.else
print "higher"
.endif

First parameter must be a real variable.
Second para can be blank (=compare against zero), a real variable, an immediate integer or a reg32.

Code Select

include \masm32\include\masm32rt.inc

fcmp MACRO cmp1:REQ, cmp2	; floatcmp
LOCAL oa
  ffree st(7)
  ifb <cmp2>
	fldz				; no second arg; compare against zero
  else
	oa = (opattr cmp2) AND 127
	if (oa eq 36) or (oa eq 48)
		push cmp2
		fild dword ptr [esp]	; integer or reg32 on stack, then on FPU
		pop eax
	else
		fld cmp2		; real on FPU
	endif
  endif
  ffree st(7)
  fld cmp1
  call fcmpP
ENDM
.code
fcmpP_proc:
fcmpP proc
	xor edx, edx	; default retval: equal
	fcompp		; pop twice
	fstsw ax	; move FPU flags C1 etc to ax
	test ah, 64	; C3 is set if ST=0 (bt eax, 14)
	jne @F		; equal, edx=0
	test ah, 1	; C0 (bt eax, 8)
	je fcPos
	dec edx		; negative
	ret
@@:	dec edx		; or zero (0-1+1=0)
fcPos:	inc edx		; positive
	ret
fcmpP endp
fcmpP_endp:

; --------------------- some test variables: ---------------------
f4	REAL4	4.4
f8	REAL8	8.8
f10	REAL10	0.0

start:
	print chr$("expected: higher ", 9)
C1_start:
	fcmp f4		; only one argument: compare against zero
C1_end:
	.if Sign?
		print "lower"
	.elseif Zero?
		print "equal"
	.else
		print "higher"
	.endif

	print chr$(13, 10, "expected: lower  ", 9)
C2_start:
	fcmp f4, f8			; compare a real4 against a real8
C2_end:
	.if Sign?
		print "lower"
	.elseif Zero?
		print "equal"
	.else
		print "higher"
	.endif

	print chr$(13, 10, "expected: higher  ", 9)
	mov eax, -1
C3_start:
	fcmp f10, eax		; compare a real10 against eax
C3_end:
	.if Sign?
		print "lower"
	.elseif Zero?
		print "equal"
	.else
		print "higher"
	.endif

	print chr$(13, 10, "expected: equal  ", 9)
C4_start:
	fcmp f10		; compare a real10 against zero
C4_end:
	.if Sign?
		print "lower"
	.elseif Zero?
		print "equal"
	.else
		print "higher"
	.endif

	print chr$(13, 10, 10, "Code sizes:", 13, 10, "proc: ", 9)
	mov eax, fcmpP_endp
	sub eax, fcmpP_proc
	print str$(eax), 13, 10, "call 1:", 9
	mov eax, C1_end
	sub eax, C1_start
	print str$(eax), 13, 10, "call 2:", 9
	mov eax, C2_end
	sub eax, C2_start
	print str$(eax), 13, 10, "call 3:", 9
	mov eax, C3_end
	sub eax, C3_start
	print str$(eax), 13, 10, "call 4:", 9
	mov eax, C4_end
	sub eax, C4_start
	print str$(eax)
	getkey
	exit
end start

(the fcmpP_proc and fcmpP_endp labels are just for getting the code size)

dedndave · December 28, 2009, 10:34:09 PM

great idea JJ
and you don't have to worry about saving the FPU state/stack :U

here it is - i thought you may find this interesting...
http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm

jj2007 · December 28, 2009, 11:55:14 PM

Quote from: dedndave on December 28, 2009, 10:34:09 PM
great idea JJ
and you don't have to worry about saving the FPU state/stack :U

here it is - i thought you may find this interesting...
http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm

"AlmostEqual" looks suspiciously close to my old Parser routine. Check attachment for RealComp ;-)

Code Select

G1=12.34, G2=43.21 (both global REAL10)
f1=11.11, f2=22.22 (both local REAL10)

Calculating 68.16122-G2*(f1+3*1.8**2.2)+G2*f1 :
Expected value is       -404.2335
Calculated value is     -404.2335
-- Second operand is smaller

Calculating 4-G1*(f2-3.4)+G2*f1 :
Expected value is       251.8243
Calculated value is     251.8243
** Operands are equal or almost equal at 8 digits precision

GregL · December 29, 2009, 03:30:56 AM

Here are some macros I wrote a while back for comparing real numbers, with help from the example code in Raymond's 'Simply FPU'.

Note to jj: I wasn't concerned about speed, just functionality. :bg

Code Select


; ====================================
isgreater MACRO r1:REQ, r2:REQ
    LOCAL error, true, false, clear
    finit
    fld r2
    fld r1
    fcom
    fstsw ax
    fwait
    sahf
    jpe   error
    ja    true
    jbe   false
  error:
    mov eax, -1
    jmp clear
  true:
    mov eax, 1
    jmp clear
  false:
    xor eax, eax
  clear:
    fstp st(0)
    fstp st(0)
    EXITM <eax>
ENDM
; ====================================
isgreaterequal MACRO r1:REQ, r2:REQ
    LOCAL error, true, false, clear
    finit
    fld r2
    fld r1
    fcom
    fstsw ax
    fwait
    sahf
    jpe   error
    jae   true
    jb    false
  error:
    mov eax, -1
    jmp clear
  true:
    mov eax, 1
    jmp clear
  false:
    xor eax, eax
  clear:
    fstp st(0)
    fstp st(0)
    EXITM <eax>
ENDM
; ====================================
isless MACRO r1:REQ, r2:REQ
    LOCAL error, true, false, clear
    finit
    fld r2
    fld r1
    fcom
    fstsw ax
    fwait
    sahf
    jpe   error
    jae   false
    jb    true
  error:
    mov eax, -1
    jmp clear
  true:
    mov eax, 1
    jmp clear
  false:
    xor eax, eax
  clear:
    fstp st(0)
    fstp st(0)
    EXITM <eax>
ENDM
; ====================================
islessequal MACRO r1:REQ, r2:REQ
    LOCAL error, true, false, clear
    finit
    fld r2
    fld r1
    fcom
    fstsw ax
    fwait
    sahf
    jpe   error
    ja    false
    jbe   true
  error:
    mov eax, -1
    jmp clear
  true:
    mov eax, 1
    jmp clear
  false:
    xor eax, eax
  clear:
    fstp st(0)
    fstp st(0)
    EXITM <eax>
ENDM
; ====================================
isnotequal MACRO r1:REQ, r2:REQ
    LOCAL error, true, false, clear
    finit
    fld r2
    fld r1
    fcom
    fstsw ax
    fwait
    sahf
    jpe   error
    ja    true
    jb    true
    jz    false
  error:
    mov eax, -1
    jmp clear
  true:
    mov eax, 1
    jmp clear
  false:
    xor eax, eax
  clear:
    fstp st(0)
    fstp st(0)
    EXITM <eax>
ENDM
; ====================================
isequal MACRO r1:REQ, r2:REQ
    LOCAL error, true, false, clear
    finit
    fld r2
    fld r1
    fcom
    fstsw ax
    fwait
    sahf
    jpe   error
    ja    false
    jb    false
    jz    true
  error:
    mov eax, -1
    jmp clear
  true:
    mov eax, 1
    jmp clear
  false:
    xor eax, eax
  clear:
    fstp st(0)
    fstp st(0)
    EXITM <eax>
ENDM
; ====================================
islessgreater MACRO r1:REQ, r2:REQ
    EXITM <isnotequal(r1, r2)>
ENDM
; ====================================
isapproxequal MACRO r1:REQ, r2:REQ, tolerance:REQ
    LOCAL diff
    .DATA?
        diff REAL10 ?
    .CODE
    finit
    .IF isgreater(r1, r2)
        fld   r2
        fld   r1
    .ELSE
        fld   r1
        fld   r2
    .ENDIF
    fsub
    fstp  diff
    fwait
    EXITM <islessequal(diff, tolerance)>
ENDM
; ====================================

jj2007 · December 29, 2009, 07:54:14 AM

Quote from: Greg Lyon on December 29, 2009, 03:30:56 AM
Here are some macros I wrote a while back for comparing real numbers, with help from the example code in Raymond's 'Simply FPU'.

Greg, looks nice, but finit trashes the whole FPU content. Especially in a context of comparing real variables, one should assume that the user is working with the FPU, and needs its content. My macro leaves all FPU settings intact and trashes only the top registers, st 6+7 - it is extremely unlikely that a user uses all 8 registers simultaneously when calling the fcmp macro, so I guess that "loss" can be tolerated.

Re functionality, imho one macro is enough - there is also only one cmp eax, nnn for integers. The right jumps can be set e.g. like this:

Code Select

	fcmp f4, f8				; compare a real4 against a real8
	.if Sign? && !Zero?
		print "lower"
	.elseif !Sign? && !Zero?
		print "higher"
	.elseif Zero?
		print "equal"
	.elseif Sign? || Zero?
		print "lower or equal"
	.elseif Zero? || !Sign?
		print "higher or equal"
	.endif

Below a modified version that preserves edx (but not eax). At 23 bytes for the proc, and 17...21 bytes for calling the macro, it is still fairly contained in size.

Quotefcmp MACRO cmp1:REQ, cmp2   ; -------- floatcmp --------
LOCAL oa
ffree st(7)
ifb <cmp2>
   fldz                        ; no second arg; compare against zero
else
   oa = (opattr cmp2) AND 127
   if (oa eq 36) or (oa eq 48)
      push cmp2
      fild dword ptr [esp]      ; integer or reg32 on stack, then on FPU
      pop eax
   else
      fld cmp2               ; real on FPU
   endif
endif
ffree st(7)
fld cmp1
call fcmpP
ENDM
.code   ; --------- end of macro --------

fcmpP [/size]proc
   push edx
   xor edx, edx   ; clear the flag register
   fcompp      ; compare ST(0) with ST(1) and pop twice
   fstsw ax      ; move FPU flags C1 etc to ax
   test ah, 64   ; C3 is set if ST=0 (bt eax, 14)
   jne @F      ; equal, edx=0
   test ah, 1      ; C0 (bt eax, 8)
   je fcPos
   dec edx      ; negative (-2+1=-1)
@@:   dec edx      ; or zero (0-1+1=0)
fcPos:   inc edx      ; positive
   pop edx
   ret
fcmpP endp

jj2007 · December 29, 2009, 09:51:43 AM

Coming back to Dave's suggestion to look at the "almost equal" question: Here is a testbed for one approach that consists of saving the two operands to memory as REAL4. The limitation of this approach is that the exponent cannot exceed the REAL4 range.

Code Select

include \masm32\include\masm32rt.inc

.data
aeL1	REAL10	3.9999999
aeH1	REAL10	4.0
aeL2	REAL10	4.0
aeH2	REAL10	4.0000001
aeL3	REAL10	-4.0000001
aeH3	REAL10	-4.0
aeL4	REAL10	-4.000001e-39	; will falsely report equal for exponents above 37 or below -39
aeH4	REAL10	-4.0e-39
aeL5	REAL10	-4.000001		; this one not equal
aeH5	REAL10	-4.0

.code
start:
	ct = 0
	REPEAT 5
		ct = ct +1
		@CatStr(<fld aeL>, %ct)
		@CatStr(<fld aeH>, %ct)
		sub esp, 8		; allocate two REAL4 slots
		fstp REAL4 ptr [esp]
		pop eax
		fstp REAL4 ptr [esp]
		pop edx
		sub edx, eax
		.if edx
			print "NOT equal", 13, 10
		.else
			print "equal", 13, 10
		.endif
	ENDM
	getkey
	exit
end start

GregL · December 29, 2009, 07:30:07 PM

Quote from: jj2007Greg, looks nice, but finit trashes the whole FPU content. Especially in a context of comparing real variables, one should assume that the user is working with the FPU, and needs its content.

I knew you were going to say something like that. Usually when I'm working with the FPU I do some calculations and save the result to a memory variable, do some more calculations and save the result to a memory variable. When doing the compare, the FPU calculations would be done. I rarely, if ever, need to preserve the FPU contents. I like to use finit to be sure I'm getting the precision I want, API calls can change the precision.

Quote from: jj2007Re functionality, imho one macro is enough - there is also only one cmp eax, nnn for integers.

I disagree, if I'm testing for a specific thing, like "isgreater", that's the macro I am going to want to use.

With these macros it doesn't matter what the data type is, they handle REAL4, REAL8 or REAL10.

I'm sick and tired of posting code here only to have it torn apart, torn down and criticized. I should have learned by now. I know these macros work because I have used them and they do the job for me. If you don't like them, don't use them.

jj2007 · December 29, 2009, 10:12:57 PM

Quote from: Greg Lyon on December 29, 2009, 07:30:07 PM
I'm sick and tired of posting code here only to have it torn apart, torn down and criticized.

Sorry Greg if I stepped on your toes. My apologies. And thanks for reminding me of the sahf instruction, it is really handy.

I have modified the fcmp macro, and stumbled over a problem with the error detection:

Code Select

;  Typical call:
;  ffree st(7)
;  fldz				; no second arg; compare against zero
;  ffree st(7)
;  fld MyReal4
;  call fcmpP
fcmpP proc
	push edx
	xor edx, edx	; clear the flag register
	fucompp		; compare ST(0) with ST(1) and pop twice
	fstsw ax	; move FPU flags C1 etc to ax
	sahf	; translate flags
	jpo @F		; jpo=no error
	dec edx		; produce -2 as error flag
@@:	ja fcPos
	je @F
	dec edx		; negative (-2+1=-1)
@@:	dec edx		; or zero (0-1+1=0)
fcPos:	inc edx		; positive
	xchg edx, [esp]	; save edx, use eax as retval
	pop eax
	ret
fcmpP endp

This works fine and is reasonably short (once 24 bytes for the proc, plus 17...21 bytes per call). However, I can't convince the FPU to set the error flag, as described in FPU Chapter 7, fcom.

I tried the following to provoke an error:

Code Select

.data
f10Err	REAL10	1.0e99	; will be written to f4Err
f4Err	REAL4	0.0	; will receive a bad number, exponent too high for a REAL4
.code
	fld f10Err
	fstp f4Err			; for error testing
	finit
	fclex
	; int 3				; start Olly here
	fcmp f4Err		; only one argument: compare against zero
	cmp eax, -2
	je FatError			; there should be an error!!!

Tracing this with Olly reveals that ST0 does get a BAD number, but the C1...C3 flags are not being set. Can somebody tell me where I am wrong?? Full code attached.

qWord · December 29, 2009, 10:37:26 PM

Quote from: jj2007 on December 29, 2009, 10:12:57 PM
Tracing this with Olly reveals that ST0 does get a BAD number, but the C1...C3 flags are not being set.

It is the OE-Flag (Overflow Exception) that indicates too large values.

BTW: For my own macros, I'm using fcomi/fcomip - both, AMD an Intel, suggest to use it for comparing fpu-values. The instruction directly set the rFLAGS.

qWord

jj2007 · December 29, 2009, 10:58:05 PM

Quote from: qWord on December 29, 2009, 10:37:26 PM
Quote from: jj2007 on December 29, 2009, 10:12:57 PM
Tracing this with Olly reveals that ST0 does get a BAD number, but the C1...C3 flags are not being set.
It is the OE-Flag (Overflow Exception) that indicates too large values.

Thanks, qWord. But how would one detect it? Do you have a practical example? Simply FPU states that the C2 flag should be used, but I can't get it to work.

Quote
BTW: For my own macros, I'm using fcomi/fcomip - both, AMD an Intel, suggest to use it for comparing fpu-values. The instruction directly set the rFLAGS.

Yes, faster and shorter, but it seems to require a P6. Some members here still use a P3. On the other hand, I also went for SSE2 code in my own library. Difficult to say where one should draw the line...

qWord · December 29, 2009, 11:08:24 PM

Quote from: jj2007 on December 29, 2009, 10:58:05 PMDo you have a practical example?

What about this:

Code Select


    .data
        f10Err  REAL10  1.0e99
        f4Err   REAL4   0.0
    .code
    
    fld f10Err
    fstp f4Err
    fstsw ax
    .if ax&01000y
        fn MessageBox,0,"value to larg",0,0
    .else
        fn MessageBox,0,"OK","OK",0
    .endif

EDIT:
an other method is to test (after store) for the value +INFINITE = 07f800000h.

Code Select

    fld f10Err
    fstp f4Err
    .if f4Err == 07f800000h	;
        fn MessageBox,0,"value to larg",0,0
    .else
        fn MessageBox,0,"OK","OK",0
    .endif

jj2007 · December 30, 2009, 12:04:39 AM

Hmmpfffff....! Your example works, but now i am thoroughly confused. Does fstsw report past overflow??
After some testing, it seems the answer is yes. The O flag is set by fstp, and remains set until cleared by fclex.

Code Select

    fld f10Err  <<< O flag still clear
    fstp f4Err  <<< after this instruction, the FPU is empty but the O and P flags are set
    fstsw ax

There is a good description of the status register here (by Randy Hyde); except that I cannot confirm "Bit seven of the status register is set if any error condition bit is set". Although bits 3 and 5 are set, bit 7 remains clear.
One more unclear bit is why Simply FPU recommends jpe error_handler ;the comparison was indeterminate after sahf - this does not seem to work.

Anyway, the corrected code is now as follows (bloated to 31 bytes :8)):

Code Select

;  Typical call:
;  ffree st(7)
;  fldz			; no second arg; compare against zero
;  ffree st(7)
;  fld MyReal4
;  call fcmpP
fcmpP proc
	push edx
	xor edx, edx	; clear the flag register
	fucompp		; compare ST(0) with ST(1) and pop twice
	fstsw ax	; move FPU flags C1 etc to ax
	test al, 8+32	; test if overflow or precision flags are set
	je @F		; overflow flag set=error
	sub edx, 127	; produce error code
	fclex		; clear exceptions
@@:	sahf
	ja fcPos
	je @F
	dec edx		; negative (-2+1=-1)
@@:	dec edx		; or zero (0-1+1=0)
fcPos:	inc edx		; positive
	xchg edx, [esp]	; save edx, use eax as retval
	pop eax
	ret
fcmpP endp

qWord · December 30, 2009, 02:03:52 AM

Quote from: jj2007 on December 30, 2009, 12:04:39 AM
Hmmpfffff....! Your example works, but now i am thoroughly confused. Does fstsw report past overflow??
After some testing, it seems the answer is yes. The O flag is set by fstp, and remains set until cleared by fclex.

tstsw copy the whole status word. The exception-flags are in the low-byte.

Quote from: jj2007 on December 30, 2009, 12:04:39 AM
There is a good description of the status register here (by Randy Hyde); except that I cannot confirm "Bit seven of the status register is set if any error condition bit is set". Although bits 3 and 5 are set, bit 7 remains clear.

a quote from AMD's developers Manuals:

QuoteException Status (ES). Bit 7. The processor calculates the value of this bit at each instruction
boundary and sets the bit to 1 when one or more unmasked floating-point exceptions occur...

By default, the OE is masked (in control register) -> bit 7 will not set.

Quote from: jj2007 on December 30, 2009, 12:04:39 AM
jpe error_handler ;the comparison was indeterminate[/b] after sahf - this does not seem to work.

this only applies, if you are comparing values - if the operants (if one or both depends on instruction) are not compareable (e.g. NaN's) the flag C2 is set. C2 becomes to the Parity Flag, when setting AH to the rFLAGS.

Here a two good references( primary literature :green2) for fpu-stuff (IMO):
AMD64 Architecture Programmer's Manual Volume 1: Application Programming (chapter 6)
AMD64 Architecture Programmer's Manual Volume 5: 64-Bit Media and x87 Floating-Point Instructions

MichaelW · December 30, 2009, 07:58:27 AM

Quotebut it seems to require a P6. Some members here still use a P3

fcomi and fcomip work fine on a Pentium III. P6 normally refers to the sixth generation x86 processors that started with the Pentium Pro.

http://en.wikipedia.org/wiki/Intel_P6

Also, for EQ, GT, LT, etc you should be able to compare the values as integers and interpret the flags just as you would for integers.

dedndave · December 30, 2009, 10:19:14 AM

i think if you weed out special values like NaN's, infinities, negative 0's, etc, that is correct
but comparing reals depends on the application
when you have a case where an epsilon is applicable, everything becomes difficult (most of the time, i guess)
the point is, no one solution will cover all the cases
i wonder why we haven't heard from Ray ? - lol

News:

Comparing real variables: fcmp