News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

MasmLib FloatToStr does not work for me

Started by jj2007, August 18, 2008, 09:24:46 AM

Previous topic - Next topic

jj2007

I use a little routine that pops a qword from the FPU stack and writes it to string buffer:
  ; esi is the string buffer address, DefNum is the number of digits to display

  fst FpuRes8 ; pop ST(0)
  if UseCrt
   invoke crt__gcvt, FpuRes8, DefNum, esi
  else
   invoke FloatToStr2, FpuRes8, esi ;, no DefNum here
  endif

So far, so simple. The problem is, it works perfectly for the crt version (apart from the inaccuracies), it works partly for the MasmLib version - see attachment. Any idea what could cause this problem?

[attachment deleted by admin]

jj2007

Attached the executables, with a third version using Raymond's FpuLib. The latter works fine, too. To see the executables, extract all of them to a temporary folder (they need a dummy parser.exe).

  if 0  ; parserCrt.exe
fst FpuRes8
invoke crt__gcvt, FpuRes8, GfaDefNum, esi
  elseif 0  ; parserFloatToStr.exe
fst FpuRes8
invoke FloatToStr2, FpuRes8, esi ;, GfaDefNum
  else  ; parserRay.exe
invoke FpuFLtoA, 0, GfaDefNum, esi, SRC1_FPU or SRC2_DIMM or STR_REG
  endif

[attachment deleted by admin]

jj2007

Since nobody answered, I had to solve it myself:

  fst FpuRes8
  ffree st(5)
  invoke FloatToStr, FpuRes8, addr buffer

Don't know why it has to be No. 5, but now it works perfectly.
The error occurs apparently only for a well-filled FPU - which is the normal case after complex calculations.

raymond

Quotewhich is the normal case after complex calculations.

AND improper FPU register management! :eek :( :( :(

Regardless of calculation complexities, the FPU registers should be cleaned up if the data they contain is no longer useful.

The FloatToStr2 function probably needs 3 free registers to perform the operation and you were probably hogging 6 out of the 8 registers, leaving only 2 free ones. The function worked by freeing up a 3rd one. The Fpulib was designed for programmers who may not be aware yet of such register management. Their content is thus saved before any required computation is performed, a fresh FPU is used for the required computation, and the register contents restored on exit of the functions. That is why it still worked with your full registers.
When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com

jj2007

Quote from: raymond on August 19, 2008, 02:29:41 AM
Quotewhich is the normal case after complex calculations.

AND improper FPU register management! :eek :( :( :(

Regardless of calculation complexities, the FPU registers should be cleaned up if the data they contain is no longer useful.

Thanks, Raymond, I was not aware of that. I tried to find this rule in chapter 3.3.2 on Webster but no success. When have the rules about FPU use been added to the ABI? Do you have any reference that I could study?

raymond

QuoteWhen have the rules about FPU use been added to the ABI? Do you have any reference that I could study?

The FPU registers are NOT covered by the ABI. That's why the WinXP programmers decided to use MMX instructions (which use the same registers as for FPU instructions) whenever they pleased without preserving anything. FloatToStr2 is probably written with the same philosophy.

Then, the programmer must also be aware that data cannot be loaded onto an FPU register unless it is free, unlike the ALU (CPU) registers.
When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com

Rsir

Using the coprocessor I always start with FINIT.
Is that the reason I never see problems as mentioned above?
Rsir

jj2007

#7
Quote from: Rsir on August 20, 2008, 03:45:12 PM
Using the coprocessor I always start with FINIT.
Is that the reason I never see problems as mentioned above?
Rsir

Hi Rsir,
finit is a very useful instruction: it resets the FPU to standard values, e.g. 80 bits precision, and frees all registers. However, it costs many cycles and should therefore only be used once at the beginning of an application, or at the beginning of a lengthy calculation.
Inside a procedure, ffree does the most important thing: It frees a register for usage.
As Raymond explained above, the ABI does not cover the FPU, so everybody is free to decide how to manage the FPU. If I had to cook up my own "FPU ABI", it would follow these rules:

- finit once at the beginning of an application (edit: Windows provides a clean FPU, but at 64 bits precision only, see Raymond's post below)

- "my home is my castle": inside our thread, we can do whatever we like with the FPU

- but be aware that you may be forced to use:
  a) Windows API calls - they may occasionally alter the FPU, although evidence is scarce, and no list of "dangerous" APIs exist
  b) little helpers such as FloatToStr - they are part of the Masm32 library, and should in principle behave well, i.e. they should not destroy FPU contents, and not rely on us using expensive finit. The latter point is also important because frequently we need to print intermediate results with FloatToStr, so it could be quite embarrassing if a little helper was allowed to throw away our results.

Having said that, it might be difficult for a little helper to do its job without any FPU register. So a compromise might be that our own procedures do not rely on ST6 and ST7 be preserved; the "helper" could thus ffree st(6) and ffree st(7), then push two values to ST1 and ST0, perform the necessary code, and pop ST1 and ST0 to restore the previous state. Such a compromise would resemble the convention to use esi, edi, ebx without risk that they are being scratched.
Does that sound plausible?

raymond

Quotefinit once at the beginning of an application (perhaps Windows does it for us anyway)

You are provided with a clean FPU at the beginning of your program. However, the precision control is only set for 64-bit double-precision floats by Windows. If you need extended double-precision (full 80 bits precision), either you modify the Control Word accordingly or use finit.
When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com

jj2007

Quote from: raymond on August 20, 2008, 11:47:04 PM
Quotefinit once at the beginning of an application (perhaps Windows does it for us anyway)

You are provided with a clean FPU at the beginning of your program. However, the precision control is only set for 64-bit double-precision floats by Windows. If you need extended double-precision (full 80 bits precision), either you modify the Control Word accordingly or use finit.


Thanks, Raymond. So, given that 17 cycles outside the loop don't matter, and the main reason to use FPU instead of SSE is precision, that means the "FPU ABI" should say "finit at app start is good practice".

raymond

Quote"finit at app start is good practice"

I generally agree with that statement.

However, if you are VERY concerned about cycles
AND you only need single precision floats,
AND don't want to (or can't) use SSE,
it would be advisable to modify the precision control of the FPU to single precision instead. The finit is then not necessary nor advisable.
When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com

jj2007

#11
Quote from: raymond on August 21, 2008, 05:52:29 PM
Quote"finit at app start is good practice"

I generally agree with that statement.

However, if you are VERY concerned about cycles
AND you only need single precision floats,
AND don't want to (or can't) use SSE,
it would be advisable to modify the precision control of the FPU to single precision instead. The finit is then not necessary nor advisable.


Raymond, I tested the finit yes or no question with the attached attempt to replace FloatToStr with a more robust function (look for the UseInit flag). Apparently it makes no difference...

xStrProc is my own baby. It preserves all FPU registers, in contrast to FloatToStr. Same for sprintf, it also preserves them, but this crt function is roughly 15 times slower than xStrProc...

finit is ON

FloatToStr      size=895, ST 6-8 trashed
xStrProc        size=274, no ST killed
sprintf         size=000, no ST killed

471 cycles with FloatToStr      1.234568e+099
378 cycles with xStr            1.234568e+099
5893 cycles with sprintf        1.234568e+099

---------
471 cycles with FloatToStr      1.234568e+100
394 cycles with xStr            1.234568e+100
5771 cycles with sprintf        1.234568e+100

---------
478 cycles with FloatToStr      -1.234568e+099
412 cycles with xStr            -1.234568e+099
6005 cycles with sprintf        -1.234568e+099

---------
475 cycles with FloatToStr      -0.000000e+098    <---- no such number ;-) ######
419 cycles with xStr            -1.000000e+099
5827 cycles with sprintf        -1e+099

---------
11 cycles with FloatToStr       0
383 cycles with xStr            0.0
681 cycles with sprintf         0


finit is OFF

FloatToStr      size=895, ST 6-8 trashed
xStrProc        size=274, no ST killed
sprintf         size=000, no ST killed

515 cycles with FloatToStr      1.234568e+099
379 cycles with xStr            1.234568e+099
5980 cycles with sprintf        1.234568e+099

---------
517 cycles with FloatToStr      1.234568e+100
393 cycles with xStr            1.234568e+100
5725 cycles with sprintf        1.234568e+100

---------
521 cycles with FloatToStr      -1.234568e+099
412 cycles with xStr            -1.234568e+099
5996 cycles with sprintf        -1.234568e+099

---------
519 cycles with FloatToStr      -0.000000e+098
409 cycles with xStr            -1.000000e+099
5839 cycles with sprintf        -1e+099

---------
13 cycles with FloatToStr       0
387 cycles with xStr            0.0
683 cycles with sprintf         0


Re SSE, by far the most expensive instruction in the code is
fyl2x ; log, 196-329 cycles
Since there is afaik no logarithm in SSE, I doubt whether it's worth the hussle...

EDIT: New version attached - a little bit faster and shorter

[attachment deleted by admin]

raymond

jj: Congratulations. I had a quick glance at your proc and it looks good. I haven't tested it though.

You can now realize that converting floats to ASCII is quite complex if all possibilities must be taken into consideration.
When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com

jj2007

Quote from: raymond on August 22, 2008, 04:45:21 AM
jj: Congratulations. I had a quick glance at your proc and it looks good. I haven't tested it though.

Thanxalot, Raymond. With so much encouragement, I could not refrain from implementing also your routine:
  invoke FpuFLtoA, edx, 6, offset f2sBuffer, SRC1_REAL or SRC2_DIMM or STR_SCI

In the meantime, I have modified my own xStr routine so that it accepts Real4, 8 and 10 variables. A macro checks the size and passes it to xStrProc via ecx. Simple but effective  :wink

finit is ON
FloatToStr      size=895, ST 6-8 trashed
xStrProc        size=269, no ST killed
Ray's lib       size=ca. 700, no ST killed
sprintf         size=000, no ST killed

509 cycles with FloatToStr      1.234568e+099
363 cycles with xStr, Real4     1.234568e+035
364 cycles with xStr, Real8     1.234568e+099
361 cycles with xStr, Real10    1.234568e+099
1126 cycles with Ray's lib       1.234568E+0099
5917 cycles with sprintf        1.234568e+099

---------
511 cycles with FloatToStr      1.234568e+100
363 cycles with xStr, Real4     1.234568e+038
379 cycles with xStr, Real8     1.234568e+100
375 cycles with xStr, Real10    1.234568e+100
1126 cycles with Ray's lib       1.234568E+0100
5750 cycles with sprintf        1.234568e+100

---------
496 cycles with FloatToStr      -1.234568e-099
410 cycles with xStr, Real4     -1.234568e-035
394 cycles with xStr, Real8     -1.234568e-099
400 cycles with xStr, Real10    -1.234568e-099
1126 cycles with Ray's lib      -1.234568E-0099
5936 cycles with sprintf        -1.234568e-099

---------
513 cycles with FloatToStr      -0.000000e+098
404 cycles with xStr, Real4     -1.000000e+020
377 cycles with xStr, Real8     -1.000000e+099
386 cycles with xStr, Real10    -1.000000e+099
1128 cycles with Ray's lib      -1.000000E+0099
5821 cycles with sprintf        -1e+099

---------
1810 cycles with FloatToStr     -0.000000e+647
376 cycles with xStr, Real4     0.0
379 cycles with xStr, Real8     0.0
379 cycles with xStr, Real10    0.0
343 cycles with Ray's lib        0
680 cycles with sprintf         0

[attachment deleted by admin]

jj2007

Can anybody confirm that FloatToStr has a problem with 0.0??

---------
437 cycles with FloatToStr      -1.234568e-003
403 cycles with xStr, Real4     -1.234568e-09
591 cycles with xStr, Real8     -1.234568e-03
417 cycles with xStr, Real10    -1.234568e-02
1139 cycles with Ray's lib      -1.234568E-0002
4380 cycles with sprintf        -0.00123457

---------
476 cycles with FloatToStr      -0.000000e+098
396 cycles with xStr, Real4     -1.000000e+020
400 cycles with xStr, Real8     -1.000000e+099
404 cycles with xStr, Real10    -1.000000e+099
1131 cycles with Ray's lib      -1.000000E+0099
5829 cycles with sprintf        -1e+099

---------
1778 cycles with FloatToStr     -0.000000e+647
398 cycles with xStr, Real4     0.0
396 cycles with xStr, Real8     0.0
396 cycles with xStr, Real10    0.0
344 cycles with Ray's lib        0
674 cycles with sprintf         0

See the problem with FloatToStr for Real8=0.0?
Result=-0.000000e+647


P.S.: New flo$ macro recognises automatically the size of a real variable:
We divide MyR4 (=1.2345678e9) by 12345678
and add 11: Result=1.110000e+02 aka 111
Now isn't that a cool macro?

Usage:
.data
MyR4 REAL4 1.2345678e9
.code
print flo$("\n\nNew flo$ macro:\nWe divide MyR4 (=1.2345678e9) by 12345678\nand add 11: Result=%f aka 111\nNow isn't that a cool macro?\n\n", MyR4/12345678+11)

[attachment deleted by admin]