News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

float$ macro and algo for testing

Started by jj2007, August 26, 2008, 03:48:53 PM

Previous topic - Next topic

jj2007


sinsi

Hey jj can you reduce the size of the posts? There's a lot of b i g posts here...and here's another.
Q6600 2.4GHz - I can test it on an Athlon 2600+ if you like (it's a small hassle).

Credits to drizz for the qwtoa algo

462 cycles for FloatToStr       1.234568e-004
295 cycles for float$ REAL4     1.23456e-05
300 cycles for float$ REAL8     1.23456e-04
299 cycles for float$ REAL10    1.23456e-03
1174 cycles for Ray's lib       0.001235
4132 cycles for sprintf         0.0001234568

---------
504 cycles for FloatToStr       0.1234568
278 cycles for float$ REAL4     0.1234568
277 cycles for float$ REAL8     0.1234568
272 cycles for float$ REAL10    0.1234568
1178 cycles for Ray's lib       0.123457
3341 cycles for sprintf         0.1234568

---------
682 cycles for FloatToStr       1.234568
279 cycles for float$ REAL4     1.234568
277 cycles for float$ REAL8     1.234568
278 cycles for float$ REAL10    1.234568
1185 cycles for Ray's lib       1.234568
4577 cycles for sprintf         1.234568

---------
679 cycles for FloatToStr       1234.568
277 cycles for float$ REAL4     1234.568
283 cycles for float$ REAL8     1234.568
282 cycles for float$ REAL10    1234.568
1239 cycles for Ray's lib       1234.567890
4306 cycles for sprintf         1234.568

---------
499 cycles for FloatToStr       1.234568e+123
300 cycles for float$ REAL4     1.23456e+23
317 cycles for float$ REAL8     1.23456e+123
311 cycles for float$ REAL10    1.23456e+123
1241 cycles for Ray's lib       1.234567890123457E+0123
5340 cycles for sprintf         1.234568e+123

---------
484 cycles for FloatToStr       -1.234568e-123
302 cycles for float$ REAL4     -1.23456e-23
351 cycles for float$ REAL8     -1.23456e-123
350 cycles for float$ REAL10    -1.23456e-123
1189 cycles for Ray's lib       -0.000000
5708 cycles for sprintf         -1.234568e-123

---------
9 cycles for FloatToStr 0
126 cycles for float$ REAL4     0
128 cycles for float$ REAL8     0
127 cycles for float$ REAL10    0
398 cycles for Ray's lib        0
585 cycles for sprintf          0

Funny how we use 'code' tags to quote something...
Light travels faster than sound, that's why some people seem bright until you hear them.

jj2007

Quote from: sinsi on September 12, 2008, 02:03:49 PM
Hey jj can you reduce the size of the posts? There's a lot of b i g posts here...and here's another.
ok, next version will output only the "typical" range, i.e. 123.456 etc.; although it's unfair to the MasmLib FloatToStr, which is particularly slow in that range.

Quote
Q6600 2.4GHz - I can test it on an Athlon 2600+ if you like (it's a small hassle).
Yes please. The AMD seems to show the smallest improvement against the MasmLib algo.
I could shave off a few cycles by cutting down the macro, but then... I wanted to have this feature:

.data
Sales2006 REAL8 300.0
Sales2007 REAL8 309.6
.code
print float$('\nMarketing report:\nSales were up %2f% in 2007\n', Sales2007/Sales2006-1*100)

Output:
Marketing report:
Sales were up 3.2% in 2007

:bg

sinsi

Your wish is my command...
Athlon XP 2600+ at 2.13GHz, XP Home SP3, 1.5GB RAM

584 cycles for FloatToStr       1.234568e-004
489 cycles for float$ REAL4     1.23456e-05
525 cycles for float$ REAL8     1.23456e-04
528 cycles for float$ REAL10    1.23456e-03
1017 cycles for Ray's lib       0.001235
4811 cycles for sprintf         0.0001234568

---------
502 cycles for FloatToStr       0.1234568
416 cycles for float$ REAL4     0.1234568
407 cycles for float$ REAL8     0.1234568
408 cycles for float$ REAL10    0.1234568
1006 cycles for Ray's lib       0.123457
3915 cycles for sprintf         0.1234568

---------
520 cycles for FloatToStr       1.234568
453 cycles for float$ REAL4     1.234568
450 cycles for float$ REAL8     1.234568
442 cycles for float$ REAL10    1.234568
1023 cycles for Ray's lib       1.234568
4638 cycles for sprintf         1.234568

---------
505 cycles for FloatToStr       1234.568
427 cycles for float$ REAL4     1234.568
433 cycles for float$ REAL8     1234.568
430 cycles for float$ REAL10    1234.568
1035 cycles for Ray's lib       1234.567890
4612 cycles for sprintf         1234.568

---------
601 cycles for FloatToStr       1.234568e+123
482 cycles for float$ REAL4     1.23456e+23
540 cycles for float$ REAL8     1.23456e+123
520 cycles for float$ REAL10    1.23456e+123
1068 cycles for Ray's lib       1.234567890123457E+0123
6097 cycles for sprintf         1.234568e+123

---------
592 cycles for FloatToStr       -1.234568e-123
512 cycles for float$ REAL4     -1.23456e-23
548 cycles for float$ REAL8     -1.23456e-123
552 cycles for float$ REAL10    -1.23456e-123
1002 cycles for Ray's lib       -0.000000
6164 cycles for sprintf         -1.234568e-123

---------
12 cycles for FloatToStr        0
143 cycles for float$ REAL4     0
141 cycles for float$ REAL8     0
142 cycles for float$ REAL10    0
361 cycles for Ray's lib        0
941 cycles for sprintf          0

Not as bad as I thought.

jj, I am in the mood to try and overclock my quadcore beastie.
It seems that a lot of these speed tests rely on clock speed.
What do you reckon? Write the code and I will roll with the hoops...
Light travels faster than sound, that's why some people seem bright until you hear them.

jj2007

#19
Quote from: sinsi on September 12, 2008, 02:46:50 PM
Your wish is my command...
Athlon XP 2600+ at 2.13GHz, XP Home SP3, 1.5GB RAM
Grazie!

Quote
It seems that a lot of these speed tests rely on clock speed.
Normally Michael's counterXX macro should produce cycles independently of clock speed. But I might be wrong.

Attached one more for the road. Little bug fixes (the expo was one position too far on the left), 10% less size, 10% more speed.

EDIT: Obsolete - see end of page 2 for March 30 version

[attachment deleted by admin]

sinsi

Tried it at 2.4 (normal), 2.6, 2.8 and 3GHz with no real differences, so clock speed doesn't have anything to do with it I guess.
Here's the latest at 2.4

428 cycles for FloatToStr       1.234568e-004
282 cycles for float$ REAL4     1.234568e-05
256 cycles for float$ REAL8     0.0001234568
267 cycles for float$ REAL10    0.001234568
1171 cycles for Ray's lib       0.001235
4061 cycles for sprintf         0.0001234568

---------
633 cycles for FloatToStr       1.234568
265 cycles for float$ REAL4     1.234568
272 cycles for float$ REAL8     1.234568
273 cycles for float$ REAL10    1.234568
1191 cycles for Ray's lib       1.234568
4456 cycles for sprintf         1.234568

---------
635 cycles for FloatToStr       1234.568
260 cycles for float$ REAL4     1234.568
263 cycles for float$ REAL8     1234.568
264 cycles for float$ REAL10    1234.568
1233 cycles for Ray's lib       1234.567890
4261 cycles for sprintf         1234.568



[attachment deleted by admin]
Light travels faster than sound, that's why some people seem bright until you hear them.

jj2007

Quote from: sinsi on September 13, 2008, 03:21:02 AM
Tried it at 2.4 (normal), 2.6, 2.8 and 3GHz with no real differences, so clock speed doesn't have anything to do with it I guess.
Quod erat demonstrandum :U

Quote

633 cycles for FloatToStr       1.234568
265 cycles for float$ REAL4     1.234568

I love it. Unfortunately, I am no good at SSE2...

jj2007

#22
After quite a bit of testing, here the float$ macro for casual use (attached - the usual disclaimers apply). It is pretty flexible, does not trash any FPU registers, and in its basic variant it is almost twice as fast as the MasmLib FloatToStr routine; not to mention the C++ sprintf routine, which is a factor 15 slower:

282 cycles for 4*float$         1234.568
618 cycles for 4*FloatToStr     1234.568

float$: reg32, Real4, Real8, Real10
MasmLib FloatToStr: 4*Real8

617 cycles for FloatToStr       1.234568
277 cycles for float$ REAL4     1.234568
258 cycles for float$ REAL8     1.234568
277 cycles for float$ REAL10    1.234568
1113 cycles for Ray's lib       1.234568
4533 cycles for sprintf         1.234568

Code sizes and FPU register preservation:
FloatToStr      size=895, ST 6-8 trashed
float$          size=823, all ST regs preserved
Ray's lib       size=700, all ST regs preserved
crt sprintf     size=???, all ST regs preserved

Usage:
   Basic:
   print float$(MyReal10)
   print float$(MyInt32)
   mov al, 123
   print float$(al)

   Simple:
   print float$("The number %f is very high", MyReal10)

   Four digits precision (n=1-15, use uppercase A-F for 10-15 digits):
   print float$("The number %4f is very high", MyReal10)

   Simple calculations: You can mix up to 5 registers (1-4 bytes), integers, immediate numbers, local and global variables:
   mov ecx, 3000   ; Caution: edx cannot be used here, eax not after an immediate integer
   print float$("Divide ecx by 10, add Sales2005,\nadd 10, mul 111.111;\nresult=%f", ecx/10+@Sales2005+10*111.1111)
   use %f (or %Af, 10 digits precision) as placeholder for the number; use \n for newline, \t for tab

EDIT: Obsolete - see end of page 2 for March 30 version

[attachment deleted by admin]

jj2007

#23
Updated with slightly higher precision:

PI = 3.14159265358979324

Extract all to a temporary folder, then double-click on Float2Asc_INSTALL.bat

Precision will be somewhat lower for very high and very low exponents.
The usual disclaimers apply.

EDIT: Obsolete - see end of page 2

[attachment deleted by admin]

herge

 Hi jj2007:

The latest results from My Computer.

Saturday, March 28, 2009 3:32 PM
------- New float$ macro: ---------------------
Divide MyReal10 (=1.2345678e9)
by 12345678 (=1.2e7, in eax)
add 11.1111    (an immediate real)
Result= 111.1111
-- This para printed by one line of code! -----


Code sizes and FPU register preservation:
FloatToStr size=895, ST 6-8 trashed
float$    size=771, all ST regs preserved
Ray's lib size=700, all ST regs preserved
crt sprintf size=???, all ST regs preserved

finit is ON Version 1.1, 1 September 2008
415 cycles for FloatToStr 1.234568e-004
428 cycles for float$ Real5 1.234568e-05
429 cycles for float$ Real8 1.234568e-04
424 cycles for float$ Real10 0.001234568
1167 cycles for Ray's lib 0.001235
4103 cycles for sprintf  0.0001234568

---------
448 cycles for FloatToStr 0.1234568
424 cycles for float$ Real5 0.1234568
422 cycles for float$ Real8 0.1234568
421 cycles for float$ Real10 0.1234568
1169 cycles for Ray's lib 0.123457
3323 cycles for sprintf  0.1234568

---------
634 cycles for FloatToStr 1.234568
425 cycles for float$ Real5 1.234568
424 cycles for float$ Real8 1.234568
422 cycles for float$ Real10 1.234568
1169 cycles for Ray's lib 1.234568
4468 cycles for sprintf  1.234568

---------
634 cycles for FloatToStr 1234.568
426 cycles for float$ Real5 1234.568
426 cycles for float$ Real8 1234.568
424 cycles for float$ Real10 1234.568
1234 cycles for Ray's lib 1234.567890
4346 cycles for sprintf  1234.568

---------
461 cycles for FloatToStr 1.234568e+123
431 cycles for float$ Real5 1.234568e+23
448 cycles for float$ Real8 1.234568e+123
449 cycles for float$ Real10 1.234568e+123
1228 cycles for Ray's lib 1.234567890123457E+0123
5335 cycles for sprintf  1.234568e+123

---------
430 cycles for FloatToStr -1.234568e-123
430 cycles for float$ Real5 -1.234568e-23
447 cycles for float$ Real8 -1.234568e-123
448 cycles for float$ Real10 -1.234568e-123
1164 cycles for Ray's lib -0.000000
5688 cycles for sprintf  -1.234568e-123

---------
10 cycles for FloatToStr 0
66 cycles for float$ Real5 0
65 cycles for float$ Real8 0
57 cycles for float$ Real10 0
400 cycles for Ray's lib 0
580 cycles for sprintf  0


Regards herge
// Herge born  Brussels, Belgium May 22, 1907
// Died March 3, 1983
// Cartoonist of Tintin and Snowy

herge

 Hi jj2207:

I thought PI was a constant you loaded in to the FPU.

D9 EB FLDPI


Regards herge
// Herge born  Brussels, Belgium May 22, 1907
// Died March 3, 1983
// Cartoonist of Tintin and Snowy

UtillMasm

Question 1
C:\FloatStr.asm
  This file contains characters which can be lost in current encoding.
  Do you want to select one of other encoding options?
==================================
Windows-936 is not right
  Line: 1656
   ;                                                                                                                               ?
Windows-1252 is ok?
  Line: 1656
   ; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
Question 2
==================================
Line: 1659
  OPT_Linker   link
  OPT_Icon   Calc
  OPT_WAIT   1
  OPT_Susy   CONSOLE
  OPT_Tmp2Asm   1
What's mean these lines?

jj2007

#27
Quote from: herge on March 30, 2009, 01:10:09 AM
Hi jj2207:

I thought PI was a constant you loaded in to the FPU.

D9 EB FLDPI


Regards herge


Hi Herge,
You are right, it could be done that way, but I don't see an elegant way to tell the Float2Asc proc that it should take PI instead of a "normal" number. It would take an extra magic number and/or a branch - too much of a hussle. That's why I chose a REAL10 in .data to show PI.

Quote from: UtillMasm on March 30, 2009, 07:34:35 AM
Question 1
C:\FloatStr.asm
  This file contains characters which can be lost in current encoding.
  Do you want to select one of other encoding options?
==================================
Windows-936 is not right
  Line: 1656
   ;                                                                                                                               ?
Windows-1252 is ok?
  Line: 1656

Hi UtillMasm,

Which editor/IDE/assembler are you using? The float$ macro uses ° (Ascii 176). I never saw this error message using RichMasm, and the usual assemblers (ml, JWasm) swallow it without problems.

Quote

Question 2
==================================
Line: 1659
  OPT_Linker   link
  OPT_Icon   Calc
  OPT_WAIT   1
  OPT_Susy   CONSOLE
  OPT_Tmp2Asm   1
What's mean these lines?

RichMasm options specifying which linker to use, an icon, subsystem CONSOLE, and whether you want to keep the (otherwise temporary) FloatStr.asm file.

Obsolete, see next post: Since both Herge and UtillMasm downloaded a much older version, I have updated everything and attach it here. Unzip to a temporary folder and double-click on Float2Asc_INSTALL.bat (the usual disclaimers apply).

[attachment deleted by admin]

jj2007

float$ was somewhat inexact for numbers close to 10^n, so I rewrote it and called it Str$.
Download it from this thread.

ToutEnMasm


The math.sdk file (vc++2008 translate) have some important constants value defined.
Quote
M_PI   equ   < 3.14159265358979323846>
jj2007                    PI = 3.14159265358979324
Better is to use this file . It can be added to windows.inc with the translate.inc coming before any .sdk file.
There is need to verify that the masm32 float fonctions follow the IEE specifications.I am not sure of that (some features can be missing).
The IEEE standard for floating point arithmetic
http://www.psc.edu/general/software/packages/ieee/ieee.php