I need a table of REAL10 variables ranging from 1.0e-150 to 1.0e+150. With the FPU, I get small rounding errors. Is it possible to do that without the FPU, working directly on mantissa and exponent of the Real10???
Below a snippet for displaying Real10 content.
invoke PrintR10, jr1
PrintR10 proc arg10:Real10
LOCAL LocBuffer[100]:BYTE
pushad
lea esi, LocBuffer
print chr$(13, 10, "9876543210987654", 13, 10)
mov eax, dword ptr arg10
mov edx, dword ptr arg10[4]
movzx ecx, word ptr arg10[8]
pushad
invoke dw2bin_ex, ecx, esi
lea edx, [esi+16]
print edx, 13, 10
print "32109876543210987654321098765432 10987654321098765432109876543210", 13, 10
popad
pushad
invoke dw2bin_ex, edx, esi
print esi, " "
popad
invoke dw2bin_ex, eax, esi
print esi, 13, 10
popad
ret
PrintR10 endp
you mean randomly generated values ?
the 80-bit float format is relatively simple
1 bit for sign
15 bits for exponent
64 bits for mantissa
exponent bias = 16383
all 1's in the exponent is a special value - do not use to make normals
Quote from: dedndave on July 09, 2009, 12:37:20 PM
you mean randomly generated values ?
No, I mean exact values:
1.0e-150
1.0e-149
...
1.0e+149
1.0e+150
Multiplying a seed of 1.0e-150 repeatedly by 10.0 (exact, Real10) produces very small rounding errors, so I wonder whether working directly on the number in memory could yield exact ones... ::)
ohhhhhhhhhhhhhhhhhhh
well - sure - a "str2float" is what you want
i dunno what you are asking for, i guess, Jochen
still early, here - lol - first cup of coffee
Quote from: dedndave on July 09, 2009, 12:56:39 PM
ohhhhhhhhhhhhhhhhhhh
well - sure - a "str2float" is what you want
i dunno what you are asking for, i guess, Jochen
still early, here - lol - first cup of coffee
> str2float
No strings involved yet - just Real10 to Real10
:bg
The exponent is easy enough to get (*log2), but if the mantissa were easy, we'd apply the process to converting any float to string.
You already know the initial value of 1.0e-150 is incorrect (inexact), so we can't get correct values by simply multiplying.
Perhaps if you started with a higher precision (e.g. 24 bytes would give 240 bits to lose) mantissa value. It's easy to multiply exactly by 10 manually.
It would be really nice if we could efficiently generate these value as needed. 300 values isn't too bad for real8's, but 9800 for real10's is rather excessive.
I'm using a modification of the Tim Roberts scheme in masmlib. Takes about 100 predefined values to cover the full range of real10's, and three multiplies to get a final value at execution.
So far the timing tests are very encouraging if I could just figure out how to efficiently round up .9999999999999999999999 to 1. before I generated all the digits :(
well - for the 80-bit value of Pi, I used my 64-bit unsigned integer to ascii decimal routine to convert the mantissa
(my routine is the slowest one presented, but I have a great degree of confidence in the result string)
i then performed division by 2 on the string 62 times and placed a decimal point at the appropriate place to arive at the string
(62 = the exponent with the bias removed)
this is a bit tedious and slow, but it demonstrates one method of converting the value to a string
it results in "the" correct string - i.e. it represents the true binary value with no rounding error
many of the decimal digits were unusable, of course - only 20 digits maximum may be reliable
i am not suggesting this as a method, but it shows how it works
of course, if you are only generating the table one time, you don't care how fast it is
for generating the table you want, it may be simpler to write a small program
to generate the ASM text file
dt 1.0e-150
dt 1.0e-149
...
dt 1.0e+149
dt 1.0e+150
of course, a little toil woth copy/paste could do the same thing
it's only 30 decades - lol
create one decade (or 2, maybe) by hand, then copy it and modify the copy
about 15 minutes using the toil method - lol
that'll be $50, Jochen, my friend - dunno what that is in lira
dt 1.0e-150,1.0e-149,1.0e-148,1.0e-147,1.0e-146
dt 1.0e-145,1.0e-144,1.0e-143,1.0e-142,1.0e-141
dt 1.0e-140,1.0e-139,1.0e-138,1.0e-137,1.0e-136
dt 1.0e-135,1.0e-134,1.0e-133,1.0e-132,1.0e-131
dt 1.0e-130,1.0e-129,1.0e-128,1.0e-127,1.0e-126
dt 1.0e-125,1.0e-124,1.0e-123,1.0e-122,1.0e-121
dt 1.0e-120,1.0e-119,1.0e-118,1.0e-117,1.0e-116
dt 1.0e-115,1.0e-114,1.0e-113,1.0e-112,1.0e-111
dt 1.0e-110,1.0e-109,1.0e-108,1.0e-107,1.0e-106
dt 1.0e-105,1.0e-104,1.0e-103,1.0e-102,1.0e-101
dt 1.0e-100,1.0e-99,1.0e-98,1.0e-97,1.0e-96
dt 1.0e-95,1.0e-94,1.0e-93,1.0e-92,1.0e-91
dt 1.0e-90,1.0e-89,1.0e-88,1.0e-87,1.0e-86
dt 1.0e-85,1.0e-84,1.0e-83,1.0e-82,1.0e-81
dt 1.0e-80,1.0e-79,1.0e-78,1.0e-77,1.0e-76
dt 1.0e-75,1.0e-74,1.0e-73,1.0e-72,1.0e-71
dt 1.0e-70,1.0e-69,1.0e-68,1.0e-67,1.0e-66
dt 1.0e-65,1.0e-64,1.0e-63,1.0e-62,1.0e-61
dt 1.0e-60,1.0e-59,1.0e-58,1.0e-57,1.0e-56
dt 1.0e-55,1.0e-54,1.0e-53,1.0e-52,1.0e-51
dt 1.0e-50,1.0e-49,1.0e-48,1.0e-47,1.0e-46
dt 1.0e-45,1.0e-44,1.0e-43,1.0e-42,1.0e-41
dt 1.0e-40,1.0e-39,1.0e-38,1.0e-37,1.0e-36
dt 1.0e-35,1.0e-34,1.0e-33,1.0e-32,1.0e-31
dt 1.0e-30,1.0e-29,1.0e-28,1.0e-27,1.0e-26
dt 1.0e-25,1.0e-24,1.0e-23,1.0e-22,1.0e-21
dt 1.0e-20,1.0e-19,1.0e-18,1.0e-17,1.0e-16
dt 1.0e-15,1.0e-14,1.0e-13,1.0e-12,1.0e-11
dt 1.0e-10,1.0e-9,1.0e-8,1.0e-7,1.0e-6
dt 1.0e-5,1.0e-4,1.0e-3,1.0e-2,1.0e-1
dt 1.0
dt 1.0e+1,1.0e+2,1.0e+3,1.0e+4,1.0e+5
dt 1.0e+6,1.0e+7,1.0e+8,1.0e+9,1.0e+10
dt 1.0e+11,1.0e+12,1.0e+13,1.0e+14,1.0e+15
dt 1.0e+16,1.0e+17,1.0e+18,1.0e+19,1.0e+20
dt 1.0e+21,1.0e+22,1.0e+23,1.0e+24,1.0e+25
dt 1.0e+26,1.0e+27,1.0e+28,1.0e+29,1.0e+30
dt 1.0e+31,1.0e+32,1.0e+33,1.0e+34,1.0e+35
dt 1.0e+36,1.0e+37,1.0e+38,1.0e+39,1.0e+40
dt 1.0e+41,1.0e+42,1.0e+43,1.0e+44,1.0e+45
dt 1.0e+46,1.0e+47,1.0e+48,1.0e+49,1.0e+50
dt 1.0e+51,1.0e+52,1.0e+53,1.0e+54,1.0e+55
dt 1.0e+56,1.0e+57,1.0e+58,1.0e+59,1.0e+60
dt 1.0e+61,1.0e+62,1.0e+63,1.0e+64,1.0e+65
dt 1.0e+66,1.0e+67,1.0e+68,1.0e+69,1.0e+70
dt 1.0e+71,1.0e+72,1.0e+73,1.0e+74,1.0e+75
dt 1.0e+76,1.0e+77,1.0e+78,1.0e+79,1.0e+80
dt 1.0e+81,1.0e+82,1.0e+83,1.0e+84,1.0e+85
dt 1.0e+86,1.0e+87,1.0e+88,1.0e+89,1.0e+90
dt 1.0e+91,1.0e+92,1.0e+93,1.0e+94,1.0e+95
dt 1.0e+96,1.0e+97,1.0e+98,1.0e+99,1.0e+100
dt 1.0e+101,1.0e+102,1.0e+103,1.0e+104,1.0e+105
dt 1.0e+106,1.0e+107,1.0e+108,1.0e+109,1.0e+110
dt 1.0e+111,1.0e+112,1.0e+113,1.0e+114,1.0e+115
dt 1.0e+116,1.0e+117,1.0e+118,1.0e+119,1.0e+120
dt 1.0e+121,1.0e+122,1.0e+123,1.0e+124,1.0e+125
dt 1.0e+126,1.0e+127,1.0e+128,1.0e+129,1.0e+130
dt 1.0e+131,1.0e+132,1.0e+133,1.0e+134,1.0e+135
dt 1.0e+136,1.0e+137,1.0e+138,1.0e+139,1.0e+140
dt 1.0e+141,1.0e+142,1.0e+143,1.0e+144,1.0e+145
dt 1.0e+146,1.0e+147,1.0e+148,1.0e+149,1.0e+150
Quote from: dedndave on July 09, 2009, 03:33:39 PM
about 15 minutes using the toil method - lol
that'll be $50, Jochen, my friend - dunno what that is in lira
About 1 minute in Excel, Dave - but that costs 3kbyte and was exactly the reason why I want to generate that table :bg
ok - but
assemble that text
look at the binary values generated for the table
see if you can sherlock an algorithm to generate it
also, the errors are cumulative because, in each pass of the algorithm, you use the value from the previous pass
see if you can eliminate that accumulation by making more direct calculations
Quote from: dedndave on July 09, 2009, 06:06:42 PM
ok - but
assemble that text
look at the binary values generated for the table
see if you can sherlock an algorithm to generate it
also, the errors are cumulative because, in each pass of the algorithm, you use the value from the previous pass
see if you can eliminate that accumulation by making more direct calculations
Well, that's exactly what I have been doing so far... the + and - indicate for which n a correction was needed.
7+ 1 1.000000000000000000e-137
8- 2 1.000000000000000000e-136
9- 3 1.000000000000000000e-135
10+ 4 1.000000000000000000e-134
17+ 5 1.000000000000000000e-127
18- 6 1.000000000000000000e-126
19- 7 1.000000000000000000e-125
20+ 8 1.000000000000000000e-124
21+ 9 1.000000000000000000e-123
22- 10 1.000000000000000000e-122
25+ 11 1.000000000000000000e-119
26- 12 1.000000000000000000e-118
27- 13 1.000000000000000000e-117
...
296+ 131 1.000000000000000000e+152
297- 132 1.000000000000000000e+153
298- 133 1.000000000000000000e+154
299+ 134 1.000000000000000000e+155
well, i think you are back to str2float
because you can surely generate the strings without rounding errors - then use str2float to get the real
this method relies on the str2float to generate good values, as well
EDIT
no way you wrote that spreadsheet in 1 minute - lol
EDIT
it may help if you generate the table, starting with 1.0e+150 and divide instead of multiplying - i dunno
it would make sense because you are gaining resolution as you go, in a way
if all else fails, you can make a very long string in allocated memory and do the division by moving an imaginary decimal point (i.e. dec a ptr)
let me give this some thought - i will get back to you later
The idea to use StrToFloat is cute but yields only REAL8 results: 1.0000000000000000200E-100
Same for a2r10, strangely enough.
ok - i have a plan - lol
for one thing, you should start with 1.0 and divide to make the lower portion of the table
and start with 1.0 and multiply to make the upper portion
1.0 expressed as a real is precise (i.e. the decimal evaluation is exactly equal to the binary representation)
i think i have a way to do it, but i have some outside work i wanted to do today before it gets any hotter
i will let it tumble around in my head while i take care of that
Quote from: dedndave on July 09, 2009, 07:03:45 PM
1.0 expressed as a real is precise
Yes, but after 2 or three multiplications/divisions it is no longer precise...
yes - but that will get rid of a large part of the errors - i know it will make a vast improvement on your current routine
next, we need to construct the reals ourselves - i am working on a routine
here is the 3010 byte table generated by masm
we can use it as a reference to test against - not sure how precise even that is, though
i would be more inclined to believe my table
[attachment deleted by admin]
Will check asap. My last hope was Raymond's lib, but FpuAtoFL yields 1.0000000000000000127E-100 :(
Hi,
Well multiplies should be okay from 1 to whenever the mantissa overflows.
There is a mild problem with the numbers less than 1 though, 1/10 does
not have an exact representation in binary. Five is relatively prime with
respect to two.
0000:0000:0000:0000:0000:0000:0000:0000■0000:0010:1000:1111:0101:1100:0010:1001
0000:0000■028F:5C29
0000000000.01000000000931322574615478515625
Regards,
Steve N.
The guts of this are from raymonds fpulib
.data?
tempex dd ?
ansx real10 301 dup (?)
.code
mov tempex,-150
lea edi,ansx
finit
.repeat
fild tempex ; start of Raymonds code
fldl2t
fmul ;->log2(10)*exponent
fld st(0)
frndint ;get the characteristic of the log
fxch
fsub st,st(1) ;get only the fractional part but keep the characteristic
f2xm1 ;->2^(fractional part)-1
fld1
fadd ;add 1 back
fscale ;re-adjust the exponent part of the REAL number
fstp st(1) ;get rid of the characteristic of the log
fstp real10 ptr [edi] ;end of raymonds code
add edi,10
inc tempex
.until sdword ptr tempex>150
Quote from: Jimg on July 09, 2009, 07:59:33 PM
The guts of this are from raymonds fpulib
Thanks, Jim, but even this one is inexact: 9.9999999999999999500E+54
My current precision is stuck at the very last digit...
that looks more like a float2str problem - i.e. your real may be correct
Quote from: dedndave on July 09, 2009, 08:13:37 PM
that looks more like a float2str problem - i.e. your real may be correct
Copied from Olly...
lol - the assembler generated a NAN for 1.0e+0
EDIT
i must be reading it wrong
3FFF8000000000000000
the sign bit is 0
the 15 exponent bits are 1's
i thought that was a NAN
the mantissa is right
IIRC, representing each decade exactly (or a multiplier table thereof) is impossible because the mantissa would need many more bits to represent all the possible values. As the value increases in magnitude from zero, the granularity increases, which in turn decreases the likelihood of "landing on" any particular standard exact integer. This curve itself would be non-linear, unsure what it is exactly, possibly a bell curve.
The link below hosts a conversion tool to tinker with several float types. Using the IEEE754 32-bit schema, it is easy to see that encoding increasing 1.0 decades causes more mantissa bits to be used, and at some point (1.0e10 == 9999998976), decade values cannot be represented exactly any longer.
http://www.piclist.com/techref/microchip/math/fpconvert.htm
One idea could be to delta-encode the difference, and account for that somehow. Or just do things the "hard way," with a huge BCD and sign bit. Zero error in that.
http://steve.hollasch.net/cgindex/coding/ieeefloat.html
http://msdn.microsoft.com/en-us/library/0b34tf65.aspx
http://www.cs.grinnell.edu/~stone/courses/fundamentals/IEEE-reals.html
well - going up from 1.0, you are good up to 1.0e+19
all those should be precise
after that, small errors begin to accumulate
this is because you shift bits off the right end and increment the exponent
once the MSB 1 is in bit 63, you stop shifting
then, you use the carry flag to round the LSB - this is the error that accumulates
by retaining the right-shifted bits for the next successive multiplication, you can eliminate the errors
i am working on a routine right now to generate the upper portion of the table without rounding errors
i am using shl/add to multiply by 10, but it may be simpler to use multiple-precision mul's
speed is not really an issue here
well - except for how fast i can write it - lol
Zara comes first - so i work on it when she is sleeping
Quote from: Mark Jones on July 10, 2009, 12:25:29 AM
One idea could be to delta-encode the difference, and account for that somehow.
Quote
I tried that already, hoping to find some kind of regular pattern, e.g. one up each ten numbers, but nope. 300 numbers with bit-encoded corrections means roughly 10 doubles or 40 bytes, which would be more acceptable than 3k, of course.
Or just do things the "hard way," with a huge BCD and sign bit. Zero error in that.
Since speed is not an issue, this could be interesting (how would 1.0e-150 look in BCD??)
Thanks, Mark & Dave :thumbu
in packed BCD, it would be 75 bytes long, mostly 0's, with a 1 in the low nybble of the last byte
in binary, it is a little shorter - oops - when you divide they all have lost bits
1 is 1 bit long
10 is 4 bits (add 3)
100 is 7 bits (add 3)
1000 is 10 bits (add 3)
10000 is 14 bits (add 4)
notice the 3,3,3,4 pattern - my theory is that, at some point it is 4,3,3,3,4,3,3,4,3,3,3,4 (a 3 gets dropped)
that's only a guess, though - i have been wanting to see what that pattern is - maybe 150 is enough to see the hiccup
the length is important to know because, by keeping track, you know how many bytes require shift/add during mul by 10
you can speed up the shl/add mul by 10 procedure by only shifting bytes needed
with packed BCD, it is always 4, of course
Quote from: dedndave on July 09, 2009, 09:19:20 PM
lol - the assembler generated a NAN for 1.0e+0
EDIT
i must be reading it wrong
3FFF8000000000000000
the sign bit is 0
the 15 exponent bits are 1's
i thought that was a NAN
the mantissa is right
Let's see: 2 bits for the three, four bits for each F, you
want to add a bunch of even numbers to get an odd number...
To paraphrase the movie; "I'm sorry Dave, I can't do that".
14 of the 15 exponent bits are 1's.
Cheers,
Steve N.
Thanks, Steve
you're right - lol - i was asleep
i missed that first one - the sign bit (which i knew was 0) threw me off for some reason
when i say stupid stuff, ignore me - lol - i am getting old and my mind ain't as sharp as it used to be
ok - new day - rethunk - new approach
here is what i am working on, now
1) define constant data for 80-bit reals, values 1.0 and 1.0e+1
2) allocate 3010 bytes heap space and copy the 2 constants into the appropriate locations
3) generate 1.0e+2 to 1.0e+19 by using FPU multiplication
that gives us 2 decades
these should be precise with no rounding, in theory
in fact, they are the only values in the table that will be
4) generate 1.0e+20 to 1.0e+150 using the initial 2 decades
there are 2 ways this can be accomplished - testing will determine the best approach (first method is prefered)
a) multiply 1.0e+1 by 1.0e+19 to obtain 1.0e+20
then multiply the original 2 decades of data by 1.0e+20 to obtain the next 2 decades
then multiply that resultant decade by 1.0e+20 again to obtain the next 2 decades
and so on, until the upper half of the table is full
b) the other approach is to store constants for 1.0e+20, 1.0e+40, 1.0e+60, and so on, up to 1.0e+140
then use the original 2 decades to create the others
5) once the upper half of the table is complete (and if required, corrections applied),
use the inverse of those values to generate the lower half of the table
then, apply corrections to the lower portion, if required
here is my current source
i will update it as each section is completed
both attachments moved to July 10th, 2009 09:12:17 PM post
Hi,
I threw together a quick and dirty comparison program.
Uses FPU to multiply by ten (FIMUL) and an extended
precision integer routine, with a shift to compare with the
mantissa of the float value. Just multiplying by tens works
exactly out to 10^27.
F424:0000:0000:0000:0000:0000:0000:0000:0000:0000:0000:0000:0000:0000:0000:0000:
0000:0000:0000:0000:0000:0000:0000:0000:0000:0000:0000:0000:0000:0000:0000:0000:
6 4012F424000000000000
CECB:8F27:F420:0F3A:0000:0000:0000:0000:0000:0000:0000:0000:0000:0000:0000:0000:
0000:0000:0000:0000:0000:0000:0000:0000:0000:0000:0000:0000:0000:0000:0000:0000:
27 4058CECB8F27F4200F3A
88D8:762B:F324:CD0F:A588:0A69:FB6A:C800:0000:0000:0000:0000:0000:0000:0000:0000:
0000:0000:0000:0000:0000:0000:0000:0000:0000:0000:0000:0000:0000:0000:0000:0000:
50 40A588D8762BF324CD11
9C69:A972:84B5:78D7:FF2A:7604:1453:6EFB:CA75:8CBF:4FBB:7447:65D2:CCCB:79B7:6532:
9A3A:BF23:CE4A:0095:A804:C648:0000:0000:0000:0000:0000:0000:0000:0000:0000:0000:
150 41F19C69A97284B578DA
FWIW.
I will try and get the display to make a little more sense,
maybe.
Regards,
Steve N.
that may be the way to fly - lol - i am having problems with the simplest of instructions
i want to do this:
fmul st,st(1) ;st(0) * st(1)
fst tbyte ptr [edi] ;store the result
the assembler gives me "invalid operand error 2007" for the fst instruction
trying things out, i changed it to fstp and it assembles and works, sort of - still not what i want
fmul st,st(1) ;st(0) * st(1)
fstp tbyte ptr [edi] ;store the result
fld tbyte ptr [edi] ;crazy, huh
that code works, kinda
the values stored in memory always have the last 2 bytes set to 0 (kind of like real8)
how do i store the st(0) value in 80-bit precision without popping it ?
ok - having read the intel instruction set reference, i see that fst only stores real4 and real8's
fstp, on the other hand, is supposed to store the extended precision format provided the operand is a tbyte
so my code should work ok as is - i can live with that
now - instead of re-loading the thing, i will make a copy in st(1) first, then store it
maybe that will alleviate my headache - lol
ok - got it - appearantly, they are 0's - i let the loop run a few more times and the lower bits fill in
this does not make sense to me because the 80-bit extended real format has a 64-bit mantissa
that means that 18,446,744,073,709,551,615 is the largest value it can hold without rounding
ignoring this point for the time being and moving forward - lol
both attachments moved to July 10th, 2009 09:12:17 PM post
i also modified the program to generate a bin file, rather than looking at the table with a debugger
in a future update, i will probably define the entire table as data, then run a comparison test
Hi Dave,
48 bytes that may change your life :wink
Here is the table with the necessary corrections:
SHL ebx, 1
.if Carry?
SHL ebx, 1
.if Carry?
fmul 1.0000xxx1
.else
fmul 0.9999xxxx
.endif
.endif
Only 31 bits are valid, so the next dword must be loaded after 31 shifts.
01234567890123456789012345678901
dd 00000001000100000011100100010000b
dd 10000000000010010000110000000010b
dd 01011001100100001100000000000000b
dd 10110110000000011000011011110110b
dd 00001110000010000110110000001000b
dd 00000000001101100110000000000000b
dd 00000100000000000011001000000000b
dd 01000010010001010000000001100110b
dd 11010001011011000001110011001100b
dd 00001110001100000000000100100000b
dd 00000100000000001100000100010000b
dd 00001000010000000000000000000000b
ok - the upper portion of the table is generated
the last value matches exactly that generated by a masm define
let me create the lower portion
both attachments moved to July 10th, 2009 09:12:17 PM post
corrections ? - what corrections - lol
or are you already done with it ?
ok - the first value and the last value look good with no corrections
verification time
the d/l in the above post is updated with a version that appears to generate the entire table correctly (at first glance)
now working on comparison tests
a few 1 and 2 lsb errors by comparing files
i don't understand your post, Jochen
Quote from: dedndave on July 10, 2009, 08:52:28 PM
corrections ? - what corrections - lol
or are you already done with it ?
Well, kind of - I generated an algo that compares (in REAL10 memory, not FPU) the last bit of the real thing against the last bit of the generated one. If they differ, it sets the current bit in a register, and the next bit to 1 if eax>0 and 0 if eax<0. But I just saw I forgot a tiny little step when the correction does not produce the desired result. Given that I am very tired, and Saturday is booked, you might get the good one only on Sunday. So you still have a chance to catch up, but don't neglect Z!
Z told me to go play
over time, i have learned not to argue with her - lol
(her "lappy" is all fixed up - she is having fun online at the moment)
in the upper portion of my table, i have 45 values that need 1 or 2 lsb correction (i.e. +/- 1 or 2)
if those corrections are applied prior to creating the lower portion, it appears it will be generated with no need for correction
before i correct anything, i am going to take a look at a few values and make sure masm is right and i am wrong
EDIT
it does not look like my routine generates the cumulative type errors i might have expected
for example, my value for 1.0e+48 is 1 lsb lower than that generated by masm (this is my frst generated error, btw - 2 thru 47 are good)
that value is used to generate 1.0e+68, 1.0e+88, 1.0e+108, and so on
my generated values for 1.0e+88 and 1.0e+108 match those generated by masm
if i corrected that value as it was generated, it may or may not cause errors down the road - hard to know, really, without trying it
only 3 of my values are off by +/-2 - the other 42 errors are +/- 1 lsb
the type of pattern suggests that masm may be off on some and i may be off on others
if all the errors were mine, i would expect them to propogate into all the values that are based on those first-generated errors
also, the differences are flat 1 or 2 lsb errors
if my routine was generating cumulative errors, they would be become more severe as generation progressed
what all this means is, i have to work harder now, to verify which are correct and which aren't - lol
i have to write a routine that will evaluate the binaries to at least a few digits more than those usable
then, see which table has the value closest to the desired value
i guess i could work it the other way and write a routine that generates the table
to a high degree of accuracy, round those results to nearest, then compare
i think the first method may be easier - lemme think on it
it is interesting to note that my ordinals are pretty good
i use my generated values for 1.0e+20, 1.0e+40, 1.0e+60, 1.0e+80, 1.0e+100, 1.0e+120, and 1.0e+140
to generate the 2-decade group that follows
of all those values, only the last one disagrees with masm (by 1 lsb)
both attachments moved to July 10th, 2009 09:12:17 PM post
a simple little change
i got it down to 35 errors in upper portion
all of them are +/- 1 lsb
by the way - i tested a few of the values - masm has the right answers, at least for the ones i tested
both files updated on this post
[attachment deleted by admin]
FORTRANS summarized it well. Powers of 10 will be exact only in the range of 100-1019 for floats when converted back to decimal. They would however still be exact up to 1027 before conversion to decimal, the 27 trailing bits all being 0 anyway if you write 1027 in binary (90 bits required).
All other powers of 10 will thus loose some relative precision equivalent to at least +/- 2-63 in the REAL10 format. I believe that the best precision with the FPU would be obtainable using the procedure based on log2/antilog2 as suggested in a previous post. That would most probably be the approach used by the assembler when declaring such values in the .data section.
hmmmmmmmmmmmm
log2 and antilog2
they sound tailor made for base conversion
i.e. what are we doing trying to mul and divide by 10 - lol
use logarithms to convert reals to decimal and vica versa
then, you don't need the table
loga X = logn X / logn a
should be bloody fast, too
Quote from: dedndave on July 10, 2009, 09:24:23 PM
the type of pattern suggests that masm may be off on some and i may be off on others
Good grief - I thought Masm was perfect :wink
Below my slightly adjusted table - I am down to 0 fails, i.e. generated and "original" table are identical. Have to leave the whole of today, tonight I want to see results, my friend :bg
.data
f2sMore REAL10 1.0000000000000000001
f2sLess REAL10 0.9999999999999999999
; 01234567890123456789012345678901
f2sMulTable\
dd 00000000000000001110011000000000b ; 0
dd 00000001001100011000000001000110b ; 4
dd 00000000110000000000000011000000b ; 8
dd 00000100001101011000000011100000b ; 12
dd 10000110010000001100000000000011b ; 16
dd 01100100000000000000000000000000b ; 20
dd 00000000011000000000000100000110b ; 24
dd 00100000011001011001000110010000b ; 28
dd 01110011000000000010001100000000b ; 32
dd 00010000011000000110000000001100b ; 36
dd 00010001001100000011000000000000b ; 40
TestCurve proc uses esi edi ebx
LOCAL f2sShCt:SDWORD, f2sShPos:DWORD, fail
mov eax, Str$(1) ; init calibrated table
mov f2sShPos, offset f2sMulTable
mov ebx, f2sMulTable
fld f2s0dot1 ; 10.0 ------- create multplier table --------
fld f2sSeed ; 1.0e-144
mov ecx, ExRange*2-1
mov edx, offset f2sR10b ; generated table
mov edi, offset f2sR10 ; compare against the real thing
add edx, ExRange*2*10-2*10
add edi, ExRange*2*10-2*10
; 12345678901234567890123456789012
NumBits = 31
m2m f2sShCt, NumBits
and fail, 0
xor esi, esi
; SHL ebx, 1 ; carry set = correction needed
@@:
inc esi
fmul st, st(1) ; create new entry
fld st ; create a copy
fstp REAL10 ptr [edx] ; store & pop
mov eax, [edx]
sub eax, [edi]
; .if eax
; int 3
; .endif
SHL ebx, 1 ; carry set = correction needed
; int 3
.if Carry?
SHL ebx, 1
.if !Carry?
fld f2sMore
.else
fld f2sLess
.endif
fmul
fld st ; create a copy
fstp REAL10 ptr [edx] ; store & pop
mov eax, [edx]
sub eax, [edi]
.if eax
inc fail
mov eax, f2sShPos
sub eax, offset f2sMulTable
pushad
print str$(eax), 9
print str$(esi), 13, 10
popad
; int 3
.endif
dec f2sShCt
.endif
dec f2sShCt
.if Sign?
mov ebx, f2sShPos
add ebx, 4
mov f2sShPos, ebx
mov ebx, [ebx]
m2m f2sShCt, NumBits
.endif
sub edx, 10
sub edi, 10
dec ecx
jne @B
fstp st
fstp st
print Str$("Fail=%i", fail)
getkey
exit
ret
TestCurve endp
i was thinking of diddling the RC bits in the FPU to handle the offsets - lol
but, this logarithm thing has my curiousity up a bit
i may play with a logarithm routine to convert 64-bit integers to decimal strings
once you get that going on, you can use a similar technique to convert reals
i may take some play time, myself - lol
enjoy your day, Jochen
The attached table makes the values look exact in OllyDbg. I wonder, though, whether Masm codes bad values, or whether Olly displays good values badly. Any idea how to check that??
[attachment deleted by admin]
well, i have checked a few points manually
for the ones i checked, masm had the right stuff
here is an example of how i did them....
take the 64-bit mantissa and convert it to decimal, just as though it were a 64-bit integer
then, take the exponent, subtract 16383 (the bias), subtract 63 (63 trailing bits under the MSB)
that value tells you how many times to multipy or divide by 2
this is masm's value for 1.0e+29
E5 0B B9 36 D7 07 8F A1 5F 40
exponent
40 5F (=16479 decimal: 16479-16383-63=33 that means we need to multiply the mantissa by 2^33, if it were negative, then divide)
mantissa
A1 8F 07 D7 36 B9 0B E5
in decimal, that is
11,641,532,182,693,481,445
so to see the result, multiply
11,641,532,182,693,481,445 X 2^33
i use my 64-bit unsigned integer routine
then i made a quicky routine to multiply and divide decimal strings repeatedly by 2
slow and crude, but it is a basic 80-bit real to decimal ascii routine with no formatting
it is capable of displaying the precise evaluation of reals, even though many of the trailing digits are unusable
Now it gets philosophical: Do we want the correct values, or those that are needed to display correct results after multiplication...??
; Input for Masm ; display in OllyDbg hex in OllyDbg
e62a REAL10 1.000000000000000000043e-62 ; 1.0000000000000000001E-62, 3F31 83A3EEEE F9153E8A
e62b REAL10 1.000000000000000000042e-62 ; 9.9999999999999999990E-63, 3F31 83A3EEEE F9153E89
e62c REAL10 1.000000000000000000000e-62 ; 9.9999999999999999990E-63, 3F31 83A3EEEE F9153E89
I had the bright idea to launch WinDbg for comparison, but the coward stops at 16 digits precision for displaying an 80-bit float :tdown
we want the closest value, of course
i think a really good real2str routine would allow me to choose how many places i want to display
if i elect to see 20 decimal digits, then i should be prepared to see a lot of "9"s
if i set the length to 18, they should all go away
at 19, i think you may still see some "9"s
you might even provide a setting that calculates the "usable digits" and sets the length to that
you could have a routine where the digit count is set as a constant - so long as it remains set to one value, the routine spits em out that long
(rather than having to pass the length as a parameter each time)
of course, you could have the routine call under two different names and do both
when i want to insure they display that way, i add a small delta, then truncate - the routine could provide that function
another possibility is to have a rounding control, similar to the fpu rc bits
the deluxe has all of the above - lol
the point is, it is the duty of the display routine
we always want the value that is nearest the intended value, especially if it is to be used for calculation, later
remember, extended reals weren't meant to be used for display
they were made for intermediate calculations, to retain more precision than required for presentation
i am trying to think of an application where i would want to generate thousands upon thousands of these numbers
this is the app that would require the routine to be very fast
otherwise, we either don't need a super-fast routine, or we don't need that many digits
the only answer i can come up with would be a special scientific app
there may be many apps that may store the reals, but not display them (CAD, animation, etc) - no conversion needed for that
accounting apps aren't likely to see that many digits unless we are printing the check that the US gov't pays to haliburton
if that check were off by 1 cent, the tax-payers might have a reason to get really upset
at any rate, the features mentioned above would make the routine much more useful
i think this is a case where features outweigh speed
of course, as assembler programmers, we want both - lol
i wanted to clean up my extended real evaluation program a bit and share it
i didn't spend a lot of time on it, but it works - lol - try not to be too critical of my code
before i post it, i am going to add the decimal point, sign, and clean up leading and trailing zeros - lol
i also want to detect QNaN's, SNaN's, infinity, and unsupported values
these values are defined a little differently for the extended real format, due to the explicit MSB
the information wasn't easy to find and i thought i might post this table for those who are interested...
(http://img34.imageshack.us/img34/5912/extreal.gif)
EDIT
notice that Signaling NaN's require at least one of the lower 62 bits to be set
this is not so for Quiet NaN's
EDIT
Quiet NaN's and Signaling NaN's may have the sign bit set to 1
EDIT
the value: ffff c0000000 00000000 is an "Indefinate"
a special case of the set of Quiet NaN values
EDIT
all other values are "Invalid"
one thing i have not been able to find out
perhaps someone out there knows the answer (hint, hint Raymond)
is it possible for NaN's to be negative ? - or are those invalid values
i suppose i could try loading one and multiply it, huh
Here's what I have in the tutorial. I most probably researched it at that time.
QuoteApart from the INFINITY and INDEFINITE values which can be generated by the FPU, there is a very large number of other NANs with all the possible permutations of fraction bits and sign bit being set to 1 when all the bits in the exponent field are set to 1.
ok Ray - thank you for helping out - i have modified the post above to clarify a few values
here is something for you to chew on, Jochen - lol
Quotewe want the closest value, of course
now - how do we define "closest"
i think i know the answer, but it may be a matter of opinion/interpretation, as well
the application could affect what "the right answer" is, too
let's say we have a hypothetical numbering system
we want to represent the value 100.000000
in this numbering system, the values closest to 100.000000 are 99.999000 and 100.001000
at first glance, you might say either answer will work equally well
|100.001000-100.000000| = 0.001
|100.000000-99.999000| = 0.001
BUT !
100.001000/100.000000 = 1.00001
100.000000/99.999000 = 1.000010000100001.....
it is easy to see that, in intel extended reals, the closest value algebraically
may be one value, while the closest value geometrically may be the other
i think the geometrically closest value is the right answer in most applications
in accounting, however, the algebraically closest may be correct
this poses a problem to us of a different nature
finding the geometrically closest value requires division, which is slower
we need Ray, again - lol
When you start comparing apples, you have to continue comparing apples.
100.001000/100.000000 and 100.000000/99.999000 all have effectively 8 significant digits.
If you want to compare the results of divisions 1.00001000 to 1.000010000100001....., you should retain only 8 significant digits, thus comparing 1.00001000 to 1.00001000 which are identical in this example.
well - i chose a bad exmple, perhaps
the same conclusions could be drawn from 100.999, 101.000, and 101.001
Quote from: dedndave on July 13, 2009, 10:44:58 PM
here is something for you to chew on, Jochen - lol
Quotewe want the closest value, of course
now - how do we define "closest"
That depends on the context, I guess. For an everyday routine like print Str$(1/3) (see my update, using the new algo for generating the multiplier table (http://www.masm32.com/board/index.php?topic=11781.msg89751#msg89751)), the rule should be "choose the one that yields the most usable results". For climate change modelling, you will simply choose an algo with a defined precision, e.g. 128 bits - slow and bulky but as precise as you need it. Here is an example how IBM tested higher precision (http://svn.python.org/projects/sandbox/trunk/decimal-c/dt2/log10.decTest).
By the way, I found it difficult to google up exact values for constants such as PI = 3.14159265358979323846
Does Google have a wildcard option??
i googled "precise pi"
found this interesting goody on codeproject - Pi = 4 * Atn(1)
anyways, i found several links by googling "precise constant tables" or "precise constant table pi"
this is the wiki page - the reference material at the bottom has lots of nice links...
http://en.wikipedia.org/wiki/Mathematical_constant
http://mathworld.wolfram.com/topics/Constants.html
as you said - context - in most applications, intermediate values eventually get mulitplied or divided
in such cases, the geometrically closest value is the one to have
of course, when you have 64 bits of precision to begin with, +/-1 lsb is kind of splitting hairs
but, i was trying to approach it from a theoretical viewpoint
i can, however, see scientific applications where it might make a difference
these are the applications where extended reals are most valuable
i suppose, by use of a bit of trickery, extended-extended reals are possible
i.e. using 2 80-bit extended real values to perform the equivalent of 128-bit precision math (probably 127 bit equivalent at best)
in these cases, the correct result is a bit more important
of course, they can make any precision system they want, outside the intel 86/87 framework
for my own applications, i have many times performed calculations using high-precision,
then used single-precision final results
in electronics, very few instruments are capable of displaying values beyond 4 or 5 digits
frequency counters are one exception - i have used counters with 12 or 13 digits - and they were pretty damn close to accurate, too
(time and frequency are two measurements on which we spend much toil - lol)
components with electrical tolerences of +/- 0.1% are considered to be very good (resistors, etc)
so, to calculate the value of a resistor with 64 bit precision is a waste of time - lol
if i cut a board to within +/- 1/32", i am doing pretty well - it will likely swell and shrink with temperature and humidity more than that
with dial calipers, i can measure a small item to about .00025" accuracy
electron beam microscopes can get down to very tiny resolutions, i dunno anyone who has one in their pocket
for many scientific needs, we rely on the National Institute of Standards and Technology for accurate measurements...
http://physics.nist.gov/cuu/
here you go, Jochen - an updated version of what i was using - lol - very handy
Numerical Cumputation Guide
http://docs.sun.com/app/docs/doc/819-3693
direct link
http://dlc.sun.com/pdf/819-3693/819-3693.pdf
Can I ask what the aim of these tables are? Apologies if it is obvious.
Jochen (jj2007) is writing a floating point library
he wants the table for float to string conversion
this particular table is for extended reals (80-bit)
the table may be defined as data, but it takes up 3,010 bytes
he was looking for a simple way to generate the table during library initialization and make the library a little smaller
i got the code+data down to 196 bytes - he is at 188 - lol
but, the last change i made makes the routine take twice as long
i am going to try a different approach
Quote from: dedndave on July 16, 2009, 11:47:09 PM
he wants the table for float to string conversion
this particular table is for extended reals (80-bit)
Test it (http://www.masm32.com/board/index.php?topic=11781.msg89037#msg89037)
Jochen, I started a new thread with my version of the table generator.
I wanted to use it to try and resolve some timing issues....
http://www.masm32.com/board/index.php?topic=11908.msg90299#msg90299