The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: Gunther on October 25, 2010, 10:03:19 PM

Title: What is it worth?
Post by: Gunther on October 25, 2010, 10:03:19 PM
The IBM super computer at the Los Alamos laboratory in New Mexico is one of the fastest machines in the world. It is used there for the computation of nuclear weapons and the simulation of nuclear tests. The machine brings approximately 1 peta flop. Peta stands for 10^15, and flop stands for floating point operations per second. 20 years ago was the Cray 2 the fastest machine with 1 giga flop. From 2001 to 2004, the Japanese earth simulator in Yokohama was the fastest machine with 37 tera flops. Tera stands for 10^12.

I'll try to illustrate these numbers a bit. Let's start with the giga flop machines; a usual PC with an average Intel or AMD processor can reach that area. We can try to print out the numbers which a giga flop computer produces in a second. If we use small letters, we can print 100 rows and 5 columns (500 numbers) on one side of a normal printer sheet. By using both sides, that gives 1000 numbers per sheet. A package of printer paper has a height of approximately 10 cm; 10 packages are 1 m. A giga flop computer produces in 1 second numbers which give a paper stack of 100 m; that's more than 1/4 of the height of the Empire State Building.

A tera flop computer is 1000 times faster. It produces a paper stack of 100 km height in the same time. That's approximately the distance between New York and Philadelphia. If such a computer calculates 1 hour (3600 seconds), it would lead to a paper stack of 360 000 km; that is approximately the distance between earth and moon.

A peta flop computer is again 1000 times faster. It will produce a paper stack of 100 000 km per second. In 25 minutes it gives a paper stack of 150 000 000 km. That is the distance between sun and earth. The floating point speed of modern computers has indeed reached astronomical dimensions.

The Japanese earth simulator solves for example a linear equation system with 1 000 000 variables in 1 000 000 equations in 5 hours. That's impressive. But how reliable are the results? The calculations are usual done with REAL 8 numbers (double), a 64 bit word, which gives approximately 16 decimal digits.

The attached test program adds 5 elements of a double array and calculates the array sum. The speed isn't the main question here, we only want the right result (which is 137, as you will see without any computer). It's always the same vector, but with different element order. The results are mostly false. But that's not all. In the new 64 bit world (both Windows and Unix), in practice all floating operations are done with XMM registers; the old FPU has nothing to do there. That's very dangerous, because the FPU results of the test program are not so "false" like the XMM results. That has to do with the fact, that the FPU uses the internal 80 bit format for calculations, which isn't possible with XMM registers.

The program is written for the gcc with a bit inline assembly (Intel syntax). It shouldn't be to hard, to implement it with VC.

Gunther
Title: Re: What is it worth?
Post by: Antariy on October 25, 2010, 10:16:17 PM
I got these results:

Sum 1 (FPU) = 136.00 Sum 1 (XMM) = 0.00
Sum 2 (FPU) = 137.00 Sum 2 (XMM) = 17.00
Sum 3 (FPU) = 136.00 Sum 3 (XMM) = 120.00
Sum 4 (FPU) = 139.00 Sum 4 (XMM) = 147.00
Sum 5 (FPU) = 137.00 Sum 5 (XMM) = 137.00
Sum 6 (FPU) = 134.00 Sum 6 (XMM) = -10.00

Right Sum   = 137.00




Alex
Title: Re: What is it worth?
Post by: Gunther on October 25, 2010, 10:20:24 PM
Alex,

yes, that's exactly the problem. If you've a look in the source, you'll find out that the results are mostly false. But now I've to check your pretty nice RAM utility.

Gunther
Title: Re: What is it worth?
Post by: Gunther on October 25, 2010, 10:42:05 PM
Are you joking Alex? That should be all what VC can compute?

Gunther
Title: Re: What is it worth?
Post by: Antariy on October 25, 2010, 10:49:47 PM
Quote from: Gunther on October 25, 2010, 10:42:05 PM
Are you joking Alex? That should be all what VC can compute?

Many sorrys  :green2 Previous version is bugged. Too late, I'm tired.

Well, there is a right version.

Results:

Sum 1 (FPU) = 0.00 Sum 1 (XMM) = 0.00
Sum 2 (FPU) = 17.00 Sum 2 (XMM) = 17.00
Sum 3 (FPU) = 120.00 Sum 3 (XMM) = 120.00
Sum 4 (FPU) = 147.00 Sum 4 (XMM) = 147.00
Sum 5 (FPU) = 137.00 Sum 5 (XMM) = 137.00
Sum 6 (FPU) = -10.00 Sum 6 (XMM) = -10.00

Right Sum   = 137.00




Alex
Title: Re: What is it worth?
Post by: Antariy on October 25, 2010, 11:37:06 PM
At MSVC10.ZIP

..........
    fstp       qword ptr [esp] <--- really nice thing :)
    fld qword ptr [esp]
    pop ecx
..........

Title: Re: What is it worth?
Post by: raymond on October 26, 2010, 03:35:07 AM
Such results should certainly NOT be a surprise. They are entirely predictable. One should always be aware of the precision available with any instrument.

How would you like to try measuring accurately thicknesses of a few micrometers added onto a block of concrete with an ordinary ruler???

The FPU has a maximum precision of 64 bits, equivalent to some 19 decimal digits when used in extended double precision. The XMM uses only double precision with 54 bits of precision equivalent to some 16 decimal digits. Thus, if you mix large numbers with 20+ decimal digits (as in this case 1e20) with small numbers, some precision loss is bound to happen, more of it with the less precise instruments. Try your program with 1e25 instead of 1e20 and the FPU won't fare any better than the XMM. :(

Are you sure that the IBM super computer is not running a few giga flops faster than you reported? :bg

Edit: One thing I noticed was the size of the EXE which was some 4x larger than the source code. So much for bloating with these HLLs.
Title: Re: What is it worth?
Post by: Twister on October 26, 2010, 04:27:04 AM
What if we increase the precision using software instead of the cpu handling the whole calculation part?

I do remember someone talking about this but with strings. It could go up to 5.7697487348734872348734873482734892347289237483258752395728 x 10456
Title: Re: What is it worth?
Post by: jj2007 on October 26, 2010, 07:49:48 AM
Quote from: raymond on October 26, 2010, 03:35:07 AM
The FPU has a maximum precision of 64 bits, equivalent to some 19 decimal digits when used in extended double precision. The XMM uses only double precision with 54 bits of precision equivalent to some 16 decimal digits. Thus, if you mix large numbers with 20+ decimal digits (as in this case 1e20) with small numbers, some precision loss is bound to happen, more of it with the less precise instruments. Try your program with 1e25 instead of 1e20 and the FPU won't fare any better than the XMM. :(

Exactly. Here is what you get using the FPU:
QuoteFPU, Real8
Sum=128.0000
Sum=129.0000
Sum=136.0000
Sum=131.0000
Sum=137.0000
Sum=134.0000

Note that results are much closer to 137 than Alex' second VC version attached above (while the first version posted as reply #1 yields the same results as the MB code below - why is the second version less precise???). Most probably, VC uses only the 53 bit mode of the FPU.

Just for fun, I also added a version in which the variables are REAL10, but it does not change anything because fld V1 yields exactly the same value as fld R10. It is the subsequent steps that cheat the FPU, i.e. 1.0e20+17=1.00...2, -10.0=1.00....0 etc

Only V5 resp. R105 yield a correct result: 1.0e20+-1.0e20=0 (exactly), +17-10+130=137

include \masm32\MasmBasic\MasmBasic.inc
.data
V1 REAL8 1.0e20, 17.0, -10.0, 130.0, -1.0e20
V2 REAL8 1.0e20, -10.0, 130.0, -1.0e20, 17.0
V3 REAL8 1.0e20, 17.0, -1.0e20, -10.0, 130.0
V4 REAL8 1.0e20, -10.0, -1.0e20, 130.0, 17.0
V5 REAL8 1.0e20, -1.0e20, 17.0, -10.0, 130.0 ; this one yields the correct result
V6 REAL8 1.0e20, 17.0, 130.0, -1.0e20, -10.0

R101 REAL10 1.0e20, 17.0, -10.0, 130.0, -1.0e20
R102 REAL10 1.0e20, -10.0, 130.0, -1.0e20, 17.0
R103 REAL10 1.0e20, 17.0, -1.0e20, -10.0, 130.0
R104 REAL10 1.0e20, -10.0, -1.0e20, 130.0, 17.0
R105 REAL10 1.0e20, -1.0e20, 17.0, -10.0, 130.0
R106 REAL10 1.0e20, 17.0, 130.0, -1.0e20, -10.0

Init
Print "FPU, Real8", CrLf$
Print Str$("Sum=%f\n", V1+V1[8]+V1[16]+V1[24]+V1[32])
Print Str$("Sum=%f\n", V2+V2[8]+V2[16]+V2[24]+V2[32])
Print Str$("Sum=%f\n", V3+V3[8]+V3[16]+V3[24]+V3[32])
Print Str$("Sum=%f\n", V4+V4[8]+V4[16]+V4[24]+V4[32])
Print Str$("Sum=%f\n", V5+V5[8]+V5[16]+V5[24]+V5[32])
Print Str$("Sum=%f\n\n", V6+V6[8]+V6[16]+V6[24]+V6[32])

Print "FPU, Real10", CrLf$
Print Str$("Sum=%f\n", R101+R101[10]+R101[20]+R101[30]+R101[40])
Print Str$("Sum=%f\n", R102+R102[10]+R102[20]+R102[30]+R102[40])
Print Str$("Sum=%f\n", R103+R103[10]+R103[20]+R103[30]+R103[40])
Print Str$("Sum=%f\n", R104+R104[10]+R104[20]+R104[30]+R104[40])
Print Str$("Sum=%f\n", R105+R105[10]+R105[20]+R105[30]+R105[40])
Print Str$("Sum=%f\n\n", R106+R106[10]+R106[20]+R106[30]+R106[40])

Inkey Str$("Your puter has run %3f hours since the last boot, give it a break!", Timer()/3600000)
Exit
end start
Title: Re: What is it worth?
Post by: hutch-- on October 26, 2010, 08:49:23 AM
 :bg

BCD anyone ?
Title: Re: What is it worth?
Post by: vanjast on October 26, 2010, 12:12:59 PM
Quote from: hutch-- on October 26, 2010, 08:49:23 AM
:bg

BCD anyone ?
with adjustments...
Title: Re: What is it worth?
Post by: raymond on October 26, 2010, 04:48:12 PM
BCD is one way to go for best accuracy, limited only to the amount of memory available. Even then, the extent of fractional errors must be fully understood to estimate the accuracy of the least significant digits.

The advantage of BCD is that conversion to ascii is not a problem if the result must be displayed in readable form by the average human.
Title: Re: What is it worth?
Post by: Gunther on October 26, 2010, 08:39:55 PM
Raymond,

Quote from: raymond, October 26, 2010, at 05:48:12 PMBCD is one way to go for best accuracy,

Right, it avoids especially converting errors between the decimal and binary system. But usual BCD arithmetic won't help much by our problem. What's dangerous? For example, subtracting two numbers which are approximately even, will lead to a significant accuracy loss. That's increased by rounding after every operation. What would we need? A long accumulator to accumulate the interim results and rounding by finishing the calculation. We could round up and round down and we would get the result inside an interval (the idea comes from the interval mathematics). That would save us a lot of numerical surprises.

Quote from: raymond, October 26, 2010, at 04:35:07 AMAre you sure that the IBM super computer is not running a few giga flops faster than you reported

May be it runs faster, but that doesn't change anything. The question isn't speed, but accuracy.

Quote from: raymond, October 26, 2010, at 04:35:07 AMOne thing I noticed was the size of the EXE which was some 4x larger than the source code. So much for bloating with these HLLs.

Raymond, it's clear that we can beat with a standalone assembly language application every HLL implementation in size and speed. But is that really necessary for such a small test program? On the other hand, the program runs under Windows, Linux, and BSD without modifications and 20 KB isn't really bloat ware.

I'm trying to write a package for arbitrary accurate floating point operations (with rounding interval); it'll be sure written in assembly language. But I would need a bit help.

Gunther
Title: Re: What is it worth?
Post by: Antariy on October 26, 2010, 09:34:45 PM
Quote from: raymond on October 26, 2010, 03:35:07 AM
Edit: One thing I noticed was the size of the EXE which was some 4x larger than the source code. So much for bloating with these HLLs.

:bg

Yes, agree.

The new one attached which is updated to for compiling with MSVC. Also results is slightly closer to right - because in main function I reinitialize FPU. When optimizing is on - compiler tried to keep all params at registers, so, even if MSVC have not support 80bit precision, we can get it manually, because losses would not occur while numbers are not stored to memory. But this is not good approach, of course, just note.

Attached archive contain updated source (© Gunther), where I insert FINIT in main function.

Also executable file have smaller size than source now  :bg

Results for this program:

Sum 1 (FPU) = 136.00 Sum 1 (XMM) = 0.00
Sum 2 (FPU) = 137.00 Sum 2 (XMM) = 17.00
Sum 3 (FPU) = 136.00 Sum 3 (XMM) = 120.00
Sum 4 (FPU) = 139.00 Sum 4 (XMM) = 147.00
Sum 5 (FPU) = 137.00 Sum 5 (XMM) = 137.00
Sum 6 (FPU) = 134.00 Sum 6 (XMM) = -10.00

Right Sum   = 137.00



FPU results is "better", but this is have no meaning at all - they are just close, still not right.



Alex
Title: Re: What is it worth?
Post by: dioxin on October 26, 2010, 09:36:41 PM
There's no reason binary can't give results just as exactly as BCD.
The only advantage of BCD is the conversion to/from readable numbers but there's a huge reduction in calculation speed when calculating in BCD so for non-trivial calculations it's usually better to use binary for all calculations and just convert at the end to readble digits if needed.

Paul.
Title: Re: What is it worth?
Post by: Gunther on October 26, 2010, 10:00:45 PM
Paul,

Quote from: dioxin, October 26, 2010, at 10:36:41 PMThere's no reason binary can't give results just as exactly as BCD.

That's right for integer numbers. But you should try to convert 0.1 (decimal) into the binary system. That effect happens with 0.1 and all multiple of it.

Gunther
Title: Re: What is it worth?
Post by: Antariy on October 26, 2010, 10:11:10 PM
Quote from: Gunther on October 26, 2010, 10:00:45 PM
That's right for integer numbers. But you should try to convert 0.1 (decimal) into the binary system. That effect happens with 0.1 and all multiple of it.

Yes, for example, nice results can happen when you are multiply something for that number:

3FFB E3 8E 38 E3 8E 38 E3 8E


There is binary form of 0.(1) for 80bit number.




Alex
Title: Re: What is it worth?
Post by: Gunther on October 26, 2010, 10:16:16 PM
Here is some equipment, which I use  :bg

(http://www.masm32.com/board/index.php?action=dlattach;topic=15160.0;id=8303)

That's the MADAS (stands for Multiplication, Automatic Division, Addition and Subtraction) of the company Egli in Zurich from 1936. Please have a special look at the long accumulator at the top of the machine, to accumulate interim results without accuracy loss.

(http://www.masm32.com/board/index.php?action=dlattach;topic=15160.0;id=8304)

That's the Monroe, Model Monochromatic, produced by Monroe Calculating in Orange, New Jersey from 1956. It has a long accumulator, too.

With that both machines I've calculated the right vector sum. Unfortunately, Antariys new memory tool won't work with that equipment  :(.

But joke apart, that are old principles (long accumulator) and should give us the idea.

Gunther
Title: Re: What is it worth?
Post by: Antariy on October 26, 2010, 10:21:51 PM
Quote from: Gunther on October 26, 2010, 10:16:16 PM
Here is some equipment, which I use  :bg
....
With that both machines I've calculated the right vector sum. Unfortunately, Antariys new memory tool won't work with that equipment  :(.

Well, that is question of instruction sets only :green2

These machines have bigger accuracy than 80bit FP.  :bg



Alex
Title: Re: What is it worth?
Post by: Gunther on October 26, 2010, 10:26:01 PM
Quote from: Antariy, October 26, 2010, at 11:21:51 PMThese machines have bigger accuracy than 80bit FP.

That's for sure, Alex.

Gunther
Title: Re: What is it worth?
Post by: dioxin on October 26, 2010, 10:32:21 PM
Gunther,
QuoteThat's right for integer numbers. But you should try to convert 0.1 (decimal) into the binary system. That effect happens with 0.1 and all multiple of it.
The same applies when using BCD.
In binary 1/256 = 0.00000001 (1 byte after the point)
In decimal it's 0.00390625  (4 BCD bytes after the point)
If you're restricted to 1 byte after the point then binary gets it right and BCD fails.
And for calculation results, look at 1/3.
In binary 1/3 = 0.01010101 to 1 byte (an error of 0.39%)
In BCD 1/3 = 0.33  to 1 byte (an error of 1.01%)


1/3 is not representable in BCD either so why drop binary because it can't represent all numbers and then choose BCD which not only can't represent all numbers but on average represents them less accurately than binary and has the additional huge calculation overhead?

It's not binary that is at fault.


And I didn't mean to just use binary at the FPU does. Use it as fixed point binary after all BCD is just a modified fixed point binary.

Paul.
Title: Re: What is it worth?
Post by: jj2007 on October 26, 2010, 10:37:02 PM
Quote from: Antariy on October 26, 2010, 09:34:45 PMAttached archive contain updated source (© Gunther), where I insert FINIT in main function.

Yep, that explains the differences. Note that you can also manipulate results by choosing the FPU rounding mode - depending on the values to sum up, results can differ quite a bit.
Title: Re: What is it worth?
Post by: Antariy on October 26, 2010, 10:47:07 PM
Quote from: jj2007 on October 26, 2010, 10:37:02 PM
Quote from: Antariy on October 26, 2010, 09:34:45 PMAttached archive contain updated source (© Gunther), where I insert FINIT in main function.

Yep, that explains the differences. Note that you can also manipulate results by choosing the FPU rounding mode - depending on the values to sum up, results can differ quite a bit.

Rounding mode have meaning with conversions of numbers.
But, rougly, more bits of precision, gives an opportunity to accumulate much bigger numbers. This have drastically effect with division, root, for example.

But, again, this is NOT good approach, of course. Moreover, in big project this is makes unpredictability of work of program.



Alex
Title: Re: What is it worth?
Post by: dioxin on October 26, 2010, 10:49:50 PM
Gunther,
both IEEE and Cray formats do include a more accurate formats. IEEE defines Quadruple precision giving 34 decimal digits (although it needs to be implemented in software as the FPU doesn't support it) and Cray defines a Double precision giving about 30 decimal digits.
It's up to the programmer to use a suitable format for the task.

Paul.
Title: Re: What is it worth?
Post by: raymond on October 26, 2010, 11:51:37 PM
Quotestands for Multiplication, Automatic Division, Addition and Subtraction

It's also possible to extract square roots with those mechanical calculators. Have you ever learned how to do it? :wink
Title: Re: What is it worth?
Post by: Antariy on October 26, 2010, 11:54:02 PM
Quote from: jj2007 on October 26, 2010, 07:49:48 AM
Note that results are much closer to 137 than Alex' second VC version attached above (while the first version posted as reply #1 yields the same results as the MB code below - why is the second version less precise???). Most probably, VC uses only the 53 bit mode of the FPU.

Yes, Win32 MSVC uses 64bit FPs. This is big discuss - just search on MS's site.

That is reason why I addeded reinitialization of FPU at main function. After do that simple optimized MSVC code behaves like BCC (at least 5.0 which I have) - which uses 80bit mode.



Alex
Title: Re: What is it worth?
Post by: dioxin on October 27, 2010, 09:21:56 AM
QuoteIt's also possible to extract square roots with those mechanical calculators
I learned that at school when I was about 15.It's a lot of subtracts and shifts.
I then programmed a PET computer to calculate square roots in the same way using 256 digit registers in screen memory so you could see the calculation in progress, it took about 2 seconds to do the full 256 digit result.

The exact same principle can be used in binary to calculate square roots. It's slightly easier in binary as each step either succeeds or fails and gives the next bit of the result  and you then move on to the next bit but in decimal each digit may need a number of successive subtractions to yield the digit.

Paul.
Title: Re: What is it worth?
Post by: MichaelW on October 27, 2010, 09:40:33 AM
The attachment contains a test based on Alan Miller's quadruple precision arithmetic module ( link (http://jblevins.org/mirror/amiller/#quad)). A FreeBASIC port of the module ( link (http://www.freebasic.net/forum/viewtopic.php?t=16531)), using the FPU version instead of the SSE2 version, produced virtually identical results. I'm not sure that the Fortran compiler does the additions in the order listed, but I know that the FreeBASIC compiler does.
Title: Re: What is it worth?
Post by: Gunther on October 27, 2010, 12:28:35 PM
Paul,

Quote from: dioxin, October 26, 2010, 11:32:21 pmAnd I didn't mean to just use binary at the FPU does. Use it as fixed point binary after all BCD is just a modified fixed point binary.

That's clear, we're talking about fixed point arithmetic. But it's not native supported by our processors. I won't argue against floating point arithmetic in binary. Floating point numbers have a large importance in scientific calculations. But everyone should know that in some cases the entire calculation crashes down. And the "modern" compiler design in the 64 bit world will lead to much more crashes.

Quote from: dioxin, October 26, 2010, 11:32:21 pm1/3 is not representable in BCD either so why drop binary because it can't represent all numbers and then choose BCD which not only can't represent all numbers but on average represents them less accurately than binary and has the additional huge calculation overhead?

Yes, and 1/7 is another example for that. These are infinite periodic decimal fractions and can't be represented exactly inside a finite state machine. But you know that 1/10, 2/10, 3/10, ... etc. will give infinite periodic binary fractions and can't be represented exactly, too. Those fractions are very common in our decimal number system. Therefore has the BCD arithmetic also a large importance. I haven't enough information about the situation in overseas or Australia, but in Europe is 85 - 90% financial sector software written in COBOL with BCD arithmetic.

Quote from: dioxin, October 26, 2010, October 26, 2010, 11:49:50 pmboth IEEE and Cray formats do include a more accurate formats. IEEE defines Quadruple precision giving 34 decimal digits (although it needs to be implemented in software as the FPU doesn't support it) and Cray defines a Double precision giving about 30 decimal digits.

That's clear. Super computers have a lot of other data formats. But the problems with floating point arithmetic remain. We've only a tool which is like a comb with less and lesser tooth and try to run through the hair.

Gunther