News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

pi program to calculate millions of digits fast

Started by Mark_Larson, October 11, 2006, 03:42:47 PM

Previous topic - Next topic

Wistrik

Mark,

No, I have standard 32-bit Windows XP Professional SP2. It's running on an AMD64 processor but not in native 64-bit mode.

Mark_Larson

#16
Quote from: Wistrik on October 12, 2006, 08:34:23 PM
Mark,

No, I have standard 32-bit Windows XP Professional SP2. It's running on an AMD64 processor but not in native 64-bit mode.

  Do you have a K7 or K8?  If you have a K7 you probably have the issue I mentioned earlier with the gcc compile switch.

EDIT: My bad Wistrik.  You said AMD64 and that's K8.  I was busy at work when I read your reply and missed that detail. It still works on my K8, so I am going to try the version attached to the forum in case it got corrupted.  If that's not it, I'll try to make a special debug version with no optimiations turned on to see if that is it.

EDIT2: It was the zip file being corrupted.  I didn't have a copy at work when I posted this, so I downloaded it from the pi-hacks newsgroup.  I just downloaded that version to my computer and got the same error as ya'll saw.  So I zipped up my current versions of chudp4 and chudk7.  It should work now.  Let me know if you have any problems.
BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

EduardoS

Working now:

Processor Name: AMD Athlon(tm) 64 Processor 3200+
Processor Speed: 2000 MHz
Number of processors: 1
#terms=70513, depth=18
..................................................
total   time =  2.828
pi = ...


Note 1: Your 2.2GHz AMD needs 2.82 seconds, shouldn't it be a bit faster than my 2.0GHz?

Note 2: Why the output result is 0.31...e1 instead of 3.1...e0?

Wistrik

Mark,

ChudK7 now works on my home computer. Here's my output, clipped for brevity:

Processor Name: AMD Athlon(tm) 64 Processor 3200+
Processor Speed: 2000 MHz
Number of processors: 1
#terms=73938, depth=18
..................................................
total   time =  2.969
pi =
0.31415926535897932<snip>e1


Edit: I'm not running with anything overclocked, but I do have a WinXP processor driver from AMD that varies processor speed in proportion to load, so it tends to run slower (and cooler) when the computer is 'idling', and full speed when I'm doing something like playing media or running a game.


Mark_Larson

Quote from: Wistrik on October 12, 2006, 11:37:59 PM
Edit: I'm not running with anything overclocked, but I do have a WinXP processor driver from AMD that varies processor speed in proportion to load, so it tends to run slower (and cooler) when the computer is 'idling', and full speed when I'm doing something like playing media or running a game.

  COOL!!!  where can I get a copy!!?!?!? :)


Quote from: EduardoS on October 12, 2006, 11:02:21 PM
Note 1: Your 2.2GHz AMD needs 2.82 seconds, shouldn't it be a bit faster than my 2.0GHz?

There's gonna be variation from running it several times, as well as what you have running in the background.  I usually have a zillion windows open doing all sorts of stuff. 

Quote from: EduardoS on October 12, 2006, 11:02:21 PM
Note 2: Why the output result is 0.31...e1 instead of 3.1...e0?

  floating point notation.  That's the format I picked.  For consistency with C I should have gone with the second format.

BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

Wistrik

Mark,

Click here for AMD's Athlon64 utilities. At the top of the list is a dual-core optimizer I don't use, but you might find handy. Further down look for AMD Athlon™ 64/FX Processor Driver for Windows XP and Windows Server 2003 Version (x86 and x64 exe) 1.3.2.16 ; this is the driver I use.

You might find other utilities by first going to the information page for a particular processor, then clicking on any drivers links near the bottom. I did that by selecting the AMD Athlon64 from the list of processors.

Mark_Larson

 thanks my electric bill has been bad lately.  Anything to cut back on power :)  This is actually my first AMD processor.  I swore never to by an Intel processor again unless they made major changes to how they do things after I got a P4. 

  I am looking at alternate methods to calculate PI that might eventually lead to a faster solution than using Chud's formula.  One of the formulas I am looking at is using Sin() to compue PI.  You can check out this webpage ( search for "sin")

http://numbers.computation.free.fr/Constants/Pi/iterativePi.html

  It's a series to compute PI that you can add any number of digits of precision of PI per iteration.  Chud does 14 bits of PI per iteration.  You can extend the sin algorithm to use 100 digits of precision per iteration or a million.  It's completly flexible.

The coefficients in front of the sin in the formula can be easily calculated using a formula, thus making it easy to extend the equation.


n = n-th digit.

numerator = 1;
for ( int i = 1; i <= n; i++)
numerator *= (2*i - 1);

denominator = (2*n + 1);
for ( int i = 1; i <= n; i++)
denominator *= (2*i);


example of using 3 digits per iteration, 5, 7, and 9

3 digits

a    = a + sin(a )
k+1    k       k


5 digits
                       1
a    = a + sin(a ) +  --- * sin^3(a )
k+1    k       k       6           k


7 digits

                       1                  3
a    = a + sin(a ) +  --- * sin^3(a ) +  --- * sin^5(a )
k+1    k       k       6           k     40           k



9 digits
                       1                  3                  5
a    = a + sin(a ) +  --- * sin^3(a ) +  --- * sin^5(a ) + ---- * sin^7(a )
k+1    k       k       6           k     40           k     112          k




you'll notice in the above algorithms you are really only calculating one sin per iteration, since all of them are using sin(a )
                                                                                                                                                                           k

You can calculate sin using the following formula

            n^3   n^5   n^7   n^9
sin n = n - --- + --- - --- + ---
             3!    5!    7!    9!




sin is an expensive operation that is why no one has done this method before.  But what if we do just ONE iteration and calculate one sin, and have enough sin's in the formula to calculate the number of digits of precision of PI we are looking for?

each additional sin in the formula above adds 2 digits of precision.  you start with 1 digit of precision.  so if you wanted to calculate 1048576 digits of PI you would have 524,288 sins in your formula
BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

Wistrik

I run a P4 at work and I'm not impressed. My home system is an order of magnitude faster (perceived performance) even though the clock speed is similar. It helps that I have little to no hard drive bottleneck in my home system (dual 250Gb SATA drives in RAID 0).

Yes, the trigonometric functions are spendy. In my old Commodore 64 I took a shortcut with SIN (in 6502 Assembly) by creating a table of precalculated values, so instead of feeding the angle into SIN, I used it as an index into the table. I seem to recall that SIN repeats every 90 degrees or so, like the sign bit flips or something. It's been awhile. You can use that to create a smaller table.

Edit: I notice the website author mentions accuracy with trig functions. I thought about that while writing the above. The problem is that high level languages only take their math functions out to so many digits of precision, so if you're striving for millions of digits, inaccuracy might sneak in while using the HLL functions. Or maybe I'm barking up the wrong tree...

Mark_Larson

Quote from: Wistrik on October 13, 2006, 03:05:19 PM
Yes, the trigonometric functions are spendy. In my old Commodore 64 I took a shortcut with SIN (in 6502 Assembly) by creating a table of precalculated values, so instead of feeding the angle into SIN, I used it as an index into the table. I seem to recall that SIN repeats every 90 degrees or so, like the sign bit flips or something. It's been awhile. You can use that to create a smaller table.

  The values for us passed to SIN will range  between 0 and PI, because you are passing the current computed value of PI into the SIN function.

Quote from: Wistrik on October 13, 2006, 03:05:19 PM
Edit: I notice the website author mentions accuracy with trig functions. I thought about that while writing the above. The problem is that high level languages only take their math functions out to so many digits of precision, so if you're striving for millions of digits, inaccuracy might sneak in while using the HLL functions. Or maybe I'm barking up the wrong tree...

  sorry I didn't go into more detail with the SIN function above.  It's actually an infinite series that you can run as many times as you need to get as many digits of precision you need.  So if we were computing PI to a million digits, we'd want a million digits of precision for the SIN.  To be able to do the above formula you have to be able to support large integer arithmetic ( which I already have for my PI code).  So I have routines that can do multiplications, divisions, subtract, addition, etc all on numbers that are millions of digits long.

http://mathworld.wolfram.com/images/equations/Sine/equation4.gif

BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

EduardoS

Hi Mark,
I try it on a K-7, it crashes after printing the time and "pi =", i guess it is due to the compiler-generated K-8 specific code (SSE2) you talk about.

A little off-topic: How long your math lib take to calculate multiply and square root of a 1 million decimals digits number?

Mark_Larson

Quote from: EduardoS on October 13, 2006, 10:18:15 PM
Hi Mark,
I try it on a K-7, it crashes after printing the time and "pi =", i guess it is due to the compiler-generated K-8 specific code (SSE2) you talk about.

   I'll get to it when I get a free moment.  I am going to make 2 programs chudk7 ( uses k7 compiler switch) and chudk8 ( uses k8 compiler switch) and time them on my K8, if there isn't a big performance difference, I am simply gonna use the k7 version.

Quote from: EduardoS on October 13, 2006, 10:18:15 PM
A little off-topic: How long your math lib take to calculate multiply and square root of a 1 million decimals digits number?

  I don't have the code in front of me, but I had done timings on a 1,048,576 digit number times a 1,048,576 digit number, and it was taking 8.59 milliseconds.

  I am looking at doing some stuff to speed it up.
BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

EduardoS

Quote from: Mark_Larson on October 16, 2006, 03:13:22 PM
  I don't have the code in front of me, but I had done timings on a 1,048,576 digit number times a 1,048,576 digit number, and it was taking 8.59 milliseconds.
That's very fast, with a also fast square root will blow up your chudk7.exe :bdg