News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Dear Math and FPU-Experts - help needed!

Started by cobold, August 16, 2008, 03:47:03 AM

Previous topic - Next topic

Mark_Larson

Quote from: raymond on August 26, 2008, 02:05:38 AM
codewarp mentionned to "think outside the box".

Although Fibonacci numbers may contain an infinite number of digits, the problem specifies to find the first one where the last 9 digits (least significant) are pandigital AND the first 9 digits (most significant) are also pandigital. Therefore, there is no necessity to know ALL the digits in between.

The last 9 digits (least significant) are easy to compute in numerous ways. It's evident that adding binary numbers would be the fastest. However, each number must also be checked for pandigitality. Converting a 9-digit binary number to a decimal string may thus be significantly slower than using the slower BCDs for the additions of at most 9 digits (unless someone knows how to test for pandigitality without converting to decimals).

Adding packed BCDs is also faster than adding the same number of digits using unpacked BCDs. However, unless a fast algo is developed to check for pandigitality of packed BCDs, the unpacking process may exceed the time savings of the additions.

As for extracting the first 9 decimal digits of Fibonacci numbers generated from binary additions, I am still waiting to see valid code which will not need to compute the entire number. In my original program, I retained only the most significant 18 digits. When the addition overflowed to 19 digits, I would check the value of the least significant of those digits (of both numbers) and use it to round the remaining digits before dropping those last digits. (N.B. For this specific problem, retaining 15 digits is the minimum if rounding is done properly.)

Finally, there is no need to check the pandigitality of both the first and last 9 digits of each Fibonacci number. If one of them is not pandigital, no use checking the other.

As for my data section,
- The tailsrc/taildes are the addresses of the source and destination buffers for the last 9 digits. The content of the source is added to the content of the destination, overwriting the latter. The addresses are interchanged before computing each Fibonacci number. (The same applies for the headscr/headdes variables related to the first 9 digits.)
- The headbuf/headbuf1 buffers are for computing the first 9 digits using up to 19 digits. One of them becomes the source, the other one becomes the destination. The tailbuf/tailbuf1 buffers are for computing the last 9 digits. (The extra declared bytes are mainly for alignment and/or simplified code.)
- fibnum is the Fibonacci number of the data currently in the destination buffers.
- dest0 is to hold the TickCount at the start of the program.
- bytecnt is to hold the digit count for the source buffer of the last 9 digits until it exceeds 9 (a copy of the digits is then made to the buffers for the first 9 digits), and then to hold the digit count for the source buffer of the first 9 digits until the end of the program.

The improved program in a few days!!!

*kicks you*  now I've been sitting here all morning trying to figure it out.  my numbers are backwards form yours.  My least significant is the front of the buffer.
BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

raymond

QuoteMy least significant is the front of the buffer.

For this type of application, I keep the digits in reverse order (i.e. little-endian or least significant at lower address). As indicated in my tutorial,
"The reverse order has the advantage that the number can be increased without the need to shift all the digits when a new most significant digit becomes necessary; it simply gets added in the next available byte."

However, for the most significant "9 digits", I still had to shift the 18 most significant digits after rounding when there was overflow to 19 digits. I didn't want to exceed the 24-byte buffers.
When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com

Mark_Larson

Quote from: raymond on August 26, 2008, 08:30:13 PM
QuoteMy least significant is the front of the buffer.

For this type of application, I keep the digits in reverse order (i.e. little-endian or least significant at lower address). As indicated in my tutorial,
"The reverse order has the advantage that the number can be increased without the need to shift all the digits when a new most significant digit becomes necessary; it simply gets added in the next available byte."

However, for the most significant "9 digits", I still had to shift the 18 most significant digits after rounding when there was overflow to 19 digits. I didn't want to exceed the 24-byte buffers.


I guess there was some confusion.  I also keep my least significant in the lowest.  I thought you were saying the reverse.  The number in my buffer increases to the RIGHT in the array.  The array starts at offset 0.  So offset 0 in the array is also where the head pointer is.  You were saying that was where your tail pointer was.  that is why my pointers in my loops increase.  Do yours decrease?

Mark
BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

dsouza123

Timings of FibB06_g.asm from an earlier post.

Using Process Explorer, XP, Athlon 1.2 Ghz @ 1.172

modifying the loop exit condition

from

      inc fib
   .until ((last9 == 1) && (first9 == 1))

k = 329468  0.140 seconds

to

      inc fib
   .until (fib == 5000000)

k = 5000000  2.022 seconds

fib is the variable that tracks the iteration in the fibonacci sequence aka fibonacci(k)

It uses extended precision floating point, real10, for both the first9 and last9

The last9 are kept to below 1 000 000 000.0, exported to packed BCD,
only 5 of the bytes are converted to unpacked BCD, 10 bytes
the first byte is given a value of 10 which evaluates to 0 on the lookup table,
tested for pandigital

I tried to keep the first9 to about 1.0e20 a routine brings it down to below 1.0e18
exported to packed BCD, using 9 bytes, converted to unpacked BCD, 18 bytes
tested for pandigital.

A possible optimization could be the last9 using a dword, loaded to real8,
exported to packed BCD, unpacked etc.

Using a dword to hold last9 would involve far less instructions
for both adding and keeping it under 1 billion.


   mov eax, yy
   add zz, eax



   mov eax, 1000 000 000
   .if zz >= eax
      sub zz, eax
   .endif


Load it into floating point, export to packed BCD

   fild  dword ptr zz
   fbstp pbcd

raymond

QuoteDo yours decrease?

tailsrc, taildes  headsrc and headdes all hold pointers to offset 0 of the respective buffers. The two "tail" pointers are interchanged after each Fibonacci number as well as the two "head" pointers. I used "tail" to refer to the last 9 digits (the tail end of Fibonacci numbers) and "head" to refer to the first 18 most significant digits.

(The registers loaded with those pointers obviously increase during the additions.)

Quotek = 329468  0.140 seconds

Sounds great. I'll be looking at your code shortly. Did not have time yet today.
When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com

drizz

Hi all!

I hope i'm not too late for the party  :P
I'm using Binet's formula for the first 9 digits, the last 9 are normally computed.
this test checks for K up to 999999999, i've verified only 2nd found result K=16999467 with Mathematica.

[attachment deleted by admin]
The truth cannot be learned ... it can only be recognized.

raymond

I ran a few tests to verify some of the assumptions and ideas.

First, I found it is slightly faster to add 5 packed decimals and unpack them as opposed to adding 9 unpacked decimals.

I also tested the relative speed of some of the FPU instructions. Running them under similar conditions provided the following timings for 1,000,000 iterations.

fadd  <1 ms
fmul  <1 ms
fdiv  15 ms
fbstp 265 ms (which would mean at least 80 ms for 325,000 iterations)

As expected, the fbstp is VERY slowwww. However, combined with fast additions on the FPU, it did prove to be faster than adding BCDs.

I added a timer (only to get a timing on my box) to dsouza123's latest algo _g which uses the FPU for both the first and last 9 digits and it ran in 170 ms on my old P4. I then rewrote it for better efficiency and my timing came down to 125 ms.

----------------------------

Fibonacci numbers are closely related to the "golden ratio", generally represented by the greek letter phi. That golden ratio is computed as follows:
phi = (1+sqrt(5))/2

Fibonacci numbers can then be computed accurately using Binet's formula (referred to by drizz) as follows:
F(k) = round(phi^k/sqrt(5))

Note: For those interested in learning more about Fibonacci numbers and the golden ratio, you may want to look at:
http://mathworld.wolfram.com/FibonacciNumber.html

The number of digits of Fibonacci numbers will exceed the capacity of signed 64 bits with F(93). The accuracy of the last digits would thereafter be lost and the equation is not useful for keeping track of the last 9 digits for larger numbers.

However, the equation can be used to compute the first 9 digits very accurately with the use of logs. This is how I could reduce the timing from 0.7 sec to 0.25 sec with my original algo. Inserting such a procedure for the first 9 digits with dsouza123's optimized algo using the FPU for the last 9 digits, the timing dropped from 125 ms to 110 ms.
That gain in speed is not enormous because the log and antilog FPU instructions are also slow ones, but slightly faster than additions requiring occasional adjustments to maintain the sums below 10^18.

A few more ms could possibly be cut with additinal tweaking but the major hurdle will always remain the fbstp instruction.
When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com

herge

 Hi drizz:

Re Fibfun.asm
It needs the files:

  \masm32\include\stdio.inc
  \masm32\include\asmlib.inc

Regards herge
// Herge born  Brussels, Belgium May 22, 1907
// Died March 3, 1983
// Cartoonist of Tintin and Snowy

Mark_Larson

Quote from: raymond on August 29, 2008, 01:51:15 AM
I ran a few tests to verify some of the assumptions and ideas.

First, I found it is slightly faster to add 5 packed decimals and unpack them as opposed to adding 9 unpacked decimals.

I also tested the relative speed of some of the FPU instructions. Running them under similar conditions provided the following timings for 1,000,000 iterations.

fadd  <1 ms
fmul  <1 ms
fdiv  15 ms
fbstp 265 ms (which would mean at least 80 ms for 325,000 iterations)

As expected, the fbstp is VERY slowwww. However, combined with fast additions on the FPU, it did prove to be faster than adding BCDs.

I added a timer (only to get a timing on my box) to dsouza123's latest algo _g which uses the FPU for both the first and last 9 digits and it ran in 170 ms on my old P4. I then rewrote it for better efficiency and my timing came down to 125 ms.

----------------------------

Fibonacci numbers are closely related to the "golden ratio", generally represented by the greek letter phi. That golden ratio is computed as follows:
phi = (1+sqrt(5))/2

Fibonacci numbers can then be computed accurately using Binet's formula (referred to by drizz) as follows:
F(k) = round(phi^k/sqrt(5))

Note: For those interested in learning more about Fibonacci numbers and the golden ratio, you may want to look at:
http://mathworld.wolfram.com/FibonacciNumber.html

The number of digits of Fibonacci numbers will exceed the capacity of signed 64 bits with F(93). The accuracy of the last digits would thereafter be lost and the equation is not useful for keeping track of the last 9 digits for larger numbers.

However, the equation can be used to compute the first 9 digits very accurately with the use of logs. This is how I could reduce the timing from 0.7 sec to 0.25 sec with my original algo. Inserting such a procedure for the first 9 digits with dsouza123's optimized algo using the FPU for the last 9 digits, the timing dropped from 125 ms to 110 ms.
That gain in speed is not enormous because the log and antilog FPU instructions are also slow ones, but slightly faster than additions requiring occasional adjustments to maintain the sums below 10^18.

A few more ms could possibly be cut with additinal tweaking but the major hurdle will always remain the fbstp instruction.


since you got an answer and it runs under 1 second, do you want to see if we can make it faster?
BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

drizz

Quote from: herge on August 31, 2008, 05:10:00 PM
Re Fibfun.asm
It needs the files:

  \masm32\include\stdio.inc
  \masm32\include\asmlib.inc
I'm using japheth's include files : http://www.japheth.de/Win32Inc.html
asmlib is just my lib renamed.

[attachment deleted by admin]
The truth cannot be learned ... it can only be recognized.

raymond

Quotethe timing dropped from 125 ms to 110 ms.
A few more ms could possibly be cut with additinal tweaking

Actually, quite a few more ms have been cut after having a look at drizz' algo. It's now down to 15 ms.

The major improvement was to minimize the use of the fbstp by first checking if those last 9 digits were a multiple of 9 (which is necessary for them to be pandigital). This probably bypassed the pandigitality test some 90% of the time.

Nice work drizz. :clap: :clap:
When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com

raymond

Quotesince you got an answer and it runs under 1 second, do you want to see if we can make it faster?

I'm certain that the 15 ms is not the ultimate lower limit.

However, I would not have taken a second look at my initial program if it had not been for that plea for help from cobold which started this thread. I must thus admit that I have learned a few more things in the process.

And I'm sure that cobold never expected to see such a low timing.
When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com

cobold

Quote from: raymond on September 01, 2008, 04:33:50 AM
However, I would not have taken a second look at my initial program if it had not been for that plea for help from cobold which started this thread. I must thus admit that I have learned a few more things in the process.
Frankly, I have learned a lot from this thread! So let me say 'thank you' to every contributor - especially to Raymond. It was a complete surprise to me that it's possible to solve the problem without calculating the whole numbers (which I did - thats one of the reasons why my program needs 3 minutes + to solve the enigma, the other one being my kind of testing for pandigital)

Quote from: raymond on September 01, 2008, 04:33:50 AM
And I'm sure that cobold never expected to see such a low timing.
Yes, you are absolutely right! I felt that it must be possible much faster than 3 minutes, but I NEVER expected a time under 1 second!
Results like this one are only possible if one has a good/deep knowlegde BOTH of the problem at hand AND his programming-language

V Coder

Hey, drizz, re Fibfun,

I downloaded and ran your program on a Core 2 Duo laptop. It gave a long list of
Fk Found! K=329468, First9="245681739", Last9="352786941", Ticks=31
Fk Found! K=16999467, First9="139428765", Last9="346192578", Ticks=390
Fk Found! K=64284481, First9="435912768", Last9="725943681", Ticks=1092
Fk Found! K=80516961, First9="213798456", Last9="234517986", Ticks=375
Fk Found! K=88730553, First9="346785291", Last9="513467298", Ticks=187
...
...
Fk Found! K=998229590, First9="123685497", Last9="639812745", Ticks=46

Fib test finished. Tested up to K=999999999


But your calculated approximations to the first digits are apparently wrong.

I carried my test to the same Fibonacci iteration limit but keeping at least 24 of the leading and and trailing digits. Actually, when 32 digits were reached, I purged 8 and continued the additions. In 98 seconds I got the list with both ends having pandigitals.

However, while K=329468, and K=16999467, match on both lists, many others did not.

** Commenced Fibonacci Series Generator at 02:34:23 on Nov 30, 2010
   Processing from initial number 1
   Pentium 4 Code.
   Both pandigital, Iteration = 329468  Digits = 68855
   Both pandigital, Iteration = 6496145  Digits = 1357614
   Both pandigital, Iteration = 6677673  Digits = 1395551
   Both pandigital, Iteration = 6811853  Digits = 1423593
   Both pandigital, Iteration = 16999467  Digits = 3552679
   Both pandigital, Iteration = 21552527  Digits = 4504212
   Both pandigital, Iteration = 25305770  Digits = 5288593
   Both pandigital, Iteration = 32318407  Digits = 6754148
   Both pandigital, Iteration = 50855937  Digits = 10628262
   Both pandigital, Iteration = 51855023  Digits = 10837059
   Both pandigital, Iteration = 65090237  Digits = 13603055
   Both pandigital, Iteration = 66974956  Digits = 13996938
   Both pandigital, Iteration = 79845893  Digits = 16686805
   Both pandigital, Iteration = 80516961  Digits = 16827050
   Both pandigital, Iteration = 88730553  Digits = 18543589
   Both pandigital, Iteration = 102043269  Digits = 21325782
   Both pandigital, Iteration = 127093394  Digits = 26560949
   Both pandigital, Iteration = 131820039  Digits = 27548759
   Both pandigital, Iteration = 134140880  Digits = 28033786
   Both pandigital, Iteration = 142201377  Digits = 29718330
   Both pandigital, Iteration = 147049461  Digits = 30731520
   Both pandigital, Iteration = 153131825  Digits = 32002659
   Both pandigital, Iteration = 160510990  Digits = 33544813
   Both pandigital, Iteration = 161078182  Digits = 33663349
   Both pandigital, Iteration = 185914377  Digits = 38853807
   Both pandigital, Iteration = 202034202  Digits = 42222651
   Both pandigital, Iteration = 203177632  Digits = 42461614
   Both pandigital, Iteration = 204979604  Digits = 42838204
   Both pandigital, Iteration = 221509963  Digits = 46292845
   Both pandigital, Iteration = 233026833  Digits = 48699728
   Both pandigital, Iteration = 258238420  Digits = 53968638
   Both pandigital, Iteration = 264978993  Digits = 55377335
   Both pandigital, Iteration = 267543737  Digits = 55913334
   Both pandigital, Iteration = 281186547  Digits = 58764513
   Both pandigital, Iteration = 283112441  Digits = 59167001
   Both pandigital, Iteration = 294180323  Digits = 61480052
   Both pandigital, Iteration = 317000777  Digits = 66249244
   Both pandigital, Iteration = 323453494  Digits = 67597783
   Both pandigital, Iteration = 325565937  Digits = 68039257
   Both pandigital, Iteration = 331377167  Digits = 69253732
   Both pandigital, Iteration = 332796942  Digits = 69550448
   Both pandigital, Iteration = 345408479  Digits = 72186103
   Both pandigital, Iteration = 348947969  Digits = 72925813
   Both pandigital, Iteration = 376259511  Digits = 78633587
   Both pandigital, Iteration = 391170038  Digits = 81749703
   Both pandigital, Iteration = 392363473  Digits = 81999116
   Both pandigital, Iteration = 397976539  Digits = 83172178
   Both pandigital, Iteration = 400745248  Digits = 83750804
   Both pandigital, Iteration = 401140543  Digits = 83833416
   Both pandigital, Iteration = 416071912  Digits = 86953887
   Both pandigital, Iteration = 421177833  Digits = 88020962
   Both pandigital, Iteration = 421872036  Digits = 88166041
   Both pandigital, Iteration = 423998878  Digits = 88610525
   Both pandigital, Iteration = 449452236  Digits = 93929962
   Both pandigital, Iteration = 457310669  Digits = 95572278
   Both pandigital, Iteration = 465035461  Digits = 97186664
   Both pandigital, Iteration = 487523647  Digits = 101886417
   Both pandigital, Iteration = 501660400  Digits = 104840823
   Both pandigital, Iteration = 502463411  Digits = 105008643
   Both pandigital, Iteration = 502732956  Digits = 105064974
   Both pandigital, Iteration = 504642412  Digits = 105464027
   Both pandigital, Iteration = 505274211  Digits = 105596065
   Both pandigital, Iteration = 510400594  Digits = 106667416
   Both pandigital, Iteration = 512627526  Digits = 107132817
   Both pandigital, Iteration = 523493170  Digits = 109403602
   Both pandigital, Iteration = 527291459  Digits = 110197398
   Both pandigital, Iteration = 531304100  Digits = 111035990
   Both pandigital, Iteration = 536057785  Digits = 112029452
   Both pandigital, Iteration = 537641890  Digits = 112360510
   Both pandigital, Iteration = 538171243  Digits = 112471138
   Both pandigital, Iteration = 541903075  Digits = 113251045
   Both pandigital, Iteration = 542170973  Digits = 113307032
   Both pandigital, Iteration = 546023226  Digits = 114112106
   Both pandigital, Iteration = 546389584  Digits = 114188670
   Both pandigital, Iteration = 552888752  Digits = 115546916
   Both pandigital, Iteration = 556209121  Digits = 116240832
   Both pandigital, Iteration = 572442708  Digits = 119633451
   Both pandigital, Iteration = 595109376  Digits = 124370504
   Both pandigital, Iteration = 599363255  Digits = 125259512
   Both pandigital, Iteration = 600612514  Digits = 125520592
   Both pandigital, Iteration = 619346653  Digits = 129435796
   Both pandigital, Iteration = 620917234  Digits = 129764028
   Both pandigital, Iteration = 634889226  Digits = 132684001
   Both pandigital, Iteration = 646307092  Digits = 135070194
   Both pandigital, Iteration = 649357364  Digits = 135707663
   Both pandigital, Iteration = 654145054  Digits = 136708231
   Both pandigital, Iteration = 660169572  Digits = 137967281
   Both pandigital, Iteration = 668508682  Digits = 139710052
   Both pandigital, Iteration = 675187349  Digits = 141105811
   Both pandigital, Iteration = 686294555  Digits = 143427080
   Both pandigital, Iteration = 687974843  Digits = 143778239
   Both pandigital, Iteration = 688613767  Digits = 143911766
   Both pandigital, Iteration = 690226007  Digits = 144248705
   Both pandigital, Iteration = 691159696  Digits = 144443834
   Both pandigital, Iteration = 702889134  Digits = 146895142
   Both pandigital, Iteration = 712239048  Digits = 148849158
   Both pandigital, Iteration = 713428773  Digits = 149097796
   Both pandigital, Iteration = 714086028  Digits = 149235154
   Both pandigital, Iteration = 723902815  Digits = 151286741
   Both pandigital, Iteration = 724367929  Digits = 151383944
   Both pandigital, Iteration = 726913161  Digits = 151915866
   Both pandigital, Iteration = 730846154  Digits = 152737813
   Both pandigital, Iteration = 735954354  Digits = 153805364
   Both pandigital, Iteration = 742643442  Digits = 155203301
   Both pandigital, Iteration = 750777461  Digits = 156903210
   Both pandigital, Iteration = 793050502  Digits = 165737753
   Both pandigital, Iteration = 798377165  Digits = 166850960
   Both pandigital, Iteration = 802828360  Digits = 167781205
   Both pandigital, Iteration = 803412647  Digits = 167903313
   Both pandigital, Iteration = 806892715  Digits = 168630605
   Both pandigital, Iteration = 813475015  Digits = 170006224
   Both pandigital, Iteration = 814245907  Digits = 170167331
   Both pandigital, Iteration = 840802238  Digits = 175717276
   Both pandigital, Iteration = 844754680  Digits = 176543287
   Both pandigital, Iteration = 850849258  Digits = 177816979
   Both pandigital, Iteration = 854085201  Digits = 178493251
   Both pandigital, Iteration = 859005575  Digits = 179521548
   Both pandigital, Iteration = 867961167  Digits = 181393156
   Both pandigital, Iteration = 874522811  Digits = 182764459
   Both pandigital, Iteration = 877317215  Digits = 183348455
   Both pandigital, Iteration = 882734391  Digits = 184480577
   Both pandigital, Iteration = 883261779  Digits = 184590795
   Both pandigital, Iteration = 904260463  Digits = 188979260
   Both pandigital, Iteration = 930442422  Digits = 194450966
   Both pandigital, Iteration = 931154576  Digits = 194599798
   Both pandigital, Iteration = 932590746  Digits = 194899939
   Both pandigital, Iteration = 955682651  Digits = 199725862
   Both pandigital, Iteration = 956363488  Digits = 199868149
   Both pandigital, Iteration = 957471700  Digits = 200099751
   Both pandigital, Iteration = 958707201  Digits = 200357956
   Both pandigital, Iteration = 975352084  Digits = 203836531
   Both pandigital, Iteration = 978302792  Digits = 204453192
   Both pandigital, Iteration = 984129746  Digits = 205670953
   Both pandigital, Iteration = 989182420  Digits = 206726900
   Both pandigital, Iteration = 993812068  Digits = 207694439


You will notice three others between your first and second: f6496145 with 1357614 digits, f6677673 with 1395551 digits and f6811853 with 1423593 digits. And there are several other disagreements.

So, I ran my own big number BCD addition code, which for accuracy and precision did not truncate the digits. It took 50 minutes to get to f6811853.

The least significant digits of f6496145 (starting with the least signficant) are 462153978, and the most significant digits are 197283645 (again starting with the least significant).

So, the calculated approximation to the first digits is very fast, but wrong for very large fibonacci series numbers.

drizz

Yes you are right there was an error. If the fraction was 0.0xxxxx it tested 8 digits instead of 9. The crux of the computation is unchanged. 11 seconds for all results.

.686
.model flat,stdcall
option casemap:none
include windows.inc
include stdio.inc
include asmlib.inc

.code

IsPandigital9 proc c n
LOCAL AllDigits
mov ecx,n
mov edx,3817748708;MagicNumber for div 9
mov eax,ecx
mul edx
shr edx,3
mov eax,ecx
lea edx,[edx*2+edx];*3
xor ecx,ecx
lea edx,[edx*2+edx];*3
mov AllDigits,ecx
.if eax == edx && eax >= 123456789; passed divisibility test & smallest test
rept 9
mov ecx,eax
mov edx,3435973837;MagicNumber for div 10
mul edx
shr edx,3
lea eax,[edx*4+edx]
add eax,eax
sub ecx,eax
mov eax,edx
mov edx,1
shl edx,cl
or AllDigits,edx
endm
.if AllDigits == 1111111110b;; restricted pandigital test passed
return 1
.endif
.endif
return 0
IsPandigital9 endp


PanFib proc uses esi edi ebx maxk
LOCAL K:DWORD,T1:DWORD
LOCAL syscw:word,mycw_trunc:word, mycw_roundup:word

.data
ALIGN 16
LOG10PHI REAL10 0.20898764024997873376927208924;Log10((Sqrt[5] + 1)/2)
ALIGN 16
RECSQRT5 REAL10 0.44721359549995793928183473375;1/Sqrt[5]
ALIGN 16
LOG105O2 REAL10 0.34948500216800940239313055264
.code

FNINIT


FNSTCW syscw
xor ebx,ebx
xor esi,esi
lea edi,[ebx+1]
xor eax,eax
inc ebx
mov ax,syscw
or ax,110000000000b;; BITS 10 11 Trunc
mov mycw_trunc,ax

and ax,not 110000000000b
or ax,     100000000000b
mov mycw_roundup,ax


FLD LOG10PHI
FLD st

.repeat
mov K,ebx
; Check Last 9 Digits
lea ecx,[edi+esi]
mov edi,esi
cmp ecx,1000000000
sbb esi,esi
xor esi,-1
and esi,-1000000000
add esi,ecx
invoke IsPandigital9,esi
.if eax

; Check First 9 Digits

;;
;; Binet's Formula
;;

;for small k the result is incorrect
FILD K
FMUL
FLD ST

FLD ST
FLD LOG105O2
FSUB

FLDCW mycw_roundup
FRNDINT
FLDCW syscw
FISTP T1

FLDCW mycw_trunc
FRNDINT
FLDCW syscw
FSUB; Get fractional part

; Perform ( 10^FractionalPart/Sqrt(5) )
fldl2t
fmul
fld  st
FLDCW mycw_trunc
FRNDINT;round it to an integer
FLDCW syscw
fsub st(1),st ; fraction
fxch st(1)
f2xm1
fld1
fadd
fscale
fstp st(1)

;; Multiply By 1/SQRT(5)
FLD RECSQRT5
        FMUL
.data
C_10 tbyte 10.0
C_1E9M1 tbyte 99999999.0
.code
@@:
FLD C_1E9M1
FLD st(1)
FCOMPP
FNSTSW ax
SAHF
JA @F
FLD C_10; 10.0
FMUL
jmp @B
@@:
        PUSH EAX
FLDCW mycw_trunc
FISTP dword ptr [esp]
FLDCW syscw
; Check for Pandigitality
call IsPandigital9;,eax
POP EDX
.if eax
invoke printf,T('Fk Found! K=%u, First9="%u", Last9="%u", Digs="%u"',13,10),K,edx,esi,T1
.endif
FLD ST

.endif
inc ebx
.until ebx >= maxk
FSTP ST
ret

PanFib endp


main proc C uses esi edi ebx argc:dword,argv:ptr dword

invoke InitConsole
invoke TextColor,ccWhite
invoke SetConsoleTitle,T('Fib test')
invoke GetTickCount
push eax
mov ebx,999999999
invoke PanFib,ebx
invoke GetTickCount
pop edx
sub eax,edx
invoke printf,T(13,10,'Fib test finished. Tested up to K=%u for %u ms'),ebx,eax

invoke getchar
return ERROR_SUCCESS

main endp

end
The truth cannot be learned ... it can only be recognized.