here is one of the projects I have been working on
this is not the final version, as I think I know a way to speed this up a bit
even still, it is fairly fast
on my p4 prescott, it is around 39,700 clock cycles for a 1024 bit unsigned integer (all 1's)
(that's about twice as fast as my previous divide-and-conquer version)
I want to thank Drizz for his help :U
this version will handle any practical length integer, signed or unsigned (mode selectable)
EDIT - version 1a - fixed negative value design mistake - see Oct 4, 2009 post for details
EDIT - added the base 1,000,000,000 versions (LLKF9)
NOTE - LLKF9_1 is the one you want - LLKF1a is retained for comaprison purposes only
EDIT - also in this thread, a Ling Long Kai Fang big integer exponental routine
post - http://www.masm32.com/board/index.php?topic=12363.msg100435#msg100435
d/l - http://www.masm32.com/board/index.php?action=dlattach;topic=12363.0;id=7070
i see people downloading - then no response - lol
that probably means 1 of 2 things:
1) they are so overwhelmed by the magnificance of my coding ability that they are lost for words
2) my code sux rox - lol
i am guessing the latter :'(
I saw this :
QuoteLing Long Kai Fang BigNum
and I shit brix
lol - now, that's funny
the math behind it was used by the Chinese as early as the 1st century B.C.
some guy played with it (around 1800 A.D.) and it got his name (William George Horner)
well - sure - back then, it wasn't "discovered" until a white guy played with it - lol
i figure it's about time we give the Chinese their props
btw - "ling long kai fang" translates to something like "polynomial ladder solution"
it's nothing to be afraid of
they aren't dirty words or anything - lol
Since I can't find anything on google/bing, to summarize this thing it just converts large numbers to an ascii string? is there a max size to the input number? also "39,700 clock" wtf! sounds like this things crunching the cure to aids. :'(
oh and nice work btw :U your comments,error checking, and clean code is refreshing.
lol - thanks E^Cube
well - you must understand - that is a 1024 bit integer - a 309 character string is returned
as for the max size - it is whatever you can fit in there - lol
it does not modify the original, so it does not need to make a copy of it
i have jammed integers over 64 Kb in there and it spits out what looks like the right stuff (150 Kb+ string)
there is also no penalty for negative numbers - should be about the same time
it does not use fpu/mmx/sse - so those registers are free
as for the reference to ling long kai fang - it is out there someplace
the polynomial identity has origins in a very old Chinese book.......
http://en.wikipedia.org/wiki/The_Nine_Chapters_on_the_Mathematical_Art
you might also look for a later Chinese mathematician, Qin Jiushao
he wrote a book in 1247 and used the equation to find roots of a 7th order polynomial
he is the one who called it "ling long kai fang" - which was not really a "title" for it - just a calling name
the guy was a dirty politician, but in his spare time, a great mathematician
my fave is still Archimedes, though - Archie and i go way back - lol - he was way ahead of his time
doesn't 8 bits = 1 byte? and 1024/8 = 128? or how did you get a 309 character string from a 1024 bit interger?
well - with a 4 byte integer, you can get a 9 byte decimal string (4,294,967,295)
from 64 bits (8 bytes), you can get 20
the decimal string is not as effiecient in terms of storage, because many of the bits are essentially unused
when converting integers, you can estimate the decimal string length by L = 2.414 B (L = decimal string bytes, B = binary integer bytes)
Quotewhen converting integers, you can estimate the decimal string length by L = 2.414 B (L = decimal string bytes, B = binary integer bytes)
And if anyone is wondering how Dave arrived at that estimate,
2.4 = 8*log
10(2) = 8*0.3
hiya Ray
i dunno - that formula doesn't yeild good results
it always leaves you short
1+sqr(2) ~ 2.414 seems to work better
8*log(2) ~ 2.408
year 1247:
.....
(x1+x2)2=x12+2x1x2+x22=x12+(2x1+x2)x2
.....
X4+4x1X3+6x12x2+4x13x=N-x14
.....
秦九韶称为"开连枝某乘方";而方程的奇欢幂系数都是零时,称为"开玲珑某乘方"。
.....
long long ago, hahahah
Doing the conversion for a large number of 32-bit integers from a good RNG I get (typically) a mean length of 9.741048 digits, or 2.435262 digits per byte.
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
include \masm32\include\masm32rt.inc
include \masm32\include\advapi32.inc
includelib \masm32\lib\advapi32.lib
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
.data
meanLength REAL8 ?
digitsPerByte REAL8 ?
counts dd 0, 0
buff db 20 dup(0)
.code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
;------------------------------------------------------------------------
; This proc collects random bytes generated by the cryptographic service
; provider (CSP), and returns a random 32-bit integer.
;
; Everything is in a single procedure to make automatic release of the
; CSP handle simple. The large buffer is a quick and dirty correction
; for the slowness of the CSP generator.
;------------------------------------------------------------------------
PROV_RSA_FULL EQU 1
CRYPT_VERIFYCONTEXT EQU 0F0000000h
.data
hProv dd 0
rand_buffer dd 10000 dup(0)
pNext dd 0
.code
csp_rand proc uses ebx
mov ebx, pNext
.IF ebx == 0
invoke CryptAcquireContext, ADDR hProv,
NULL,
NULL,
PROV_RSA_FULL,
CRYPT_VERIFYCONTEXT
.IF eax == 0
ret
.ENDIF
invoke CryptGenRandom, hProv, 10000*4, ADDR rand_buffer
.IF eax == 0
ret
.ENDIF
invoke CryptReleaseContext, hProv, 0
.ENDIF
mov eax, [rand_buffer+ebx]
add ebx, 4
.IF ebx > 9999*4
xor ebx, ebx
.ENDIF
mov pNext, ebx
ret
csp_rand endp
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
xor edi, edi
mov ebx, 1000000
.WHILE ebx
call csp_rand
invoke crt_sprintf, ADDR buff, cfm$("%u"), eax
add edi, eax
dec ebx
.ENDW
push edi
fild DWORD PTR [esp]
pop edi
fld8 1000000.0
fdiv
fst meanLength
fld8 4.0
fdiv
fstp digitsPerByte
invoke crt_printf, cfm$("mean length: %f\ndigits per byte: %f\n\n"),
meanLength,
digitsPerByte
inkey "Press any key to exit..."
exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start
@dedndave,
very nice, but you asm is hard to follow :P even with all the comments :P
did you test against "div by 10" routine to see how it performs in comparison to it?
@other,
seriously, N-bit binary number has Trunc(N*Log10(2))+1 decimal digits
lol UtillMasm - that is: 1247 = x(x(x(1) + 2) + 4) + 7, where x = 10
i was hoping you or Onan (Farabi) could give us a better translation of the phrase "ling long kai fang"
my Chinese is worse than my Masm - lol
Michael,
we are talking about the maximum length
no need to generate random numbers - just set all the bits to 1
Drizz,
after playing with the length formulas - Drizz's is perfecto - lol - i will use that in the future
i checked it against the length of a 65,536 byte integer, which is 157,287 decimal digits
in fact - i will use it to speed up the routine :U
i can precalculate the required buffer length and avoid checking each time it is lengthened
i just need to use enough precision to make it work with extremely long values (and make sure i include the minus sign - lol)
i had hoped the comments would be enough
but, i do have an outline of the operation that i will add on the next version
it steps through the process and details the number of bits required and maximum values along the way
the version i posted converts base 4,294,967,296 to base 100,000,000, then converts to base 10 in the ASCII loop
i am currently writing a version that converts base 4,294,967,296 to base 1,000,000,000, then to base 10
this will slow down the ASCII loop a little bit, but should speed up the ling long kai fang considerably
it will have an added advantage in that the output string will be left justified in the output buffer field
it will be even harder to follow the code - lol
i will try to explain it's operation thoroughly
i may put it in a seperate read_me file to avoid clutter in the proc
as for comparison to the divide-and-conquer method, i think divide-and-conquer might win out slightly over the current version
well - for shorter integers, anyways
except for negative signed values, where ling long kai fang will always win
it appears that the real advantage of ling long kai fang is more prominent with longer integers
the newer version should beat divide-and-conquer in all cases above 64 bits
before we can really compare, i want to spend time to make divide-and-conquer as fast as i can
i also want to make the routine equal in all other aspects - error checking, no fpu/mmx/sse instructions, etc
it won't take nearly as long because divide-and-conquer is simpler to write
i know Drizz and Jochen probably already have routines for this
but i am guessing they use the fpu or sse registers
this thing could run like a bullet if it were written with sse
i haven't learned sse yet, but i wanted to avoid it for this routine, anyway
a bignum program is likely to use those registers for other purposes (i.e. more time critical than string output)
Quoteseriously, N-bit binary number has Trunc(N*Log10(2))+1 decimal digits
It has a
maximum of Trunc(N*Log10(2))+1 decimal digits.
It has a
minimum of Trunc(N*Log10(2)) decimal digits.
QuoteMichael,
we are talking about the maximum length
no need to generate random numbers - just set all the bits to 1
I didn't see that stated anywhere. Testing the formula for the maximum length is trivial for small bit lengths, and the value that you specified is much closer to the mean length, without leading zeros and over the full range of values, than to the maximum, so I guessed that the goal was the mean length.
And my backup excuse :lol is that I did this after I woke up in the middle of the night and could not immediately go back to sleep, but it was too early to start in with the coffee.
Quote from: raymond on September 25, 2009, 03:56:39 PM
Quoteseriously, N-bit binary number has Trunc(N*Log10(2))+1 decimal digits
It has a maximum of Trunc(N*Log10(2))+1 decimal digits.
It has a minimum of Trunc(N*Log10(2)) decimal digits.
lol I wrote "(maximum)" at the end of sentence but deleted it before posting... ::)
Next time i'll remember to be precise :)
Michael - you don't need any "excuse" around here - lol
we are all half-crazy, afterall
besides, you have contributed much to our progress, so we can't really complain - lol
i have written a small program that grabs maximum allocatable process heap,
splits it into integer/buffer space, fills the integer with 1's and cranks
the integer is 611,607,020 bytes long
the output buffer space is 1,472,896,500 bytes (i made it a tad bigger than it needed to be - lol)
it is cranking on the decimal string, now
i will let you know how many decimal digits it makes (as soon as the dang thing finishes - lol)
...still cranking...
lol
i gave up on that one - lol
too ling long
i may play with some numbers and see what size returns in a practical length of time
here are some practical sizes...
integer length: 262,144 bytes
decimal string: 631,306 bytes
Drizz formula: 631,306 bytes
integer length: 524,288 bytes
decimal string: 1,262,612 bytes
Drizz formula: 1,262,612 bytes
1Mb takes more than 10 minutes to crank out a result
at least we know what sizes are practical - lol
my appologies Ray - i must have been punching the buttons wrong or something
your log formula seems to work ok
Drizz's little touch makes it perfect for calculating required buffer length
dialing in on it by the bit...
All 1 bits
4194296 Integer Bits 1262609 Decimal Digits
4194297 Integer Bits 1262610 Decimal Digits
4194298 Integer Bits 1262610 Decimal Digits
4194299 Integer Bits 1262610 Decimal Digits
4194300 Integer Bits 1262611 Decimal Digits
4194301 Integer Bits 1262611 Decimal Digits
4194302 Integer Bits 1262611 Decimal Digits
4194303 Integer Bits 1262612 Decimal Digits
4194304 Integer Bits 1262612 Decimal Digits
Single 1 bit
4194296 Integer Bits 1262609 Decimal Digits
4194297 Integer Bits 1262609 Decimal Digits
4194298 Integer Bits 1262610 Decimal Digits
4194299 Integer Bits 1262610 Decimal Digits
4194300 Integer Bits 1262610 Decimal Digits
4194301 Integer Bits 1262611 Decimal Digits
4194302 Integer Bits 1262611 Decimal Digits
4194303 Integer Bits 1262611 Decimal Digits
4194304 Integer Bits 1262612 Decimal Digits
Kai Fang == extraction of a root; evolution
Ling Long == a small, cabinet, smart way
it's too old,,,, this just a literal translation.
cool - that helps :U
what i found using the "traditional" Chinese online translators was that ling/long was a ladder(or lattice)/solution
"smart way" may be what he meant
"extraction of a root" is, i am sure what was meant in the original text
the equation may be used for that, as well
Quote from: dedndave on September 25, 2009, 02:34:46 PM
i checked it against the length of a 65,536 byte integer, which is 157,287 decimal digits
Shouldn't that be 157,827 decimal digits for a 65,536 byte unsigned integer with all bits == 1 ?
I mean either it's a typo or the routine gives wrong results? - I checked this with a routine of mine and my result is 157,827 decimal digits for a 64k integer
best regards
cobold
oops - was a typo :bg
ok guys
i got version 2 up and running
i need to do some clean-up and finish the documentation
looks like i shaved almost 3,000 clock cycles off the 1024-bit integer all 1's test :bg
and - the output is left justified in the buffer field
that ascii conversion loop was tricky - lol
i will post asap
uh oh
i was testing version 2 and found a design flaw
some negative values are displayed incorrectly
i went back to see if version 1 had the same flaw - and it does
i appologize if this has caused any inconvenience
i thought i had tested it thoroughly
the values that are displayed incorrectly were either not tested or escaped my attention
i have removed the version 1 d/l and will let you know what i find
as you know, it is extremely impractical to test all the possible values
ok i have found the troubles
to evaluate (signed) negative values, i invert the input values as they are read and add one to the accumulator
the problem was - i was adding one to the accumulator prior to evaluating the input value
that means - for input values larger than a dword, that 1 gets multiplied up
smaller values do not get multiplied
the fix is to add the 1 AFTER evaluation is complete (actually, during evaluation of the last input dword)
i will fix version 1 and post the d/l asap
again, i am sorry if this caused any problems for you
Dave
the mistake has been fixed
i have tested and verified a larger variety of values
the d/l is in the first post of the thread, and here...
http://www.masm32.com/board/index.php?action=dlattach;topic=12363.0;id=6725
Ray (and Drizz, too - lol)
i have been working out the details of this buffer size calculation
if the maximum decimal string length can be found by:
L = Trunc( 8 * B * Log10(2) ) + 1
or
L = Trunc( b * Log10(2) ) + 1
then it seems to make sense that the minimum decimal string length can be found by:
L = Trunc( (8 * B - 1) * Log10(2) ) + 1
or
L = Trunc( (b - 1) * Log10(2) ) + 1
where:
L = decimal string length
B = binary bytes
b = binary bits
the logic behind this is that the minimum for N bits is the same as the maximum for N-1 bits
a decimal rollover never occurs simultaneously with a binary rollover
for example, the decimal string length for, say, 01111111 is the same as for 10000000
Why not simply
L = Trunc( b * Log10(2) )
for the minimum.
Your other suggestion may be correct if you are trying to determine the exact number of digits. However, if you only need the information to allocate sufficient memory, that +/-1 seems relatively immaterial for big numbers.
i am not allocating the memory - the caller does that
in this case, i want to verify that they allocated sufficient buffer space to evaluate the input without overflow
making a calculation at the front end speeds things up considerably
because it eliminates the need to check for buffer overflow inside the math conversion loop
there is some additional testing at the very end of the routine during ASCII conversion
that cannot be avoided because N bits can yield a string with a length variation of 1 byte
even that test is simplified if i know in advance they may only be 1 byte short
but, it would be nice to reject a large integer early if the buffer is short
if someone wants to evaluate an extremely large integer, it can take a while to process
i am trying to come up with an equation that works over a wide range of input lengths
my equation may not be correct, either - lol - we all know how well my equations work out
i may write a little program to test the two equations against each other
and spit out the first length where they yield different results, then see which is valid
Quotein this case, i want to verify that they allocated sufficient buffer space to evaluate the input without overflow
Then use your formula for the
maximum to verify if sufficient memory has been allocated. You could also add 1 for a minus sign when handling negative numbers.
well - if they have a buffer large enough for the minimum of a certain sized integer, i have to try and evaluate it
yes - got the minus sign - lol
that was the easiest part, as i use an XOR mask to handle the signed integers
it is a dword: -1 for negatives, otherwise it's 0
all i have to do to add the length for the minus sign is subtract that XOR mask - no testing or branching :bg
i have the code for the length now - it is a little hairy because of register usage, but i got it ~35 clock cycles
got the fast scanner re-written - it is very fast, now
i just need a little time to finish the doc's for it
that will be LLKF version 2
i want to take some of the code and tricks from the new one, and apply them to version 1, as well
when i am done, we will have a divide-and-conquer, and 2 versions of LLKF to compare
i want to break them out and make graphs for each in 3 sections;
1) initialization, 2) math loop, 3) ASCII conversion loop
the divide-and-conquer combines the math loop with the ASCII conversion
otherwise, the ASCII conversion is the same as for the base 100,000,000 LLKF
so, i can subtract the ASCII time to estimate the math time
then, the graphs will show how they each perform over a range of input sizes
we can see if there is an advantage to processing 9-digit intermediates over 8-digit
it should be interesting to see how they compare
FINALLY !!!!
i got the base 1,000,000,000 ling long kai fang routines finished !!!! - lol
there are three routines:
signed
unsigned
signed/unsigned mode selectable
that way, you can pick the one that best suits your app
the unsigned version is slightly faster - the mode selectable version requires an additional parameter
you can find the attachment on the first post of the thread, or here...
http://www.masm32.com/board/index.php?action=dlattach;topic=12363.0;id=6926
Quote from: dedndave on November 19, 2009, 09:17:24 AM
or here...
http://www.masm32.com/board/index.php?action=dlattach;topic=12363.0;id=6925
Direct links to attachments work only on your own puter, Dave: "You are not allowed to access this section"
:thumbu
i changed the file Jochen - set the timestamps
try it again - notice the new number
Interesting. It works ::)
rolling your eyes at my code again ? - lol
i know - you are talking about the link :bg
My appologies to Hutch for mis-spelling his name
it will be corrected on the next update
oh - i forgot to mention
the new base 1,000,000,000 version is ~20% faster than the current base 100,000,000 version
that comparison is made using 1024-bit all 1's unsigned integers
however, that is a little misleading
i have learned a few new trix while writing the base 1,000,000,000 code
i need to apply some of those trix to the base 100,000,000 routine to make a fair comparison
working on it.......
i was having a little fun with bignums and ling long kai fang
a friend of mine (a C programmer) was interested in calculating the odds of creating the first DNA
he wanted to evaluate some rather large exponentials (his numbers - not mine - lol)
i wrote a special little ling long kai fang routine that converts from a variable base to do the unsigned integer exponentiation
then, i used the LLKF9 routine to evaluate the resulting integers
these values are calculated and evaluated to full precision
(i.e., the last one is evaluated to the full 821727 decimal places)
scientific notation is used to abbreviate the results for display
22^300 Binary Exponential Length: 168 Bytes
5.330945549698723250305294228225903087386839891140036229539747963875875010e+402
44^300 Binary Exponential Length: 205 Bytes
1.085932787261652233516845109331492005702511128627953792159861291779059911e+493
22^500000 Binary Exponential Length: 278715 Bytes
2.189833539391172228072796932353901294631929097129937858803130860608658e+671211
44^500000 Binary Exponential Length: 341215 Bytes
2.178929073473699914235546735939051662312488506450443207720800467312438e+821726
i will post the code after some clean-up
and here it is !
a freebie for you guys - an integer exponentiation routine
i haven't optimized it, although, i doubt i can squeeze a lot out of it
i did not bother adding the timing routines - it was just something fun to do
source code included...
update
found a little bug that arose only for exponent value = 4,294,967,295
improved the app notes
managed to squeeze a slight speed improvement out of the exponentiation loop
Hi dedndave:
Bignum Unsigned Integer Exponentiation Demo - DednDave - Dec 2009
These values are calculated and evaluated at full precision.
(i.e., 44^500000 is calculated to 821727 decimal places)
Scientific notation is used to abbreviate the results for display.
In this version, I am reasonably pleased with the performance of the
exponentiation routine (completion is indicated by display of the Exponential
Integer Size). I have not spent any time optimizing this routine, as yet.
The integer to decimal ASCII routine takes much longer to evaluate the
large integers. The last two examples generate some very long strings.
22^300 Exponential Integer Size: 168 Bytes
5.330945549698723250305294228225903087386839891140036229539747963875875010e+402
44^300 Exponential Integer Size: 205 Bytes
1.085932787261652233516845109331492005702511128627953792159861291779059911e+493
22^500000 Exponential Integer Size: 278715 Bytes
2.189833539391172228072796932353901294631929097129937858803130860608658e+671211
44^500000 Exponential Integer Size: 341215 Bytes
2.178929073473699914235546735939051662312488506450443207720800467312438e+821726
Press any key to continue ...
Regards herge
thanks, Herge - not a time test - just a toy to play with :P