The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: bruce1948 on August 17, 2009, 12:21:03 AM

Title: Multiprecision square root
Post by: bruce1948 on August 17, 2009, 12:21:03 AM
I'm currently working on a multiprecision unsigned integer package and find that the algorithm I'm using does not always give accurate results. 75% seem to be OK but about 25% are off base. The small ones (<= 96 bits are spot on) but as i go towards 512 bits some are ok and and some are not. Any one  know of a better way of calculating large unsigned square roots. Code follows:

; *********************************************************************
;
; Procedure: mpSqrt
;
; Function: Computes integer square root result = sqrt(x)
;           where result and x are multiprecision integers of ndigits
;           Uses xk1 = 1/2(xk + x / xk) and iterates until x/xk >= xk
;           leaving xk1 as the square root.
;
; Entry:    x   =   a pointer to the multiprecision number whose
;                   square root is to be taken
;
;           result  =   a pointer to the multiprecision number to
;                       hold the square root of x.
;
; Exit:     result  =   square root of x
;
; *********************************************************************
mpSqrt proc USES esi ebx ecx edx edi, result:unsigned_ptr, x:unsigned_ptr, ndigits:size_t       
        local xk:unsigned_ptr
        local xk1:unsigned_ptr
        local t:unsigned_ptr
        local tmp:unsigned_ptr
        ; *************************************************************
        ; Allocate memory for variables.
        ; Do not need to zero allocated memory as mpAlloc uses LMEM_ZEROINIT
        ; *************************************************************
        mov ebx, ndigits
        shl ebx, 2                                  ; need 4 * ndigits bytes for each local
        push ebx                                    ; save ebx
        shl ebx, 2                                  ; need 4 locals
        invoke mpAlloc, ebx
        pop ebx
        mov xk, eax                                 ; store the returned pointer
        add eax, ebx                                ; point at next local
        mov xk1, eax                                ; store the pointer
        add eax, ebx                                ; point at next local
        mov t, eax                                  ; store the pointer
        add eax, ebx                                ; point at next local
        mov tmp, eax                                ; store the pointer
        ; *************************************************************
        ; if x <= 1 copy x to result
        ; *************************************************************
        cld
        invoke mpShortCmp, x, 1, ndigits
        .if sdword ptr eax == -1 || eax == 0
            mov esi, x
            mov edi, result
            xor ecx, ecx                            ; loop counter
            .while ecx < ndigits
                movsd                               ; copy x to result
                inc ecx
            .endw
            jmp mpSqrtEnd
        .endif
        ; xk = x / 2
        invoke mpShiftRight, xk, x, 1, ndigits
        ; *************************************************************
        ; Set up complete - find the square root
        ; *************************************************************
        mov ebx, 1
        .while ebx == 1
            ; t = x / xk (tmp is used for the unused remainder)
            invoke mpDivide, t, tmp, x, ndigits, xk, ndigits
            ; xk1 = (xk + t)/2 (tmp holds the reult of the addition)
            invoke mpAdd, tmp, xk, t, ndigits
            invoke mpShiftRight, xk1, tmp, 1, ndigits
            ; if (t >= xk)  break
            invoke mpCompare, t, xk, ndigits
            .break .if sdword ptr eax >= 0
            ; Set xk = xk1
            mov esi, xk1
            mov edi, xk
            xor ecx, ecx                            ; loop counter
            .while ecx < ndigits
                movsd                               ; copy xk1 to xk
                inc ecx
            .endw
        .endw
        ; *************************************************************
        ; copy square root of x (xk1) to result
        ; *************************************************************
        mov esi, xk1
        mov edi, result
        xor ecx, ecx                            ; loop counter
        .while ecx < ndigits
            movsd                               ; copy xk1 to result
            inc ecx
        .endw
mpSqrtEnd:
        invoke mpFree, xk       
        ret
mpSqrt endp

Title: Re: Multiprecision square root
Post by: dedndave on August 17, 2009, 12:26:13 AM
long ago i wrote a simple one for 16-bit integers (i think it was - lol)
let me see if i can find it - it is not what you want, but you can get the concept
it is a loop - i more or less did what they taught us in grade-school - lol
Title: Re: Multiprecision square root
Post by: dedndave on August 17, 2009, 12:36:31 AM
ok - so it's a 24-bit routine - lol
as i said, it's a loop, so it's not very fast
but, the loop is all shifts and adds - it could be re-written to be much faster on a pentium

;-----------------------------------------------
;
SQINT   PROC    NEAR
;
;  This is a 24-bit integer squareroot routine.
; It calculates the squareoroot, then rounds
; it to the nearest integer.
;
; Call With: DL:AX = 24 bit integer (0-FF:FFFFh)
;
;   Returns: AX = rounded squareroot (0-1000h)
;            BX = DX = CH = 0
;            CF = clear
;            CL, BP, SI, DI invalid
;
        MOV     DH,DL
        MOV     DL,AH
        MOV     BH,AL
        MOV     AX,0C000h
        MOV     BL,AL
        MOV     CX,8
;
SQINT0: TEST    AX,DX
        JNZ     SQINT3
;
        ROR     AX,1
        ROR     AX,1
        LOOP    SQINT0
;
        MOV     CL,4
;
SQINT1: TEST    AH,BH
        JNZ     SQINT2
;
        SHR     AH,1
        SHR     AH,1
        LOOP    SQINT1
;
        RET
;
SQINT2: SUB     CL,4
;
SQINT3: SHL     CL,1
        ADD     CL,9
        XOR     AX,AX
        MOV     SI,AX
        MOV     DI,AX
        MOV     BP,AX
;
SQINT4: SHL     BX,1
        RCL     DX,1
        RCL     DI,1
        RCL     BP,1
        SHL     BX,1
        RCL     DX,1
        RCL     DI,1
        RCL     BP,1
        STC
        RCL     SI,1
        RCL     AX,1
        SUB     DI,SI
        SBB     BP,AX
        JNC     SQINT6
;
        ADD     DI,SI
        ADC     BP,AX
        DEC     SI
        LOOP    SQINT4
;
SQINT5: SHL     SI,1
        RCL     AX,1
        SHL     SI,1
        RCL     AX,1
        SHL     SI,1
        ADC     AX,CX
        RET
;
SQINT6: INC     SI
        LOOP    SQINT4
;
        JMP     SQINT5
;
SQINT   ENDP
;
;-----------------------------------------------
Title: Re: Multiprecision square root
Post by: bruce1948 on August 17, 2009, 12:43:12 AM
I shall study it and see if I can work anything out.

thanks

Bruce
Title: Re: Multiprecision square root
Post by: Rockoon on August 17, 2009, 08:40:50 AM
If you cook up a multiprecision fixed/float log() and exp() functions, the accurate calculation of sqrt's becomes trivial

sqrt(x) = exp(0.5 * log(x))

I dont know how performance will compare with newtons progressive approximation, but just the same having log() and exp() are quite usefull for bignum stuff, because bignum multiplication and division are expensive where log() and exp() can turn those into additions and subtractions
Title: Re: Multiprecision square root
Post by: raymond on August 17, 2009, 03:03:35 PM
If it's for inclusion in a bignum library, I have to assume that you wish to keep everything in binary.

In the MixLib package available at

http://www.ray.masmcode.com/fixmath.html

you would find the source code of two procedures for extracting the square root, one for 32-bit numbers and the other for 64-bit numbers. Based on the latter (sqrt64.asm), you should be able to expand the process to any size number, using memory instead of registers.

If you wanted to extract the square root of a number in decimal string format and return the result in decimal string format with up to 9999 decimal digits, you would find code in the BCDtut package available at

http://www.ray.masmcode.com/BCDtut.html
Title: Re: Multiprecision square root
Post by: dedndave on August 17, 2009, 04:49:14 PM
i was just looking at Ray's 64-bit squareroot routine
i like his much better than my old 24-bit (16-bit code) one - lol
it's much cleaner and easier to expand to 512 bits, i think
log and exp functions is a good idea, too
if i remember correctly, those routines could be used for sine and cosine functions, as well
Title: Re: Multiprecision square root
Post by: bruce1948 on August 17, 2009, 06:19:54 PM
Plenty to think about here.


Thanks all

Bruce
Title: Re: Multiprecision square root
Post by: FORTRANS on August 17, 2009, 09:25:04 PM
Thanks Raymond,

   I am putting together a set of fixed point routines, and
looking at your code had me review some of what I am doing.
Which now seem to have some design flaws.  You are using
a different algorithm for converting a fraction to ASCII than I
am using.  Do you have a quick pointer to a description?  I
like that you truncate properly.  I am printing some bogus
characters on the end.  I am varying precision and haven't
worked out the numbers yet.

   One of my test numbers is Pi with 100 fraction digits.


Debug large fixed point routines.
16:240
3.1415926535897932384626433832795028841971693993751058209749445923078164060462507223188
3.1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679

Debug large fixed point routines.
16:368
3.14159265358979323846264338327950288419716939937510582097494459230781640628620899862803482534211706789999999999838420518372309708
3.1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679

Debug large fixed point routines.

FixDiv algorithm.
Divisor   2.0
Dividend  3.14159265358979323846264338327950288419716939937510582097494459230781640628620899862803482534211706789999999999838420518372309708
Result    1.57079632679489661923132169163975144209858469968755291048747229615390820314310449931401741267105853394999999999836046981060995660


Regards,

Steve N.
Title: Re: Multiprecision square root
Post by: dedndave on August 17, 2009, 09:30:17 PM
i don't think that's got enough precision, Steve ! - lol
those digits may be the true evaluation of the binary representation
if you want to know how many decimal digits a binary format can support, change the lsb by one and observe the difference
Title: Re: Multiprecision square root
Post by: raymond on August 18, 2009, 03:19:44 AM
QuoteYou are using a different algorithm for converting a fraction to ASCII than I am using.

The algo I'm using is quite simple. In my Mixlib, I'm only carrying 16 bits for the fraction (which is equivalent to 5 decimal places). Those 16 bits xxxxh are thus a fraction of 10000h. The decimal fraction digits are thus
xxxxh * 100000 / 10000h
I then simply convert the result to ascii as for any other dword. A similar approach could be used for up to 31 fractional bits.

However, you are combining fixed point with BIG numbers which cannot use this simple algo. The only way I could see at present would be to use BCDs.

The number of significant decimal digits should be: number of fraction bits * 0.3

Start with a string of 0s
The most significant bit = 50000000........ with as many 0s as needed; add it if set
Divide by 2
The 2nd significant bit = 250000000.......; add it if set
Divide by 2
The 3rd significant bit = 125000000.......; add it if set
etc.
Convert all the BCD digits to ascii by adding 30h.

It's obviously slow but you only need it to display the occasional result.
Title: Re: Multiprecision square root
Post by: dedndave on August 18, 2009, 03:53:40 AM
QuoteThe number of significant decimal digits should be: number of fraction bits * 0.3
what is the math behind that formula Ray ?
is it an estimate ?
Title: Re: Multiprecision square root
Post by: raymond on August 18, 2009, 04:13:18 AM
log102 = 0.30103
Title: Re: Multiprecision square root
Post by: dedndave on August 18, 2009, 04:23:12 AM
lol
i knew that - didn't know it could be used to calc digits - very cool
you might consider adding that little tid-bit to your tute - pardon me if it is already in there - i missed it
we use that ratio often - if you double power, it is ~3 Db gain: 10 log(P2/P1)
it is ~6 Db gain for doubling voltage (or current): 20 log(E2/E1)
a Db is a decibel (or 10 bel's), named after Alexander Graham Bell, of course
Title: Re: Multiprecision square root
Post by: Eddy on August 18, 2009, 09:08:18 AM
Bruce,

(Referring to your first post)
Newtons method is an iterative method that always converges for floating point numbers, but for integer numbers, caution is necessary.
You must start with an 'initial value' that is guaranteed smaller than the root.
The following new calculated value will be LARGER than the root and will decrease with each iteration.
Iteration must stop when the new value starts to INCREASE (becomes larger than the previous value).
For maximum speed, the initial value is very important. The closer to the root, the faster.
Worst case, a non-optimal intial value could cause the algorithm to 'oscillate' and never converge to a solution, atleast for integers.

Kind regards
Title: Re: Multiprecision square root
Post by: Rockoon on August 18, 2009, 10:06:25 AM
Quote from: Eddy on August 18, 2009, 09:08:18 AM
Bruce,

(Referring to your first post)
Newtons method is an iterative method that always converges for floating point numbers, but for integer numbers, caution is necessary.
You must start with an 'initial value' that is guaranteed smaller than the root.
The following new calculated value will be LARGER than the root and will decrease with each iteration.
Iteration must stop when the new value starts to INCREASE (becomes larger than the previous value).
For maximum speed, the initial value is very important. The closer to the root, the faster.
Worst case, a non-optimal intial value could cause the algorithm to 'oscillate' and never converge to a solution, atleast for integers.

Kind regards

Yes and that leads to a good first approximation by simply taking half as many bits as the input value. If the input value is a 100 bit number, than the sqrt is a 50 bit number.
Title: Re: Multiprecision square root
Post by: Eddy on August 18, 2009, 11:03:39 AM
Quote from: Rockoon on August 18, 2009, 10:06:25 AM
... a good first approximation by simply taking half as many bits as the input value. If the input value is a 100 bit number, than the sqrt is a 50 bit number.
Correct!
And in your first approximation (50 bits in your example), if you simply set the most significant bit to one (or more accurate: the 50th bit !) and zeroes for all other bits, this will make sure that the first approximation is always smaller or equal than the exact root.
Title: Re: Multiprecision square root
Post by: dedndave on August 18, 2009, 01:01:15 PM
you can take the upper bits of a bignum and use the fpu to get very close
if you have a 512-bit value, use the upper 64 bits to create an extended real (removing the low-order 448 bits)
use 3FFFh for the sign and exponent
take the squareroot and pad 224 low order bits (0's) to the result (replacing the 448 bits you removed)

of course, if bit 511 is a zero, you must left-justify the value into the 64 bit mantissa
if you remove an odd number of bits from the bottom, bump the exponent/sign to 4000h
then pad the result with half of one less than as many zeros
you get the idea
the basis is that the squareroot of 4 is 2, of 16 is 4, of 64 is 8, and so on

in writing such a library, i would think you could use the fpu to find many such "partials"
certainly, it could easily be used for partial multiplication
it should be possible to use it for partial logs, as well
not sure about exponentiation - i'd have to give that one some thought - lol
for division, i think you are stuck with long division if you have a large divisor - i may be wrong on that one
if the divisor is 64 bits or less, you can cascade divisions
Ray may know some tricks to using the fpu for extending precision
Title: Re: Multiprecision square root
Post by: FORTRANS on August 18, 2009, 02:42:29 PM
Hi Raymond,

   Thanks for the explanation.  I'll look at your code again and
puzzle it out.  In the end we must be using variants of the
same algorithm.  I used BCD math in the fixed point conversion
program I posted in a different thread.  But that was not a
way I wanted to go with these longer fractions.  I was using
look-up tables, and that gets ridiculous.

Regards,

Steve N.
Title: Re: Multiprecision square root
Post by: bruce1948 on August 18, 2009, 07:05:20 PM
Quote
Correct!
And in your first approximation (50 bits in your example), if you simply set the most significant bit to one (or more accurate: the 50th bit !) and zeroes for all other bits, this will make sure that the first approximation is always smaller or equal than the exact root.

Thanks Eddy and Rockoon, I've come across a lot on how important the first estimation is, but nothing puts it that clearly :U


Bruce
Title: Re: Multiprecision square root
Post by: dsouza123 on August 23, 2009, 02:54:58 PM
I freshly recoded an unsigned integer square root routine from
http://www.masm32.com/board/index.php?topic=3334.msg26094
now with more documentation but the number conversion routines removed.
It has a fixed number of iterations, it is half the count of bits of the number.
It currently is in the form of a program with fixed 32 dword allocations for the number, square root, remainder.
At the end it displays a MessageBox with the number and square root in hex.

; Loop for each bit pair of the number from highest to lowest
;    root * 2
;    remainder * 4 + current pair
;    if root < remainder
;      remainder - ( root + 1)
;      root + 2
;    endif
; endLoop
; root \ 2
; if remainder == 0
;   root is from a perfect square  ie root*root == number
; else
;   the number doesn't have an integer square root

It worked with the test value but extensive testing with other values is yet to be done.
Title: Re: Multiprecision square root
Post by: dsouza123 on August 23, 2009, 04:35:01 PM
Bug squashed.
After testing with a few more numbers, the bug came to light.
It affected the high dword of the result, because it also,
along with all the other dwords in the result, need to be shr 1 bit.


past2rtdiv:
  ; mov [brt + edi*4 - 4], 0

past2rtdiv:
  shr [brt + edi*4 - 4], 1
Title: Re: Multiprecision square root
Post by: dsouza123 on August 23, 2009, 10:20:01 PM
Added hex string to binary routine (stores binary value in an array of dwords),
along with a 1024 bit unsigned integer number as a hex string to test.

Added routine to convert the 3 binary values to hex strings
that is displayed in the MessageBox.

The output agrees with the values from the old code, referenced earlier.
Title: Re: Multiprecision square root
Post by: dsouza123 on August 24, 2009, 02:56:42 AM
Found out the code doesn't handle a number only 1 dword in length.

Changing the code that works well for 2+ dwords, for the special case of 1 dword
wasn't a good tradeoff.

So put in code to check if the number is 1 dword,
if so it runs a small 32 bit square root routine instead.
Title: Re: Multiprecision square root
Post by: drizz on August 24, 2009, 10:15:36 PM
This algo from wikipedia is FAST
http://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Binary_.28base_2.29

use bsr, and, bts to construct the first approximation.

rearange the loop from:
while (one != 0) {
            if (op >= res + one) {
                op -= res + one;
                res = (res >> 1) + one;
            }
            else {
                res >>= 1;
            }
            one >>= 2;
}


to:while (one != 0) {
            tmp = res + one;           
            res >>= 1;
            if (op >= tmp) {
                op -= tmp
                res += one;
            }
            one >>= 2;
}