The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: bitRAKE on January 17, 2005, 11:50:59 PM

Title: Factor WITHOUT division / multiplication:
Post by: bitRAKE on January 17, 2005, 11:50:59 PM
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE


Sqrt PROC uint32:DWORD
xor ecx, ecx
xor eax, eax
bsr edx, [esp][4]
and edx, -2 ; even
bts ecx, edx

s0: add eax, ecx
sub [esp][4], eax
jnc s1
add [esp][4], eax
sub eax, ecx
jmp s2

s1: add eax, ecx
s2: shr eax, 1
shr ecx, 2
jne s0
sX:
retn 4
Sqrt ENDP


Factor PROC num:DWORD

invoke Sqrt, [esp][4] ; num

xor edx, edx
test eax, 1
sete dl
sub eax, edx

mov ecx, eax
mul eax
lea edx, [ecx*2]
mov ecx, [esp][4]
sub ecx, eax
mov eax, edx

_0: cmp ecx, eax
jnc _1
sub eax, 4
add ecx, edx
_1: add edx, 4
sub ecx, eax
jne _0

shr eax, 1
shr edx, 1
retn 4
Factor ENDP
The above algorithm only finds two factors of a number (EAX and EDX on return). If EAX is equal to one then EDX is prime, else call Factor recursively on EAX and EDX. Yes, in general it is faster than trial division and does not need an array of primes - although a prime array would help end recursion early.

Factor does not work for even numbers.

I would appreciate credit being given for use.
Title: Re: Factor WITHOUT division / multiplication:
Post by: hutch-- on January 18, 2005, 12:37:51 AM
Rickey,

This stuff looks good but remember I count with my fingers so just a little documentation here would help. The code says no stack frame on either but I would like to see a bit more data on what they both do.
Title: Re: Factor WITHOUT division / multiplication:
Post by: bitRAKE on January 18, 2005, 04:01:38 AM
The square root algorithm is an optimized translation of the itterative method outlined in this (http://venus.elfak.ni.ac.yu/projects/IMPEG/DigitalSystem.pdf) document. It is very similar to what one might do with pencil and paper. It has been described several times but no one has given the code for it. I have verified the output for all 32-bit numbers except zero is undefined due to the nature of BSR. :'( Maybe, someone can find an optimal fix for it - I would be most grateful.

Sqrt is passed an unsigned 32-bit value, and returns the integer portion of the square root of that number in EAX. Factor is passed an odd composite number, and returns two numbers that multiply together to equal that number in EAX and EDX (EAX <= EDX). The standard windows calling convention is used in this instance.

There is an ever increasing volume of algorithms on my web site.  When I get board creating the algorithms then I will put some work into describing their use and derivation in great detail. Until then it would be best to read the code - the instructions are few and the practice is good.
Title: Re: Factor WITHOUT division / multiplication:
Post by: raymond on January 18, 2005, 05:04:07 AM
Quoteexcept zero is undefined due to the nature of BSR.  Maybe, someone can find an optimal fix for it - I would be most grateful

You could simply modify the start of the proc as follows (using your own syntax):

   xor  eax,eax
   cmp  [esp][4],0
   jz   sX
   xor  ecx,ecx
   .....


hutch

bitRAKE's algo for the sqrt of a 32-bit integer could also be considered as an optimization of the sqrt32 proc in the MixLib library issued a few years ago. The primary optimization is the setting of the MSB with the bsr instruction, improving the speed for the smaller input numbers. The computing loops are similar.

One difference is that bitRAKE's proc returns the truncated square root while sqrt32 returns the square root rounded to the nearest integer.

Raymond


Title: Re: Factor WITHOUT division / multiplication:
Post by: bitRAKE on January 18, 2005, 05:53:36 AM
Thanks, Raymond but I was hoping for something more optimal and less obvious - I try to not pass zero currently. Is MixLib still distributed with MASM32 as I can only find two FPU Sqrt routines in MASM32.
Title: Re: Factor WITHOUT division / multiplication:
Post by: dioxin on January 18, 2005, 02:23:51 PM
Bitrake,
   an optimised SQRT? Have you tried comparing your method to this:

push number
fild dword ptr [esp]
fsqrt
fistp dword ptr [esp]
pop answer


It runs a lot quicker than yours on my machine.

Paul.
Title: Re: Factor WITHOUT division / multiplication:
Post by: raymond on January 18, 2005, 05:58:33 PM
bitRAKE

MixLib is not distributed with MASM32. However, it has been available on hutch's site as the last item of his MASM Source Code section:

http://www.movsd.com/source.htm

Kind of quaint that I was reviewing the code of that library yesterday, before I even saw your post, to remove redundant instructions. The two sqrt functions (32 and 64 bits) were my primary targets.

Raymond
Title: Re: Factor WITHOUT division / multiplication:
Post by: raymond on January 18, 2005, 06:32:58 PM
dioxin

I agree with you that the FPU route is faster, at least for the upper range of values. However, the CPU route should not be neglected when you could do it in parallel when the FPU may be busy doing other computations.

Raymond
Title: Re: Factor WITHOUT division / multiplication:
Post by: Randall Hyde on January 18, 2005, 09:48:32 PM
Quote from: bitRAKE on January 18, 2005, 04:01:38 AM
The square root algorithm is an optimized translation of the itterative method outlined in this (http://venus.elfak.ni.ac.yu/projects/IMPEG/DigitalSystem.pdf) document. It is very similar to what one might do with pencil and paper. It has been described several times but no one has given the code for it. I have verified the output for all 32-bit numbers except zero is undefined due to the nature of BSR. :'( Maybe, someone can find an optimal fix for it - I would be most grateful.
What's wrong with testing the zero flag after executing BSR?
Cheers,
Randy Hyde
Title: Re: Factor WITHOUT division / multiplication:
Post by: bitRAKE on January 19, 2005, 12:35:18 AM
Quote from: Randall Hyde on January 18, 2005, 09:48:32 PM
What's wrong with testing the zero flag after executing BSR?
Thanks, that is exactly what is needed.
Title: Re: Factor WITHOUT division / multiplication:
Post by: kflorek on April 22, 2005, 02:32:42 PM
Quote from: hutch-- on January 18, 2005, 12:37:51 AM
Rickey,

This stuff looks good but remember I count with my fingers so just a little documentation here would help.

Yeah, I couldn't follow it, minus comments. Sure, I can follow the instructions, but not how and why it works. It is very compactly and neatly done, therefore incomprehensible. So here is the factoring code with my own comments:

; For odd x, finds odd y and z such that y*z = x
; Since x = x*1, it will always succeed. The question
; is if it suceeds before getting down to the factor 1.
;
; You don't need to mulltiply (y+2) times z  to get the next trial product. You
; can calculate (y+2)*z by adding 2*z to y*z, eliminating multiplication in the main loop.

Factor PROC num:DWORD

   invoke Sqrt, [esp][4] ; num
      ;y = eax = the integer square root of x
      ;which is the largest possible smaller factor
   xor edx, edx
   test eax, 1   ;is it odd?
   sete dl      ;dl=1 if even
   sub eax, edx   ;y if odd, else y - 1
      ;eax = the largest odd y such that y^2 <= x

   mov ecx, eax   ;y
   mul eax      ;y^2 = largest odd square < x
   lea edx, [ecx*2]   ;2*y
   mov ecx, [esp][4]   ;x
   sub ecx, eax   ;x - y^2
   mov eax, edx   ;2*z
      ;jz somewhere, to catch squares

;+loop
   ; y is the larger factor, z is the smaller
   ; that is y >= z
   ; at the start, z is the largest possible smaller factor
   ; and y and z are as close as possible,
   ; that is y = z
   ;ecx accumulates x - y*z
   ;eax accumulates twice z
   ;edx accumulates twice y
   ; increase y as far as possible without y*z going over x
   ; if y*z = x, exit.
   ; when the next value of y would make y*z go over x, decease z
   ; before increasing y
   ; and if y*z = z, exit
   ; (y+2)*(z-2) = y*z - 2*(y - z) - 4
   ; since y-z is >= 0 or greater, this  is <= y*z

_0:   cmp ecx, eax   ;is x - y*z >= 2*z, that is x >= (y+2)*z ?
   jnc _1      ;if yes, y can go higher without x - y*z going negative.
      ;if no, x <> (y+2)*z and y can't go higher, so decrease z by 2
      ;
   sub eax, 4   ;2 * (z-2), adjust to lower odd z
   add ecx, edx   ;x - y*z + 2*y = x - y*(z-2)

_1:
      ;either way, y up 2
   add edx, 4   ;2*(y+2)

   sub ecx, eax   ;x - (y+2)*z
         ;or x -(y+2)*(z-2)= x - y*z + 2*y - 2*z + 4
; But is there an x,y,z where
;  x < (y+2)*z , x > y*z
; and x < (y+2)*(z-2) = y*z - 2*(y - z) - 4
; no, if x >= y*z and y-z >= 0

   jne _0
;-loop

   shr eax, 1   ;smaller factor
   shr edx, 1   ;larger factor
   retn 4
Factor ENDP

The comments seemed clear enough when I wrote them. As usual, now that I look at it after a week or two,
it looks vague.

Instead of doing trial division, this algorithm tries products, always keeping trial y*z <= x, and y >= z. It starts with a pair of factors the smaller of which is the largest possible smaller factor. Instead of generating successive trial products by multiplication, it generates them by adding the difference. If we want to try (y+2) times z after we already know y*z, add 2*z to y*z, since  (y+2)* z = y*z + 2*z.

At an alternative branch, where both y and z need to be changed, (y+2)* (z-2)  is calculate by adding the difference, since (y+2)* (z-2)  = y*z + 2*z -2*y - 4.  Notice that  2*z -2*y - 4 is negative when z <= y. This is important because the loop needs the trial products always <= x.

After deciphering, it became clear that the code misses the equal factors of square numbers. And if x is the square of a prime, the code thinks x is a prime.  For instance 3*3 = 9 is missed. Not hard to fix.

It is also clear that the branch where y goes up 2 and z down 2 can never produce equality to x. That's because y*z is already < x  and  (y+2)* (z-2)  is < y*z.

It is interesting that the accumulated numbers are all even. Therefore you could get rid of the factors of 2 by revising the code. Although interesting, there does not seem to be any apparent advantage to it.

It looks like where the loop slows down is where z gets small, but to cut down on wasted passes would probably introduce some division.

If anybody is interested in this but me, I'll try to make the still obscure spots clearer.
Title: Re: Factor WITHOUT division / multiplication:
Post by: roticv on April 22, 2005, 03:55:53 PM
Hello,

It is based on Fermat's factoring which makes use of the fact that (a + b)(a - b) = a^2 - b^2. What he is doing is searching for values of a and b such that a^2 - b^2 = the number he wants to factor. This type of factoring is good for factors that are close to each other for example 79 * 83.
Title: Re: Factor WITHOUT division / multiplication:
Post by: AeroASM on April 22, 2005, 06:02:47 PM
Quote from: dioxin on January 18, 2005, 02:23:51 PM
Bitrake,
   an optimised SQRT? Have you tried comparing your method to this:

push number
fild dword ptr [esp]
fsqrt
fistp dword ptr [esp]
pop answer


It runs a lot quicker than yours on my machine.

Probably because you don't have the fwait which you need

Paul.
Title: Re: Factor WITHOUT division / multiplication:
Post by: kflorek on April 22, 2005, 10:01:43 PM
Quote from: roticv on April 22, 2005, 03:55:53 PM
Hello,

It is based on Fermat's factoring which makes use of the fact that (a + b)(a - b) = a^2 - b^2. What he is doing is searching for values of a and b such that a^2 - b^2 = the number he wants to factor. This type of factoring is good for factors that are close to each other for example 79 * 83.

I was going to mention that I initially analyzed the code assuming that's what it was based on (except my post was already long.) The initial multiplcation, which is a squaring, seemed to indicate that. The algebra works out OK that way, except that writing it all out leads to rather more involved expressions then what I presented. If Fermat factoring were the idea, then you would expect to calculate a+b and a-b at the end. And, strangely, nowhere are a+b or a-b calculated. Then I noticed all the expressions looked simpler expressed with the actual factors instead of a+b and a-b.

I don't know if the author started with Fermat factoring, but the only thing remaining in the code in common with it is that it starts with factors that are near the square root and works further away. The idea in this algorithm could be implemented the other way around, starting with a small factor. You really don't need Fermat's idea at all.

I've seen Fermat factoring mentioned on a few sites, but I never saw even a half way decent codiing of it. Apparently, the interest is historical, rather than practical. It is just going to be too slow compared to modern ideas.

Here's how the present algorithm proceeds numerically: Lets factor 39.

Find the integer square root of 39, which is 6. 6 is even, so subtract 1 to get an odd number. There is no sense trying even factors for an odd number.

  39 > 5*5 = 25
( _0: )
We can add 2 to 5 (the larger factor), 7*5 = (5+2)* 5 = 25 + 2*5 = 25 +10 = 35 and
39  >  35 ( skip to _1)

( _0: )
if we were to only add 2 to 7,  9*5 = (7+2)* 5 = 35 + 2*5 = 35+ 10 = 45, and
39 < 45,
so instead add 2 to 7 (the larger factor) and subtract 2 from 5 (the smaller factor).
9*3 = (7+2)* (5-2) = 35+ 2*5 - 2*7 - 4  = 35 - 8 = 27
  Of course 27 will be < 39  because 27 is already < 35

( _0: ) 
So add 2 to 9 (the larger factor).  11* 3 = (9+2)* 3 = 27 + 2*3 = 27 + 6 = 33 and
39 > 33, (go to _1)

( _0: ) 
So add 2 to 11 (the larger factor).  13*3 = (11+2)* 3 = 33 + 2*3 = 27 + 6 = 39 and
39 = 39, (go to _1)

It's possible to interpret this as finding the difference between two squares = 39, but it really doesn't help, and in fact makes the whole thing seem complicated.

All that happens in the loop is that the larger factor goes up by 2 for as long as possible without going over the number.. But when doing that would go over the number, the larger factor goes up 2 AND the smaller factor goes down 2. You keep that up until you hit equality. By adding the difference between the old trial factorization and the new one, you can avoid multiplication.



Title: Re: Factor WITHOUT division / multiplication:
Post by: MichaelW on April 23, 2005, 06:08:08 AM
Quote from: AeroASM on April 22, 2005, 06:02:47 PM
Quote from: dioxin on January 18, 2005, 02:23:51 PM
Bitrake,
   an optimised SQRT? Have you tried comparing your method to this:

push number
fild dword ptr [esp]
fsqrt
fistp dword ptr [esp]
pop answer


It runs a lot quicker than yours on my machine.

Probably because you don't have the fwait which you need

Paul.
On my P3 the difference is much larger than could be accounted for by a missing fwait, 207 versus 62 cycles, with no procedure call overhead.



[attachment deleted by admin]
Title: Re: Factor WITHOUT division / multiplication:
Post by: Mark Jones on April 26, 2005, 10:24:51 PM
Here's a version which uses pseudo-random values. I slopped the code together so maybe this is all wrong. Anyways here is my output:


Random number samples: 146891097 380636641 6240874 622665325 447979919

CPU Square Root Factoring: 112,167,130,136,147,156,138,136,159,154 cycles
FPU Square Root Factoring: 27,27,27,27,27,27,30,27,27,27 cycles

Press ENTER to exit...

[attachment deleted by admin]
Title: Re: Factor WITHOUT division / multiplication:
Post by: Posit on May 16, 2005, 02:17:10 AM
Quote from: kflorek on April 22, 2005, 10:01:43 PM
Quote from: roticv on April 22, 2005, 03:55:53 PM
Hello,

It is based on Fermat's factoring which makes use of the fact that (a + b)(a - b) = a^2 - b^2. What he is doing is searching for values of a and b such that a^2 - b^2 = the number he wants to factor. This type of factoring is good for factors that are close to each other for example 79 * 83.

I was going to mention that I initially analyzed the code assuming that's what it was based on (except my post was already long.) The initial multiplcation, which is a squaring, seemed to indicate that. The algebra works out OK that way, except that writing it all out leads to rather more involved expressions then what I presented. If Fermat factoring were the idea, then you would expect to calculate a+b and a-b at the end. And, strangely, nowhere are a+b or a-b calculated. Then I noticed all the expressions looked simpler expressed with the actual factors instead of a+b and a-b.

I don't know if the author started with Fermat factoring, but the only thing remaining in the code in common with it is that it starts with factors that are near the square root and works further away. The idea in this algorithm could be implemented the other way around, starting with a small factor. You really don't need Fermat's idea at all.

I've seen Fermat factoring mentioned on a few sites, but I never saw even a half way decent codiing of it. Apparently, the interest is historical, rather than practical. It is just going to be too slow compared to modern ideas.

Here's how the present algorithm proceeds numerically: Lets factor 39.

Find the integer square root of 39, which is 6. 6 is even, so subtract 1 to get an odd number. There is no sense trying even factors for an odd number.

  39 > 5*5 = 25
( _0: )
We can add 2 to 5 (the larger factor), 7*5 = (5+2)* 5 = 25 + 2*5 = 25 +10 = 35 and
39  >  35 ( skip to _1)

( _0: )
if we were to only add 2 to 7,  9*5 = (7+2)* 5 = 35 + 2*5 = 35+ 10 = 45, and
39 < 45,
so instead add 2 to 7 (the larger factor) and subtract 2 from 5 (the smaller factor).
9*3 = (7+2)* (5-2) = 35+ 2*5 - 2*7 - 4  = 35 - 8 = 27
  Of course 27 will be < 39  because 27 is already < 35

( _0: ) 
So add 2 to 9 (the larger factor).  11* 3 = (9+2)* 3 = 27 + 2*3 = 27 + 6 = 33 and
39 > 33, (go to _1)

( _0: ) 
So add 2 to 11 (the larger factor).  13*3 = (11+2)* 3 = 33 + 2*3 = 27 + 6 = 39 and
39 = 39, (go to _1)

It's possible to interpret this as finding the difference between two squares = 39, but it really doesn't help, and in fact makes the whole thing seem complicated.

All that happens in the loop is that the larger factor goes up by 2 for as long as possible without going over the number.. But when doing that would go over the number, the larger factor goes up 2 AND the smaller factor goes down 2. You keep that up until you hit equality. By adding the difference between the old trial factorization and the new one, you can avoid multiplication.

This is actually using an improvement to Fermat's method made by Gauss, who noted that to find the next perfect square in a series does not require multiplication; it can be accomplished with a simple addition. The difference between any perfect square and the next perfect square is 2 more than the difference to the last perfect square. So 1, 4 ,9, 16, 25... and so on have differences between them of 3, 5, 7, 9...., so keeping a running total of the difference lets you find the next perfect square by adding 11 to 25, as opposed to multiplying 6*6.

The algorithm above works by trying to find x^2 - y^2 = n, where n is the number being factored. But instead of keeping track of x and y, or even x^2 and y^2, it keeps track of an error term that equals x^2 - y^2 - n. If this error term is ever 0, then it has found two factors of n.

Starting with x^2 - n, it subtracts differences to subsequent perfect squares until the error term is <= 0. Then it starts adding differences to subsequent perfect squares until the error term >= 0. It keeps this up until it finds a factor.

As far as a good implementation, here is one written in C that seems to boil the idea down to the essence, taken from http://www.frenchfries.net/paul/src/fermat.cc

Integer factor(Integer n)
{
    Integer r, t, u, v, e;

    t = 0;

    if (n%2 == 0) return 2;
    r = sqrt(n);
    if (issquare(n))
   {
   if (isprime(r)) return r;
   else return factor(r);
   }
    r++;
    u = 2*r + 1; v = 1;                // start with possible candidates
    e = r*r - n;
    while (e != 0)                      // minimize the error term a la Gauss
  {
   while (e<0) {e += u; u+=2;}
   while (e>0) {e -= v; v+=2;}
  }
//    return (u+v-2)/2;              // either one will work
    return (u-v)/2;
}

In this code, e keeps track of x^2 - y^2 - n, while u and v keep track of the difference to the next perfect square of x and y respectively. Once the error term is zero, it returns (u+v-2)/2 because given two numbers, u and v, each of which represents the difference between subsequent perfect squares, then (u+v-2)/2 will give you x+y.  For example, if u=9 and v=11, then (u+v-2)/2 = 9, which is x+y, one of the factors. x is of course 4 (the difference between 4^2 and 5^2 iis 9) and y is 5 (the difference between 5^2 and 6^2 is 11).