Factor WITHOUT division / multiplication:

bitRAKE · January 17, 2005, 11:50:59 PM

	OPTION PROLOGUE:NONE
	OPTION EPILOGUE:NONE


Sqrt PROC uint32:DWORD
	xor ecx, ecx
	xor eax, eax
	bsr edx, [esp][4]
	and edx, -2	; even
	bts ecx, edx

s0:	add eax, ecx
	sub [esp][4], eax
	jnc s1
	add [esp][4], eax
	sub eax, ecx
	jmp s2

s1:	add eax, ecx
s2:	shr eax, 1
	shr ecx, 2
	jne s0
sX:
	retn 4
Sqrt ENDP


Factor PROC num:DWORD

	invoke Sqrt, [esp][4] ; num

	xor edx, edx
	test eax, 1
	sete dl
	sub eax, edx

	mov ecx, eax
	mul eax
	lea edx, [ecx*2]
	mov ecx, [esp][4]
	sub ecx, eax
	mov eax, edx

_0:	cmp ecx, eax
	jnc _1
	sub eax, 4
	add ecx, edx
_1:	add edx, 4
	sub ecx, eax
	jne _0

	shr eax, 1
	shr edx, 1
	retn 4
Factor ENDP

The above algorithm only finds two factors of a number (EAX and EDX on return). If EAX is equal to one then EDX is prime, else call Factor recursively on EAX and EDX. Yes, in general it is faster than trial division and does not need an array of primes - although a prime array would help end recursion early.

Factor does not work for even numbers.

I would appreciate credit being given for use.

hutch-- · January 18, 2005, 12:37:51 AM

Rickey,

This stuff looks good but remember I count with my fingers so just a little documentation here would help. The code says no stack frame on either but I would like to see a bit more data on what they both do.

bitRAKE · January 18, 2005, 04:01:38 AM

The square root algorithm is an optimized translation of the itterative method outlined in this document. It is very similar to what one might do with pencil and paper. It has been described several times but no one has given the code for it. I have verified the output for all 32-bit numbers except zero is undefined due to the nature of BSR. :'( Maybe, someone can find an optimal fix for it - I would be most grateful.

Sqrt is passed an unsigned 32-bit value, and returns the integer portion of the square root of that number in EAX. Factor is passed an odd composite number, and returns two numbers that multiply together to equal that number in EAX and EDX (EAX <= EDX). The standard windows calling convention is used in this instance.

There is an ever increasing volume of algorithms on my web site. When I get board creating the algorithms then I will put some work into describing their use and derivation in great detail. Until then it would be best to read the code - the instructions are few and the practice is good.

raymond · January 18, 2005, 05:04:07 AM

Quoteexcept zero is undefined due to the nature of BSR. Maybe, someone can find an optimal fix for it - I would be most grateful

You could simply modify the start of the proc as follows (using your own syntax):

Code Select

   xor  eax,eax
   cmp  [esp][4],0
   jz   sX
   xor  ecx,ecx
   .....

hutch

bitRAKE's algo for the sqrt of a 32-bit integer could also be considered as an optimization of the sqrt32 proc in the MixLib library issued a few years ago. The primary optimization is the setting of the MSB with the bsr instruction, improving the speed for the smaller input numbers. The computing loops are similar.

One difference is that bitRAKE's proc returns the truncated square root while sqrt32 returns the square root rounded to the nearest integer.

Raymond

bitRAKE · January 18, 2005, 05:53:36 AM

Thanks, Raymond but I was hoping for something more optimal and less obvious - I try to not pass zero currently. Is MixLib still distributed with MASM32 as I can only find two FPU Sqrt routines in MASM32.

dioxin · January 18, 2005, 02:23:51 PM

Bitrake,
an optimised SQRT? Have you tried comparing your method to this:

Code Select


push number
fild dword ptr [esp]
fsqrt
fistp dword ptr [esp]
pop answer

It runs a lot quicker than yours on my machine.

Paul.

raymond · January 18, 2005, 05:58:33 PM

bitRAKE

MixLib is not distributed with MASM32. However, it has been available on hutch's site as the last item of his MASM Source Code section:

http://www.movsd.com/source.htm

Kind of quaint that I was reviewing the code of that library yesterday, before I even saw your post, to remove redundant instructions. The two sqrt functions (32 and 64 bits) were my primary targets.

Raymond

raymond · January 18, 2005, 06:32:58 PM

dioxin

I agree with you that the FPU route is faster, at least for the upper range of values. However, the CPU route should not be neglected when you could do it in parallel when the FPU may be busy doing other computations.

Raymond

Randall Hyde · January 18, 2005, 09:48:32 PM

Quote from: bitRAKE on January 18, 2005, 04:01:38 AM
The square root algorithm is an optimized translation of the itterative method outlined in this document. It is very similar to what one might do with pencil and paper. It has been described several times but no one has given the code for it. I have verified the output for all 32-bit numbers except zero is undefined due to the nature of BSR. :'( Maybe, someone can find an optimal fix for it - I would be most grateful.

What's wrong with testing the zero flag after executing BSR?
Cheers,
Randy Hyde

bitRAKE · January 19, 2005, 12:35:18 AM

Quote from: Randall Hyde on January 18, 2005, 09:48:32 PM
What's wrong with testing the zero flag after executing BSR?

Thanks, that is exactly what is needed.

kflorek · April 22, 2005, 02:32:42 PM

Quote from: hutch-- on January 18, 2005, 12:37:51 AM
Rickey,

This stuff looks good but remember I count with my fingers so just a little documentation here would help.

Yeah, I couldn't follow it, minus comments. Sure, I can follow the instructions, but not how and why it works. It is very compactly and neatly done, therefore incomprehensible. So here is the factoring code with my own comments:

; For odd x, finds odd y and z such that y*z = x
; Since x = x*1, it will always succeed. The question
; is if it suceeds before getting down to the factor 1.
;
; You don't need to mulltiply (y+2) times z to get the next trial product. You
; can calculate (y+2)*z by adding 2*z to y*z, eliminating multiplication in the main loop.

Factor PROC num:DWORD

   invoke Sqrt, [esp][4] ; num
      ;y = eax = the integer square root of x
      ;which is the largest possible smaller factor
   xor edx, edx
   test eax, 1   ;is it odd?
   sete dl      ;dl=1 if even
   sub eax, edx   ;y if odd, else y - 1
      ;eax = the largest odd y such that y^2 <= x

   mov ecx, eax   ;y
   mul eax      ;y^2 = largest odd square < x
   lea edx, [ecx*2]   ;2*y
   mov ecx, [esp][4]   ;x
   sub ecx, eax   ;x - y^2
   mov eax, edx   ;2*z
      ;jz somewhere, to catch squares

;+loop
   ; y is the larger factor, z is the smaller
   ; that is y >= z
   ; at the start, z is the largest possible smaller factor
   ; and y and z are as close as possible,
   ; that is y = z
   ;ecx accumulates x - y*z
   ;eax accumulates twice z
   ;edx accumulates twice y
   ; increase y as far as possible without y*z going over x
   ; if y*z = x, exit.
   ; when the next value of y would make y*z go over x, decease z
   ; before increasing y
   ; and if y*z = z, exit
   ; (y+2)*(z-2) = y*z - 2*(y - z) - 4
   ; since y-z is >= 0 or greater, this is <= y*z

_0:   cmp ecx, eax   ;is x - y*z >= 2*z, that is x >= (y+2)*z ?
   jnc _1      ;if yes, y can go higher without x - y*z going negative.
      ;if no, x <> (y+2)*z and y can't go higher, so decrease z by 2
      ;
   sub eax, 4   ;2 * (z-2), adjust to lower odd z
   add ecx, edx   ;x - y*z + 2*y = x - y*(z-2)

_1:
      ;either way, y up 2
   add edx, 4   ;2*(y+2)

   sub ecx, eax   ;x - (y+2)*z
         ;or x -(y+2)*(z-2)= x - y*z + 2*y - 2*z + 4
; But is there an x,y,z where
; x < (y+2)*z , x > y*z
; and x < (y+2)*(z-2) = y*z - 2*(y - z) - 4
; no, if x >= y*z and y-z >= 0

   jne _0
;-loop

   shr eax, 1   ;smaller factor
   shr edx, 1   ;larger factor
   retn 4
Factor ENDP

The comments seemed clear enough when I wrote them. As usual, now that I look at it after a week or two,
it looks vague.

Instead of doing trial division, this algorithm tries products, always keeping trial y*z <= x, and y >= z. It starts with a pair of factors the smaller of which is the largest possible smaller factor. Instead of generating successive trial products by multiplication, it generates them by adding the difference. If we want to try (y+2) times z after we already know y*z, add 2*z to y*z, since (y+2)* z = y*z + 2*z.

At an alternative branch, where both y and z need to be changed, (y+2)* (z-2) is calculate by adding the difference, since (y+2)* (z-2) = y*z + 2*z -2*y - 4. Notice that 2*z -2*y - 4 is negative when z <= y. This is important because the loop needs the trial products always <= x.

After deciphering, it became clear that the code misses the equal factors of square numbers. And if x is the square of a prime, the code thinks x is a prime. For instance 3*3 = 9 is missed. Not hard to fix.

It is also clear that the branch where y goes up 2 and z down 2 can never produce equality to x. That's because y*z is already < x and (y+2)* (z-2) is < y*z.

It is interesting that the accumulated numbers are all even. Therefore you could get rid of the factors of 2 by revising the code. Although interesting, there does not seem to be any apparent advantage to it.

It looks like where the loop slows down is where z gets small, but to cut down on wasted passes would probably introduce some division.

If anybody is interested in this but me, I'll try to make the still obscure spots clearer.

roticv · April 22, 2005, 03:55:53 PM

Hello,

It is based on Fermat's factoring which makes use of the fact that (a + b)(a - b) = a^2 - b^2. What he is doing is searching for values of a and b such that a^2 - b^2 = the number he wants to factor. This type of factoring is good for factors that are close to each other for example 79 * 83.

AeroASM · April 22, 2005, 06:02:47 PM

Quote from: dioxin on January 18, 2005, 02:23:51 PM
Bitrake,
an optimised SQRT? Have you tried comparing your method to this:
Code Select Expand
push number fild dword ptr [esp] fsqrt fistp dword ptr [esp] pop answer

It runs a lot quicker than yours on my machine.

Probably because you don't have the fwait which you need

Paul.

kflorek · April 22, 2005, 10:01:43 PM

Quote from: roticv on April 22, 2005, 03:55:53 PM
Hello,

It is based on Fermat's factoring which makes use of the fact that (a + b)(a - b) = a^2 - b^2. What he is doing is searching for values of a and b such that a^2 - b^2 = the number he wants to factor. This type of factoring is good for factors that are close to each other for example 79 * 83.

I was going to mention that I initially analyzed the code assuming that's what it was based on (except my post was already long.) The initial multiplcation, which is a squaring, seemed to indicate that. The algebra works out OK that way, except that writing it all out leads to rather more involved expressions then what I presented. If Fermat factoring were the idea, then you would expect to calculate a+b and a-b at the end. And, strangely, nowhere are a+b or a-b calculated. Then I noticed all the expressions looked simpler expressed with the actual factors instead of a+b and a-b.

I don't know if the author started with Fermat factoring, but the only thing remaining in the code in common with it is that it starts with factors that are near the square root and works further away. The idea in this algorithm could be implemented the other way around, starting with a small factor. You really don't need Fermat's idea at all.

I've seen Fermat factoring mentioned on a few sites, but I never saw even a half way decent codiing of it. Apparently, the interest is historical, rather than practical. It is just going to be too slow compared to modern ideas.

Here's how the present algorithm proceeds numerically: Lets factor 39.

Find the integer square root of 39, which is 6. 6 is even, so subtract 1 to get an odd number. There is no sense trying even factors for an odd number.

39 > 5*5 = 25
( _0: )
We can add 2 to 5 (the larger factor), 7*5 = (5+2)* 5 = 25 + 2*5 = 25 +10 = 35 and
39 > 35 ( skip to _1)

( _0: )
if we were to only add 2 to 7, 9*5 = (7+2)* 5 = 35 + 2*5 = 35+ 10 = 45, and
39 < 45,
so instead add 2 to 7 (the larger factor) and subtract 2 from 5 (the smaller factor).
9*3 = (7+2)* (5-2) = 35+ 2*5 - 2*7 - 4 = 35 - 8 = 27
Of course 27 will be < 39 because 27 is already < 35

( _0: )
So add 2 to 9 (the larger factor). 11* 3 = (9+2)* 3 = 27 + 2*3 = 27 + 6 = 33 and
39 > 33, (go to _1)

( _0: )
So add 2 to 11 (the larger factor). 13*3 = (11+2)* 3 = 33 + 2*3 = 27 + 6 = 39 and
39 = 39, (go to _1)

It's possible to interpret this as finding the difference between two squares = 39, but it really doesn't help, and in fact makes the whole thing seem complicated.

All that happens in the loop is that the larger factor goes up by 2 for as long as possible without going over the number.. But when doing that would go over the number, the larger factor goes up 2 AND the smaller factor goes down 2. You keep that up until you hit equality. By adding the difference between the old trial factorization and the new one, you can avoid multiplication.

MichaelW · April 23, 2005, 06:08:08 AM

Quote from: AeroASM on April 22, 2005, 06:02:47 PM
Quote from: dioxin on January 18, 2005, 02:23:51 PM
Bitrake,
an optimised SQRT? Have you tried comparing your method to this:
Code Select Expand
push number fild dword ptr [esp] fsqrt fistp dword ptr [esp] pop answer

It runs a lot quicker than yours on my machine.

Probably because you don't have the fwait which you need

Paul.

On my P3 the difference is much larger than could be accounted for by a missing fwait, 207 versus 62 cycles, with no procedure call overhead.

[attachment deleted by admin]

News:

Factor WITHOUT division / multiplication:

bitRAKE

hutch--

bitRAKE

raymond

bitRAKE

dioxin

raymond

raymond

Randall Hyde

bitRAKE

kflorek

roticv

AeroASM

kflorek

MichaelW