Computing the square root of a 32-bit integer

Number1awa · February 04, 2011, 08:48:35 PM

I am trying to use the Bisection Method for computing the square root of a 32-bit integer

This is not for homework, I am just trying to make a simple calculator, and I wanted to try adding the sqrt function to it.

As it is right now, I have gotten lost in my code of many pops and pushes.

If you guys could help me with writing this or just come up with something your self that would be awesome thanks :D

dedndave · February 05, 2011, 04:25:13 AM

the easy way is to use the FPU
load integer, sqrt, store integer

but, if you want to write a "discrete" routine, this might be helpful...
http://www.masm32.com/board/index.php?topic=12116.msg92595#msg92595

clive · February 05, 2011, 08:58:50 AM

This converts to assembler quite easily

Code Select

// Jack Crenshaw's Integer Square Root

unsigned long integer_sqrt(unsigned long a)
{
  unsigned long rem = 0;
  unsigned long root = 0;
  unsigned long divisor = 0;
  int i;

  for(i=0; i<16; i++) // 32-bit, 2 at a time
  {
    root <<= 1;

    rem = ((rem << 2) + (a >> 30));

    a <<= 2;

    divisor = (root << 1) + 1;

    if (divisor <= rem)
    {
      rem -= divisor;
      root++;
    }
  }

  return(root);
}

Number1awa · February 05, 2011, 05:53:58 PM

thanks guys

but clive a few things in your c++ code

what does
root <<= 1

rem = ((rem << 2) + (a >> 30)
a <<= 2
divisor = (root << 1) + 1

what do the use of the '<<' and '>>' mean in these i know they are used for input and output but i have never seen them used in this way...:) (learning c++ later this year)

dedndave · February 05, 2011, 05:57:11 PM

shift left by 1, shift right by 1

jj2007 · February 05, 2011, 06:28:57 PM

Should look roughly like this and assembles fine but it always returns zero ::)

What does a <<= 2 mean??

Code Select

include \masm32\include\masm32rt.inc
.code
start:
	a equ eax
	rem equ edx
	root equ esi
	divisor equ edi
	i equ ecx

	mov a, 12345678
	call integer_sqrt

	inkey str$(root), " returned"
	exit
; unsigned long integer_sqrt(unsigned long a)
;{
;  unsigned long rem = 0;
;  unsigned long root = 0;
;  unsigned long divisor = 0;
;  int i;
integer_sqrt proc
  xor rem, rem
  xor root, root
  xor divisor, divisor
  xor i, i
  .Repeat
	shl root, 1	; root <<= 1
	shl rem, 2	; rem = ((rem << 2) + (a >> 30))
	shr a, 30
	add rem, a
	shl a, 2	; a <<= 2 ???
	shl root, 1
	mov divisor, root
	inc divisor
	.if divisor<rem
		sub rem, divisor	; rem -= divisor
		inc root	; root++
	.endif
	inc i
  .Until i>=16
  ret
integer_sqrt endp
end start

qWord · February 05, 2011, 06:30:42 PM

QuoteWhat does a <<= 2 mean

equal to: a= a << 2;
a = a SHL 2

clive · February 05, 2011, 07:01:34 PM

Quote from: Number1awa
but clive a few things in your c++ code

Actually it's pure C, ANSI rather than K&R. The << and >> are shift left and shift right respectively. With an equal sign they become assignments. root <<= 1, can be written in the longer form root = root << 1;

I can't say converting it to x86 has much point given the easy availability of float/XMM instructions to achieve better performance, and better fractional precision. When using it on an ARM or MIPS, one might also consider suitable scaling of the inputs and outputs to achieve the desired range.

Glenn9999 · February 05, 2011, 07:33:33 PM

Quote from: clive on February 05, 2011, 07:01:34 PM
I can't say converting it to x86 has much point given the easy availability of float/XMM instructions to achieve better performance, and better fractional precision.

It's hard to say. It could be someone trying to learn assembler and grasping at straws looking for good projects that are sufficiently rigorous to help foster understanding. I concur with you and dave, it's just better to use the sqrt functionality of the FPU instead of cooking up your own (unless you want FPU emulation, and there's been more than enough effort already done on that).

jj2007 · February 05, 2011, 08:01:14 PM

Still, just for the fun of it, any idea why my code above doesn't work? I have never used C, so it's probably something very trivial.
The error could be here...:

Code Select

	.if divisor<rem
			if 1
			mov rem, divisor
			neg rem
			else
			sub rem, divisor	; rem -= divisor
			endif
		inc root	; root++
	.endif

::)

jj2007 · February 05, 2011, 08:41:50 PM

I got this version working. Google, by the way, suggests this interesting link which seems to be based on Listing 4 :bg

Code Select

include \masm32\include\masm32rt.inc

.code
start:	push 152399025
	call integer_sqrt
	inkey str$(eax), " returned"
	exit

integer_sqrt proc
root	equ eax
tmp	equ ebx
i	equ ecx
rem	equ edx
number equ dword ptr [esp+4]

  xor rem, rem
  xor root, root
  xor i, i
  .Repeat
	shl root, 1			; root <<= 1
	mov tmp, number
	shl rem, 2			; rem = ((rem << 2) + (a >> 30))
	shr tmp, 30
	add rem, tmp
	shl number, 2			; a <<= 2 ???
	.if root<rem
		inc root
		sub rem, root		; rem -= root
		inc root
	.endif
	inc i
  .Until i>=16
  shr root, 1
  ret 4
integer_sqrt endp
end start

clive · February 05, 2011, 10:27:17 PM

Code Select

unsigned long asm_sqrt(unsigned long a)
{
	unsigned long root;

	__asm
	{
		xor		edx,edx 			; rem
		mov		eax,a
		xor		ebx,ebx				; root

		mov		ecx,16				; i, iterations
lp:
		shl		ebx,1					; root <<= 1

		add		eax,eax			; rem = ((rem << 2) + (a >> 30)); a <<= 2;
		adc		edx,edx
		add		eax,eax
		adc		edx,edx

		lea		edi,[ebx*2+1]	; divisor = (root << 1) + 1;
		cmp		edx,edi				; if (divisor <= rem)
		jc		ovr
		sub		edx,edi				; rem -= divisor;
		inc		ebx						; root++
ovr:
		dec		ecx						; i--
		jnz		lp
		mov		root,ebx
	}

	return(root);
}

jj2007 · February 05, 2011, 10:54:49 PM

Thanks, Clive, with LEA codesize is down to 40 bytes. But it's still not competitive with the FPU...

Code Select

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
223     cycles for Jack C, result 12345
64      cycles for the FPU, result 12345
62      cycles for MasmBasic, result 12345

oex · February 06, 2011, 12:25:35 AM

A more sensible implimentation might build a table rather than just calculate once fast.... If used for a calculator no-one is going to notice the difference between 64 and 223 cycles on a button click and you want the best legible code :lol....

For an implementation that has to calculate the sqrt a lot you would want to generate a 256Kb table? in as few cycles as possible and then search table in far fewer than 60 cycles for result?....

Just my 2 cents worth :lol I may be far out, Maths/French is not my Fortay, but this is an interesting calculation I've not done before :bg

Number1awa · February 06, 2011, 01:25:23 AM

Thanks guys i like jj2007 version the best :)

it helped alot :)

News:

Computing the square root of a 32-bit integer

Number1awa

Number1awa

Number1awa