News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Computing the square root of a 32-bit integer

Started by Number1awa, February 04, 2011, 08:48:35 PM

Previous topic - Next topic

Number1awa

I am trying to use the Bisection Method for computing the square root of a 32-bit integer

This is not for homework, I am just trying to make a simple calculator, and I wanted to try adding the sqrt function to it.

As it is right now, I have gotten lost in my code of many pops and pushes.

If you guys could help me with writing this or just come up with something your self that would be awesome thanks :D

dedndave

the easy way is to use the FPU
load integer, sqrt, store integer

but, if you want to write a "discrete" routine, this might be helpful...
http://www.masm32.com/board/index.php?topic=12116.msg92595#msg92595

clive

This converts to assembler quite easily

// Jack Crenshaw's Integer Square Root

unsigned long integer_sqrt(unsigned long a)
{
  unsigned long rem = 0;
  unsigned long root = 0;
  unsigned long divisor = 0;
  int i;

  for(i=0; i<16; i++) // 32-bit, 2 at a time
  {
    root <<= 1;

    rem = ((rem << 2) + (a >> 30));

    a <<= 2;

    divisor = (root << 1) + 1;

    if (divisor <= rem)
    {
      rem -= divisor;
      root++;
    }
  }

  return(root);
}
It could be a random act of randomness. Those happen a lot as well.

Number1awa

thanks guys

but clive a few things in your c++ code

what does
root <<= 1

rem = ((rem << 2) + (a >> 30)
a <<= 2
divisor = (root << 1) + 1

what do the use of the '<<' and '>>' mean in these i know they are used for input and output but i have never seen them used in this way...:)    (learning c++ later this year)

dedndave


jj2007

Should look roughly like this and assembles fine but it always returns zero ::)

What does a <<= 2 mean??

include \masm32\include\masm32rt.inc
.code
start:
a equ eax
rem equ edx
root equ esi
divisor equ edi
i equ ecx

mov a, 12345678
call integer_sqrt

inkey str$(root), " returned"
exit
; unsigned long integer_sqrt(unsigned long a)
;{
;  unsigned long rem = 0;
;  unsigned long root = 0;
;  unsigned long divisor = 0;
;  int i;
integer_sqrt proc
  xor rem, rem
  xor root, root
  xor divisor, divisor
  xor i, i
  .Repeat
shl root, 1 ; root <<= 1
shl rem, 2 ; rem = ((rem << 2) + (a >> 30))
shr a, 30
add rem, a
shl a, 2 ; a <<= 2 ???
shl root, 1
mov divisor, root
inc divisor
.if divisor<rem
sub rem, divisor ; rem -= divisor
inc root ; root++
.endif
inc i
  .Until i>=16
  ret
integer_sqrt endp
end start

qWord

QuoteWhat does a <<= 2 mean
equal to: a= a << 2;
a = a SHL 2
FPU in a trice: SmplMath
It's that simple!

clive

Quote from: Number1awa
but clive a few things in your c++ code

Actually it's pure C, ANSI rather than K&R.  The << and >> are shift left and shift right respectively. With an equal sign they become assignments. root <<= 1, can be written in the longer form root = root << 1;

I can't say converting it to x86 has much point given the easy availability of float/XMM instructions to achieve better performance, and better fractional precision. When using it on an ARM or MIPS, one might also consider suitable scaling of the inputs and outputs to achieve the desired range.
It could be a random act of randomness. Those happen a lot as well.

Glenn9999

Quote from: clive on February 05, 2011, 07:01:34 PM
I can't say converting it to x86 has much point given the easy availability of float/XMM instructions to achieve better performance, and better fractional precision.

It's hard to say.  It could be someone trying to learn assembler and grasping at straws looking for good projects that are sufficiently rigorous to help foster understanding.  I concur with you and dave, it's just better to use the sqrt functionality of the FPU instead of cooking up your own (unless you want FPU emulation, and there's been more than enough effort already done on that).

jj2007

Still, just for the fun of it, any idea why my code above doesn't work? I have never used C, so it's probably something very trivial.
The error could be here...:

.if divisor<rem
if 1
mov rem, divisor
neg rem
else
sub rem, divisor ; rem -= divisor
endif
inc root ; root++
.endif

::)

jj2007

I got this version working. Google, by the way, suggests this interesting link which seems to be based on Listing 4 :bg

include \masm32\include\masm32rt.inc

.code
start: push 152399025
call integer_sqrt
inkey str$(eax), " returned"
exit

integer_sqrt proc
root equ eax
tmp equ ebx
i equ ecx
rem equ edx
number equ dword ptr [esp+4]

  xor rem, rem
  xor root, root
  xor i, i
  .Repeat
shl root, 1 ; root <<= 1
mov tmp, number
shl rem, 2 ; rem = ((rem << 2) + (a >> 30))
shr tmp, 30
add rem, tmp
shl number, 2 ; a <<= 2 ???
.if root<rem
inc root
sub rem, root ; rem -= root
inc root
.endif
inc i
  .Until i>=16
  shr root, 1
  ret 4
integer_sqrt endp
end start

clive

unsigned long asm_sqrt(unsigned long a)
{
unsigned long root;

__asm
{
xor edx,edx ; rem
mov eax,a
xor ebx,ebx ; root

mov ecx,16 ; i, iterations
lp:
shl ebx,1 ; root <<= 1

add eax,eax ; rem = ((rem << 2) + (a >> 30)); a <<= 2;
adc edx,edx
add eax,eax
adc edx,edx

lea edi,[ebx*2+1] ; divisor = (root << 1) + 1;
cmp edx,edi ; if (divisor <= rem)
jc ovr
sub edx,edi ; rem -= divisor;
inc ebx ; root++
ovr:
dec ecx ; i--
jnz lp
mov root,ebx
}

return(root);
}
It could be a random act of randomness. Those happen a lot as well.

jj2007

Thanks, Clive, with LEA codesize is down to 40 bytes. But it's still not competitive with the FPU...

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
223     cycles for Jack C, result 12345
64      cycles for the FPU, result 12345
62      cycles for MasmBasic, result 12345

oex

A more sensible implimentation might build a table rather than just calculate once fast.... If used for a calculator no-one is going to notice the difference between 64 and 223 cycles on a button click and you want the best legible code :lol....

For an implementation that has to calculate the sqrt a lot you would want to generate a 256Kb table? in as few cycles as possible and then search table in far fewer than 60 cycles for result?....

Just my 2 cents worth :lol I may be far out, Maths/French is not my Fortay, but this is an interesting calculation I've not done before :bg
We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv

Number1awa

Thanks guys i like jj2007 version the best :)

it helped alot :)