found on german site a routine to calculate sqare root UNSIGNED 32-bit:
implement it, but doesnot work,-- gives wrong result wiht carry????
help appreciated:
[
Sqrt_Cobold proc number:DWORD
LOCAL i0,i1,i2,i3,i4,i5,k0,k1:DWORD
mov k1,0
mov i1,0
mov i5,32
mov i0,1 ;For i0 := 1 to 4 do Begin
ALIGN 4
.while i0 <= 4
sub i5,8
mov eax,number
mov ecx,i5
shr eax,cl
and eax,0000000FFh
mov i4,eax ;i4 := (number shr i5) and $000000FF
mov eax,i1
shl eax,8
or eax,i4
mov i1,eax ;i1 := (i1 shl 8) or i4
mov eax,k1
shl eax,5
inc eax
mov i2,eax ;i2 := (k1 shl 5) + 1
inc i0
.endw
mov k0,0
ALIGN 4
.repeat
mov eax,i1 ;i3 = i1 - i2
sub eax,i2
jc @F
mov i3,eax
.break .if i3 < 0
mov eax,i3
mov i1,eax ;i1 = i3
add i2,2 ;i2 = i2 + 2
adc i2,0
inc k0
.until i3 < 0
@@:
mov eax,k1 ;k1 := (k1 shl 4) + k0;
shl eax,4
adc eax,k0
ret
Sqrt_Cobold endp
/code]
I haven't analyzed the algo you posted to determine how it may work. However, if you are looking for an algo to extract the square root of unsigned 32-bit integers (and 64-bit unsigned integers) without using the FPU, you may want to look at the one(s) included in my MixLib package which you can find at:
http://www.ray.masmcode.com/fixmath.html
The source code(s) are included in the package. You can adapt the source code to return a truncated result instead of a rounded result by commenting out the very last "inc eax". Although the code may look long, part of it is repeated 16 times (for 32-bit integers) on the assumption that the timing would be improved by unrolling a loop.
Raymond,
thanks for the reply and for mixlib!
I believe FSQRT and the IntSqrt of masm treat all numbers as signed, so you cannot calculate the square root of numbers > 2147483648 - is my assumption correct?
Anyway, I forget about my routine and take a look at mixlib now!
regards
Cobold
Your library looks nice Raymond, you seem to have put a lot of time into it. Is it for the most part safe? I tried to understand all you documentation but it seemed to go back and forth. And if so, have you considered talking to hutch about adding it to masm32? i'd imagine countless people could benefit from this.
I issued that library many years ago (I still had a Pentium III at the time :() before I even issued the Fpulib and SimplyFPU tutorial. Although the basic principles of "fixed point" math can be usefull in some applications, it has become more of a curiosity with the advent of the FPU/MMX/SSE technologies.
However, I am available to answer any question regarding this subject if someone doesn't understand some of the explanations given at the above link or in the Mixlib package.
thought I'd play with that sample...
what I can work out in 30 seconds - I might be wrong :boohoo:
At the end of the While loop, you just end up with
I1 = number
I2 = 1
It actually does nuzzing except make I2 ??
Anyway off to the repeat loop, whcih at the end just
K0 = repeat loop Counter
K1 is intitilaised to zero and always remains there, yet it's used in the final calc ??
Result would be the repeat loop count (+1 if there is a 'stray carry' floating around )
It looks like the variables might be mixed up, or the author was smoking something
:bg
Quote from: cobold on November 26, 2008, 01:23:28 AM
found on german site a routine to calculate sqare root UNSIGNED 32-bit:
implement it, but doesnot work,-- gives wrong result wiht carry????
help appreciated:[\quote]
the floating point square root on a P4 and up is extremely fast. I recommend converting to floating point and then doing the square root and converting back. Assuming of course you are looking for speed.
I think the p4 does the fsqrt in like 12 cycles. So it's really fast now. The P3 was like 45 cycles. Most people don't realize that and try to do software implementations like yours.
Agner Fog shows 69 cycles for the P2/3 at the default 64-bit precision, with notes that the instruction is not pipelined, and that it will be faster at a lower precision. For the P4 he shows 43 cycles at the default precision, 30 at 53-bit precision, and 23 at 24-bit precision. I can't see any good way to test fsqrt in isolation, so I settled for testing:
fild xx
fsqrt
fistp yy
Running the code in the attachment on my P3, I get consistent results:
Control word PC field = 00000003h (default, 64 bits)
65 cycles
Control word PC field = 00000002h (53 bits)
54 cycles
Control word PC field = 00000000h (24 bits)
25 cycles
Running on a P4 the run to run variation will probably be relatively large.
[attachment deleted by admin]