magic number division for 64, 32 and 16 bit

qWord · October 01, 2009, 09:08:52 PM

dedndave,
On WinVista the button "copy to clipboard" works (the same with Copy and Past: Ctrl +c/v). Maybe the problem is the rich edit control - I'm using WM_GETTEXT to receive text. I'm also noticed that there are problems with the edit-control (were you type in the divisor): copying a number to it using CTRL+v/c fails, but through the context-menu it works. I fix this issues in the next version.

dedndave · October 01, 2009, 10:55:36 PM

cool - many thanks qWord

raymond · October 06, 2009, 02:59:42 AM

As promised, the attached zip file contains my four procedures to convert signed/unsigned dword/qword to ascii. They are in modular format ready for inclusion into a library. Or just copy/paste the procedure code.

You will notice that they all require the address of a buffer of adequate size to return the ascii string. Each procedure returns the address of the first byte of the string in the EAX register. Since conversions generate the digits in reverse order, this approach avoided code to place the string starting always at the first byte of the supplied buffer.

Dword to ascii algos have been beaten almost to death as to how fast they compare. I thus doubt that the attached ones would fare any better (nor much worse) than the best.

Qword to ascii algos have not been as studied. The attached ones may have an edge if anyone is interested in making comparisons. (Timings will obviously go down with 64-bit computers and related code.)

dedndave · October 06, 2009, 11:52:08 AM

very nice Ray
qwords have been worked over pretty well, also - lol
Lingo, Drizz, and Jochen all have pretty fast algo's
i do like yours, though - they are nice and small and should time favorably :U

this is a link to Drizz's u64toa
http://www.masm32.com/board/index.php?topic=11741.msg88897#msg88897

as you can see, it is quite a bit longer than yours
Lingo modified a P. Dixon routine that has a look-up table
i think that is the fastest one around, but it is fairly large

as you know, i am working on a bignum routine that combines ling long kai fang (horner's rule) and multiply-to-divide
i may take a shot at a 64-bit version of that once i get finished - lol

dedndave · October 06, 2009, 02:15:31 PM

here are some times...

CPU 0: Intel(R) Pentium(R) 4 CPU 3.00GHz MMX SSE3 Cores: 2

2147483647
-2147483648
4294967295
9223372036854775807
-9223372032559808512
18446744073709551615

smdtoa (2147483647) 266 267 265 clock cycles
smdtoa (-2147483648) 265 266 275 clock cycles
umdtoa (4294967295) 227 231 233 clock cycles
smqtoa (9223372036854775807) 1192 1181 1183 clock cycles
smqtoa (-9223372036854775808) 1173 1173 1169 clock cycles
umqtoa (18446744073709551615) 1084 1056 1054 clock cycles

it looks like smqtoa has a problem displaying the correct digits for a negative value
(i know that feeling - lol - my llkf bignum had a similar problem)
(program and source attached)

dedndave · October 06, 2009, 03:10:34 PM

this might be the problem...

test esi,esi
pushfd
.if SIGN?
not esi
neg ebx
.endif

here is one possible fix...

test esi,esi
pushfd
mov eax,esi
cdq
xor ebx,edx
xor esi,edx
sub ebx,edx
sbb esi,edx

the sign bit is extended into edx
then, if negative, invert both dwords and add 1
i applied that fix and smqtoa is now faster than umqtoa

dedndave · October 06, 2009, 03:39:37 PM

Ray,
if you want to speed up both signed routines.....
something i learned while writing llkf bignum is that pushfd and popfd are slow
rather than pushing and poping the flags, extend the sign bit into edx with cdq
then, push the edx value
at the end, pop it and....

pop edx ;recall extended sign
mov al,2Dh ;minus sign "-"
and eax,edx
jz @F

dec edi
mov [edi],al

@@:

here are some times for comparison...

CPU 0: Intel(R) Pentium(R) 4 CPU 3.00GHz MMX SSE3 Cores: 2

2147483647
-2147483648
4294967295
9223372036854775807
-9223372036854775808
18446744073709551615

smdtoa (2147483647) 197 197 198 clock cycles
smdtoa (-2147483648) 202 201 201 clock cycles
umdtoa (4294967295) 229 229 228 clock cycles
smqtoa (9223372036854775807) 1034 1032 1030 clock cycles
smqtoa (-9223372036854775808) 1018 1018 1049 clock cycles
umqtoa (18446744073709551615) 1057 1056 1055 clock cycles

(program and source attached)

raymond · October 06, 2009, 05:01:16 PM

Thanks Dave. Here's my correction for the stupid error I made negating the number. That's the standard way it should have been done in the first place.

Code Select

      .if   SIGN?
            not   esi
            not   ebx
            add   ebx,1
            adc   esi,0
      .endif

As for the pushfd, I'll take your word for it. However, I simply push the high dword and pop it at the end and retest it. Using cdq for the qword routine would have involved a few more instructions.

I've modified my atttachement in the previous post.

dedndave · October 06, 2009, 09:03:53 PM

very good, Ray
did you time the new fixes ?
you should be able to use that program as a skeleton

Jimg · October 06, 2009, 09:09:57 PM

I'm probably missing something here, but what do the last nine posts have to do with the original topic? Perhaps they should be moved to their own thread?

dedndave · October 06, 2009, 09:12:02 PM

i think Ray wanted to demonstrate the advantages of using multiply-to-divide
but, you may be right - lol - oh well - i am sure that qWord won't mind

dedndave · October 08, 2009, 03:15:23 PM

i have a good question for you guys...

is there some equation or algorithm that will calculate the minimum number of multiplier bits
required for some number of dividend bits and some number of divisor bits ?

it is not just to reduce the number of shifts,
but to determine how much precision is required for the multiplication for a given set of input parameters

i am going to try to write a simple program to test it for divisors that are multiples of 10
and see if i can "gleen" an equation - lol - we all know how well my equations work :lol

dedndave · October 09, 2009, 02:39:31 AM

qWord ?
Ray ?
anybody ?
is this a puzzle for project euler ???
don't make me be the one to solve it :P

qWord · October 09, 2009, 01:51:17 PM

dedndave ,

Not sure if I have understand you right, but even if you can calculate such thinks it makes not much sense on x86, because we have only 3 (or 4 on x86-64) kinds of multiplication: 8,16 and 32bit. In generally I would say that the needed precision depends on your dividend, nothing more.
But manly raymond is the man that could give you a right and more detailed answer.

qWord

BTW: have you received my PM? I've upload an new version of MagicNumber64.exe - did the 'copy to clipboard'-button now works on your machine?

dedndave · October 09, 2009, 02:33:06 PM

yes qWord - the clipboard works great :U

as for the resolution, i am working with some larger values that require multiple precision multiplication
at the moment, i am using double-double precision multiplication (64-bit x 64-bit)
i would like to reduce it to single-double precision (32-bit x 64-bit) if i can
there are too many possible input values to do a 100% value test (it would take centuries)
so, i have to show it works in theory if i want to reduce it

i have written a program that runs with divisors of 10 to 1,000,000,000 and reduces the multiplier
as soon as it is finished running, i will post the results (text file)

News:

magic number division for 64, 32 and 16 bit