News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

magic number division for 64, 32 and 16 bit

Started by qWord, September 18, 2008, 12:07:32 PM

Previous topic - Next topic

qWord

dedndave,
On WinVista the button "copy to clipboard" works (the same with Copy and Past: Ctrl +c/v). Maybe the problem is the rich edit control - I'm using WM_GETTEXT to receive text.  I'm also noticed that there are problems with the edit-control (were you type in the divisor): copying a number to it using CTRL+v/c fails, but through the context-menu it works. I fix this issues in the next version.
FPU in a trice: SmplMath
It's that simple!

dedndave


raymond

#17
As promised, the attached zip file contains my four procedures to convert signed/unsigned dword/qword to ascii. They are in modular format ready for inclusion into a library. Or just copy/paste the procedure code.

You will notice that they all require the address of a buffer of adequate size to return the ascii string. Each procedure returns the address of the first byte of the string in the EAX register. Since conversions generate the digits in reverse order, this approach avoided code to place the string starting always at the first byte of the supplied buffer.

Dword to ascii algos have been beaten almost to death as to how fast they compare. I thus doubt that the attached ones would fare any better (nor much worse) than the best.

Qword to ascii algos have not been as studied. The attached ones may have an edge if anyone is interested in making comparisons. (Timings will obviously go down with 64-bit computers and related code.)
When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com

dedndave

very nice Ray
qwords have been worked over pretty well, also - lol
Lingo, Drizz, and Jochen all have pretty fast algo's
i do like yours, though - they are nice and small and should time favorably   :U

this is a link to Drizz's u64toa
http://www.masm32.com/board/index.php?topic=11741.msg88897#msg88897

as you can see, it is quite a bit longer than yours
Lingo modified a P. Dixon routine that has a look-up table
i think that is the fastest one around, but it is fairly large

as you know, i am working on a bignum routine that combines ling long kai fang (horner's rule) and multiply-to-divide
i may take a shot at a 64-bit version of that once i get finished - lol

dedndave

here are some times...

CPU 0: Intel(R) Pentium(R) 4 CPU 3.00GHz MMX SSE3 Cores: 2

2147483647
-2147483648
4294967295
9223372036854775807
-9223372032559808512
18446744073709551615

smdtoa (2147483647)            266      267     265     clock cycles
smdtoa (-2147483648)           265      266     275     clock cycles
umdtoa (4294967295)            227      231     233     clock cycles
smqtoa (9223372036854775807)   1192     1181    1183    clock cycles
smqtoa (-9223372036854775808)  1173     1173    1169    clock cycles
umqtoa (18446744073709551615)  1084     1056    1054    clock cycles

it looks like smqtoa has a problem displaying the correct digits for a negative value
(i know that feeling - lol - my llkf bignum had a similar problem)
(program and source attached)

dedndave

this might be the problem...

      test  esi,esi
      pushfd
      .if   SIGN?
            not   esi
            neg   ebx
      .endif

here is one possible fix...

        test    esi,esi
        pushfd
        mov     eax,esi
        cdq
        xor     ebx,edx
        xor     esi,edx
        sub     ebx,edx
        sbb     esi,edx

the sign bit is extended into edx
then, if negative, invert both dwords and add 1
i applied that fix and smqtoa is now faster than umqtoa

dedndave

Ray,
if you want to speed up both signed routines.....
something i learned while writing llkf bignum is that pushfd and popfd are slow
rather than pushing and poping the flags, extend the sign bit into edx with cdq
then, push the edx value
at the end, pop it and....

        pop     edx         ;recall extended sign
        mov     al,2Dh      ;minus sign "-"
        and     eax,edx
        jz      @F

        dec     edi
        mov     [edi],al

@@:

here are some times for comparison...

CPU 0: Intel(R) Pentium(R) 4 CPU 3.00GHz MMX SSE3 Cores: 2

2147483647
-2147483648
4294967295
9223372036854775807
-9223372036854775808
18446744073709551615

smdtoa (2147483647)            197      197     198     clock cycles
smdtoa (-2147483648)           202      201     201     clock cycles
umdtoa (4294967295)            229      229     228     clock cycles
smqtoa (9223372036854775807)   1034     1032    1030    clock cycles
smqtoa (-9223372036854775808)  1018     1018    1049    clock cycles
umqtoa (18446744073709551615)  1057     1056    1055    clock cycles

(program and source attached)

raymond

Thanks Dave. Here's my correction for the stupid error I made negating the number. That's the standard way it should have been done in the first place.

      .if   SIGN?
            not   esi
            not   ebx
            add   ebx,1
            adc   esi,0
      .endif


As for the pushfd, I'll take your word for it. However, I simply push the high dword and pop it at the end and retest it. Using cdq for the qword routine would have involved a few more instructions.

I've modified my atttachement in the previous post.
When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com

dedndave

very good, Ray
did you time the new fixes ?
you should be able to use that program as a skeleton

Jimg

I'm probably missing something here, but what do the last nine posts have to do with the original topic?  Perhaps they should be moved to their own thread?

dedndave

i think Ray wanted to demonstrate the advantages of using multiply-to-divide
but, you may be right - lol - oh well - i am sure that qWord won't mind

dedndave

i have a good question for you guys...

is there some equation or algorithm that will calculate the minimum number of multiplier bits
required for some number of dividend bits and some number of divisor bits ?

it is not just to reduce the number of shifts,
but to determine how much precision is required for the multiplication for a given set of input parameters

i am going to try to write a simple program to test it for divisors that are multiples of 10
and see if i can "gleen" an equation - lol - we all know how well my equations work   :lol

dedndave

qWord ?
Ray ?
anybody ?
is this a puzzle for project euler ???
don't make me be the one to solve it   :P

qWord

dedndave ,

Not sure if I have understand you right, but even if you can calculate such thinks it makes not much sense on x86, because we have only 3 (or 4 on x86-64) kinds of multiplication: 8,16 and 32bit. In generally I would say that the needed precision depends on your dividend, nothing more.
But manly raymond is the man that could give you a right and more detailed answer.

qWord

BTW: have you received my PM? I've upload an new version of MagicNumber64.exe - did the 'copy to clipboard'-button now works on your machine?
FPU in a trice: SmplMath
It's that simple!

dedndave

yes qWord - the clipboard works great   :U

as for the resolution, i am working with some larger values that require multiple precision multiplication
at the moment, i am using double-double precision multiplication (64-bit x 64-bit)
i would like to reduce it to single-double precision (32-bit x 64-bit) if i can
there are too many possible input values to do a 100% value test (it would take centuries)
so, i have to show it works in theory if i want to reduce it

i have written a program that runs with divisors of 10 to 1,000,000,000 and reduces the multiplier
as soon as it is finished running, i will post the results (text file)