The MASM Forum Archive 2004 to 2012

General Forums => The Workshop => Topic started by: qWord on September 18, 2008, 12:07:32 PM

Title: magic number division for 64, 32 and 16 bit
Post by: qWord on September 18, 2008, 12:07:32 PM
Hi all,

In the last days I've written an tool for determining the "magic number" for unsigned and signed division of 64, 32 and 16 bit numbers.
The algorithm based on c-source that can be found in AMD's  "Software Optimization Guide".

I've test the program randomly and it looks like (for me  :bg) that it works propper - but there is no guarantee at the moment.
Hope people here can help me to improve the program - so, feel free and post helpful suggestions, criticism and of course failure reports.

In the attached file you found the executable and source.

regards, qWord

-----
28-09-2009: new version attached
02-10-2009: fixed some bugs - new version uploaded

[attachment deleted by admin]
Title: Re: magic number division for 64, 32 and 16 bit
Post by: drizz on September 18, 2008, 03:23:42 PM
This is great!
I also wanted to write that sort of program  ( damn you :P), but been working on other things and just wrote this Mathematica function:
(based on Agner Fog code)
<<Developer`
MagicDiv64[d_]:=
    If[FractionalPart[2^(64+Floor[Log[2,d]])/d]>=0.5,
      Print[BaseForm[Round[2^(64+Round[Log[2,d]])/d],16],"<-"],
      Print[BaseForm[Floor[2^(64+Floor[Log[2,d]])/d],16]," add 1"]];

And i have also compiled programs from AMD manual :)

About your program:
- i would like to see an editbox instead of a static for display
- make the startup position : centerscreen :)

Code Suggestion:
- Log2(N) == bsr eax,N (count leading zeros)
- use also the mu64l,div64,add64,sub64 from AMD manual :)

Anyway it was time to ditch the old The_Svin "Magic divider" prog :)
Title: Re: magic number division for 64, 32 and 16 bit
Post by: qWord on September 18, 2008, 05:29:29 PM
hi drizz,

Quote from: drizz on September 18, 2008, 03:23:42 PM
I also wanted to write that sort of program  ( damn you :P) ...
that's life  :bdg

Quote
i would like to see an editbox instead of a static for display
I've replace the static with an RichEdit, so the user can select and copy code from it.

Quote
- make the startup position : centerscreen :)
-> done  :U

Log[sub]2[/sub](N) == bsr eax,N OMG , i should know this  :bg

The next days i will rewrite some part of the mathematic routins.

An updated version is attached in my first post.

regards qWord

Title: Re: magic number division for 64, 32 and 16 bit
Post by: raymond on September 27, 2009, 03:43:29 AM
Good work qWord.

Recently, I had a more in-depth look at the MagicNumber64 source code. The original C code seemed to emanate from programmers such as those from MS using MMX registers to add 2+2. Maybe the language used doesn't lend itself to anything better. However, I was a wee bit disappointed to see it translated "verbatim" to assembly, specially the section for 32-bit numbers.

For example, from the C code (where t is the source dword divided by 2 until it becomes an odd number)

j = 0xFFFFFFFF % t

where j is declared as a U64; also declared as a qword in assembly. However, it is obvious that the modulo of a dword cannot be any larger than a dword.

That j is then used to compute k (also declared as a U64 in C and a qword in assembly)

k = (((U64) (1)) << (32 + l)) / ((U64) (0xffffffff - j))

From the above description of j, it cannot obviously ever be larger than 0xffffffff and the Cflag can never be set by the subtraction. But the assembly code is:

  mov DWORD ptr tmp[0],-1
  mov eax,DWORD ptr j
  sub DWORD ptr tmp[0],eax
  mov eax,DWORD ptr j+4
  sbb DWORD ptr tmp[0+4],eax


With a bit of analysis, I'm sure the entire code for the 32-bit divisors (and possibly for 16-bit divisors) could be simplified significantly. I will eventually have a crack at the 32-bit one.

Also, your displayed text for one of the cases looks as follows, having replaced the magic number and the shifting value by variable names:

; valid for: 0 <= X < 0ffffffffh
mov eax,X
mov edx,magic_number
inc eax
mul edx
shr edx,shift
; edx = quotient

; REMARKS:
; if it's possible that X gets 0ffffffffh then
; use following code:
; mov edx,X
; mov eax,magic_number
; mul edx
; add eax,magic_number
; adc edx,0
; shr edx,shift

The valid range and REMARKS could be eliminated with the following:

mov eax,X
mov edx,magic_number
add eax,1
.if !CARRY?
   mul edx
.endif
shr edx,shift
; edx = quotient
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on September 27, 2009, 05:53:18 AM
yes - i have been using qWord's tool to generate numbers - although i am using some code i snagged off Drizz   :bg
i find that some of the numbers may require incrementing, rather than incrementing the multiplicand as Ray suggested
something else that would be nice is a way to show the range of values, although, it is asking a bit much
what i mean is, for some applications, i am not using the full register width on the input
so, i can shr the magic number (sometimes incrementing it again) and reduce the number of shifts at the end to get the quotient
in cases where i have to use 32-bit multiple precision multiplication, this can save an ADC and sometimes the use of a register
in all cases, i am using Drizz's examples to further extend the result and extract the remainder, as well
Drizz has some neat tricks   :U
i think he should go ahead and write one - lol
Title: Re: magic number division for 64, 32 and 16 bit
Post by: qWord on September 27, 2009, 02:56:28 PM
thanks for your contribution raymond,

The routines are 1:1 translate from c to asm - this is done so, because I was a bit scared to change the behave of the routine when changing datatypes or something else. I will clean up the code the next days and follow your suggestions. How ever, if you are working on a better routine ("cracked" one :bg) please inform me.

regards, qWord
Title: Re: magic number division for 64, 32 and 16 bit
Post by: raymond on September 27, 2009, 05:12:48 PM
Quoteplease inform me

I will certainly do. I should have a bit of time during the next few days to work on it. But, it will not be a "better" routine.
Title: Re: magic number division for 64, 32 and 16 bit
Post by: ecube on September 28, 2009, 06:26:48 AM
How does a magic number work exactly? i've seen lingo use them alot to speed things up, but i'm abit fuzzy as to what they are.
Title: Re: magic number division for 64, 32 and 16 bit
Post by: donkey on September 28, 2009, 07:25:51 AM
Quote from: E^cube on September 28, 2009, 06:26:48 AM
How does a magic number work exactly? i've seen lingo use them alot to speed things up, but i'm abit fuzzy as to what they are.

http://blogs.msdn.com/devdev/archive/2005/12/12/502980.aspx
Title: Re: magic number division for 64, 32 and 16 bit
Post by: ecube on September 28, 2009, 07:47:09 AM
awesome  :U
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on September 28, 2009, 08:03:13 AM
nothing really "magic" E^Cube - lol
MUL is much faster than DIV
so - multiply by the inverse of the number you want to divide by
then - take the high order register as a result
d/l qWord's tool in the first post of this thread and try it out   :U
Title: Re: magic number division for 64, 32 and 16 bit
Post by: raymond on September 30, 2009, 12:31:53 AM
Attached is the source code of my simplified version for computing the magic number for unsigned 32-bit divisors. I've left in place the list of LOCALs used by qWord, commenting out those which had become unused. I've also left the C code translated by qWord and added my comments for the coding I used.

It returns the values required by qWord's program in a struct declared in his global variables. If you are not familiar with his source code and its goal, the output may be useless.

I've checked the output of this simplified version with several random numbers and all were identical to qWord's results.

I've also written procedures to convert dwords (signed and unsigned) and unsigned qwords (signed qwords still to come) using magic numbers (the qword one works all the way up to -1 with results identical to those obtained with Window's calculator). I'll post those within a few days after I add required comments. I may also add those procedures to the Fpulib when I get the time to update the help file so that their description will be included.
Title: Re: magic number division for 64, 32 and 16 bit
Post by: qWord on September 30, 2009, 07:28:13 PM
wow - you have reduce the problem noticeable :U
looking your code I've found that the loops could be eliminated using bsr/bsf instruction:

; Reduce uDword until not divisible by 2

      xor   eax,eax           ; EAX = n for the 2^n factor of uDword
   @@:
      shr   ebx,1
      jc    @F                ; break when too far
      inc   eax
      dec   ecx
      jmp   @B

   @@:                        ;ECX = log2(t)
--------------------------------------------------
==>
      mov edx,ecx   ; save log2(uDword)
      xor ecx,ecx
      bsf ecx,ebx   ; ecx = n
      shr ebx,cl    ; reduce uDword ==> t
      mov eax,edx
      sub eax,ecx   ; log2(uDword)-n = log2(t)
      mov n,ecx     
      mov ecx,eax   ; ecx = log2(t)

the same for the last loop:

   @@:
      shr   edx,1
      jc    @F
      dec   ecx
      jmp   @B
   @@:
      lea   edx,[edx*2+1]

      add   eax,n
      mov   d32.m,edx
      mov   d32.s,eax
      ret

-------------------------------------------------
==>
   mov eax,ecx
   xor ecx,ecx
   bsf ecx,edx
   shr edx,cl
   sub eax,ecx

   add   eax,n
   mov   d32.m,edx
   mov   d32.s,eax
   ret




      .if   eax != ebx
            mov d32.a,0
         @@:
            dec   L
            mov   ecx,eax     ; m_low
            mov   edx,ebx     ; m_high
            shr   eax,1
            shr   ebx,1
            cmp   eax,ebx
            jnz   @B
            ; EDX = magic number, L = shift
      .else

with this loop I'm not sure (manly caused by 'jnz'):
m_low and m_high must be at least bit-equal with their highest bit, other wise this loop will never exit?

            mov d32.a,0
           
            dec L
            mov edx,ebx        ; save m_high to edx
            xor eax,ebx        ; make different bits "visible"
            xor ecx,ecx
            bsr ecx,eax        ; get the position of highest unequal bit ...
            shr edx,cl
            sub L,ecx

I've test it and it seems like it is correct - how ever, I'm not sure with this one.

regards, qWord

EDIT: just striked out some babel
Title: Re: magic number division for 64, 32 and 16 bit
Post by: raymond on September 30, 2009, 08:33:34 PM
Your additional code reductions are great. Using them, let's continue reducing.

Starting from the very begining, how about:

      mov   ebx,uDword
      bsr   ecx,ebx           ; get log2(uDword) in ECX
      mov edx,ecx   ; save log2(uDword)
;your code with some rearranging
;      xor ecx,ecx    zeroing ecx is not necessary, same with all your other bsr/bsf
      bsf ecx,ebx   ; ecx = n
      mov eax,edx
      sub eax,ecx   ; log2(uDword)-n = log2(t)

      .if   ZERO?      ;would have been a power of 2
            mov   d32.m,0
            mov   d32.s,edx
            mov   d32.a,3
            ret
      .endif

      mov n,ecx     
      shr ebx,cl    ; reduce uDword ==> t
      mov ecx,eax   ; ecx = log2(t)


Your other loop eliminations also look good. However, as I mentioned in the attachement, the very last loop could even be eliminated entirely, not being necessary anymore.
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on October 01, 2009, 02:31:18 AM
i am using xp pro (mce2005) sp2
the copy to clipboard does not work for me - am i doing something wrong ?
Title: Re: magic number division for 64, 32 and 16 bit
Post by: qWord on October 01, 2009, 09:08:52 PM
dedndave,
On WinVista the button "copy to clipboard" works (the same with Copy and Past: Ctrl +c/v). Maybe the problem is the rich edit control - I'm using WM_GETTEXT to receive text.  I'm also noticed that there are problems with the edit-control (were you type in the divisor): copying a number to it using CTRL+v/c fails, but through the context-menu it works. I fix this issues in the next version.
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on October 01, 2009, 10:55:36 PM
cool - many thanks qWord
Title: Re: magic number division for 64, 32 and 16 bit
Post by: raymond on October 06, 2009, 02:59:42 AM
As promised, the attached zip file contains my four procedures to convert signed/unsigned dword/qword to ascii. They are in modular format ready for inclusion into a library. Or just copy/paste the procedure code.

You will notice that they all require the address of a buffer of adequate size to return the ascii string. Each procedure returns the address of the first byte of the string in the EAX register. Since conversions generate the digits in reverse order, this approach avoided code to place the string starting always at the first byte of the supplied buffer.

Dword to ascii algos have been beaten almost to death as to how fast they compare. I thus doubt that the attached ones would fare any better (nor much worse) than the best.

Qword to ascii algos have not been as studied. The attached ones may have an edge if anyone is interested in making comparisons. (Timings will obviously go down with 64-bit computers and related code.)
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on October 06, 2009, 11:52:08 AM
very nice Ray
qwords have been worked over pretty well, also - lol
Lingo, Drizz, and Jochen all have pretty fast algo's
i do like yours, though - they are nice and small and should time favorably   :U

this is a link to Drizz's u64toa
http://www.masm32.com/board/index.php?topic=11741.msg88897#msg88897

as you can see, it is quite a bit longer than yours
Lingo modified a P. Dixon routine that has a look-up table
i think that is the fastest one around, but it is fairly large

as you know, i am working on a bignum routine that combines ling long kai fang (horner's rule) and multiply-to-divide
i may take a shot at a 64-bit version of that once i get finished - lol
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on October 06, 2009, 02:15:31 PM
here are some times...

CPU 0: Intel(R) Pentium(R) 4 CPU 3.00GHz MMX SSE3 Cores: 2

2147483647
-2147483648
4294967295
9223372036854775807
-9223372032559808512
18446744073709551615

smdtoa (2147483647)            266      267     265     clock cycles
smdtoa (-2147483648)           265      266     275     clock cycles
umdtoa (4294967295)            227      231     233     clock cycles
smqtoa (9223372036854775807)   1192     1181    1183    clock cycles
smqtoa (-9223372036854775808)  1173     1173    1169    clock cycles
umqtoa (18446744073709551615)  1084     1056    1054    clock cycles

it looks like smqtoa has a problem displaying the correct digits for a negative value
(i know that feeling - lol - my llkf bignum had a similar problem)
(program and source attached)
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on October 06, 2009, 03:10:34 PM
this might be the problem...

      test  esi,esi
      pushfd
      .if   SIGN?
            not   esi
            neg   ebx
      .endif

here is one possible fix...

        test    esi,esi
        pushfd
        mov     eax,esi
        cdq
        xor     ebx,edx
        xor     esi,edx
        sub     ebx,edx
        sbb     esi,edx

the sign bit is extended into edx
then, if negative, invert both dwords and add 1
i applied that fix and smqtoa is now faster than umqtoa
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on October 06, 2009, 03:39:37 PM
Ray,
if you want to speed up both signed routines.....
something i learned while writing llkf bignum is that pushfd and popfd are slow
rather than pushing and poping the flags, extend the sign bit into edx with cdq
then, push the edx value
at the end, pop it and....

        pop     edx         ;recall extended sign
        mov     al,2Dh      ;minus sign "-"
        and     eax,edx
        jz      @F

        dec     edi
        mov     [edi],al

@@:

here are some times for comparison...

CPU 0: Intel(R) Pentium(R) 4 CPU 3.00GHz MMX SSE3 Cores: 2

2147483647
-2147483648
4294967295
9223372036854775807
-9223372036854775808
18446744073709551615

smdtoa (2147483647)            197      197     198     clock cycles
smdtoa (-2147483648)           202      201     201     clock cycles
umdtoa (4294967295)            229      229     228     clock cycles
smqtoa (9223372036854775807)   1034     1032    1030    clock cycles
smqtoa (-9223372036854775808)  1018     1018    1049    clock cycles
umqtoa (18446744073709551615)  1057     1056    1055    clock cycles

(program and source attached)
Title: Re: magic number division for 64, 32 and 16 bit
Post by: raymond on October 06, 2009, 05:01:16 PM
Thanks Dave. Here's my correction for the stupid error I made negating the number. That's the standard way it should have been done in the first place.

      .if   SIGN?
            not   esi
            not   ebx
            add   ebx,1
            adc   esi,0
      .endif


As for the pushfd, I'll take your word for it. However, I simply push the high dword and pop it at the end and retest it. Using cdq for the qword routine would have involved a few more instructions.

I've modified my atttachement in the previous post.
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on October 06, 2009, 09:03:53 PM
very good, Ray
did you time the new fixes ?
you should be able to use that program as a skeleton
Title: Re: magic number division for 64, 32 and 16 bit
Post by: Jimg on October 06, 2009, 09:09:57 PM
I'm probably missing something here, but what do the last nine posts have to do with the original topic?  Perhaps they should be moved to their own thread?
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on October 06, 2009, 09:12:02 PM
i think Ray wanted to demonstrate the advantages of using multiply-to-divide
but, you may be right - lol - oh well - i am sure that qWord won't mind
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on October 08, 2009, 03:15:23 PM
i have a good question for you guys...

is there some equation or algorithm that will calculate the minimum number of multiplier bits
required for some number of dividend bits and some number of divisor bits ?

it is not just to reduce the number of shifts,
but to determine how much precision is required for the multiplication for a given set of input parameters

i am going to try to write a simple program to test it for divisors that are multiples of 10
and see if i can "gleen" an equation - lol - we all know how well my equations work   :lol
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on October 09, 2009, 02:39:31 AM
qWord ?
Ray ?
anybody ?
is this a puzzle for project euler ???
don't make me be the one to solve it   :P
Title: Re: magic number division for 64, 32 and 16 bit
Post by: qWord on October 09, 2009, 01:51:17 PM
dedndave ,

Not sure if I have understand you right, but even if you can calculate such thinks it makes not much sense on x86, because we have only 3 (or 4 on x86-64) kinds of multiplication: 8,16 and 32bit. In generally I would say that the needed precision depends on your dividend, nothing more.
But manly raymond is the man that could give you a right and more detailed answer.

qWord

BTW: have you received my PM? I've upload an new version of MagicNumber64.exe - did the 'copy to clipboard'-button now works on your machine?
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on October 09, 2009, 02:33:06 PM
yes qWord - the clipboard works great   :U

as for the resolution, i am working with some larger values that require multiple precision multiplication
at the moment, i am using double-double precision multiplication (64-bit x 64-bit)
i would like to reduce it to single-double precision (32-bit x 64-bit) if i can
there are too many possible input values to do a 100% value test (it would take centuries)
so, i have to show it works in theory if i want to reduce it

i have written a program that runs with divisors of 10 to 1,000,000,000 and reduces the multiplier
as soon as it is finished running, i will post the results (text file)
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on October 09, 2009, 03:19:34 PM
here is a simple program i wrote to test multipliers, with the resultant text output
for the purpose of trying to find an equation, the values that work all the way to FFFFFFFFh aren't helping - lol
i may make them multiple precision to see where they die
Title: Re: magic number division for 64, 32 and 16 bit
Post by: qWord on October 09, 2009, 03:22:46 PM
to proof your theory you can reduce it to 8bit x 16 bit. If this works, it will also work with 32bit x 64bit.
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on October 09, 2009, 03:59:47 PM
i don't think that is really true, qWord
i would like to use simple algebraic statements to determine an equation

the problem, as i see it, is this:
in BASIC, we had a function named INT()
it truncated the fractional part of real number into an integer
this is a natural function in computing, as registers, values, etc have a defined number of bits
when we determine a multiplier, we use this function, as most multipliers have bits that are truncated
also, when we multiply to divide, we use the function again because the multiplication step limits the low-order bits
in algebra, however, there are no simple "rules" for working with this function
if the INT() function appears in an algebraic equation, it is hard to manipulate/evaluate/reduce
perhaps we can somehow define it in a form like this:
R = W + F, where R = a real value, W = the whole part, F = the fractional part
if we can seperate them algebraically, we can manipulate them in an equation
the problem here is how to define algebraically where the whole part ends and the fractional part starts - lol
let me toss it around in my brain for a while
Title: Re: magic number division for 64, 32 and 16 bit
Post by: PBrennick on October 09, 2009, 06:43:27 PM
dedndave,
What qword said has to be true because the difference between the two is a constant factor of 2x2. In other words
(8*4)*(16*4)=32*64

Paul
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on October 09, 2009, 08:12:08 PM
well - it may be that i can derive an equation using smaller registers - that is true

Quoteto proof your theory you can reduce it to 8bit x 16 bit. If this works, it will also work with 32bit x 64bit.

the problem with qWords statement is....
i don't have a theory, yet   :lol
Title: Re: magic number division for 64, 32 and 16 bit
Post by: raymond on October 09, 2009, 08:13:06 PM
If you are talking about multiplying an integer by a reciprocal instead of dividing by an integer, consider this:

- The size of the integer divisor is totally immaterial. Its reciprocal can be computed with an infinite number of decimal places with full precision. That reciprocal is a fraction, whether it be in decimal or binary.
- The number of significant digits (precision) in the product of an integer by a fraction will be dependant on the number of significant digits retained in the fraction.

In other words, if you have a 10 digit integer and you multiply it by a fraction having only 5 significant digits, the result will only be precise to 5 significant digits. The number of significant digits in the multiplying fraction must thus have as many significant digits as the integer being multiplied to result in full precision. (In binary, digit = bit)

Similarly, the precision of the product of an integer by a fraction will not be any more precise by using more digits than necessary in the fraction. As an example, if you need the circumference of a circle whose diameter is given with 3 significant digits, using a value of pi with 4 significant digits would be sufficient, The result would not be any more precise by using a value of pi with 30 significant digits.
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on October 10, 2009, 01:53:33 PM
that makes sense, Raymond
more or less - lol
so - if i have a 57 bit maximum dividend, i need a 57 bit multiplier ?
that makes it easy

what about this....
i have a 62 bit maximum dividend
after the multiplication, i shift right 29 bits (essentially, "unneccessary resolution")
i have a 33 bit maximum quotient
does this mean i can get away with a 33 bit multiplier ? (62-29=33)

if i understand what you are saying, it is the size of the quotient that determines the required multiplier resolution
(which would also mean - the divisor does make a difference - larger divisor -> smaller quotient)

this seems to make sense
i can speed up my routine by not using a full 64x64 multiplication
i will see what i can come up with and test it
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on October 10, 2009, 03:04:32 PM
ok - we have a theory, now - lol
not only should the multiplier have as many bits as the max quotient, but....

Dave's Theory #1 (assuming it will change - lol)
The minimum multiplier should be equal to or greater than the maximum expected quotient.
Title: Re: magic number division for 64, 32 and 16 bit
Post by: PBrennick on October 10, 2009, 05:14:23 PM
Hey Dave,

Good luck with your project. I am extremely interested.

btw - theories often give birth to theorems...

Paul
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on October 10, 2009, 05:39:44 PM
thanks Paul - well - "Dave's Multiplier Theory #1" looks great on paper
it may wind up on a roll in the bathroom - lol
Title: Re: magic number division for 64, 32 and 16 bit
Post by: PBrennick on October 10, 2009, 05:52:34 PM
S-o-o-o,

Either way it will prove useful; sounds like a win-win situation. ::)

Paul
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on October 10, 2009, 07:00:45 PM
ok - scratch Dave's Multiplier Theory #1 - lol
i have a good handle on it, now
i just need to state it mathematically
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on October 12, 2009, 11:20:30 PM
here is a case that never makes it to FFFF_FFFF_FFFF_FFFF
i am dividing by 1,000,000,000
i am using a multiplier of 8970_5F41_36B4_A598 and a shift of 29 (which is really /2^93)

                             HiA LoA
                          X  HiB LoB
                     ------------------

    2^96        2^64        2^32        2^0
                           LoALoB1     LoALoB0
               HiALoB1     HiALoB0
               LoAHiB1     LoAHiB0
   HiAHiB1     HiAHiB0

Multiplier: 89705F41_36B4A598  Shift: 29  Fail: AA4EA381_AE3507FF

A: 89705F41_36B4A598
B: AA4EA381_AE3507FF

carry:    1           2
                           253A1DA9    06F01A68
               5D86D94C    DFFEA8BF
               2464C3F1    FAC73998
   5B6ED745    FE1462C1
-------------------------------------------------
   5B6ED746    80000000    00000000    06F01A68

  5B6ED746_80000000 / 2^29 = 2_DB76_BA34

  result: 2_DB76_BA34   (5B6ED746_80000000 / 2^29)
expected: 2_DB76_BA33   (AA4EA381_AE3507FF / 1,000,000,000)

in this specific application, my maximum dividend is 3B9A_CA03_4B82_FA17, so i am ok
but, the multiply-to-divide should work all the way to at least FFFF_FFFF_FFFF_FFFE
i can't make any of our statements about multiplier resolution work for me, here
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on October 12, 2009, 11:54:05 PM
here is the test program and data (it takes a while to run - lol)
notice that there is little or no logic in the failure pattern as i shift the multiplier right

Multiplier: 89705F41_36B4A598  Shift: 29  Fail: AA4EA381_AE3507FF
Multiplier: 44B82FA0_9B5A52CC  Shift: 28  Fail: AA4EA381_AE3507FF
Multiplier: 225C17D0_4DAD2966  Shift: 27  Fail: AA4EA381_AE3507FF
Multiplier: 112E0BE8_26D694B3  Shift: 26  Fail: AA4EA381_AE3507FF
Multiplier: 089705F4_136B4A5A  Shift: 25  Fail: 0F9B0B1C_F79815FF
Multiplier: 044B82FA_09B5A52D  Shift: 24  Fail: 0F9B0B1C_F79815FF
Multiplier: 0225C17D_04DAD297  Shift: 23  Fail: 035E36C8_CDE0B3FF
Multiplier: 0112E0BE_826D694C  Shift: 22  Fail: 014FB4D0_15340DFF
Multiplier: 0089705F_4136B4A6  Shift: 21  Fail: 014FB4D0_15340DFF
Multiplier: 0044B82F_A09B5A53  Shift: 20  Fail: 014FB4D0_15340DFF
Multiplier: 00225C17_D04DAD2A  Shift: 19  Fail: 00390AF2_316B81FF
Multiplier: 00112E0B_E826D695  Shift: 18  Fail: 00390AF2_316B81FF
Multiplier: 00089705_F4136B4B  Shift: 17  Fail: 000D340E_991957FF
Multiplier: 00044B82_FA09B5A6  Shift: 16  Fail: 00053444_56AAADFF
Multiplier: 000225C1_7D04DAD3  Shift: 15  Fail: 00053444_56AAADFF
Multiplier: 000112E0_BE826D6A  Shift: 14  Fail: 0001852A_D5DF1BFF
Multiplier: 00008970_5F4136B5  Shift: 13  Fail: 0001852A_D5DF1BFF
Multiplier: 000044B8_2FA09B5B  Shift: 12  Fail: 00006591_D3EA21FF
Multiplier: 0000225C_17D04DAE  Shift: 11  Fail: 000028FC_FA86ADFF
Multiplier: 0000112E_0BE826D7  Shift: 10  Fail: 000028FC_FA86ADFF
Multiplier: 00000897_05F4136C  Shift:  9  Fail: 00000C1B_356D35FF
Multiplier: 0000044B_82FA09B6  Shift:  8  Fail: 00000C1B_356D35FF
Multiplier: 00000225_C17D04DB  Shift:  7  Fail: 00000C1B_356D35FF
Multiplier: 00000112_E0BE826E  Shift:  6  Fail: 000001D3_0EEADBFF
Multiplier: 00000089_705F4137  Shift:  5  Fail: 000001D3_0EEADBFF
Multiplier: 00000044_B82FA09C  Shift:  4  Fail: 0000006A_675299FF
Multiplier: 00000022_5C17D04E  Shift:  3  Fail: 0000006A_675299FF
Multiplier: 00000011_2E0BE827  Shift:  2  Fail: 0000006A_675299FF
Multiplier: 00000008_9705F414  Shift:  1  Fail: 0000000E_E6B27FFF
Multiplier: 00000004_4B82FA0A  Shift:  0  Fail: 0000000E_E6B27FFF
Title: Re: magic number division for 64, 32 and 16 bit
Post by: raymond on October 13, 2009, 12:05:48 AM
According to qWord's dialog box, the multiplier should be 89705F4136B4A597h (instead of 89705F4136B4A598h) and you then need to increment the dividend before the multiplication.

You then get the expected result.
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on October 13, 2009, 12:20:43 AM
thanks Ray
oh - i thought we were incrementing the multipliers - lol
let me modify the test program and run it again
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on October 13, 2009, 12:28:24 AM
running it now, Ray
thanks again
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on October 13, 2009, 01:43:12 AM
that's a little better - lol
of course, i can make the first one pass easy enough
this test checks each point just before and just after the quotient rollover
i want to add all the points where the low dword is 00000000 and FFFFFFFF to the sequence, as well
that will make for a fairly comprehensive 64-bit test

Multiplier: 89705F41_36B4A597  Shift: 29  Fail: FFFFFFFF_FFFFFFFF
Multiplier: 44B82FA0_9B5A52CB  Shift: 28  Fail: 73348093_3BA6C600
Multiplier: 225C17D0_4DAD2965  Shift: 27  Fail: 2B0B3E03_EF940E00
Multiplier: 112E0BE8_26D694B2  Shift: 26  Fail: 131B7A69_5A7B4000
Multiplier: 089705F4_136B4A59  Shift: 25  Fail: 131B7A69_5A7B4000
Multiplier: 044B82FA_09B5A52C  Shift: 24  Fail: 05ED06A0_F2511800
Multiplier: 0225C17D_04DAD296  Shift: 23  Fail: 05ED06A0_F2511800
Multiplier: 0112E0BE_826D694B  Shift: 22  Fail: 05ED06A0_F2511800
Multiplier: 0089705F_4136B4A5  Shift: 21  Fail: 00E8B62F_54D78200
Multiplier: 0044B82F_A09B5A52  Shift: 20  Fail: 00566838_C33B0A00
Multiplier: 00225C17_D04DAD29  Shift: 19  Fail: 00566838_C33B0A00
Multiplier: 00112E0B_E826D694  Shift: 18  Fail: 0018957D_925BDC00
Multiplier: 00089705_F4136B4A  Shift: 17  Fail: 0018957D_925BDC00
Multiplier: 00044B82_FA09B5A5  Shift: 16  Fail: 0018957D_925BDC00
Multiplier: 000225C1_7D04DAD2  Shift: 15  Fail: 0003A7FD_3BAEAE00
Multiplier: 000112E0_BE826D69  Shift: 14  Fail: 0003A7FD_3BAEAE00
Multiplier: 00008970_5F4136B4  Shift: 13  Fail: 0000D47A_97ED2600
Multiplier: 000044B8_2FA09B5A  Shift: 12  Fail: 0000D47A_97ED2600
Multiplier: 0000225C_17D04DAD  Shift: 11  Fail: 0000D47A_97ED2600
Multiplier: 0000112E_0BE826D6  Shift: 10  Fail: 00001D93_E2A71A00
Multiplier: 00000897_05F4136B  Shift:  9  Fail: 00001D93_E2A71A00
Multiplier: 0000044B_82FA09B5  Shift:  8  Fail: 000006A8_51FFF000
Multiplier: 00000225_C17D04DA  Shift:  7  Fail: 0000029C_74EF6600
Multiplier: 00000112_E0BE826D  Shift:  6  Fail: 0000029C_74EF6600
Multiplier: 00000089_705F4136  Shift:  5  Fail: 000000C2_E1167200
Multiplier: 00000044_B82FA09B  Shift:  4  Fail: 000000C2_E1167200
Multiplier: 00000022_5C17D04D  Shift:  3  Fail: 00000032_FD6ACE00
Multiplier: 00000011_2E0BE826  Shift:  2  Fail: 00000014_B8D03A00
Multiplier: 00000008_9705F413  Shift:  1  Fail: 00000014_B8D03A00
Multiplier: 00000004_4B82FA09  Shift:  0  Fail: 00000006_0DB88400
Title: Re: magic number division for 64, 32 and 16 bit
Post by: raymond on October 13, 2009, 03:55:22 AM
QuoteMultiplier: 89705F41_36B4A597  Shift: 29  Fail: FFFFFFFF_FFFFFFFF

This only fails if you blindly add 1 resulting in 0, without considering the carry, and then multiplying. If you shift your multiplier by 29, you get the correct result (Look at my very first post in this thread for such an occurence.)
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on October 13, 2009, 07:51:31 AM
i was more interested in all the other lines, Ray - lol
i was hoping to make some kind of correlation between the result and multiplier resolution
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on October 15, 2009, 12:39:47 AM
i have totally re-written this program
it checks all values where the low 32-bits are zero or FFFFFFFF
it checks all values where the high 32-bits are zero or FFFFFFFF
it checks all values just before/after quotient rollover (this is where most errors occur first)
it checks them in sequence
if an error is found, it backs up to the last "pass" dividend and checks one by one until the earliest fail is found
it also runs 64 shift values - some have been removed in the following text (they all fail like the last one after a point)
it takes a while to run - i have throttled it back to only use 25% CPU time
for a divisor of 1,000,000,000, it took about 4 1/2 hrs on my machine
for smaller divisors, it will take even longer
anyways, i thought you guys might like to have a 64 x 64 32-bit mul2div test program
source is attached...

Multiplier: 89705F41_36B4A597  Mult Divisor: 2^93  Pass: FFFFFFFF_FFFFFFFF
Multiplier: 44B82FA0_9B5A52CB  Mult Divisor: 2^92  Fail: 73348093_3BA6C600
Multiplier: 225C17D0_4DAD2965  Mult Divisor: 2^91  Fail: 2B0B3E03_EF940E00
Multiplier: 112E0BE8_26D694B2  Mult Divisor: 2^90  Fail: 131B7A69_5A7B4000
Multiplier: 089705F4_136B4A59  Mult Divisor: 2^89  Fail: 131B7A69_5A7B4000
Multiplier: 044B82FA_09B5A52C  Mult Divisor: 2^88  Fail: 05ED06A0_F2511800
Multiplier: 0225C17D_04DAD296  Mult Divisor: 2^87  Fail: 05ED06A0_F2511800
Multiplier: 0112E0BE_826D694B  Mult Divisor: 2^86  Fail: 05ED06A0_F2511800
Multiplier: 0089705F_4136B4A5  Mult Divisor: 2^85  Fail: 00E8B62F_54D78200
Multiplier: 0044B82F_A09B5A52  Mult Divisor: 2^84  Fail: 00566838_C33B0A00
Multiplier: 00225C17_D04DAD29  Mult Divisor: 2^83  Fail: 00566838_C33B0A00
Multiplier: 00112E0B_E826D694  Mult Divisor: 2^82  Fail: 0018957D_925BDC00
Multiplier: 00089705_F4136B4A  Mult Divisor: 2^81  Fail: 0018957D_925BDC00
Multiplier: 00044B82_FA09B5A5  Mult Divisor: 2^80  Fail: 0018957D_925BDC00
Multiplier: 000225C1_7D04DAD2  Mult Divisor: 2^79  Fail: 0003A7FD_3BAEAE00
Multiplier: 000112E0_BE826D69  Mult Divisor: 2^78  Fail: 0003A7FD_3BAEAE00
Multiplier: 00008970_5F4136B4  Mult Divisor: 2^77  Fail: 0000D47A_97ED2600
Multiplier: 000044B8_2FA09B5A  Mult Divisor: 2^76  Fail: 0000D47A_97ED2600
Multiplier: 0000225C_17D04DAD  Mult Divisor: 2^75  Fail: 0000D47A_97ED2600
Multiplier: 0000112E_0BE826D6  Mult Divisor: 2^74  Fail: 00001D93_E2A71A00
Multiplier: 00000897_05F4136B  Mult Divisor: 2^73  Fail: 00001D93_E2A71A00
Multiplier: 0000044B_82FA09B5  Mult Divisor: 2^72  Fail: 000006A8_51FFF000
Multiplier: 00000225_C17D04DA  Mult Divisor: 2^71  Fail: 0000029C_74EF6600
Multiplier: 00000112_E0BE826D  Mult Divisor: 2^70  Fail: 0000029C_74EF6600
Multiplier: 00000089_705F4136  Mult Divisor: 2^69  Fail: 000000C2_E1167200
Multiplier: 00000044_B82FA09B  Mult Divisor: 2^68  Fail: 000000C2_E1167200
Multiplier: 00000022_5C17D04D  Mult Divisor: 2^67  Fail: 00000032_FD6ACE00
Multiplier: 00000011_2E0BE826  Mult Divisor: 2^66  Fail: 00000014_B8D03A00
Multiplier: 00000008_9705F413  Mult Divisor: 2^65  Fail: 00000014_B8D03A00
Multiplier: 00000004_4B82FA09  Mult Divisor: 2^64  Fail: 00000006_0DB88400
Multiplier: 00000002_25C17D04  Mult Divisor: 2^63  Fail: 00000000_3B9ACA00


now, to put the finishing touches on my base 1,000,000,000 ling long kai fang bignum routine   :bg
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on November 19, 2009, 07:29:34 PM
i really like using qWords program
but, it occured to me that what it needs, now, is a nice icon - lol
the attached file has an ico with several sizes in it

(http://img98.imageshack.us/img98/3966/numbers128x128x32.png)
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on June 26, 2011, 05:05:28 PM
i was playing with this..... again   :P

i want to write some code that uses multiply-to-divide, but with variable multiplier and shift values
when the divisor is changed, i calculate new mult and shift values and store them
then, the multiply-to-divide code uses those values until the divisor changes again
the problem here is, the code in the magic number generator program may be different from one divisor to another
i can probably overcome this by carefully choosing the multiplier

at any rate, i was looking at the code that is generated for an example divisor of 7...
mov eax,X
mov edx,092492492h
add eax,1
.if !CARRY?
     mul edx
.endif
shr edx,02h
; edx = quotient


when disassembled, it looks something like this...
        mov     eax,X
        mov     edx,92492492h
        add     eax,1
        jc      @F

        mul     edx

@@:     shr     edx,2


i was wondering why ADD EAX,1 and JC are used instead of INC EAX and JZ...
        mov     eax,X
        mov     edx,92492492h
        inc     eax
        jz      @F

        mul     edx

@@:     shr     edx,2


or, in code that everyone else but me seems to like   :P
mov eax,X
mov edx,092492492h
inc eax
.if !ZERO?
     mul edx
.endif
shr edx,02h
; edx = quotient
Title: Re: magic number division for 64, 32 and 16 bit
Post by: qWord on June 26, 2011, 07:54:17 PM
Quote from: dedndave on June 26, 2011, 05:05:28 PM
i was wondering why ADD EAX,1 and JC are used instead of INC EAX and JZ...
why? - code size is irrelevant in this case.
Title: Re: magic number division for 64, 32 and 16 bit
Post by: Tedd on July 01, 2011, 01:08:03 PM
Also, INC and DEC will usually cause a stall if the next instruction tests the flags (since they avoid modifying carry.)
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on July 01, 2011, 02:12:03 PM
is that so for newer cores ?
i would think that instruction pair would be optimized to some degree   :P
Title: Re: magic number division for 64, 32 and 16 bit
Post by: jj2007 on July 01, 2011, 02:36:35 PM
Quote from: Tedd on July 01, 2011, 01:08:03 PM
Also, INC and DEC will usually cause a stall if the next instruction tests the flags (since they avoid modifying carry.)

That is a 0.2 cycles stall on my P4 - but why do we need an extra test anyway?

Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
1648    cycles for dec
1665    cycles for sub

1898    cycles for dec + test
1698    cycles for sub + test


Title: Re: magic number division for 64, 32 and 16 bit
Post by: qWord on July 01, 2011, 02:46:38 PM
Quote from: jj2007 on July 01, 2011, 02:36:35 PMbut why do we need an extra test anyway?
which test ?
Title: Re: magic number division for 64, 32 and 16 bit
Post by: jj2007 on July 01, 2011, 02:55:44 PM
Quote from: Tedd on July 01, 2011, 01:08:03 PM... stall if the next instruction tests the flags

Maybe I misunderstood something ::)
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on July 01, 2011, 02:58:57 PM
that is a p4 - newer cores should be a little better   :P

the test is necessary if you want a multiply-to-divide routine to cover the full range
in my case, i am dividing small numbers and managed to select multipliers that do not require the INC/JZ
but some of these multipliers may fail if the dividend is large
Title: Re: magic number division for 64, 32 and 16 bit
Post by: Tedd on July 01, 2011, 04:40:17 PM
Sorry, bad wording on my part.

"INC and DEC will usually cause a stall if the next instruction tests depends on the flags" -- e.g. a conditional jump.

It's an out-of-order execution effect. The conditional instruction can't complete until the state of the flags is known, and partially modifying the flags causes an uncertainty (the full state also depends on the outcome of previous instructions.) So it's not something that can be worked around easily.
ADD and SUB fully modify the flags, so their outcome can be predicted, allowing the jump to be taken before the previous instructions have necessarily completed. So prefer ADD or SUB in place of INC or DEC when the next instruction is conditional.
Title: Re: magic number division for 64, 32 and 16 bit
Post by: jj2007 on July 01, 2011, 04:56:40 PM
Tedd,

The theory sounds plausible, but the effect is 0.2 cycles on my P4 and 0.0 on my Celeron...
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on July 01, 2011, 05:06:03 PM
i have seen cases where it is advantageous to put an unrelated instruction between the 2
        cmp     eax,5
        mov     SomePlace,SomeThing
        ja      SomeLabel

but, i find this applies to ADD, SUB, CMP, INC, TEST - any instruction where the flags are tested, really
Title: Re: magic number division for 64, 32 and 16 bit
Post by: hutch-- on July 12, 2011, 07:55:29 AM
I have just finished reading this thread as I was after qWords technique but in relation to the last comment of Ted, Intel have been recommending ADD and SUB over INC and DEC for a long time in optimisation manuals. Often it does not matter but in enough places I have found it faster using their recommendation that INC DEC so I rarely ever use INC DEC any longer.
Title: Re: magic number division for 64, 32 and 16 bit
Post by: dedndave on July 12, 2011, 09:21:57 AM
don't toss them out altogether
they are great if the next instruction does not examine the flags, such as conditional branches
inc and dec do not alter the carry flag
that is a sometimes a desirable feature
also - inc/dec on a dword register is a single byte   :P
Title: Re: magic number division for 64, 32 and 16 bit
Post by: KeepingRealBusy on July 27, 2011, 12:49:43 AM
Quote from: qWord on September 18, 2008, 12:07:32 PM
Hi all,

In the last days I've written an tool for determining the "magic number" for unsigned and signed division of 64, 32 and 16 bit numbers.
The algorithm based on c-source that can be found in AMD's  "Software Optimization Guide".

I've test the program randomly and it looks like (for me  :bg) that it works propper - but there is no guarantee at the moment.
Hope people here can help me to improve the program - so, feel free and post helpful suggestions, criticism and of course failure reports.

In the attached file you found the executable and source.

regards, qWord

-----
28-09-2009: new version attached
02-10-2009: fixed some bugs - new version uploaded

qWord,

Thank you! Thank You! Thank You!

I had been using the old magic number routine and suddenly found that it failed for a value of 100 and so I was faced with resolving what was wrong with the old one, or getting your new one. Your new one works!

Dave.