magic number division for 64, 32 and 16 bit

qWord · September 18, 2008, 12:07:32 PM

Hi all,

In the last days I've written an tool for determining the "magic number" for unsigned and signed division of 64, 32 and 16 bit numbers.
The algorithm based on c-source that can be found in AMD's "Software Optimization Guide".

I've test the program randomly and it looks like (for me :bg) that it works propper - but there is no guarantee at the moment.
Hope people here can help me to improve the program - so, feel free and post helpful suggestions, criticism and of course failure reports.

In the attached file you found the executable and source.

regards, qWord

-----
28-09-2009: new version attached
02-10-2009: fixed some bugs - new version uploaded

[attachment deleted by admin]

drizz · September 18, 2008, 03:23:42 PM

This is great!
I also wanted to write that sort of program ( damn you :P), but been working on other things and just wrote this Mathematica function:
(based on Agner Fog code)

Code Select

<<Developer`
MagicDiv64[d_]:=
    If[FractionalPart[2^(64+Floor[Log[2,d]])/d]>=0.5,
      Print[BaseForm[Round[2^(64+Round[Log[2,d]])/d],16],"<-"],
      Print[BaseForm[Floor[2^(64+Floor[Log[2,d]])/d],16]," add 1"]];

And i have also compiled programs from AMD manual :)

About your program:
- i would like to see an editbox instead of a static for display
- make the startup position : centerscreen :)

Code Suggestion:
- Log₂(N) == bsr eax,N (count leading zeros)
- use also the mu64l,div64,add64,sub64 from AMD manual :)

Anyway it was time to ditch the old The_Svin "Magic divider" prog :)

qWord · September 18, 2008, 05:29:29 PM

hi drizz,

Quote from: drizz on September 18, 2008, 03:23:42 PM
I also wanted to write that sort of program ( damn you :P) ...

that's life :bdg

Quote
i would like to see an editbox instead of a static for display

I've replace the static with an RichEdit, so the user can select and copy code from it.

Quote
- make the startup position : centerscreen :)

-> done :U

Code Select

Log[sub]2[/sub](N) == bsr eax,N OMG , i should know this :bg

The next days i will rewrite some part of the mathematic routins.

An updated version is attached in my first post.

regards qWord

raymond · September 27, 2009, 03:43:29 AM

Good work qWord.

Recently, I had a more in-depth look at the MagicNumber64 source code. The original C code seemed to emanate from programmers such as those from MS using MMX registers to add 2+2. Maybe the language used doesn't lend itself to anything better. However, I was a wee bit disappointed to see it translated "verbatim" to assembly, specially the section for 32-bit numbers.

For example, from the C code (where t is the source dword divided by 2 until it becomes an odd number)

j = 0xFFFFFFFF % t

where j is declared as a U64; also declared as a qword in assembly. However, it is obvious that the modulo of a dword cannot be any larger than a dword.

That j is then used to compute k (also declared as a U64 in C and a qword in assembly)

k = (((U64) (1)) << (32 + l)) / ((U64) (0xffffffff - j))

From the above description of j, it cannot obviously ever be larger than 0xffffffff and the Cflag can never be set by the subtraction. But the assembly code is:

mov DWORD ptr tmp[0],-1
mov eax,DWORD ptr j
sub DWORD ptr tmp[0],eax
mov eax,DWORD ptr j+4
sbb DWORD ptr tmp[0+4],eax

With a bit of analysis, I'm sure the entire code for the 32-bit divisors (and possibly for 16-bit divisors) could be simplified significantly. I will eventually have a crack at the 32-bit one.

Also, your displayed text for one of the cases looks as follows, having replaced the magic number and the shifting value by variable names:

; valid for: 0 <= X < 0ffffffffh
mov eax,X
mov edx,magic_number
inc eax
mul edx
shr edx,shift
; edx = quotient

; REMARKS:
; if it's possible that X gets 0ffffffffh then
; use following code:
; mov edx,X
; mov eax,magic_number
; mul edx
; add eax,magic_number
; adc edx,0
; shr edx,shift

The valid range and REMARKS could be eliminated with the following:

mov eax,X
mov edx,magic_number
add eax,1
.if !CARRY?
mul edx
.endif
shr edx,shift
; edx = quotient

dedndave · September 27, 2009, 05:53:18 AM

yes - i have been using qWord's tool to generate numbers - although i am using some code i snagged off Drizz :bg
i find that some of the numbers may require incrementing, rather than incrementing the multiplicand as Ray suggested
something else that would be nice is a way to show the range of values, although, it is asking a bit much
what i mean is, for some applications, i am not using the full register width on the input
so, i can shr the magic number (sometimes incrementing it again) and reduce the number of shifts at the end to get the quotient
in cases where i have to use 32-bit multiple precision multiplication, this can save an ADC and sometimes the use of a register
in all cases, i am using Drizz's examples to further extend the result and extract the remainder, as well
Drizz has some neat tricks :U
i think he should go ahead and write one - lol

qWord · September 27, 2009, 02:56:28 PM

thanks for your contribution raymond,

The routines are 1:1 translate from c to asm - this is done so, because I was a bit scared to change the behave of the routine when changing datatypes or something else. I will clean up the code the next days and follow your suggestions. How ever, if you are working on a better routine ("cracked" one :bg) please inform me.

regards, qWord

raymond · September 27, 2009, 05:12:48 PM

Quoteplease inform me

I will certainly do. I should have a bit of time during the next few days to work on it. But, it will not be a "better" routine.

ecube · September 28, 2009, 06:26:48 AM

How does a magic number work exactly? i've seen lingo use them alot to speed things up, but i'm abit fuzzy as to what they are.

donkey · September 28, 2009, 07:25:51 AM

Quote from: E^cube on September 28, 2009, 06:26:48 AM
How does a magic number work exactly? i've seen lingo use them alot to speed things up, but i'm abit fuzzy as to what they are.

http://blogs.msdn.com/devdev/archive/2005/12/12/502980.aspx

ecube · September 28, 2009, 07:47:09 AM

awesome :U

dedndave · September 28, 2009, 08:03:13 AM

nothing really "magic" E^Cube - lol
MUL is much faster than DIV
so - multiply by the inverse of the number you want to divide by
then - take the high order register as a result
d/l qWord's tool in the first post of this thread and try it out :U

raymond · September 30, 2009, 12:31:53 AM

Attached is the source code of my simplified version for computing the magic number for unsigned 32-bit divisors. I've left in place the list of LOCALs used by qWord, commenting out those which had become unused. I've also left the C code translated by qWord and added my comments for the coding I used.

It returns the values required by qWord's program in a struct declared in his global variables. If you are not familiar with his source code and its goal, the output may be useless.

I've checked the output of this simplified version with several random numbers and all were identical to qWord's results.

I've also written procedures to convert dwords (signed and unsigned) and unsigned qwords (signed qwords still to come) using magic numbers (the qword one works all the way up to -1 with results identical to those obtained with Window's calculator). I'll post those within a few days after I add required comments. I may also add those procedures to the Fpulib when I get the time to update the help file so that their description will be included.

qWord · September 30, 2009, 07:28:13 PM

wow - you have reduce the problem noticeable :U
looking your code I've found that the loops could be eliminated using bsr/bsf instruction:

Code Select


; Reduce uDword until not divisible by 2

      xor   eax,eax           ; EAX = n for the 2^n factor of uDword
   @@:
      shr   ebx,1
      jc    @F                ; break when too far
      inc   eax
      dec   ecx
      jmp   @B

   @@:                        ;ECX = log2(t)
--------------------------------------------------
==>
      mov edx,ecx   ; save log2(uDword)
      xor ecx,ecx
      bsf ecx,ebx   ; ecx = n
      shr ebx,cl    ; reduce uDword ==> t
      mov eax,edx
      sub eax,ecx   ; log2(uDword)-n = log2(t)
      mov n,ecx     
      mov ecx,eax   ; ecx = log2(t)

the same for the last loop:

Code Select


   @@:
      shr   edx,1
      jc    @F
      dec   ecx
      jmp   @B
   @@:
      lea   edx,[edx*2+1]

      add   eax,n
      mov   d32.m,edx
      mov   d32.s,eax
      ret

-------------------------------------------------
==>
   mov eax,ecx
   xor ecx,ecx
   bsf ecx,edx
   shr edx,cl
   sub eax,ecx 

   add   eax,n
   mov   d32.m,edx
   mov   d32.s,eax
   ret

Code Select


      .if   eax != ebx
            mov d32.a,0
         @@:
            dec   L
            mov   ecx,eax     ; m_low
            mov   edx,ebx     ; m_high
            shr   eax,1
            shr   ebx,1
            cmp   eax,ebx
            jnz   @B
            ; EDX = magic number, L = shift
      .else

with this loop I'm not sure (manly caused by 'jnz'):
m_low and m_high must be at least bit-equal with their highest bit, other wise this loop will never exit?

Code Select

 
            mov d32.a,0
            
            dec L
            mov edx,ebx        ; save m_high to edx
            xor eax,ebx        ; make different bits "visible"
            xor ecx,ecx
            bsr ecx,eax        ; get the position of highest unequal bit ...
            shr edx,cl
            sub L,ecx

I've test it and it seems like it is correct - how ever, I'm not sure with this one.

regards, qWord

EDIT: just striked out some babel

raymond · September 30, 2009, 08:33:34 PM

Your additional code reductions are great. Using them, let's continue reducing.

Starting from the very begining, how about:

Code Select

      mov   ebx,uDword
      bsr   ecx,ebx           ; get log2(uDword) in ECX
      mov edx,ecx   ; save log2(uDword)
;your code with some rearranging
;      xor ecx,ecx    zeroing ecx is not necessary, same with all your other bsr/bsf
      bsf ecx,ebx   ; ecx = n
      mov eax,edx
      sub eax,ecx   ; log2(uDword)-n = log2(t)

      .if   ZERO?      ;would have been a power of 2
            mov   d32.m,0
            mov   d32.s,edx
            mov   d32.a,3
            ret
      .endif

      mov n,ecx     
      shr ebx,cl    ; reduce uDword ==> t
      mov ecx,eax   ; ecx = log2(t)

Your other loop eliminations also look good. However, as I mentioned in the attachement, the very last loop could even be eliminated entirely, not being necessary anymore.

dedndave · October 01, 2009, 02:31:18 AM

i am using xp pro (mce2005) sp2
the copy to clipboard does not work for me - am i doing something wrong ?

News:

magic number division for 64, 32 and 16 bit

ecube

ecube