News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

magic number division for 64, 32 and 16 bit

Started by qWord, September 18, 2008, 12:07:32 PM

Previous topic - Next topic

dedndave

thanks Ray
oh - i thought we were incrementing the multipliers - lol
let me modify the test program and run it again

dedndave


dedndave

that's a little better - lol
of course, i can make the first one pass easy enough
this test checks each point just before and just after the quotient rollover
i want to add all the points where the low dword is 00000000 and FFFFFFFF to the sequence, as well
that will make for a fairly comprehensive 64-bit test

Multiplier: 89705F41_36B4A597  Shift: 29  Fail: FFFFFFFF_FFFFFFFF
Multiplier: 44B82FA0_9B5A52CB  Shift: 28  Fail: 73348093_3BA6C600
Multiplier: 225C17D0_4DAD2965  Shift: 27  Fail: 2B0B3E03_EF940E00
Multiplier: 112E0BE8_26D694B2  Shift: 26  Fail: 131B7A69_5A7B4000
Multiplier: 089705F4_136B4A59  Shift: 25  Fail: 131B7A69_5A7B4000
Multiplier: 044B82FA_09B5A52C  Shift: 24  Fail: 05ED06A0_F2511800
Multiplier: 0225C17D_04DAD296  Shift: 23  Fail: 05ED06A0_F2511800
Multiplier: 0112E0BE_826D694B  Shift: 22  Fail: 05ED06A0_F2511800
Multiplier: 0089705F_4136B4A5  Shift: 21  Fail: 00E8B62F_54D78200
Multiplier: 0044B82F_A09B5A52  Shift: 20  Fail: 00566838_C33B0A00
Multiplier: 00225C17_D04DAD29  Shift: 19  Fail: 00566838_C33B0A00
Multiplier: 00112E0B_E826D694  Shift: 18  Fail: 0018957D_925BDC00
Multiplier: 00089705_F4136B4A  Shift: 17  Fail: 0018957D_925BDC00
Multiplier: 00044B82_FA09B5A5  Shift: 16  Fail: 0018957D_925BDC00
Multiplier: 000225C1_7D04DAD2  Shift: 15  Fail: 0003A7FD_3BAEAE00
Multiplier: 000112E0_BE826D69  Shift: 14  Fail: 0003A7FD_3BAEAE00
Multiplier: 00008970_5F4136B4  Shift: 13  Fail: 0000D47A_97ED2600
Multiplier: 000044B8_2FA09B5A  Shift: 12  Fail: 0000D47A_97ED2600
Multiplier: 0000225C_17D04DAD  Shift: 11  Fail: 0000D47A_97ED2600
Multiplier: 0000112E_0BE826D6  Shift: 10  Fail: 00001D93_E2A71A00
Multiplier: 00000897_05F4136B  Shift:  9  Fail: 00001D93_E2A71A00
Multiplier: 0000044B_82FA09B5  Shift:  8  Fail: 000006A8_51FFF000
Multiplier: 00000225_C17D04DA  Shift:  7  Fail: 0000029C_74EF6600
Multiplier: 00000112_E0BE826D  Shift:  6  Fail: 0000029C_74EF6600
Multiplier: 00000089_705F4136  Shift:  5  Fail: 000000C2_E1167200
Multiplier: 00000044_B82FA09B  Shift:  4  Fail: 000000C2_E1167200
Multiplier: 00000022_5C17D04D  Shift:  3  Fail: 00000032_FD6ACE00
Multiplier: 00000011_2E0BE826  Shift:  2  Fail: 00000014_B8D03A00
Multiplier: 00000008_9705F413  Shift:  1  Fail: 00000014_B8D03A00
Multiplier: 00000004_4B82FA09  Shift:  0  Fail: 00000006_0DB88400

raymond

QuoteMultiplier: 89705F41_36B4A597  Shift: 29  Fail: FFFFFFFF_FFFFFFFF

This only fails if you blindly add 1 resulting in 0, without considering the carry, and then multiplying. If you shift your multiplier by 29, you get the correct result (Look at my very first post in this thread for such an occurence.)
When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com

dedndave

i was more interested in all the other lines, Ray - lol
i was hoping to make some kind of correlation between the result and multiplier resolution

dedndave

i have totally re-written this program
it checks all values where the low 32-bits are zero or FFFFFFFF
it checks all values where the high 32-bits are zero or FFFFFFFF
it checks all values just before/after quotient rollover (this is where most errors occur first)
it checks them in sequence
if an error is found, it backs up to the last "pass" dividend and checks one by one until the earliest fail is found
it also runs 64 shift values - some have been removed in the following text (they all fail like the last one after a point)
it takes a while to run - i have throttled it back to only use 25% CPU time
for a divisor of 1,000,000,000, it took about 4 1/2 hrs on my machine
for smaller divisors, it will take even longer
anyways, i thought you guys might like to have a 64 x 64 32-bit mul2div test program
source is attached...

Multiplier: 89705F41_36B4A597  Mult Divisor: 2^93  Pass: FFFFFFFF_FFFFFFFF
Multiplier: 44B82FA0_9B5A52CB  Mult Divisor: 2^92  Fail: 73348093_3BA6C600
Multiplier: 225C17D0_4DAD2965  Mult Divisor: 2^91  Fail: 2B0B3E03_EF940E00
Multiplier: 112E0BE8_26D694B2  Mult Divisor: 2^90  Fail: 131B7A69_5A7B4000
Multiplier: 089705F4_136B4A59  Mult Divisor: 2^89  Fail: 131B7A69_5A7B4000
Multiplier: 044B82FA_09B5A52C  Mult Divisor: 2^88  Fail: 05ED06A0_F2511800
Multiplier: 0225C17D_04DAD296  Mult Divisor: 2^87  Fail: 05ED06A0_F2511800
Multiplier: 0112E0BE_826D694B  Mult Divisor: 2^86  Fail: 05ED06A0_F2511800
Multiplier: 0089705F_4136B4A5  Mult Divisor: 2^85  Fail: 00E8B62F_54D78200
Multiplier: 0044B82F_A09B5A52  Mult Divisor: 2^84  Fail: 00566838_C33B0A00
Multiplier: 00225C17_D04DAD29  Mult Divisor: 2^83  Fail: 00566838_C33B0A00
Multiplier: 00112E0B_E826D694  Mult Divisor: 2^82  Fail: 0018957D_925BDC00
Multiplier: 00089705_F4136B4A  Mult Divisor: 2^81  Fail: 0018957D_925BDC00
Multiplier: 00044B82_FA09B5A5  Mult Divisor: 2^80  Fail: 0018957D_925BDC00
Multiplier: 000225C1_7D04DAD2  Mult Divisor: 2^79  Fail: 0003A7FD_3BAEAE00
Multiplier: 000112E0_BE826D69  Mult Divisor: 2^78  Fail: 0003A7FD_3BAEAE00
Multiplier: 00008970_5F4136B4  Mult Divisor: 2^77  Fail: 0000D47A_97ED2600
Multiplier: 000044B8_2FA09B5A  Mult Divisor: 2^76  Fail: 0000D47A_97ED2600
Multiplier: 0000225C_17D04DAD  Mult Divisor: 2^75  Fail: 0000D47A_97ED2600
Multiplier: 0000112E_0BE826D6  Mult Divisor: 2^74  Fail: 00001D93_E2A71A00
Multiplier: 00000897_05F4136B  Mult Divisor: 2^73  Fail: 00001D93_E2A71A00
Multiplier: 0000044B_82FA09B5  Mult Divisor: 2^72  Fail: 000006A8_51FFF000
Multiplier: 00000225_C17D04DA  Mult Divisor: 2^71  Fail: 0000029C_74EF6600
Multiplier: 00000112_E0BE826D  Mult Divisor: 2^70  Fail: 0000029C_74EF6600
Multiplier: 00000089_705F4136  Mult Divisor: 2^69  Fail: 000000C2_E1167200
Multiplier: 00000044_B82FA09B  Mult Divisor: 2^68  Fail: 000000C2_E1167200
Multiplier: 00000022_5C17D04D  Mult Divisor: 2^67  Fail: 00000032_FD6ACE00
Multiplier: 00000011_2E0BE826  Mult Divisor: 2^66  Fail: 00000014_B8D03A00
Multiplier: 00000008_9705F413  Mult Divisor: 2^65  Fail: 00000014_B8D03A00
Multiplier: 00000004_4B82FA09  Mult Divisor: 2^64  Fail: 00000006_0DB88400
Multiplier: 00000002_25C17D04  Mult Divisor: 2^63  Fail: 00000000_3B9ACA00


now, to put the finishing touches on my base 1,000,000,000 ling long kai fang bignum routine   :bg

dedndave

i really like using qWords program
but, it occured to me that what it needs, now, is a nice icon - lol
the attached file has an ico with several sizes in it


dedndave

i was playing with this..... again   :P

i want to write some code that uses multiply-to-divide, but with variable multiplier and shift values
when the divisor is changed, i calculate new mult and shift values and store them
then, the multiply-to-divide code uses those values until the divisor changes again
the problem here is, the code in the magic number generator program may be different from one divisor to another
i can probably overcome this by carefully choosing the multiplier

at any rate, i was looking at the code that is generated for an example divisor of 7...
mov eax,X
mov edx,092492492h
add eax,1
.if !CARRY?
     mul edx
.endif
shr edx,02h
; edx = quotient


when disassembled, it looks something like this...
        mov     eax,X
        mov     edx,92492492h
        add     eax,1
        jc      @F

        mul     edx

@@:     shr     edx,2


i was wondering why ADD EAX,1 and JC are used instead of INC EAX and JZ...
        mov     eax,X
        mov     edx,92492492h
        inc     eax
        jz      @F

        mul     edx

@@:     shr     edx,2


or, in code that everyone else but me seems to like   :P
mov eax,X
mov edx,092492492h
inc eax
.if !ZERO?
     mul edx
.endif
shr edx,02h
; edx = quotient

qWord

Quote from: dedndave on June 26, 2011, 05:05:28 PM
i was wondering why ADD EAX,1 and JC are used instead of INC EAX and JZ...
why? - code size is irrelevant in this case.
FPU in a trice: SmplMath
It's that simple!

Tedd

Also, INC and DEC will usually cause a stall if the next instruction tests the flags (since they avoid modifying carry.)
No snowflake in an avalanche feels responsible.

dedndave

is that so for newer cores ?
i would think that instruction pair would be optimized to some degree   :P

jj2007

Quote from: Tedd on July 01, 2011, 01:08:03 PM
Also, INC and DEC will usually cause a stall if the next instruction tests the flags (since they avoid modifying carry.)

That is a 0.2 cycles stall on my P4 - but why do we need an extra test anyway?

Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
1648    cycles for dec
1665    cycles for sub

1898    cycles for dec + test
1698    cycles for sub + test



qWord

FPU in a trice: SmplMath
It's that simple!

jj2007

Quote from: Tedd on July 01, 2011, 01:08:03 PM... stall if the next instruction tests the flags

Maybe I misunderstood something ::)

dedndave

that is a p4 - newer cores should be a little better   :P

the test is necessary if you want a multiply-to-divide routine to cover the full range
in my case, i am dividing small numbers and managed to select multipliers that do not require the INC/JZ
but some of these multipliers may fail if the dividend is large