News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Magic divider for 16 bit operands

Started by LimoDriver, April 02, 2007, 01:45:33 PM

Previous topic - Next topic

LimoDriver

Greetings, folks!

I'm searching for a 16 bit version of the magic division routine...

I'm doing some buffer operations in MMX - there are 4 packed unsigned words in a 64 bit register, which I need to divide with a defined BYTE constant.
As you probably know, there are no MMX division instructions, so ...

I've searched around for quite a bit, but I couldn't locate the code which would preform a multiplication / shift replacement for the division.
One solution is to unpack the MMX register into two 64 bit regs, and perform the MMX "magic division" routine for 32 bit values twice.

But this sounds pretty silly if it's possible to obtain the code necessary for 16 bit division.

Anyone, pretty please?
Thanx in advance.

PS: If you have a better idea, other then using the 16 bit magic division, please speak up, by all means.

MichaelW

Agner Fog details a way to do this in his optimizing_assembly pdf, under Problematic Instructions, Division.
eschew obfuscation

LimoDriver

Even before I read Mr. Agner's article, I just brute-forced the thing - wrote a program that should find each and every one of them...
Went through 0000h -> FFFEh and checked every magic number for every divisor, and if that didn't work, I increased by 1...

Here follows a list of magic numbers:

They've all been tested dividing numbers from 0x0000 to 0xFFFE, and most of them work for 0xFFFF too.
(never going to happen in my program)

ax / 0001 = shr ax, 00h !
ax / 0002 = shr ax, 01h !
ax / 0003 = mov dx, AAABh / mul dx / shr dx, 01h !
ax / 0004 = shr ax, 02h !
ax / 0005 = mov dx, CCCDh / mul dx / shr dx, 02h !
ax / 0006 = mov dx, AAABh / mul dx / shr dx, 02h !
ax / 0007 = inc ax / mov dx, 9249h / mul dx / shr dx, 02h !
ax / 0008 = shr ax, 03h !
ax / 0009 = mov dx, E38Fh / mul dx / shr dx, 03h !
ax / 000A = mov dx, CCCDh / mul dx / shr dx, 03h !
ax / 000B = mov dx, BA2Fh / mul dx / shr dx, 03h !
ax / 000C = mov dx, AAABh / mul dx / shr dx, 03h !
ax / 000D = mov dx, 9D8Ah / mul dx / shr dx, 03h !
ax / 000E = inc ax / mov dx, 9249h / mul dx / shr dx, 03h !
ax / 000F = mov dx, 8889h / mul dx / shr dx, 03h !
ax / 0010 = shr ax, 04h !
ax / 0011 = mov dx, F0F1h / mul dx / shr dx, 04h !
ax / 0012 = mov dx, E38Fh / mul dx / shr dx, 04h !
ax / 0013 = mov dx, D795h / mul dx / shr dx, 04h !
ax / 0014 = mov dx, CCCDh / mul dx / shr dx, 04h !
ax / 0015 = inc ax / mov dx, C30Ch / mul dx / shr dx, 04h !
ax / 0016 = mov dx, BA2Fh / mul dx / shr dx, 04h !
ax / 0017 = inc ax / mov dx, B216h / mul dx / shr dx, 04h !
ax / 0018 = mov dx, AAABh / mul dx / shr dx, 04h !
ax / 0019 = inc ax / mov dx, A3D7h / mul dx / shr dx, 04h !
ax / 001A = mov dx, 9D8Ah / mul dx / shr dx, 04h !
ax / 001B = inc ax / mov dx, 97B4h / mul dx / shr dx, 04h !
ax / 001C = inc ax / mov dx, 9249h / mul dx / shr dx, 04h !
ax / 001D = mov dx, 8D3Eh / mul dx / shr dx, 04h !
ax / 001E = mov dx, 8889h / mul dx / shr dx, 04h !
ax / 001F = inc ax / mov dx, 8421h / mul dx / shr dx, 04h !
ax / 0020 = shr ax, 05h !
ax / 0021 = mov dx, F83Fh / mul dx / shr dx, 05h !
ax / 0022 = mov dx, F0F1h / mul dx / shr dx, 05h !
ax / 0023 = mov dx, EA0Fh / mul dx / shr dx, 05h !
ax / 0024 = mov dx, E38Fh / mul dx / shr dx, 05h !
ax / 0025 = mov dx, DD68h / mul dx / shr dx, 05h !
ax / 0026 = mov dx, D795h / mul dx / shr dx, 05h !
ax / 0027 = inc ax / mov dx, D20Dh / mul dx / shr dx, 05h !
ax / 0028 = mov dx, CCCDh / mul dx / shr dx, 05h !
ax / 0029 = inc ax / mov dx, C7CEh / mul dx / shr dx, 05h !
ax / 002A = inc ax / mov dx, C30Ch / mul dx / shr dx, 05h !
ax / 002B = mov dx, BE83h / mul dx / shr dx, 05h !
ax / 002C = mov dx, BA2Fh / mul dx / shr dx, 05h !
ax / 002D = mov dx, B60Ch / mul dx / shr dx, 05h !
ax / 002E = inc ax / mov dx, B216h / mul dx / shr dx, 05h !
ax / 002F = inc ax / mov dx, AE4Ch / mul dx / shr dx, 05h !
ax / 0030 = mov dx, AAABh / mul dx / shr dx, 05h !
ax / 0031 = inc ax / mov dx, A72Fh / mul dx / shr dx, 05h !
ax / 0032 = inc ax / mov dx, A3D7h / mul dx / shr dx, 05h !
ax / 0033 = mov dx, A0A1h / mul dx / shr dx, 05h !
ax / 0034 = mov dx, 9D8Ah / mul dx / shr dx, 05h !
ax / 0035 = mov dx, 9A91h / mul dx / shr dx, 05h !
ax / 0036 = inc ax / mov dx, 97B4h / mul dx / shr dx, 05h !
ax / 0037 = inc ax / mov dx, 94F2h / mul dx / shr dx, 05h !
ax / 0038 = inc ax / mov dx, 9249h / mul dx / shr dx, 05h !
ax / 0039 = inc ax / mov dx, 8FB8h / mul dx / shr dx, 05h !
ax / 003A = mov dx, 8D3Eh / mul dx / shr dx, 05h !
ax / 003B = mov dx, 8AD9h / mul dx / shr dx, 05h !
ax / 003C = mov dx, 8889h / mul dx / shr dx, 05h !
ax / 003D = mov dx, 864Ch / mul dx / shr dx, 05h !
ax / 003E = inc ax / mov dx, 8421h / mul dx / shr dx, 05h !
ax / 003F = inc ax / mov dx, 8208h / mul dx / shr dx, 05h !
ax / 0040 = shr ax, 06h !
ax / 0041 = mov dx, FC10h / mul dx / shr dx, 06h !
ax / 0042 = mov dx, F83Fh / mul dx / shr dx, 06h !
ax / 0043 = mov dx, F48Ah / mul dx / shr dx, 06h !
ax / 0044 = mov dx, F0F1h / mul dx / shr dx, 06h !
ax / 0045 = inc ax / mov dx, ED73h / mul dx / shr dx, 06h !
ax / 0046 = mov dx, EA0Fh / mul dx / shr dx, 06h !
ax / 0047 = mov dx, E6C3h / mul dx / shr dx, 06h !
ax / 0048 = mov dx, E38Fh / mul dx / shr dx, 06h !
ax / 0049 = mov dx, E071h / mul dx / shr dx, 06h !
ax / 004A = mov dx, DD68h / mul dx / shr dx, 06h !
ax / 004B = inc ax / mov dx, DA74h / mul dx / shr dx, 06h !
ax / 004C = mov dx, D795h / mul dx / shr dx, 06h !
ax / 004D = mov dx, D4C8h / mul dx / shr dx, 06h !
ax / 004E = inc ax / mov dx, D20Dh / mul dx / shr dx, 06h !
ax / 004F = mov dx, CF65h / mul dx / shr dx, 06h !
ax / 0050 = mov dx, CCCDh / mul dx / shr dx, 06h !
ax / 0051 = mov dx, CA46h / mul dx / shr dx, 06h !
ax / 0052 = inc ax / mov dx, C7CEh / mul dx / shr dx, 06h !
ax / 0053 = mov dx, C566h / mul dx / shr dx, 06h !
ax / 0054 = inc ax / mov dx, C30Ch / mul dx / shr dx, 06h !
ax / 0055 = mov dx, C0C1h / mul dx / shr dx, 06h !
ax / 0056 = mov dx, BE83h / mul dx / shr dx, 06h !
ax / 0057 = mov dx, BC53h / mul dx / shr dx, 06h !
ax / 0058 = mov dx, BA2Fh / mul dx / shr dx, 06h !
ax / 0059 = inc ax / mov dx, B817h / mul dx / shr dx, 06h !
ax / 005A = mov dx, B60Ch / mul dx / shr dx, 06h !
ax / 005B = inc ax / mov dx, B40Bh / mul dx / shr dx, 06h !
ax / 005C = inc ax / mov dx, B216h / mul dx / shr dx, 06h !
ax / 005D = inc ax / mov dx, B02Ch / mul dx / shr dx, 06h !
ax / 005E = inc ax / mov dx, AE4Ch / mul dx / shr dx, 06h !
ax / 005F = mov dx, AC77h / mul dx / shr dx, 06h !
ax / 0060 = mov dx, AAABh / mul dx / shr dx, 06h !
ax / 0061 = inc ax / mov dx, A8E8h / mul dx / shr dx, 06h !
ax / 0062 = inc ax / mov dx, A72Fh / mul dx / shr dx, 06h !
ax / 0063 = mov dx, A57Fh / mul dx / shr dx, 06h !
ax / 0064 = inc ax / mov dx, A3D7h / mul dx / shr dx, 06h !
ax / 0065 = mov dx, A238h / mul dx / shr dx, 06h !
ax / 0066 = mov dx, A0A1h / mul dx / shr dx, 06h !
ax / 0067 = mov dx, 9F12h / mul dx / shr dx, 06h !
ax / 0068 = mov dx, 9D8Ah / mul dx / shr dx, 06h !
ax / 0069 = mov dx, 9C0Ah / mul dx / shr dx, 06h !
ax / 006A = mov dx, 9A91h / mul dx / shr dx, 06h !
ax / 006B = inc ax / mov dx, 991Fh / mul dx / shr dx, 06h !
ax / 006C = inc ax / mov dx, 97B4h / mul dx / shr dx, 06h !
ax / 006D = mov dx, 9650h / mul dx / shr dx, 06h !
ax / 006E = inc ax / mov dx, 94F2h / mul dx / shr dx, 06h !
ax / 006F = mov dx, 939Bh / mul dx / shr dx, 06h !
ax / 0070 = inc ax / mov dx, 9249h / mul dx / shr dx, 06h !
ax / 0071 = mov dx, 90FEh / mul dx / shr dx, 06h !
ax / 0072 = inc ax / mov dx, 8FB8h / mul dx / shr dx, 06h !
ax / 0073 = inc ax / mov dx, 8E78h / mul dx / shr dx, 06h !
ax / 0074 = mov dx, 8D3Eh / mul dx / shr dx, 06h !
ax / 0075 = mov dx, 8C09h / mul dx / shr dx, 06h !
ax / 0076 = mov dx, 8AD9h / mul dx / shr dx, 06h !
ax / 0077 = inc ax / mov dx, 89AEh / mul dx / shr dx, 06h !
ax / 0078 = mov dx, 8889h / mul dx / shr dx, 06h !
ax / 0079 = mov dx, 8768h / mul dx / shr dx, 06h !
ax / 007A = mov dx, 864Ch / mul dx / shr dx, 06h !
ax / 007B = inc ax / mov dx, 8534h / mul dx / shr dx, 06h !
ax / 007C = inc ax / mov dx, 8421h / mul dx / shr dx, 06h !
ax / 007D = inc ax / mov dx, 8312h / mul dx / shr dx, 06h !
ax / 007E = inc ax / mov dx, 8208h / mul dx / shr dx, 06h !
ax / 007F = inc ax / mov dx, 8102h / mul dx / shr dx, 06h !
ax / 0080 = shr ax, 07h !
ax / 0081 = mov dx, FE04h / mul dx / shr dx, 07h !
ax / 0082 = mov dx, FC10h / mul dx / shr dx, 07h !
ax / 0083 = mov dx, FA24h / mul dx / shr dx, 07h !
ax / 0084 = mov dx, F83Fh / mul dx / shr dx, 07h !
ax / 0085 = mov dx, F661h / mul dx / shr dx, 07h !
ax / 0086 = mov dx, F48Ah / mul dx / shr dx, 07h !
ax / 0087 = mov dx, F2BAh / mul dx / shr dx, 07h !
ax / 0088 = mov dx, F0F1h / mul dx / shr dx, 07h !
ax / 0089 = mov dx, EF2Fh / mul dx / shr dx, 07h !
ax / 008A = inc ax / mov dx, ED73h / mul dx / shr dx, 07h !
ax / 008B = mov dx, EBBEh / mul dx / shr dx, 07h !
ax / 008C = mov dx, EA0Fh / mul dx / shr dx, 07h !
ax / 008D = mov dx, E866h / mul dx / shr dx, 07h !
ax / 008E = mov dx, E6C3h / mul dx / shr dx, 07h !
ax / 008F = mov dx, E526h / mul dx / shr dx, 07h !
ax / 0090 = mov dx, E38Fh / mul dx / shr dx, 07h !
ax / 0091 = mov dx, E1FDh / mul dx / shr dx, 07h !
ax / 0092 = mov dx, E071h / mul dx / shr dx, 07h !
ax / 0093 = mov dx, DEEAh / mul dx / shr dx, 07h !
ax / 0094 = mov dx, DD68h / mul dx / shr dx, 07h !
ax / 0095 = mov dx, DBECh / mul dx / shr dx, 07h !
ax / 0096 = inc ax / mov dx, DA74h / mul dx / shr dx, 07h !
ax / 0097 = mov dx, D902h / mul dx / shr dx, 07h !
ax / 0098 = mov dx, D795h / mul dx / shr dx, 07h !
ax / 0099 = mov dx, D62Ch / mul dx / shr dx, 07h !
ax / 009A = mov dx, D4C8h / mul dx / shr dx, 07h !
ax / 009B = inc ax / mov dx, D368h / mul dx / shr dx, 07h !
ax / 009C = inc ax / mov dx, D20Dh / mul dx / shr dx, 07h !
ax / 009D = mov dx, D0B7h / mul dx / shr dx, 07h !
ax / 009E = mov dx, CF65h / mul dx / shr dx, 07h !
ax / 009F = mov dx, CE17h / mul dx / shr dx, 07h !
ax / 00A0 = mov dx, CCCDh / mul dx / shr dx, 07h !
ax / 00A1 = inc ax / mov dx, CB87h / mul dx / shr dx, 07h !
ax / 00A2 = mov dx, CA46h / mul dx / shr dx, 07h !
ax / 00A3 = mov dx, C908h / mul dx / shr dx, 07h !
ax / 00A4 = inc ax / mov dx, C7CEh / mul dx / shr dx, 07h !
ax / 00A5 = inc ax / mov dx, C698h / mul dx / shr dx, 07h !
ax / 00A6 = mov dx, C566h / mul dx / shr dx, 07h !
ax / 00A7 = inc ax / mov dx, C437h / mul dx / shr dx, 07h !
ax / 00A8 = inc ax / mov dx, C30Ch / mul dx / shr dx, 07h !
ax / 00A9 = mov dx, C1E5h / mul dx / shr dx, 07h !
ax / 00AA = mov dx, C0C1h / mul dx / shr dx, 07h !
ax / 00AB = inc ax / mov dx, BFA0h / mul dx / shr dx, 07h !
ax / 00AC = mov dx, BE83h / mul dx / shr dx, 07h !
ax / 00AD = inc ax / mov dx, BD69h / mul dx / shr dx, 07h !
ax / 00AE = mov dx, BC53h / mul dx / shr dx, 07h !
ax / 00AF = mov dx, BB3Fh / mul dx / shr dx, 07h !
ax / 00B0 = mov dx, BA2Fh / mul dx / shr dx, 07h !
ax / 00B1 = inc ax / mov dx, B921h / mul dx / shr dx, 07h !
ax / 00B2 = inc ax / mov dx, B817h / mul dx / shr dx, 07h !
ax / 00B3 = mov dx, B710h / mul dx / shr dx, 07h !
ax / 00B4 = mov dx, B60Ch / mul dx / shr dx, 07h !
ax / 00B5 = mov dx, B50Ah / mul dx / shr dx, 07h !
ax / 00B6 = inc ax / mov dx, B40Bh / mul dx / shr dx, 07h !
ax / 00B7 = mov dx, B310h / mul dx / shr dx, 07h !
ax / 00B8 = inc ax / mov dx, B216h / mul dx / shr dx, 07h !
ax / 00B9 = mov dx, B120h / mul dx / shr dx, 07h !
ax / 00BA = inc ax / mov dx, B02Ch / mul dx / shr dx, 07h !
ax / 00BB = mov dx, AF3Bh / mul dx / shr dx, 07h !
ax / 00BC = inc ax / mov dx, AE4Ch / mul dx / shr dx, 07h !
ax / 00BD = inc ax / mov dx, AD60h / mul dx / shr dx, 07h !
ax / 00BE = mov dx, AC77h / mul dx / shr dx, 07h !
ax / 00BF = mov dx, AB90h / mul dx / shr dx, 07h !
ax / 00C0 = mov dx, AAABh / mul dx / shr dx, 07h !
ax / 00C1 = inc ax / mov dx, A9C8h / mul dx / shr dx, 07h !
ax / 00C2 = inc ax / mov dx, A8E8h / mul dx / shr dx, 07h !
ax / 00C3 = mov dx, A80Bh / mul dx / shr dx, 07h !
ax / 00C4 = inc ax / mov dx, A72Fh / mul dx / shr dx, 07h !
ax / 00C5 = mov dx, A656h / mul dx / shr dx, 07h !
ax / 00C6 = mov dx, A57Fh / mul dx / shr dx, 07h !
ax / 00C7 = mov dx, A4AAh / mul dx / shr dx, 07h !
ax / 00C8 = inc ax / mov dx, A3D7h / mul dx / shr dx, 07h !
ax / 00C9 = mov dx, A307h / mul dx / shr dx, 07h !
ax / 00CA = mov dx, A238h / mul dx / shr dx, 07h !
ax / 00CB = inc ax / mov dx, A16Bh / mul dx / shr dx, 07h !
ax / 00CC = mov dx, A0A1h / mul dx / shr dx, 07h !
ax / 00CD = inc ax / mov dx, 9FD8h / mul dx / shr dx, 07h !
ax / 00CE = mov dx, 9F12h / mul dx / shr dx, 07h !
ax / 00CF = mov dx, 9E4Dh / mul dx / shr dx, 07h !
ax / 00D0 = mov dx, 9D8Ah / mul dx / shr dx, 07h !
ax / 00D1 = mov dx, 9CC9h / mul dx / shr dx, 07h !
ax / 00D2 = mov dx, 9C0Ah / mul dx / shr dx, 07h !
ax / 00D3 = mov dx, 9B4Dh / mul dx / shr dx, 07h !
ax / 00D4 = mov dx, 9A91h / mul dx / shr dx, 07h !
ax / 00D5 = inc ax / mov dx, 99D7h / mul dx / shr dx, 07h !
ax / 00D6 = inc ax / mov dx, 991Fh / mul dx / shr dx, 07h !
ax / 00D7 = mov dx, 9869h / mul dx / shr dx, 07h !
ax / 00D8 = inc ax / mov dx, 97B4h / mul dx / shr dx, 07h !
ax / 00D9 = inc ax / mov dx, 9701h / mul dx / shr dx, 07h !
ax / 00DA = mov dx, 9650h / mul dx / shr dx, 07h !
ax / 00DB = inc ax / mov dx, 95A0h / mul dx / shr dx, 07h !
ax / 00DC = inc ax / mov dx, 94F2h / mul dx / shr dx, 07h !
ax / 00DD = mov dx, 9446h / mul dx / shr dx, 07h !
ax / 00DE = mov dx, 939Bh / mul dx / shr dx, 07h !
ax / 00DF = inc ax / mov dx, 92F1h / mul dx / shr dx, 07h !
ax / 00E0 = inc ax / mov dx, 9249h / mul dx / shr dx, 07h !
ax / 00E1 = mov dx, 91A3h / mul dx / shr dx, 07h !
ax / 00E2 = mov dx, 90FEh / mul dx / shr dx, 07h !
ax / 00E3 = inc ax / mov dx, 905Ah / mul dx / shr dx, 07h !
ax / 00E4 = inc ax / mov dx, 8FB8h / mul dx / shr dx, 07h !
ax / 00E5 = mov dx, 8F18h / mul dx / shr dx, 07h !
ax / 00E6 = inc ax / mov dx, 8E78h / mul dx / shr dx, 07h !
ax / 00E7 = inc ax / mov dx, 8DDAh / mul dx / shr dx, 07h !
ax / 00E8 = mov dx, 8D3Eh / mul dx / shr dx, 07h !
ax / 00E9 = mov dx, 8CA3h / mul dx / shr dx, 07h !
ax / 00EA = mov dx, 8C09h / mul dx / shr dx, 07h !
ax / 00EB = inc ax / mov dx, 8B70h / mul dx / shr dx, 07h !
ax / 00EC = mov dx, 8AD9h / mul dx / shr dx, 07h !
ax / 00ED = mov dx, 8A43h / mul dx / shr dx, 07h !
ax / 00EE = inc ax / mov dx, 89AEh / mul dx / shr dx, 07h !
ax / 00EF = mov dx, 891Bh / mul dx / shr dx, 07h !
ax / 00F0 = mov dx, 8889h / mul dx / shr dx, 07h !
ax / 00F1 = mov dx, 87F8h / mul dx / shr dx, 07h !
ax / 00F2 = mov dx, 8768h / mul dx / shr dx, 07h !
ax / 00F3 = inc ax / mov dx, 86D9h / mul dx / shr dx, 07h !
ax / 00F4 = mov dx, 864Ch / mul dx / shr dx, 07h !
ax / 00F5 = inc ax / mov dx, 85BFh / mul dx / shr dx, 07h !
ax / 00F6 = inc ax / mov dx, 8534h / mul dx / shr dx, 07h !
ax / 00F7 = mov dx, 84AAh / mul dx / shr dx, 07h !
ax / 00F8 = inc ax / mov dx, 8421h / mul dx / shr dx, 07h !
ax / 00F9 = inc ax / mov dx, 8399h / mul dx / shr dx, 07h !
ax / 00FA = inc ax / mov dx, 8312h / mul dx / shr dx, 07h !
ax / 00FB = mov dx, 828Dh / mul dx / shr dx, 07h !
ax / 00FC = inc ax / mov dx, 8208h / mul dx / shr dx, 07h !
ax / 00FD = mov dx, 8185h / mul dx / shr dx, 07h !
ax / 00FE = inc ax / mov dx, 8102h / mul dx / shr dx, 07h !
ax / 00FF = mov dx, 8081h / mul dx / shr dx, 07h !
ax / 0100 = shr ax, 08h !


#include <windows.h>
#include <stdio.h>

int main()
{
    WORD divisor, result, tester;
    DWORD i, r;

    BYTE faliure, shift;

    for ( divisor = 1; divisor <= 256; divisor ++ )
    {
        printf( "ax / %04X = ", divisor );

        for ( shift = 0; shift < 16; shift ++ ) if ( !( divisor >> ( shift + 1 ) ) ) break;

        if ( divisor == ( 1 << shift ) )
        {
            printf( "shr ax, %02Xh !\n", shift );
            continue;
        }

        for ( i = 0; i < 0x20000; i ++ )
        {
            faliure = 0;

            for ( r = 0; r < 0xFFFF; r ++ )
            {
                __asm
                {
                    mov ax, WORD PTR [ r ]
                    add ax, WORD PTR [ i + 2 ]
                    mov dx, WORD PTR [ i ]
                    mul dx
                    mov cl, shift
                    shr dx, cl
                    mov tester, dx

                    mov ax, WORD PTR [ r ]
                    xor dx, dx
                    mov cx, divisor
                    div cx
                    mov result, ax
                }

                if ( tester != result )
                {
                    faliure = 1;
                    break;
                }
            }

            if ( !faliure )
            {
               if ( i > 0xFFFF) printf( "inc ax / " );

               printf( "mov dx, %04Xh / mul dx / shr dx, %02Xh !\n", (WORD) i, shift );
               break;
            }
        }

        if ( faliure )
        {
            printf( " ...\n" );
        }
    }
           
    scanf_s( "%d", &i );

    return 0;
}

lingo

What to do?  :lol


. data
Value dd 277

If I use Douglas W. Jones method

; case ax/0007 = ...

mov eax, Value
mov edx, 2943h
cmp eax, 104859
jnl Use_method_2
mul edx              ; eax*2943h
shr eax, 16
add eax, Value   
shr eax, 3           ;eax= the quotient of Value/7   
;...                 ; eax=28h ->quotient of 277/7     
;...
Use_method_2:
;...
;...

I have eax=28h=40


If I use your method:


mov eax, ValueA1
inc eax
mov edx, 9249h
mul dx
shr dx, 2

I have edx=27h=39

If I use MS Calculator Plus

I have 277/7 == 39.57



Regards,
Lingo

LimoDriver

Thanx Lingo for pointing that out.

The thing is... I'm not too hot with MMX branching, in fact I just started learning MMX yesterday...

Currently, the magic divider, mmx procedure looks like this


paddw     mm1, mm3    // optional inc
pmulhuw   mm1, mm4    // optional multiplier
psrlw     mm1, mm5    // shift
packuswb  mm1, mm2 // unpack back to ARGB
 

... and I don't have a clue how to code conditions in MMX... at least not yet...

Now, given that I'm writing a tray icon anti-aliasing thingymaboob...
I think that 0.75% rounding error is too small for anyone to notice :P

Anyhows, the default integer rounding would truncate that .75 anyways, so I guess I'm free of charge.

Rather than that, someone could try and help me out how to code the opinional multiplier part...
Currenty I'm working on something like this:


BYTE const_mmx_inc[] = { 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0 };
BYTE const_mmx_shr[] = { 0, 2, 3, 4, 4, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8 };
WORD const_mmx_mul[] = { 1, 1, 0xE38F, 1, 0xA3D7, 0xE38F, 0xA72F, 1, 0xCA46, 0xA3D7, 0x8768, 0xE38F, 0xC1E5, 0xA72F, 0x91A3, 1 };


The fields correspond to dividers: 1*1, 2*2, 3*3 and so on to 16*16

So for instance if I'm dividing with 4*4, I do:


add mmx, 0
mul mmx, 1
shr mmx, 4


and if I'm dividing with 5*5, I do:


add mmx, 1
mul mmx, 1xA3D7
shr mmx, 4


the problem is that currently, that "mul" part looks ugly, so I would really like to conditionally skip the thing or something...

LimoDriver

Bah, I'll just branch of without MMX comparisons:


                paddw       mm1, mm3

                cmp         WORD PTR [ mmx_mul ], 1h
                jz          SKIP_MUL
         
                movq        mm4, [ mmx_mul ]
                pmulhuw     mm1, mm4

SKIP_MUL:       psrlw       mm1, mm5
                packuswb    mm1, mm2


This way, if it's just a pure shr, I skip the mul entirely, and if it's something that needs to go through the magic divisor, it automatically puts only the higher 4 words (4 edx's in regular magic division) into the same MMX register.

lingo

"I think that 0.75% rounding error is too small for anyone to notice "

"You have just become the spokesman for "everybody"?" by Hutch   :lol


You can read old discussion too:   
http://www.asmcommunity.net/board/index.php?topic=4855.0

Regards,
Lingo

LimoDriver

Lemme explain one more time, please bare with me:

It's like this. I have a matrix of pixels.. 1x1, 2x2 up to 16x16, depending on the quality of anti-aliasing involved.

I make a const DWORD array, like this:


DWORD mmx_sim[] = { 0x00000000, 0x02000000, 0x0300E38F, 0x04000000, 0x0401A3D7, 0x0500E38F, 0x0501A72F, 0x06000000,
                    0x6000CA46, 0x0601A3D7, 0x06008768, 0x0700E38F, 0x0700C1E5, 0x0701A72F, 0x070091A3, 0x08000000 };


This is all the info I need, one for each matrix.

Now, during the MMX initialization, I put three factors into MMX registers:

1) Opinional addition - example for divider 25, it will be +1
2) Opinional multiplication
3) A shift which will always execute

Now, if the matrix is a power of two, let's say 4x4, I just add all the pixels, and do a shift on the colors ( 4x4 = 16 = shr 4 ).
But, if the matrix needs magic division, it goes through the addition / multiplication bit.

You are suggesting, that I *fix* the division because you think that numbers with a fractional part should be rounded up.
But I prefer truncating to zeros for two reasons:

1) It is compatible with the shift operation result, which occurs if the matrix is a power of two
2) Lots of "shadowy" pixels around the transparent bitmap I'm blitting have very low alpha values, and if they're close to zero,
the truncation will remove them from the system. One pixel with a transparency of 1 is enough for me to have to include the
entire line of pixels in both directions... Bah, I'll attach the proggie so that you can take a peek :)

[attachment deleted by admin]