News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

128-bit integer arithmetics

Started by msoftprogramming, May 24, 2011, 11:29:57 AM

Previous topic - Next topic

dedndave

yikes !!!!!!
you guys don't have your thinking caps on - lol

why not use SUB once, then SBB
get rid of all those JMP's
when you're done, zero and carry are set for you

also, it might be a little faster to push EBX
then, use alternating registers

dedndave

something like this
cmp128  proc    lpNum1:DWORD,lpNum2:DWORD

        push    ebx
        mov     edx,lpNum1
        mov     ebx,lpNum2
        mov     eax,[edx]
        mov     ecx,[edx+4]
        sub     eax,[ebx]
        sbb     ecx,[ebx+4]
        mov     eax,[edx+8]
        mov     ecx,[edx+12]
        sbb     eax,[ebx+8]
        sbb     ecx,[ebx+12]
        pop     ebx
        ret

cmp128  endp

sinsi


hutch: 0,-1,1,-1,1
sinsi: 0,-1,1,-1,1
dave:  0,-1,0,1,-2

Light travels faster than sound, that's why some people seem bright until you hear them.

dedndave

that's because mine does not return a value in EAX
it returns with the flags set

dedndave

gosh - what happened in here ?

noone noticed that the flags returned by my proc are not correct   :P
you guys are asleep at the wheel - lol

well, the original poster wanted to test for equality/inequality only
that greatly simplifies the requirements
however, it would be nice to have a compare function that sets the overflow, sign, and carry flags, as well
that way it could be used for signed/unsigned greater/less comparisons

EDIT
the sign and carry flags should be correct for my method
the zero flag isn't too hard to set
the overflow flag needs a little work   :bg
i don't think i would mess with parity - it isn't very useful - although, it wouldn't be hard

FORTRANS

Quote from: dedndave on May 26, 2011, 03:13:35 AM
i don't think i would mess with parity - it isn't very useful - although, it wouldn't be hard

Hi,

   Just AND the low byte with itself and store the parity flag
somewhere.

Regards,

Steve N.


Checking parity flag settings.

  AX  AX AH AL PF
4FB4  O  O  E  E
B783  O  E  O  O
7706  E  E  E  E
62AD  E  O  O  O
E428  E  E  E  E
7967  E  O  O  O
279A  E  E  E  E
5231  E  O  O  O
A5DC  O  E  O  O
078B  O  O  E  E
B76E  O  E  O  O
17F5  E  E  E  E
C8D0  E  O  O  O
05EF  O  E  O  O
7A82  O  O  E  E
F7F9  O  O  E  E
C104  E  O  O  O
5893  O  O  E  E

dedndave

#21
i didn't think it was worth the clock cycles or code bytes, Steve   :P
that low-byte-only thing is for serial communications, of course
if we actually wanted the parity of bignum values, we could XOR all the bytes together

anyways, here is my code and a test piece...
Pentium 4 Prescott (2005+), MMX, SSE3

NO NS ZF NC
NO SF NZ CF
NO NS NZ NC
NO NS NZ NC
NO SF NZ CF
NO NS NZ CF
NO SF NZ NC
NO NS NZ CF
OF SF NZ CF
NO SF NZ NC
OF NS NZ NC

58 57 57 58 57


i am sure it could be a few cycles faster without using SSE
and there is probably a way to do it with SSE, too
but, you can now use it for signed or unsigned comparisons

as it turned out, the only flag that requires modification after SUB, SBB, SBB, SBB is the zero flag
you can save the other flags using PUSHFD
POPFD is very slow, however
and - LAHF/SAHF do not load/store the overflow flag (i did not realize that until now - lol)

EDIT - see the next post for download...

dedndave

#22
slight improvement....
Pentium 4 Prescott (2005+), MMX, SSE3

NO NS ZF NC
NO SF NZ CF
NO NS NZ NC
NO NS NZ NC
NO SF NZ CF
NO NS NZ CF
NO SF NZ NC
NO NS NZ CF
OF SF NZ CF
NO SF NZ NC
OF NS NZ NC

52 52 52 52 53


updated code below

dedndave

my apologies
i left an instruction in there from a previous attempt that destroyed the contents of EBX on the stack
i hope it hasn't caused anyoe trouble   :red

Pentium 4 Prescott (2005+), MMX, SSE3

NO NS ZF NC
NO SF NZ CF
NO NS NZ NC
NO NS NZ NC
NO SF NZ CF
NO NS NZ CF
NO SF NZ NC
NO NS NZ CF
OF SF NZ CF
NO SF NZ NC
OF NS NZ NC

53 52 52 52 52


updated code attached...

FORTRANS

Hi Dave,

Quotei didn't think it was worth the clock cycles or code bytes, Steve

   Actually wrote it a few days ago to address a concern from
a while back.  "Educational."  Hm,  may add a tweak or two to it.

Quotethat low-byte-only thing is for serial communications, of course

   Yeah, a left over from the 8085 I guess.  Makes you (me)
wonder how often it was/is used.

Quoteupdated code attached...

   Nice.  An incentive to update my fixed point routines.

Thanks,

Steve N.


   This behaved a teensie bit odd with the cursor.  Win98
full screen.  SSE0? <g>

P1 (1993+), MMX, SSE0

NO NS ZF NC
NO SF NZ CF
NO NS NZ NC
NO NS NZ NC
NO SF NZ CF
NO NS NZ CF
NO SF NZ NC
NO NS NZ CF
OF SF NZ CF
NO SF NZ NC
OF NS NZ NC

34 34 34 34 34

Press any key to continue ...

P3 (2000+), MMX, SSE1

NO NS ZF NC
NO SF NZ CF
NO NS NZ NC
NO NS NZ NC
NO SF NZ CF
NO NS NZ CF
NO SF NZ NC
NO NS NZ CF
OF SF NZ CF
NO SF NZ NC
OF NS NZ NC

32 32 32 32 32

Press any key to continue ...

dedndave

yah - that's an old version of Jochen's ShowCPU   :P
he has newer versions around - i just grabbed what was handy - lol
it looks like you have a Pentium MMX
i have one of those around - it is a 200 MHz CPU, but it seems to run well at 225   :bg
back before the year 2000, that was my main machine with win 95 or win 98 on it

i forgot to explain how the routine might be used in normal operation
it is like CMP so, for example, if you wanted to use JLE...
        INVOKE  cmp128,offset FirstValue,offset SecondValue
        jle     SomeLabel


i managed to squeeze a couple more clock cycles out of it by changing the order of the zero-test instructions
got rid of a little dependancy
        push    ebx
        mov     edx,[esp+8]            ;lpNum1
        mov     ebx,[esp+12]           ;lpNum2
        mov     eax,[edx]
        mov     ecx,[edx+4]
        sub     eax,[ebx]
        sbb     ecx,[ebx+4]
        push    eax
        push    ecx
        mov     eax,[edx+8]
        mov     ecx,[edx+12]
        sbb     eax,[ebx+8]
        sbb     ecx,[ebx+12]
        pop     edx
        pop     ebx
        pushfd
        or      edx,ecx        ;little change
        or      edx,eax        ;little change
        pop     ecx
        or      edx,ebx        ;little change
        lahf
        pop     ebx
        and     ah,40h
        and     cx,8BFh
        or      ah,cl
        add     ch,78h
        sahf
        ret     8