Hello everyone,
just a very simple question. I'd like to use XMM 128-bit registries to do some VERY simple 128-bit integer math on them.
In particular, I'd need to:
set XMM registries to some 128 bit int value
compare two 128-bit integer and jump somewhere if they are equal
multiply 128 bit XMM registry value by 3
divide 128 bit XMM registry value by 2
add 1 to 128 XMM registry value
is that possible?
Thank you very much!
Matteo Monti
Msoft Programming
might be easier to divide by 2, add that to the original, then increment
Y = 1.5X + 1
Quote from: msoftprogramming on May 24, 2011, 11:29:57 AM
compare two 128-bit integer and jump somewhere if they are equal
Ciao Matteo,
Here is a snippet that compares two 128-bit memory vars:
include \masm32\include\masm32rt.inc
.686
.xmm
IsEqual128 MACRO arg1, arg2
movups xmm0, oword ptr arg1
movups xmm1, oword ptr arg2
psubd xmm0, xmm1
xorps xmm1, xmm1
pcmpeqb xmm0, xmm1
pmovmskb eax, xmm0
cwde
inc eax
EXITM <Zero?>
ENDM
.data
x128A REAL8 123456789.0, 123456789.1
x128B REAL8 123456789.0, 123456789.1
x128C REAL8 123456789.1, 123456789.1
.code
AppName db "Masm32:", 0
start:
.if IsEqual128(x128A, x128B)
MsgBox 0, "A=B", "Hi", MB_OK
.else
MsgBox 0, "A and B are different", "Hi", MB_OK
.endif
.if IsEqual128(x128A, x128C)
MsgBox 0, "A=C", "Hi", MB_OK
.else
MsgBox 0, "A and C are different", "Hi", MB_OK
.endif
exit
end start
This is for REAL8 vars, but it works for integers, too; try this:
x128A dq 1234567890, 1234567891
x128B dq 1234567890, 1234567891
x128C dq 1234567891, 1234567891
hi,
Quote from: msoftprogramming on May 24, 2011, 11:29:57 AMI'd like to use XMM 128-bit registries to do some VERY simple 128-bit integer math on them.
you have obvious miss understood what SSEx is for: it is designed for processing vectorized data (SIMD). There is no nature support for 128Bit integers - 64Bit integers are the maximum.
... all right! I think I should find out something to understand the language a bit better.. could you tell me where can I find something like a tutorial explaining everything from the beginning? Something about every registry and how does each work and all the operation I can perform with them? Thank you very much.
Anyway, jj2007, I tried to run your code, but.. it always says that numbers are different, even if i change the values...
Thank you again!
Matteo
Quote from: msoftprogramming on May 24, 2011, 02:35:02 PM
Anyway, jj2007, I tried to run your code, but.. it always says that numbers are different, even if i change the values...
That's odd - here it works, and it should work, as it simply tests two packed quadwords for equality (which is indeed a 128:128-bit comparison, although a very simple one).
Can anybody confirm Matteo's finding?
jj's code works for me, but it could be done a bit easier:
.data
align 16
data1 QWORD -123,-123
data2 QWORD -123,-123
.code
movdqa xmm0,OWORD ptr data1
pcmpeqb xmm0,OWORD ptr data2
pmovmskb eax,xmm0
.if eax == 0ffffh
MsgBox 0,"equal",0,0
.endif
Yes, that's right, I forgot the pcmpeqb compares all bytes to their counterparts...
Nonetheless I would stick with movups - assuming 16-bit alignment is kind of bug-prone :wink
QuoteIsEqual128 MACRO arg1, arg2
movups xmm0, oword ptr arg1
movups xmm1, oword ptr arg2
pcmpeqb xmm0, xmm1
pmovmskb eax, xmm0
cwde
inc eax
EXITM <Zero?>
ENDM
Quote.if IsEqual128(x128A, x128B)
Looking at the IsEqual128 macro, it is obvious that you need memory locations as arguments. As coded, would the x128A and x128B arguments always be interpreted as offsets with ALL assemblers or could some assemblers interpret them as actual values?
"There is no nature support for 128Bit integers - 64Bit integers are the maximum."
Take a look of AVX coz AVX adds new register-state through the 256-bit wide YMM register-file, so explicit operating system support is required to properly save & restore AVX's new registers between context switches.
I have Intel Sandy Bridge processor, Windows 7-64bit SP1, MASM from VS2010-SP1 and have no problem with the new instructions. See my replay #4 here (http://www.masm32.com/board/index.php?topic=15895.0) :U
I know about AVX,
but AFAIKS there is no nature support for arithmetic on 128Bit integers !(?)
Quote from: raymond on May 25, 2011, 01:20:16 AMLooking at the IsEqual128 macro, it is obvious that you need memory locations as arguments.
Well, not really:
movups xmm3, oword ptr x128B
.if IsEqual128(x128A, xmm3)
QuoteAs coded, would the x128A and x128B arguments always be interpreted as offsets with ALL assemblers or could some assemblers interpret them as actual values?
Most of the code posted in the Forum can be interpreted correctly only by ml.exe and jwasm.exe ...
Quote from: lingo on May 25, 2011, 01:35:50 AM"There is no nature support for 128Bit integers
Except for the supernatural IsEqual128 macro, of course :bg
This seems to run and the results appear to work but no garrantees. :bg You know, Eenie, meanie (blue), miney and MOE.
IF 0 ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include\masm32rt.inc
cmp128 PROTO :DWORD,:DWORD
.data
item1 oword 0FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFh
item2 oword 0
item3 oword 1
item4 oword 0FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEh
item5 oword 0000000000000000FFFFFFFFFFFFFFFFh
item6 oword 0FFFFFFFFFFFFFFFF0000000000000000h
.code
start:
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
call main
inkey
exit
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
main proc
invoke cmp128,ADDR item2,ADDR item2
print str$(eax),13,10
invoke cmp128,ADDR item4,ADDR item1
print str$(eax),13,10
invoke cmp128,ADDR item3,ADDR item2
print str$(eax),13,10
invoke cmp128,ADDR item5,ADDR item6
print str$(eax),13,10
invoke cmp128,ADDR item6,ADDR item5
print str$(eax),13,10
ret
main endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
cmp128 proc num1:DWORD,num2:DWORD
mov ecx, num1
mov edx, num2
mov eax, [ecx+12]
cmp eax, [edx+12]
jb lessthan
ja greater
mov eax, [ecx+8]
cmp eax, [edx+8]
jb lessthan
ja greater
mov eax, [ecx+4]
cmp eax, [edx+4]
jb lessthan
ja greater
mov eax, [ecx]
cmp eax, [edx]
jb lessthan
ja greater
equal:
xor eax, eax
ret
greater:
mov eax, 1
ret
lessthan:
or eax, -1
ret
cmp128 endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end start
Jumpin' Jiminy!
cmp128 proc num1:DWORD,num2:DWORD
mov ecx, num1
mov edx, num2
mov eax, [ecx+12]
sub eax, [edx+12]
jnz above_or_below
mov eax, [ecx+8]
sub eax, [edx+8]
jnz above_or_below
mov eax, [ecx+4]
sub eax, [edx+4]
jnz above_or_below
mov eax, [ecx]
sub eax, [edx]
jz done
above_or_below:
sbb eax,eax
jnz done
add eax,1
done:
ret
cmp128 endp
Unsigned I assume.
:bg
yikes !!!!!!
you guys don't have your thinking caps on - lol
why not use SUB once, then SBB
get rid of all those JMP's
when you're done, zero and carry are set for you
also, it might be a little faster to push EBX
then, use alternating registers
something like this
cmp128 proc lpNum1:DWORD,lpNum2:DWORD
push ebx
mov edx,lpNum1
mov ebx,lpNum2
mov eax,[edx]
mov ecx,[edx+4]
sub eax,[ebx]
sbb ecx,[ebx+4]
mov eax,[edx+8]
mov ecx,[edx+12]
sbb eax,[ebx+8]
sbb ecx,[ebx+12]
pop ebx
ret
cmp128 endp
hutch: 0,-1,1,-1,1
sinsi: 0,-1,1,-1,1
dave: 0,-1,0,1,-2
that's because mine does not return a value in EAX
it returns with the flags set
gosh - what happened in here ?
noone noticed that the flags returned by my proc are not correct :P
you guys are asleep at the wheel - lol
well, the original poster wanted to test for equality/inequality only
that greatly simplifies the requirements
however, it would be nice to have a compare function that sets the overflow, sign, and carry flags, as well
that way it could be used for signed/unsigned greater/less comparisons
EDIT
the sign and carry flags should be correct for my method
the zero flag isn't too hard to set
the overflow flag needs a little work :bg
i don't think i would mess with parity - it isn't very useful - although, it wouldn't be hard
Quote from: dedndave on May 26, 2011, 03:13:35 AM
i don't think i would mess with parity - it isn't very useful - although, it wouldn't be hard
Hi,
Just AND the low byte with itself and store the parity flag
somewhere.
Regards,
Steve N.
Checking parity flag settings.
AX AX AH AL PF
4FB4 O O E E
B783 O E O O
7706 E E E E
62AD E O O O
E428 E E E E
7967 E O O O
279A E E E E
5231 E O O O
A5DC O E O O
078B O O E E
B76E O E O O
17F5 E E E E
C8D0 E O O O
05EF O E O O
7A82 O O E E
F7F9 O O E E
C104 E O O O
5893 O O E E
i didn't think it was worth the clock cycles or code bytes, Steve :P
that low-byte-only thing is for serial communications, of course
if we actually wanted the parity of bignum values, we could XOR all the bytes together
anyways, here is my code and a test piece...
Pentium 4 Prescott (2005+), MMX, SSE3
NO NS ZF NC
NO SF NZ CF
NO NS NZ NC
NO NS NZ NC
NO SF NZ CF
NO NS NZ CF
NO SF NZ NC
NO NS NZ CF
OF SF NZ CF
NO SF NZ NC
OF NS NZ NC
58 57 57 58 57
i am sure it could be a few cycles faster without using SSE
and there is probably a way to do it with SSE, too
but, you can now use it for signed or unsigned comparisons
as it turned out, the only flag that requires modification after SUB, SBB, SBB, SBB is the zero flag
you can save the other flags using PUSHFD
POPFD is very slow, however
and - LAHF/SAHF do not load/store the overflow flag (i did not realize that until now - lol)
EDIT - see the next post for download...
slight improvement....
Pentium 4 Prescott (2005+), MMX, SSE3
NO NS ZF NC
NO SF NZ CF
NO NS NZ NC
NO NS NZ NC
NO SF NZ CF
NO NS NZ CF
NO SF NZ NC
NO NS NZ CF
OF SF NZ CF
NO SF NZ NC
OF NS NZ NC
52 52 52 52 53
updated code below
my apologies
i left an instruction in there from a previous attempt that destroyed the contents of EBX on the stack
i hope it hasn't caused anyoe trouble :red
Pentium 4 Prescott (2005+), MMX, SSE3
NO NS ZF NC
NO SF NZ CF
NO NS NZ NC
NO NS NZ NC
NO SF NZ CF
NO NS NZ CF
NO SF NZ NC
NO NS NZ CF
OF SF NZ CF
NO SF NZ NC
OF NS NZ NC
53 52 52 52 52
updated code attached...
Hi Dave,
Quotei didn't think it was worth the clock cycles or code bytes, Steve
Actually wrote it a few days ago to address a concern from
a while back. "Educational." Hm, may add a tweak or two to it.
Quotethat low-byte-only thing is for serial communications, of course
Yeah, a left over from the 8085 I guess. Makes you (me)
wonder how often it was/is used.
Quoteupdated code attached...
Nice. An incentive to update my fixed point routines.
Thanks,
Steve N.
This behaved a teensie bit odd with the cursor. Win98
full screen. SSE0? <g>
P1 (1993+), MMX, SSE0
NO NS ZF NC
NO SF NZ CF
NO NS NZ NC
NO NS NZ NC
NO SF NZ CF
NO NS NZ CF
NO SF NZ NC
NO NS NZ CF
OF SF NZ CF
NO SF NZ NC
OF NS NZ NC
34 34 34 34 34
Press any key to continue ...
P3 (2000+), MMX, SSE1
NO NS ZF NC
NO SF NZ CF
NO NS NZ NC
NO NS NZ NC
NO SF NZ CF
NO NS NZ CF
NO SF NZ NC
NO NS NZ CF
OF SF NZ CF
NO SF NZ NC
OF NS NZ NC
32 32 32 32 32
Press any key to continue ...
yah - that's an old version of Jochen's ShowCPU :P
he has newer versions around - i just grabbed what was handy - lol
it looks like you have a Pentium MMX
i have one of those around - it is a 200 MHz CPU, but it seems to run well at 225 :bg
back before the year 2000, that was my main machine with win 95 or win 98 on it
i forgot to explain how the routine might be used in normal operation
it is like CMP so, for example, if you wanted to use JLE...
INVOKE cmp128,offset FirstValue,offset SecondValue
jle SomeLabel
i managed to squeeze a couple more clock cycles out of it by changing the order of the zero-test instructions
got rid of a little dependancy
push ebx
mov edx,[esp+8] ;lpNum1
mov ebx,[esp+12] ;lpNum2
mov eax,[edx]
mov ecx,[edx+4]
sub eax,[ebx]
sbb ecx,[ebx+4]
push eax
push ecx
mov eax,[edx+8]
mov ecx,[edx+12]
sbb eax,[ebx+8]
sbb ecx,[ebx+12]
pop edx
pop ebx
pushfd
or edx,ecx ;little change
or edx,eax ;little change
pop ecx
or edx,ebx ;little change
lahf
pop ebx
and ah,40h
and cx,8BFh
or ah,cl
add ch,78h
sahf
ret 8