The MASM Forum Archive 2004 to 2012

General Forums => The Campus => Topic started by: HooKooDooKu on December 15, 2011, 06:17:52 PM

Title: Branch if an SSE register equals zero
Post by: HooKooDooKu on December 15, 2011, 06:17:52 PM
Can someone provide me with a better (quicker) way to branch based on an SSE register being zero?

So far the only thing I can come up with on my own is to basically 'or' the high dword and low dword of the register together then use 'ucomisd' to compare the register with another register of all zeros to update the EFLAGs, then branch based on the EFLAGS

pxor    xmm0, xmm0    ;clear xmm0
movdqa  xmm1, xmm3    ;copy upper dword of xmm3
psrldq  xmm1, 8       ; to lower dword of xmm1
por     xmm1, xmm3    ;or the upper and lowwer dwords together   
ucomisd xmm1, xmm0    ;compare the combined dwords to zero
jne Location02        ;branch based on EFLAGs
Title: Re: Branch if an SSE register equals zero
Post by: qWord on December 15, 2011, 06:24:50 PM
; check xmm0
pxor xmm1,xmm1
pcmpeqd xmm1,xmm0
pmovmskb eax,xmm1
cmp eax,0ffffh
je @xmm0_is_zero
Title: Re: Branch if an SSE register equals zero
Post by: HooKooDooKu on December 15, 2011, 07:04:28 PM
Well that is a solution... unfortunately it's just as many lines of code as I already have (once implemented).

My actual implementation is trying to test xmm3, but I've already got xmm7 permanently set to all zeros (because there's several other places where I'm doing comparisons to zero).  Additionally, I have to preserve the non-zero value of xmm3.  So my current code looks like this:


   movdqa  xmm1, xmm3      ;if xmm3 == 0
   psrldq  xmm1, 8         ;   ...
   por     xmm1, xmm3      ;   ...
   ucomisd xmm1, xmm7      ;   ...
   je      RLL10         ;   then goto next row


So I'm already at an implementation that takes 5 instructions. 
Title: Re: Branch if an SSE register equals zero
Post by: qWord on December 15, 2011, 07:13:59 PM
KooDooKu,

less number of instruction does not impose, that the code is faster - it is more important what the instruction does. Also, your implementation can't work correct, because you may pass invalid floating point values to ucomisd.
When you need the zero-test more than one time, write a function or a macro.