Can someone provide me with a better (quicker) way to branch based on an SSE register being zero?
So far the only thing I can come up with on my own is to basically 'or' the high dword and low dword of the register together then use 'ucomisd' to compare the register with another register of all zeros to update the EFLAGs, then branch based on the EFLAGS
pxor xmm0, xmm0 ;clear xmm0
movdqa xmm1, xmm3 ;copy upper dword of xmm3
psrldq xmm1, 8 ; to lower dword of xmm1
por xmm1, xmm3 ;or the upper and lowwer dwords together
ucomisd xmm1, xmm0 ;compare the combined dwords to zero
jne Location02 ;branch based on EFLAGs
; check xmm0
pxor xmm1,xmm1
pcmpeqd xmm1,xmm0
pmovmskb eax,xmm1
cmp eax,0ffffh
je @xmm0_is_zero
Well that is a solution... unfortunately it's just as many lines of code as I already have (once implemented).
My actual implementation is trying to test xmm3, but I've already got xmm7 permanently set to all zeros (because there's several other places where I'm doing comparisons to zero). Additionally, I have to preserve the non-zero value of xmm3. So my current code looks like this:
movdqa xmm1, xmm3 ;if xmm3 == 0
psrldq xmm1, 8 ; ...
por xmm1, xmm3 ; ...
ucomisd xmm1, xmm7 ; ...
je RLL10 ; then goto next row
So I'm already at an implementation that takes 5 instructions.
KooDooKu,
less number of instruction does not impose, that the code is faster - it is more important what the instruction does. Also, your implementation can't work correct, because you may pass invalid floating point values to ucomisd.
When you need the zero-test more than one time, write a function or a macro.