News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

SSE Integer Comparisons

Started by johnsa, April 19, 2010, 09:36:09 AM

Previous topic - Next topic

jj2007

My version, Celeron M, with source.

416     ticks for CmpArray1, Lingo      -1
518     ticks for CmpArray2, Lingo+ret  -1
418     ticks for CmpArray3, Clive      65535
404     ticks for CmpArray4, dioxin     65535

dioxin

I find it faster to do this on a Phenom II:
!movdqu xmm0,[esi]     'get the first 32 bytes
!movdqu xmm1,[esi+16]

!pcmpeqd xmm0,[edi]    'compare with the second 32 bytes
!pcmpeqd xmm1,[edi+16]
         
!pand xmm0,xmm1
                 
!pmovmskb eax, xmm0  'eax now contains &hffff if the 32 bytes matched

Not loading the data to register before comparing, instead just compare with the data in memory.

Since the code is now so short, wouldn't it make sense to inline it to remove the CALL/RET overhead?

Paul.

jj2007

Quote from: dioxin on April 21, 2010, 11:53:17 PM
I find it faster to do this on a Phenom II:...
Paul.


Paul,

That's what I implemented in the attachment as CmpArray4. It is marginally faster than the others, even though I used movdqa.

Regards,
Jochen

johnsa

This update: moving from proc to macro and and'ing the values in xmm regs first doubles the speed for me on my core2 duo 2.0ghz.

It's now doing 600,000,000 per second instead of 300,000,000. Huge boost.

clive

On a Willamette P4, its all pretty much a wash.

Empty
      1592 Ticks (min),  0.053 cycles

FOO (1) Lingo
     13016 Ticks (min), 13.016 cycles
     13607 Ticks (avg), 13.607 cycles
     20153 Ticks (rms), 20.154 cycles

FOO (2) Clive#1
     12520 Ticks (min), 12.520 cycles
     12568 Ticks (avg), 12.568 cycles
     12591 Ticks (rms), 12.592 cycles

FOO (3)
     12520 Ticks (min), 12.520 cycles
     12539 Ticks (avg), 12.539 cycles
     12539 Ticks (rms), 12.539 cycles

FOO (4)
     12520 Ticks (min), 12.520 cycles
     12542 Ticks (avg), 12.542 cycles
     12542 Ticks (rms), 12.543 cycles

FOO (5) Clive#2
     12520 Ticks (min), 12.520 cycles
     12960 Ticks (avg), 12.960 cycles
     18549 Ticks (rms), 18.550 cycles

FOO (6) JJ (lingo w/ret)
     12516 Ticks (min), 12.516 cycles
     12537 Ticks (avg), 12.537 cycles
     12537 Ticks (rms), 12.537 cycles

FOO (7) Clive#2 w/ret
     12500 Ticks (min), 12.500 cycles
     12532 Ticks (avg), 12.532 cycles
     12561 Ticks (rms), 12.561 cycles

FOO (8) johnsa#2
     25524 Ticks (min), 25.524 cycles
     27684 Ticks (avg), 27.684 cycles
     53286 Ticks (rms), 53.286 cycles

FOO (9) dioxin
     12496 Ticks (min), 12.496 cycles
     13210 Ticks (avg), 13.210 cycles
     24524 Ticks (rms), 24.525 cycles
It could be a random act of randomness. Those happen a lot as well.