Print Page - How to get started with simd/sse ?

Title: How to get started with simd/sse ?
Post by: BlackVortex on March 01, 2010, 08:04:09 PM

I'm interested in optimizing long memory operations, like searching in gigabytes of memory, and I believe SIMD is the way to go. Am I right ? :bg

Searching around the forum topics is hard because "sse" is contained in assembly ...

Is there a good assembly+sse (sse2 at least) tutorial, or should I plunge into Intel's documentation ? Any useful links would be appreciated.

EDIT: Oh, I found this thread after better searching keywords :
http://www.masm32.com/board/index.php?topic=8498.0

Title: Re: How to get started with simd/sse ?
Post by: dedndave on March 01, 2010, 08:15:09 PM

from that link, i have been reading the tommasani docs...

http://www.tommesani.com/Docs.html

Title: Re: How to get started with simd/sse ?
Post by: jj2007 on March 01, 2010, 08:40:54 PM

search the forum for pcmpeqb

Title: Re: How to get started with simd/sse ?
Post by: BlackVortex on March 01, 2010, 09:34:52 PM

Quote from: jj2007 on March 01, 2010, 08:40:54 PM
search the forum for pcmpeqb

I found some more interesting threads, thanks.

Can you do me a little favour, plz ? I wanna timetest a code. Can you make a real quick barebones no-frame procedure that writes the 0BBBBBBBBh or some other dword, esi=starting offset, number of bytes in MEMCHUNK define.

As a test control, my function is this :

Code Select

@ZeroMemPlain:
mov eax,esi
mov ecx, MEMCHUNK/4
.again:
mov D[eax],0bbbbbbbbh
add eax,4
dec ecx
jnz < .again
ret

:bg
It takes about 140ms to fill 256mb, on my pc.
I just want to be convinced that it's worth the speedup, I'm not interested in doing complex arithmetic with SSE.

EDIT: I fixed some stuff.

Title: Re: How to get started with simd/sse ?
Post by: qWord on March 01, 2010, 09:46:23 PM

Code Select

    .data
        align 16
        values db 16 dup (0bh)
    .code
    mov eax,esi
    movdqa xmm0,OWORD ptr values
    mov ecx,MEMCHUNK/16
@@: movdqa OWORD ptr [eax],xmm0  ; use movdqu if ESI is unaligned  (not recommended)
    lea eax,[eax+16]
    dec ecx
    jnz @B
@@:

Title: Re: How to get started with simd/sse ?
Post by: BlackVortex on March 01, 2010, 10:08:14 PM

Thanks qWord. It works fine, but the timings are exactly the same as before. Exactly. I guess idepends on memory throughput and not execution cycles.

At least it's fun to step over that instruction and see 128 bits of data moved at once :green

Title: Re: How to get started with simd/sse ?
Post by: hutch-- on March 02, 2010, 01:21:47 AM

BlackVortex,

Memory speed limitations are the final limitation and the real advantage of SSE is its capacity to parallel process 128 bits of data at the same time. Reduced memory access count and parallel processing of the variety of data types it can handle will get the speed of many algorithms up by a long way but in raw data trasnfer memory will be the final limitation.

Title: Re: How to get started with simd/sse ?
Post by: oex on March 02, 2010, 01:25:11 AM

Am I right in thinking that

mov = 4 mem access = 2
mov 128 = 4 mem access = 8
movdqa = 1 mem access = 8

Kind of thing....

I have an SSE2 copy that is faster but I think it uses seperate registers so the speed up in my app might be due to that?

The MASM Forum Archive 2004 to 2012

General Forums => The Campus => Topic started by: BlackVortex on March 01, 2010, 08:04:09 PM