1. Getting started with SSE: mov instructions and data

I assume you have a P3 or Athlon or higher cpu and a OS that saves .xmm regs, otherwise you will not be able to run SSE without a GPS

before anything, start with setup your data & constants, you need to align it on 16 byte boundary

if you want to use the 32bit general registers as pointers you will note that you can only scale them x8

.data

ALIGN 16

var1 REAL4 0.0,0.0,0.0,0.0

(when using SSE2

var1 REAL8 0.0,0.0 )

var2 REAL4 12.0,12.0,12.0,12.0

we start to explain SSE syntax:

MOVAPS ;first three letters are the same as other x86 :MOV,MUL

it ends with either SS for single-precision FP, usually the lowest 0-31 bits of xmm regs,PS for 4 packed Single precision FP, for SSE2: SD for single doubleprecision, PD for Packed double

A: after MOV is aligned, U=unaligned, NT= non temporal: advice processor to bypass cache and store directly

H= high pair,L= low pair

MOVAPS causes a General protection fault, if you try to use any adress that isnt aligned on a 16 byte boundary

SSE move data instructions

MOVAPS xmm1,xmm2/mem128

MOVLPS xmm,m64

MOVSS xmm1,xmm2/m32
MOVAPS xmm1/mem128,xmm2 MOVLPS m64,xmm MOVSS xmm1/m32,xmm2
MOVUPS xmm1,xmm2/mem128 MOVHPS xmm,m64  
MOVUPS xmm1/mem128,xmm2 MOVHPS m64,xmm  
MOVNTPS mem128,xmm2

move low pair FP's to high pair fp's

MOVLHPS xmm1,xmm2

 

you will need at least a P4 to be able to run SSE2

SSE2 move data instructions(for P4)

MOVDQA xmm1,xmm2/m128 MOVDQ2Q mm,xmm MOVD xmm,r/m32
MOVDQA xmm1/m128,xmm2 MOVSD xmm1,xmm2/m64 MOVD r/m32,xmm
MOVDQU xmm1,xmm2/m128 MOVSD xmm1/m64,xmm2  
MOVDQU xmm1/m128,xmm2    
MASKMOVDQU xmm1,xmm2*    
MOVAPD xmm1,xmm2/m128 MOVLPD xmm,m64  
MOVAPDxmm1/mem128,xmm2 MOVLPD m64,xmm  
MOVNTDQ m128,xmm   MOVNTI m32,reg32
MOVNTPD m128,xmm    
MOVUPD xmm1,xmm2/mem128    
MOVUPD xmm1/mem128,xmm2    
     

*MASKMOVDQU xmm1,xmm2 masked mov double quad, stores xmm1 in memory pointed to by ES:EDI, each byte is stored from xmm1, depending on most significiant bit in each byte in xmm2 1=write,0= nowrite