Hello here are three questions from a fairly green assembly programmer:
1) How do I enable SSE in MASM? It doesn't seem to recognize the opcodes or registers. I have the ML that comes with VC processor pack
2) How do I get the processor pack to handle SSE3 opcodes? Processor pack is advertised to support "New" SSE2
3) How do I use SSE2 to do fast dot products? The motviation is for fast matrix multiplies. I looked in my processor docs, and SSE3 offers horizontal adders and places the sum in the lower float, so this may help with fast matrix multiplies, however unless i can solve questions 1 or 2 this is not an option.
Using MASM32 with ML.EXE version 6.15 the following works. The .686 and .xmm are critical, having .xmm after casemap
allows either upper or lowercase for SSE2. The code was already posted in a thread in the campus.
http://www.masm32.com/board/index.php?PHPSESSID=2c553b61c58c539d95eb0e06b046b7a9&topic=6202.0
.686
.model flat,stdcall
option casemap:none
.xmm
include \masm32\include\windows.inc
include \masm32\include\kernel32.inc
include \masm32\include\user32.inc
includelib \masm32\lib\kernel32.lib
includelib \masm32\lib\user32.lib
.data
align 16
fourSP real4 0.0, 1.0, 2.0, 3.0 ; real4 is a 32 bit single precision floating point value
.code
start:
movss xmm0, fourSP ; xmm0 == ?.? ?.? ?.? 0.0 get one real4
movaps xmm1, fourSP ; xmm1 == 3.0 2.0 1.0 0.0 aligned get four real4
movups xmm2, fourSP ; xmm2 == 3.0 2.0 1.0 0.0 unaligned get four real4
invoke ExitProcess, NULL
end start
Using db will allow using SSE3, also it could be done using macros.
the encoding of haddpd starts with
db 66h 0Fh 7Ch then more bytes for destination register, source register/memory selection.