News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

3 SSE questions

Started by softwareguy256, November 29, 2006, 07:43:44 PM

Previous topic - Next topic

softwareguy256

Hello here are three questions from a fairly green assembly programmer:
1) How do I enable SSE in MASM?  It doesn't seem to recognize the opcodes or registers.  I have the ML that comes with VC processor pack
2) How do I get the processor pack to handle SSE3 opcodes?  Processor pack is advertised to support "New" SSE2
3) How do I use SSE2 to do fast dot products?  The motviation is for fast matrix multiplies.  I looked in my processor docs, and SSE3 offers horizontal adders and places the sum in the lower float, so this may help with fast matrix multiplies, however unless i can solve questions 1 or 2 this is not an option.

dsouza123

Using MASM32 with ML.EXE version 6.15 the following works. The .686 and .xmm are critical, having .xmm after casemap
allows either upper or lowercase for SSE2.  The code was already posted in a thread in the campus.
http://www.masm32.com/board/index.php?PHPSESSID=2c553b61c58c539d95eb0e06b046b7a9&topic=6202.0


.686
.model flat,stdcall
option casemap:none
.xmm

include \masm32\include\windows.inc
include \masm32\include\kernel32.inc
include \masm32\include\user32.inc
includelib \masm32\lib\kernel32.lib
includelib \masm32\lib\user32.lib

.data
align 16
  fourSP real4 0.0, 1.0, 2.0, 3.0  ; real4 is a 32 bit single precision floating point value

.code
start:
   movss  xmm0, fourSP  ; xmm0 == ?.? ?.? ?.? 0.0              get one  real4
   movaps xmm1, fourSP  ; xmm1 == 3.0 2.0 1.0 0.0      aligned get four real4
   movups xmm2, fourSP  ; xmm2 == 3.0 2.0 1.0 0.0    unaligned get four real4
   invoke ExitProcess, NULL
end start


Using db will allow using SSE3, also it could be done using macros.
the encoding of haddpd starts with
db 66h 0Fh 7Ch then more bytes for destination register, source register/memory selection.