News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Getting SSE working

Started by stanhebben, May 19, 2006, 11:19:27 AM

Previous topic - Next topic

stanhebben

I'm trying to use SSE/SSE2 instructions in my program, but the compiler doesn't recognize them. For example,

  movss mmx1, [esi+eax]

generates the following error:
  error A2006: undefined symbol : mmx1

I'm using masm 6.15, and xmm is enabled (.686 .xmm)

Anyone knows what I've forgotten?

MichaelW

I don't know much about this, but I can tell you that the registers are mm0-mm7 and xmm0-xmm7.


eschew obfuscation

Ossa

For the sake of completeness, I'll repost the macros I put here before the "event"...

the MM0-MM7 and XMM0-XMM7 (or higher) registers need to be in upper case in MASM, so here are some macros that allow lower case registers (if, like me, you prefer them this way). You can extend these to do the XMM8-XMM15 (as i think they are called) registers that exist on the 64-bit processors.

IFDEF MM0
mm0 TEXTEQU MM0
mm1 TEXTEQU MM1
mm2 TEXTEQU MM2
mm3 TEXTEQU MM3
mm4 TEXTEQU MM4
mm5 TEXTEQU MM5
mm6 TEXTEQU MM6
mm7 TEXTEQU MM7
ENDIF

IFDEF XMM0
xmm0 TEXTEQU XMM0
xmm1 TEXTEQU XMM1
xmm2 TEXTEQU XMM2
xmm3 TEXTEQU XMM3
xmm4 TEXTEQU XMM4
xmm5 TEXTEQU XMM5
xmm6 TEXTEQU XMM6
xmm7 TEXTEQU XMM7
ENDIF


Ossa
Website (very old): ossa.the-wot.co.uk

MichaelW

If I recall correctly the case sensitivity for the register names is related to the relative positions of the option casemap:none and the .MMX or .XMM directives. This assembled and ran without error:


; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc
    .686
    .mmx
    .xmm
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
        x dq 0,0
    .code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    movss mm1, x
    movss xmm1, x
    inkey "Press any key to exit..."
    exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start


eschew obfuscation

Mark Jones

I have a question. What would be the best approach to using the most advanced FPU functions available for a given PC? Would you determine the FPU capabilities first with CPUID and set a global flag to branch to individual code sections? (If FPU=MMX jmp xxx else if FPU=XMM jmp yyy, etc.) Or is the most efficient method to setup a series of error handlers and just "skip" over the unsupported instructions? (i.e. setup errorhandler1, try xmm instruction, if fail setup errorhandler2, try mmx instruction, etc.)
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

Ossa

Personally, (if the speedup was very important) I would rewrite every procedure that could make use of more powerful features for each "level" that I wanted to support... e.g. one for FPU only, one for MMX, one for SSE, one for SSE2... then when they are called, I would not call them directly, but rather have a jump table (or call table... whatever), would load the addess and jump to it... the jump table would created at startup, so that no time is lost whilst running checking which mode we're in. Maybe that deserves some code:


.data?

pMathsHeavyProc DWORD ?

...

.code
start:

invoke CallTableInit

...

mov eax, pMathsHeavyProc
call eax

...

CallTableInit PROC
    ; if FPU only
    mov pMathsHeavyProc, offset MathsHeavyProcFPU

    ; if MMX
    mov pMathsHeavyProc, offset MathsHeavyProcMMX

    ; if SSE
    mov pMathsHeavyProc, offset MathsHeavyProcSSE

    ...

CallTableInit ENDP

MathsHeavyProcFPU PROC
    ...
MathsHeavyProcFPU ENDP

MathsHeavyProcMMX PROC
    ...
MathsHeavyProcMMX ENDP

MathsHeavyProcSSE PROC
    ...
MathsHeavyProcSSE ENDP


It's a lot of work, but I think it's worth it if you need the speed up.

Ossa
Website (very old): ossa.the-wot.co.uk

stanhebben

I recently rewrote a drawing routine so it used SSE2 instructions. The result was about 3 times faster.

I'll put it on my (future) website. It's part of a larger project, but I'll split it off and make it available as example.

daydreamer

so if I write MMX code w db 66h prefix, they will be SSE2
will non-SSE2 capability cpu's still perform them by ignoring prefix and execute them as MMX?
so in a loop only add ecx,stepping ; mov ecx,loopcount has different stepping and loopcount, depending if it executes on 16byte or 8byte data?

stanhebben

No, the sse instruction set is different from the mmx instruction set. If you want code that is compatible with both, then you'll have to write two separate pieces of code and switch between them, depending on the processor.

daydreamer

Quote from: stanhebben on May 27, 2006, 04:49:14 PM
No, the sse instruction set is different from the mmx instruction set. If you want code that is compatible with both, then you'll have to write two separate pieces of code and switch between them, depending on the processor.

Stan read this thread, download these macros and examine them in your favourite text editor
http://www.masm32.com/board/index.php?topic=973.0
I just ran a test and my cpu ignores my SSE2 integer opcodes and executes them still as MMX, because my cpu has no SSE2 caps
SSE2 integer is just MMX with db 66h prefix, you only need to have minor changes like loopiterations and stepping of adress 8 or 16
you only need to set these variables up in initialization after checking cpuid
http://nono40.developpez.com/sources/source0068/
you still understand the international x86 language in this example, which was what I found

I just need to run test on several nonSSE2 cap cpu's to confirm it works in general and not happened to work on just mine

stanhebben

Sorry, I wasn't aware of that.

Interesting behavior... I wonder how good it works.