I'm trying to use SSE/SSE2 instructions in my program, but the compiler doesn't recognize them. For example,
movss mmx1, [esi+eax]
generates the following error:
error A2006: undefined symbol : mmx1
I'm using masm 6.15, and xmm is enabled (.686 .xmm)
Anyone knows what I've forgotten?
I don't know much about this, but I can tell you that the registers are mm0-mm7 and xmm0-xmm7.
For the sake of completeness, I'll repost the macros I put here before the "event"...
the MM0-MM7 and XMM0-XMM7 (or higher) registers need to be in upper case in MASM, so here are some macros that allow lower case registers (if, like me, you prefer them this way). You can extend these to do the XMM8-XMM15 (as i think they are called) registers that exist on the 64-bit processors.
IFDEF MM0
mm0 TEXTEQU MM0
mm1 TEXTEQU MM1
mm2 TEXTEQU MM2
mm3 TEXTEQU MM3
mm4 TEXTEQU MM4
mm5 TEXTEQU MM5
mm6 TEXTEQU MM6
mm7 TEXTEQU MM7
ENDIF
IFDEF XMM0
xmm0 TEXTEQU XMM0
xmm1 TEXTEQU XMM1
xmm2 TEXTEQU XMM2
xmm3 TEXTEQU XMM3
xmm4 TEXTEQU XMM4
xmm5 TEXTEQU XMM5
xmm6 TEXTEQU XMM6
xmm7 TEXTEQU XMM7
ENDIF
Ossa
If I recall correctly the case sensitivity for the register names is related to the relative positions of the option casemap:none and the .MMX or .XMM directives. This assembled and ran without error:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
include \masm32\include\masm32rt.inc
.686
.mmx
.xmm
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
.data
x dq 0,0
.code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
movss mm1, x
movss xmm1, x
inkey "Press any key to exit..."
exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start
I have a question. What would be the best approach to using the most advanced FPU functions available for a given PC? Would you determine the FPU capabilities first with CPUID and set a global flag to branch to individual code sections? (If FPU=MMX jmp xxx else if FPU=XMM jmp yyy, etc.) Or is the most efficient method to setup a series of error handlers and just "skip" over the unsupported instructions? (i.e. setup errorhandler1, try xmm instruction, if fail setup errorhandler2, try mmx instruction, etc.)
Personally, (if the speedup was very important) I would rewrite every procedure that could make use of more powerful features for each "level" that I wanted to support... e.g. one for FPU only, one for MMX, one for SSE, one for SSE2... then when they are called, I would not call them directly, but rather have a jump table (or call table... whatever), would load the addess and jump to it... the jump table would created at startup, so that no time is lost whilst running checking which mode we're in. Maybe that deserves some code:
.data?
pMathsHeavyProc DWORD ?
...
.code
start:
invoke CallTableInit
...
mov eax, pMathsHeavyProc
call eax
...
CallTableInit PROC
; if FPU only
mov pMathsHeavyProc, offset MathsHeavyProcFPU
; if MMX
mov pMathsHeavyProc, offset MathsHeavyProcMMX
; if SSE
mov pMathsHeavyProc, offset MathsHeavyProcSSE
...
CallTableInit ENDP
MathsHeavyProcFPU PROC
...
MathsHeavyProcFPU ENDP
MathsHeavyProcMMX PROC
...
MathsHeavyProcMMX ENDP
MathsHeavyProcSSE PROC
...
MathsHeavyProcSSE ENDP
It's a lot of work, but I think it's worth it if you need the speed up.
Ossa
I recently rewrote a drawing routine so it used SSE2 instructions. The result was about 3 times faster.
I'll put it on my (future) website. It's part of a larger project, but I'll split it off and make it available as example.
so if I write MMX code w db 66h prefix, they will be SSE2
will non-SSE2 capability cpu's still perform them by ignoring prefix and execute them as MMX?
so in a loop only add ecx,stepping ; mov ecx,loopcount has different stepping and loopcount, depending if it executes on 16byte or 8byte data?
No, the sse instruction set is different from the mmx instruction set. If you want code that is compatible with both, then you'll have to write two separate pieces of code and switch between them, depending on the processor.
Quote from: stanhebben on May 27, 2006, 04:49:14 PM
No, the sse instruction set is different from the mmx instruction set. If you want code that is compatible with both, then you'll have to write two separate pieces of code and switch between them, depending on the processor.
Stan read this thread, download these macros and examine them in your favourite text editor
http://www.masm32.com/board/index.php?topic=973.0
I just ran a test and my cpu ignores my SSE2 integer opcodes and executes them still as MMX, because my cpu has no SSE2 caps
SSE2 integer is just MMX with db 66h prefix, you only need to have minor changes like loopiterations and stepping of adress 8 or 16
you only need to set these variables up in initialization after checking cpuid
http://nono40.developpez.com/sources/source0068/
you still understand the international x86 language in this example, which was what I found
I just need to run test on several nonSSE2 cap cpu's to confirm it works in general and not happened to work on just mine
Sorry, I wasn't aware of that.
Interesting behavior... I wonder how good it works.