News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Pixel Extraction with SIMD

Started by johnsa, September 05, 2011, 08:54:04 AM

Previous topic - Next topic

johnsa

Hey,

I was trawling around in some old code and found this piece of code which I've started re-using in my current projects.
I store my color information as floats A,R,G,B (real4 0.0f - 1.0f) and I needed a fast way to extract those back into a normal UINT for drawing.

The best piece of code I could find / come up with is the following:



movaps xmm0,[esi]                                                ; Load the colour as 4 floats: | A | R | G | B | (range 0.0f - 1.0f)
mulps xmm0,oword ptr COLOR3D_NORMALIZE           ; Convert the range back to 0 - 255
cvttps2dq xmm0,xmm0                                           ; Convert the floats back to ints
pshufb xmm0,oword ptr pixelMask2                           ; Because we know the range is limited to 0-255, we can safely extract the byte portions of each int component
movd dword ptr [edi+edx],xmm0                             ; Draw to screen



Is there a better way to do this? I can't think of anything myself

qWord

Quote from: johnsa on September 05, 2011, 08:54:04 AMIs there a better way to do this? I can't think of anything myself
you may want to reduce the instruction set requirements to SSE2 by replacing pshufb with other shuffel/shift-operations.
FPU in a trice: SmplMath
It's that simple!

qWord

it is maybe faster to convert 4 pixels at a time (SSE2):
COLOR3D_NORMALIZE REAL4 4 dup (255.0)
...
lea esi,colors
movaps xmm0,OWORD ptr [esi+0*16]
movaps xmm1,xmm0
movaps xmm2,OWORD ptr [esi+2*16]
movaps xmm3,xmm2

punpckhdq xmm0,OWORD ptr [esi+1*16]
punpckldq xmm1,OWORD ptr [esi+1*16]
punpckhdq xmm2,OWORD ptr [esi+3*16]
punpckldq xmm3,OWORD ptr [esi+3*16]

movdqa xmm4,xmm0
movdqa xmm5,xmm1

punpckhqdq xmm0,xmm2
punpcklqdq xmm4,xmm2
punpckhqdq xmm1,xmm3
punpcklqdq xmm5,xmm3

mulps xmm0,COLOR3D_NORMALIZE
mulps xmm4,COLOR3D_NORMALIZE
mulps xmm1,COLOR3D_NORMALIZE
mulps xmm5,COLOR3D_NORMALIZE

cvtps2dq xmm0,xmm0
cvtps2dq xmm4,xmm4
cvtps2dq xmm1,xmm1
cvtps2dq xmm5,xmm5

pslldq xmm0,3
pslldq xmm4,2
pslldq xmm1,1

por xmm0,xmm4
por xmm1,xmm5
por xmm0,xmm1

movdqa OWORD ptr ...,xmm0
FPU in a trice: SmplMath
It's that simple!

Farabi

Do you think it is faster than convensional instruction?
Those who had universe knowledges can control the world by a micro processor.
http://www.wix.com/farabio/firstpage

"Etos siperi elegi"

qWord

FPU in a trice: SmplMath
It's that simple!