Hey,
I was trawling around in some old code and found this piece of code which I've started re-using in my current projects.
I store my color information as floats A,R,G,B (real4 0.0f - 1.0f) and I needed a fast way to extract those back into a normal UINT for drawing.
The best piece of code I could find / come up with is the following:
movaps xmm0,[esi] ; Load the colour as 4 floats: | A | R | G | B | (range 0.0f - 1.0f)
mulps xmm0,oword ptr COLOR3D_NORMALIZE ; Convert the range back to 0 - 255
cvttps2dq xmm0,xmm0 ; Convert the floats back to ints
pshufb xmm0,oword ptr pixelMask2 ; Because we know the range is limited to 0-255, we can safely extract the byte portions of each int component
movd dword ptr [edi+edx],xmm0 ; Draw to screen
Is there a better way to do this? I can't think of anything myself
Quote from: johnsa on September 05, 2011, 08:54:04 AMIs there a better way to do this? I can't think of anything myself
you may want to reduce the instruction set requirements to SSE2 by replacing pshufb with other shuffel/shift-operations.
it is maybe faster to convert 4 pixels at a time (SSE2):
COLOR3D_NORMALIZE REAL4 4 dup (255.0)
...
lea esi,colors
movaps xmm0,OWORD ptr [esi+0*16]
movaps xmm1,xmm0
movaps xmm2,OWORD ptr [esi+2*16]
movaps xmm3,xmm2
punpckhdq xmm0,OWORD ptr [esi+1*16]
punpckldq xmm1,OWORD ptr [esi+1*16]
punpckhdq xmm2,OWORD ptr [esi+3*16]
punpckldq xmm3,OWORD ptr [esi+3*16]
movdqa xmm4,xmm0
movdqa xmm5,xmm1
punpckhqdq xmm0,xmm2
punpcklqdq xmm4,xmm2
punpckhqdq xmm1,xmm3
punpcklqdq xmm5,xmm3
mulps xmm0,COLOR3D_NORMALIZE
mulps xmm4,COLOR3D_NORMALIZE
mulps xmm1,COLOR3D_NORMALIZE
mulps xmm5,COLOR3D_NORMALIZE
cvtps2dq xmm0,xmm0
cvtps2dq xmm4,xmm4
cvtps2dq xmm1,xmm1
cvtps2dq xmm5,xmm5
pslldq xmm0,3
pslldq xmm4,2
pslldq xmm1,1
por xmm0,xmm4
por xmm1,xmm5
por xmm0,xmm1
movdqa OWORD ptr ...,xmm0
Do you think it is faster than convensional instruction?
Quote from: Farabi on September 24, 2011, 07:39:03 AM
Do you think it is faster than convensional instruction?
yes