I tried my hand at learning some MMX and this is the result.
I use the following formula:
dest = alpha * (source - dest) / 255 + dest;
Previously the best I could do was work on each channel of the pixel, so doing everything at once I imagine is a bit quicker.
I'm using a DIB Section for drawing, so I get a pointer to the bits. (which is dst)
color is a dword, and serves as the source.
Basically this is how I tried to lay it out...
00 XX 00 RR 00 GG 00 BB
-
00 XX 00 RR 00 GG 00 BB
*
00 AA 00 AA 00 AA 00 AA
/
00 FF 00 FF 00 FF 00 FF
+
00 XX 00 RR 00 GG 00 BB
pxor mm3, mm3 //clear register
mov eax, dword ptr [dst] //get everything ready
movd mm0, dword ptr [eax]
movd mm1, dword ptr [color] //src
movd mm2, dword ptr [alpha]
punpcklbw mm0, mm3 //unpack dst to words
punpcklbw mm1, mm3 //unpack color
punpcklbw mm2, mm2 //unpack alpha
punpcklbw mm2, mm2
punpcklbw mm2, mm3
psubusb mm1, mm0 //(color - dest)
pmullw mm1, mm2 //alpha * (color - dest)
psrlw mm1, 8 //alpha * (color - dest) / 256
paddusw mm1, mm0 //alpha * (color - dest) / 256 + dest
packuswb mm1, mm3
movd dword ptr [eax], mm1
Any ideas on improvements? It is very straight forward, but if anyone is aware of any neat tricks / improvements, please let me know.
http://www.masm32.com/board/index.php?topic=8783.msg63805#msg63805
Quote from: NightWare on April 05, 2009, 09:40:28 PM
http://www.masm32.com/board/index.php?topic=8783.msg63805#msg63805
thanks, this and some other reading lead me to the discovery of how awesome pshufw is for this.
http://www.tommesani.com/SSEPrimer.html
http://avisynth.org/mediawiki/Filter_SDK/Simple_MMX_optimization
pshufw mm2, mm2, 0
can be used instead of those 3 unpacks for alpha.
http://www.madwizard.org/programming/snippets