News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

RGBA to BGRA (and back again)

Started by oex, May 24, 2010, 01:31:00 AM

Previous topic - Next topic

FORTRANS

Quote from: hutch-- on May 26, 2010, 04:53:10 PM
On old hardware I wonder if this approach is of any use.

Hi,

   Updated my data in Reply #8.

Cheers,

Steve N.

jj2007

Possible variants. I feel handicapped because both my CPUs don't have pshufb...

RgbSwapLingo2 proc ; lpSrc, bytes
pop edx
pop eax
pop ecx ; N*4 pixels
lea ecx, [eax+4*ecx] ; create limit
pshufd xmm0, oword ptr Msk, 0E4h
@@:
movdqa xmm1, [eax]
add eax, 16
pshufb xmm1, xmm0
cmp eax, ecx
movdqa [eax-16], xmm1
jl @b
jmp edx
RgbSwapLingo2 endp

RgbSwapLingo3 proc ; lpSrc, bytes
pop edx
pop eax
pop ecx ; N*4 pixels
lea ecx, [eax+4*ecx] ; create limit
pshufd xmm0, oword ptr Msk, 0E4h
@@:
movdqa xmm1, [eax] ; unrolled once
pshufb xmm1, xmm0
movdqa [eax], xmm1
movdqa xmm1, [eax+16]
pshufb xmm1, xmm0
movdqa [eax+16], xmm1
add eax, 32
cmp eax, ecx
jl @b
jmp edx
RgbSwapLingo3 endp

lingo

"What programm are you using?
I've modified the test bed a bit and add lingo's test:"


I don't think unrolled variants are safety...anyway:

Test for correctness:
BGRAbgraBGR1bgr1
RGBArgbaRGB1rgb1
BGRAbgraBGR1bgr1
RGBArgbaRGB1rgb1
BGRAbgraBGR1bgr1
RGBArgbaRGB1rgb1
BGRAbgraBGR1bgr1
BGRAbgraBGR1bgr1

12421   cycles for RgbSwap MichaelW
8217    cycles for RgbSwap Hutch
3256    cycles for RgbSwapSSE2 qWord
2068    cycles for RgbSwapLingo (SSSE3)
1556    cycles for RgbSwapLingoUnrolled
8215    cycles for RgbSwap Hutch2
14371   cycles for RgbSwapSSE
0       cycles

Press any key to exit...

hutch--


i7 quad 2.8 gig

10131   cycles for RgbSwap MichaelW
6638    cycles for RgbSwap Hutch
2681    cycles for RgbSwapSSE2 qWord
2667    cycles for RgbSwapLingo (SSSE3)
1259    cycles for RgbSwapLingoUnrolled
6645    cycles for RgbSwap Hutch2
8405    cycles for RgbSwapSSE
0       cycles

Press any key to exit...

Core2 quad 3.0 gig

12433   cycles for RgbSwap MichaelW
8237    cycles for RgbSwap Hutch
3260    cycles for RgbSwapSSE2 qWord
2068    cycles for RgbSwapLingo (SSSE3)
1556    cycles for RgbSwapLingoUnrolled
8224    cycles for RgbSwap Hutch2
14363   cycles for RgbSwapSSE
0       cycles

Press any key to exit...
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

hutch--

 :P

Coming to you from my 2002 SHITBOX, this is the fastest of the legacy algos. I tweaked Michaels bswap algo and got it about 25% faster but a bswap and ror are enough to make it slower than 2 memory reads and 2 memory writes per pixel.



RgbSwapH2 proc rsSrc, rsBytes

    push ebx
    push esi
    push edi

    mov esi, rsBytes
    mov edi, rsSrc
    shr esi, 2
@@:
    mov al, BYTE PTR [edi]
    mov bl, BYTE PTR [edi+2]
    mov [edi], bl
    mov [edi+2], al
    mov cl, BYTE PTR [edi+4]
    mov dl, BYTE PTR [edi+2+4]
    mov [edi+4], dl
    mov [edi+2+4], cl
    mov al, BYTE PTR [edi+8]
    mov bl, BYTE PTR [edi+2+8]
    mov [edi+8], bl
    mov [edi+2+8], al
    mov cl, BYTE PTR [edi+12]
    mov dl, BYTE PTR [edi+2+12]
    mov [edi+12], dl
    mov [edi+2+12], cl
    add edi, 16
    dec esi
    jne @B
   
    pop edi
    pop esi
    pop ebx

    ret

RgbSwapH2 endp
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Neo

Quote from: qWord on May 28, 2010, 01:19:59 PM
Quote from: Neo on May 28, 2010, 05:14:03 AMThat's what I did for the data labelled "PSHUFB" on the plot above, also on a core 2 duo.
What programm are you using?
I'm using Inventor IDE with the not-yet-released performance testing add-in, which I can't seem to get completely separated from the main app, so I keep delaying its release.  Maybe I should just have it in the main app.  It's really handy, but still has a few kinks to be worked out (e.g. the performance test settings don't get saved yet).