The MASM Forum Archive 2004 to 2012

General Forums => The Workshop => Topic started by: donkey on January 09, 2005, 04:02:45 AM

Title: Graphics a la FPU
Post by: donkey on January 09, 2005, 04:02:45 AM
Hi All,

I have been playing with the idea of doing some YUV <> RGB conversions using the FPU, mostly for fun but maybe also to include in the graphics library. I am not very versed on the use of the FPU so I am posting the code to see if anyone can A: fix it up and make it look pretty, B: convert it to integer math with the same approx level of accuracy or C: suggest a completely different way to do this...

Note that this is probably ripe with bad assumptions about the FPU...

/*
Y = 0.299 R + 0.587 G + 0.114 B
U = 0.492 (B - Y)
V = 0.877 (R - Y)
*/

YUV STRUCT
Y DD ?
U DD ?
V DD ?
ENDS

DATA SECTION
n877 DQ 0.877
n492 DQ 0.492
n114 DQ 0.114
n299 DQ 0.299
n587 DQ 0.587
n236 DQ 236.1
CODE SECTION

YUV2RGB FRAME pYUV
LOCAL RED :D
LOCAL GREEN :D
LOCAL BLUE :D

mov esi,[pYUV]

; finit
; ########## RED (R = (V+0.877Y)/0.877)
fild D[esi+YUV.Y]
fld Q[n877]
fmulp ;ST0, ST1
fiadd D[esi+YUV.V]
fdiv Q[n877]
fistp D[RED]
mov eax,[RED]
or eax,eax
jns >
; add one for the zero crossing if necessary
inc D[RED]
:
and D[RED],0FFh

; ########## BLUE (B = (U+0.492Y)/0.492)
fild D[esi+YUV.Y]
fld Q[n492]
fmulp
fiadd D[esi+YUV.U]
fdiv Q[n492]
fist D[BLUE]
mov eax,[BLUE]
or eax,eax
jns >
; add one for the zero crossing if necessary
inc D[BLUE]
:
and D[BLUE],0FFh

; ########## GREEN (G = (Y-(0.299R + 0.114B))/0.587)
fild D[BLUE]
fld Q[n114]
fmul
fild D[RED]
fld Q[n299]
fmul
fadd st0,st1
fild D[esi+YUV.Y]
fsub ST0, ST1
fld Q[n587]
fdiv
fistp D[GREEN]
mov eax,[GREEN]
or eax,eax
jns >
; add one for the zero crossing if necessary
inc D[GREEN]
:
and D[GREEN],0FFh

mov eax,[BLUE]
shl eax,8
mov al,[GREEN]
shl eax,8
mov al,[RED]
RET
ENDF

RGB2YUV FRAME clrRGB, pYUV
LOCAL RED :D
LOCAL GREEN :D
LOCAL BLUE :D
LOCAL Y :Q

mov esi, [pYUV]
mov eax,[clrRGB]
and eax,0FFh
mov [RED],eax

mov eax,[clrRGB]
shr eax,8
and eax,0FFh
mov [GREEN],eax

mov eax,[clrRGB]
shr eax,16
and eax,0FFh
mov [BLUE],eax

; finit
; ######### Y
fild D[RED]
fld Q[n299]
fmul
fild D[BLUE]
fld Q[n114]
fmul
fild D[GREEN]
fld Q[n587]
fmul
fadd ST0,ST1
fadd ST0,ST2
fist D[esi+YUV.Y]

; ######### U
fild D[BLUE]
fsub ST0,ST1
fld Q[n492]
fmul
fistp D[esi+YUV.U]

; ######### V
fild D[RED]
fsub ST0,ST1
fld Q[n877]
fmul
fistp D[esi+YUV.V]

RET
ENDF
Title: Re: Graphics a la FPU
Post by: daydreamer on January 09, 2005, 01:09:50 PM
Well you only need upper 8bits for each channel in the usual 0-255 /channel
first calculating your constants to 8:24 fixed point dword constants and:
pseudocode:
ALU mul for 0.114 x B
packed dword mul for 0.299*R at the same time 0.587 G (ALU and mmx different pipelines)
two add to get Y and store two copies of Y for later packed operation
packed sub B-Y and R-Y at the same time
packed mul with 0.492 and 0.877


btw I thought this subject on a first glance was fpu graphics ala rendering to a A32R32G32B32 texture
on newer gpu's which support floating point textures

Title: Re: Graphics a la FPU
Post by: dioxin on January 09, 2005, 01:37:41 PM
Donkey,
   why would you want to use the FPU to do this? (unless it's just an exercise in familiarising yourself with the FPU).


I'd do it with normal integer maths although MMX may be better, I assume you want to avoid that for compatibility with older CPUs.

Things to look at:

MUL does EAX*r/m32 and puts the result in EDX:EAX pair.

Suppose EAX contains the COLOUR (in this case R, G or B) and r.m32 contains 2^32 (I know it doesn't quite fit, but stick with it for now!)
The result in EDX:EAX will then be:
EDX= COLOUR, EAX=0

If instead I put 2^32*0.5 i.e. 2^31 in r/m32 and do the same then the result will be:
EDX=COLOUR/2, EAX= possibly a leftover bit in the msb.

i.e. I half the value since I loaded EAX with half of 2^32.


Now, if instead I'd used r/m32= 2^32*0.299 = 4C8B4395h then the result would be:
EDX= COLOUR*0.299 and EAX= remainder.

EDX is the result you want!

So, you can do 32 bit accuracy multiplies by your "FP" constants using just integer maths and you'll get accuracy way beyond what you need.


Your code for RGB to YUV would become something like,(completely untested)

mov eax,  2^32*0.299
mul [red]
mov ebx,edx                        'ebx=0.299R

mov eax,  2^32*0.114
mul [blue]
add ebx,edx                    'ebx=0.299R+0.144B

mov eax,  2^32*0.587
mul [green]
add ebx,edx                     'ebx=0.299R+0.144B+0.587G = Y

mov Y,ebx                       'save Y

mov eax,[blue]
sub eax,ebx                     'eax=B-Y
imul 2^32*0.492              'edx=U
mov U,edx                       'save U

mov eax,[red]
sub eax.ebx                     'eax=R-Y
imul 2^32*0.877              'edx=V
mov V,edx


I'm sure the same sort of thing could be done with YUV to RGB

Paul.
Title: Re: Graphics a la FPU
Post by: donkey on January 09, 2005, 01:53:39 PM
Hi Paul,

I found that the level of inaccuracy was too high if I was not dealing in floating point, actually the routines I posted are too inaccurate for any real usage even with the FPU. These ones are the modified ones to get rid of spurious FFh in place of 0 when some combinations are used. The problem is that whenever you convert the YUV to an integer you get a whole mess of problems. Yes, it was just an exercise for fun but I would also very much like to find a 100% accurate way of doing it, so far only the FPU has offered me that.

I also did an approximation in integer form to get the YUV but my output values were trashed for example I would get the following

in                         out
RGB = 000000CEh  RGB = 00FF01CEh

Only off by one in Blue and Green, not bad for an approx. but useless for color.

I had used the YUV approximation suggested by Microsoft...

Y = ( (  66 * R + 129 * G +  25 * B + 128) >> 8) +  16
U = ( ( -38 * R -  74 * G + 112 * B + 128) >> 8) + 128
V = ( ( 112 * R -  94 * G -  18 * B + 128) >> 8) + 128

I also used a formula similar to the one you have used where I did the math then shifted out the lower contents. But in all tests it failed to consistently yeild the same RGB pattern as I input. This has lead me to believe that to be accurate I must pass the YUV as a float and deal with the speed on that basis. I am hoping that someone can come up with a way to convert to integer math and still maintain at least a 99% accuracy with no roll-overs (ie 00 becomes FF or 01)

The following yeilds a perfect RGB>YUV>RGB conversion which is what I need

YUV STRUCT
Y DQ ?
U DQ ?
V DQ ?
ENDS

DATA SECTION
n877 DQ 0.877
n492 DQ 0.492
n114 DQ 0.114
n299 DQ 0.299
n587 DQ 0.587

CODE SECTION

YUV2RGB FRAME pYUV
uses esi
LOCAL RED :D
LOCAL GREEN :D
LOCAL BLUE :D

mov esi,[pYUV]

finit

; ########## RED (R = (V+0.877Y)/0.877)
fld Q[esi+YUV.Y]
fld Q[n877]
fmulp ;ST0, ST1
fadd Q[esi+YUV.V]
fdiv Q[n877]
fistp D[RED]
and D[RED],0FFh

; ########## BLUE (B = (U+0.492Y)/0.492)
fld Q[esi+YUV.Y]
fld Q[n492]
fmulp
fadd Q[esi+YUV.U]
fdiv Q[n492]
fist D[BLUE]
and D[BLUE],0FFh

fild D[BLUE]
fld Q[n114]
fmul
fild D[RED]
fld Q[n299]
fmul
fadd st0,st1
fld Q[esi+YUV.Y]
fsub ST0, ST1
fld Q[n587]
fdiv
fistp D[GREEN]
and D[GREEN],0FFh

mov eax,[BLUE]
shl eax,8
mov al,[GREEN]
shl eax,8
mov al,[RED]
RET
ENDF

RGB2YUV FRAME clrRGB, pYUV
uses esi
LOCAL RED :D
LOCAL GREEN :D
LOCAL BLUE :D
LOCAL Y :Q

/*
Y = 0.299 R + 0.587 G + 0.114 B
U = 0.492 (B - Y)
V = 0.877 (R - Y)
*/
finit

mov esi, [pYUV]
mov eax,[clrRGB]
and eax,0FFh
mov [RED],eax

mov eax,[clrRGB]
shr eax,8
and eax,0FFh
mov [GREEN],eax

mov eax,[clrRGB]
shr eax,16
and eax,0FFh
mov [BLUE],eax

; finit
; ######### Y
fild D[RED]
fld Q[n299]
fmul
fild D[BLUE]
fld Q[n114]
fmul
fild D[GREEN]
fld Q[n587]
fmul
fadd ST0,ST1
fadd ST0,ST2
fst Q[esi+YUV.Y]

; ######### U
fild D[BLUE]
fsub ST0,ST1
fld Q[n492]
fmul
fstp Q[esi+YUV.U]

; ######### V
fild D[RED]
fsub ST0,ST1
fld Q[n877]
fmul
fstp Q[esi+YUV.V]

RET
ENDF
Title: Re: Graphics a la FPU
Post by: dioxin on January 09, 2005, 02:07:12 PM
Donkey,
   Things to look at, part 2

Your equations for Y, U and V should be rearranged to avoid the DIVs. Instead, multiply by the reciprocal. Since it's only constant values that you divide by then this will be no extra programming but will run a lot quicker.


e.g. you have R = (V+0.877Y)/0.877

i.e. R = V/0.877 + Y
i.e. R = V*1.140 + Y

We now have 1 MUL and 1 ADD instead of 1 ADD, 1 MUL and 1 DIV.

The same can be done with your other equations to give

R = V*1.140 + Y
B = U*2.032 + Y
G = Y*1.703 - R*0.509 - G*0.194


   I might look into the rounding problems later..

Paul.
Title: Re: Graphics a la FPU
Post by: donkey on January 09, 2005, 02:44:05 PM
Hi dioxin,

Thanks, that works very well and is 100% accurate for the range 0..0FFFFFFh  :U

YUV2RGB_NEW FRAME pYUV
uses esi
LOCAL RED :D
LOCAL GREEN :D
LOCAL BLUE :D
LOCAL garbage :Q

DATA SECTION
n2p032 DQ 2.032
n1p703 DQ 1.703
n1p14 DQ 1.14
np509 DQ 0.509
np194 DQ 0.194

CODE SECTION

mov esi,[pYUV]

finit

fld Q[esi+YUV.V]
fld Q[n1p14]
fmul
fld Q[esi+YUV.Y]
fadd ST0, ST1
fist D[RED]
fxch ST0,ST1
fstp Q[garbage]

fld Q[esi+YUV.U]
fld Q[n2p032]
fmul
fld Q[esi+YUV.Y]
fadd ST0, ST1
fist D[BLUE]
fxch ST0,ST1
fstp Q[garbage]

fld Q[esi+YUV.Y]
fld Q[n1p703]
fmul

; Bring RED to the batters box...
fxch ST0,ST2
fld Q[np509]
fmul

; Bring BLUE to the batters box...
fxch ST0,ST1
fld Q[np194]
fmul

; Bring Y to the batters box
fxch ST0,ST2
fsub ST0,ST1
fsub ST0,ST2
fistp D[GREEN]

mov eax,[BLUE]
shl eax,8
mov al,[GREEN]
shl eax,8
mov al,[RED]

RET
ENDF


I'm not completely sure if the fxch function is optimal but I assumed that it had to be better than anything that moved data out of or back into the FPu.
Title: Re: Graphics a la FPU
Post by: MichaelW on January 09, 2005, 08:57:38 PM
Hi Donkey,

I don't know what you intend to do with the YUV values, but assuming they will ultimately end up as integers, it seems to me that an RGB-YUV-RGB conversion followed by a test for matching input and output values is not a valid measure of conversion accuracy. It works when the YUV values are stored as real numbers, but it should fail if the YUV values are stored as integers. I think judging the accuracy of integer-based conversion routines will require something more sophisticated.
Title: Re: Graphics a la FPU
Post by: dioxin on January 09, 2005, 09:13:09 PM
Donkey,
it can certainly be done without the FPU and in a similar number of instructions to your FPU solution but the solution I have is a bit fiddly and gives the YUV values as fixed point integers.
I have a working version (PowerBASIC syntax unfortunately!) which I'll try to tidy up a bit before I post it.

Paul.
Title: Re: Graphics a la FPU
Post by: raymond on January 09, 2005, 09:23:36 PM
Y = 0.299 R + 0.587 G + 0.114 B
U = 0.492 (B - Y)
V = 0.877 (R - Y)

The values of R, G and B are always positive integers ranging from 0 to 255.

According to the above equations,
- the value of Y would always be positive and also range from 0 to 255,
- the value of U can range from -111 for pure yellow (R=255,G=255,B=0) to +111 for pure blue (R=0,G=0,B=255),
- the value of V can range from -157 for pure cyan (R=0,G=255,B=255) to +157 for pure red (R=255,G=0,B=0)

This is a typical application for the use of fixed point math.
- Using the lower 16 bits of a DWORD for the fractional part provides an accuracy equivalent to 5 decimal places; ALL the numbers used above only have an accuracy of 3 digits.
- This leaves the upper 16 bits of the DWORD for the integer part with a range of -32767 to +32767, ALL the numbers used above being well within that range.

(MASM syntax is more familiar for most readers and is used for the following code. The YUV struct being a simple one, offsets within the struct were used instead of the more complex struct type addressing.)

YUV STRUCT
Y DD ?
U DD ?
V DD ?
ENDS

.data
; the various factors are initialized with the
; decimals converted to binary fractions

   n114  dd  114*65536/1000  ;0.114
   n299  dd  299*65536/1000  ;0.299
   n492  dd  492*65536/1000  ;0.492
   n587  dd  587*65536/1000  ;0.587
   n877  dd  877*65536/1000  ;0.877
   n236  dd  2361*65536/10   ;236.1

; although that last variable n236 is not used, it has
; been included as an additional example of initialization.

;%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

RGB2YUV proc USES esi clrRGB:DWORD, pYUV:DWORD

LOCAL  RED    :DWORD
LOCAL  BLUE   :DWORD

   mov  esi,pYUV

;compute Y = 0.299 R + 0.587 G + 0.114 B

   mov  eax,clrRGB
   and  eax,0FFh   ;isolate the RED
   mov  RED,eax
   imul n299
   mov  ecx,eax    ;use ECX as the accumulator

   mov  eax,clrRGB
   shr  eax,8
   and  eax,0FFh   ;isolate the GREEN (will not be reused)
   imul n587
   add  ecx,eax

   mov  eax,clrRGB
   shr  eax,16
   and  eax,0FFh   ;isolate the BLUE
   mov  BLUE,eax
   imul n114
   add  ecx,eax    ;ECX = Y = 0.299 R + 0.587 G + 0.114 B

   shr  ecx,16     ;shift out the fractional portion
                   ;The last bit shifted out is copied to
                   ;the CARRY flag and is the one equivalent
                   ;to a decimal fraction of 0.5
   adc  ecx,0      ;ECX = Y rounded to the nearest integer
   mov  [esi],ecx  ;store it

;compute U = 0.492 (B - Y)

   mov  eax,BLUE
   sub  eax,ecx    ;(B - Y)
   imul n492
   or   eax,eax    ;test for negative sign
   pushf           ;keep flags
   jns  @F
   neg  eax        ;make it positive for rounding
@@:
   shr  eax,16     ;shift out the fractional portion
   adc  eax,0
   popf            ;retrieve sign flag
   jns  @F
   neg  eax
@@:
   mov  [esi+4],eax ;store the U value

;compute V = 0.877 (R - Y)

   mov  eax,RED
   sub  eax,ecx    ;(R - Y)
   imul n877
   or   eax,eax    ;test for negative sign
   pushf           ;keep flags
   jns  @F
   neg  eax        ;make it positive for rounding
@@:
   shr  eax,16     ;shift out the fractional portion
   adc  eax,0
   popf            ;retrieve sign flag
   jns  @F
   neg  eax
@@:
   mov  [esi+8],eax ;store the V value
   ret
endp

;%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

YUV2RGB proc USES esi pYUV:DWORD

LOCAL RED     :DWORD
LOCAL GREEN   :DWORD
LOCAL BLUE    :DWORD

   mov esi,pYUV

; RED (R = (V/0.877+Y))
   mov  eax,[esi+8]
   shl  eax,16      ;shift it to the integer bits
   cdq              ;extend sign to EDX
   idiv n877        ;will give the integer part of V/0.877
                    ;in the lower 16 bits of EAX
                    ;and the fractional part in
                    ;the lower 16 bits of EDX
   add  eax,[esi]   ;+Y = R
   or   eax,eax     ;test for sign
   jns  @F
   xor  eax,eax     ;negative values can only be due to
                    ;to rounding errors
   jmp storeRED
@@:
   shl  dx,1        ;the most significant bit of the
                    ;fractional part is equivalent to 0.5
                    ;and gets transferred to the CARRY flag
   adc  eax,0       ;increment result if fraction > 0.5
storeRED:
   cmp  eax,255
   jbe  @F
   mov  eax,255
@@:
   mov  RED,eax

; BLUE (B = (U/0.492+Y))
   mov  eax,[esi+4]
   shl  eax,16      ;shift it to the integer bits
   cdq              ;extend sign to EDX
   idiv n492        ;will give the integer part of U/0.492
                    ;in the lower 16 bits of EAX
                    ;and the fractional part in
                    ;the lower 16 bits of EDX
   add  eax,[esi]   ;+Y = B
   or   eax,eax     ;test for sign
   jns  @F
   xor  eax,eax     ;negative values can only be due to
                    ;to rounding errors
   jmp storeBLUE
@@:
   shl  dx,1        ;the most significant bit of the
                    ;fractional part is equivalent to 0.5
                    ;and gets transferred to the CARRY flag
   adc  eax,0       ;increment result if fraction > 0.5
storeBLUE:
   cmp  eax,255
   jbe  @F
   mov  eax,255
@@:
   mov  BLUE,eax

; GREEN (G = (Y-(0.299R + 0.114B))/0.587)
   mov  ecx,[esi]   ;use ECX as accumulator and
                    ;initialize it with Y
   shl  ecx,16      ;shift it to the integer bits
   mov  eax,RED
   mul  n299
   sub  ecx,eax
   mov  eax,BLUE
   mul  n114
   sub  ecx,eax     ;ECX = Y-(0.299R + 0.114B)
   mov  eax,ecx
   cdq              ;extend sign to EDX
   idiv n587
   or   eax,eax     ;test for sign
   jns  @F
   xor  eax,eax     ;negative values can only be due to
                    ;to rounding errors
   jmp storeGREEN
@@:
   shl  dx,1        ;the most significant bit of the
                    ;fractional part is equivalent to 0.5
                    ;and gets transferred to the CARRY flag
   adc  eax,0       ;increment result if fraction > 0.5
storeGREEN:
   cmp  eax,255
   jbe  @F
   mov  eax,255
@@:
   mov  GREEN,eax

   mov  eax,BLUE
   shl  eax,8
   add  eax,GREEN
   shl  eax,8
   add  eax,RED
   ret
endp


This was based on your original post. It could be modified easily to yield an accuracy equivalent to 5 decimal places if those YUV values are to be used only internally.

If you would rather use the FPU, let me know and I will prepare my comments on your code.

Raymond

EDIT: Had forgotten to add code to check for overflow in YUV2RGB proc. Now added

Title: Re: Graphics a la FPU
Post by: dioxin on January 09, 2005, 10:48:02 PM
It works but it's still a bit untidy. PowerBASIC syntax but it's close enough to MASM to see how it's done.
No FPU used.
Runs an RGB-> YUV ->RGB conversion in under 60clks, about 20 for RGB-> YUV and about 40 for YUV->RGB



'R, G, B are integers from 0-255
'Y, U, V are fixed point integers, 9bits before and 23 bits after the point

'define constants
v0299&=2^23*0.299
v0587&=2^23*0.587
v0114&=2^23*0.114

v0492&=2^32*0.492
v0877&=2^31*0.877  'not 2^32 otherwise it overflows into sign bit. Correct after using it

v1140&=2^24/0.877    '1/0.877   
v2032&=2^24/0.492    '1/0.492   
v1703&=2^24/0.587    '1/0.587   



'RGB -> YUV
!movzx  eax,byte ptr col[1]    'eax=green
!movzx  edi,byte ptr col[2]    'edi=blue
!movzx  esi,byte ptr col       'esi=red

!mul v0587&      ;G*0.587       ;green
!mov ebx,eax     ;accumulate result in ebx

!mov eax,edi     ;blue
!mul v0114&      ;B*0.114
!add ebx,eax

!mov eax,esi     ;'red
!mul v0299&      ;R*0.299
!add ebx,eax

!mov y,ebx      ;store Y        'Y done


!mov eax,edi    ;blue
!shl eax,23     ;line up with Y
!sub eax,ebx    ;(B-Y)
!imul v0492&    ;0.492*(B-Y)
!mov U,edx      ;U done


!mov eax,esi    ;red
!shl eax,23     ;line up with Y
!sub eax,ebx    ;(R-Y)
!imul v0877&    ;0.877*(R-Y)

!shl eax,1      ;double result to correct for v0877& being half size to prevent overflow
!rcl edx,1

!mov V,edx      ;V done



'YUV -> RGB
!mov eax,U
!imul v2032&        ;U*1/0.492

!mov ebx,Y          ;line up Y
!shr ebx,8

!adc edx,ebx        ;Y+U/0.492

!add edx,&h4000     ;round if needed
!shr edx,15         ;scale
!mov blue,edx       ;blue done

   

!mov eax,V
!imul v1140&        ;V * 1/0.877

!mov ebx,Y          ;line up Y
!shr ebx,8

!adc edx,ebx        ;Y+V/0.877

!add edx,&h4000     ;round if needed
!shr edx,15         ;scale
!mov red,edx        ;red done



!mov eax,blue
!mul v0114&         ;B*0.114
!mov ebx,eax        ;acumulate in ebx

!mov eax,red
!mul v0299&         ;red*0.299
!add ebx,eax        ;acumulate in ebx

!mov eax,Y

!sub eax,ebx        ;Y-(B*0.114 + red*0.299)
!imul v1703&        ;(Y-(B*0.114 + red*0.299))/0.587


!add edx,&h4000     ;round if needed
!shr edx,15         ;scale

!mov green,edx      ;green done


!mov eax,blue       ;merge RGB into a single value, col2
!shl eax,8
!or eax,green
!shl eax,8
!or eax,red
!mov col2,eax
Title: Re: Graphics a la FPU
Post by: donkey on January 10, 2005, 12:02:42 AM
Hi Raymond and Dioxin,

Thanks, I will take a look at them as soon as I thaw out, darn Calgary winter.

MichaelW,

Not really planning on anything right now, I have a grayscale routine and am testing an MMX based contrast so outside of those 2 functions I can't even imagine an application. However, I came across the formulae when I did the grayscale some time ago and decided to have another look. I am sure there is some practical application, perhaps a unique kind of processing or something. For now though I took it on as an exercise in graphics and the FPU. However, an integer or better still MMX version would be nice to add to the library of graphics functions.
Title: Re: Graphics a la FPU
Post by: raymond on January 10, 2005, 01:10:55 AM
donkey

On further analysis of your requirements relative to the accuracy of the conversion to/from RGB/YUV, and running some actual code, here are my conclusions:

RGBs are normally converted to YUVs to perform some transformation of the overall color (such as brightness) and then converted back to RGBs for proper display. Conversions are not done simply for the sake of conversion.

Using 3 significant digits for the conversion factors and the YUV values is generally sufficient to return RGBs within +/-1 of the original RGB values, which is almost impossible to detect visually.

Running the conversion back and forth 10 consecutive times would definitely produce variations larger than the +/-1. However, even if the computations are performed with an accuracy of 19 digits (such as with the FPU), exactly the same variations would still be observed if the results of the computations are simply stored as 3-digit integers. The only way to obtain the maximum accuracy would be to store the YUV values as 80-bit floats, which may make it difficult to handle when performing the overall color adjustments.

Raymond

P.S. In case you copied my code posted originally, I edited it to add the necessary code to prevent overflow of the RGB values.
Title: Re: Graphics a la FPU
Post by: donkey on January 10, 2005, 01:24:10 AM
Hi Raymond,

Yes, I tested my integer algo and came up with +/- 1 as well, however as an example a value of black -1 yeilds FF and that is not really acceptable in graphics applications. The algorithm seemed to fail when a color was in the middle-top of the range 0CEh+, Ofcourse clipping it is the solution however I did not find a reliable way to clip the rollover based on direction (ie rolls over 0 going down or up).

QuoteThe only way to obtain the maximum accuracy would be to store the YUV values as 80-bit floats

Which is why I finally ended up at the FPU and stored the YUV as 3 floats. Using the FPU but storing as integers required too many fixes to the code, hence my second attempt with a modified FPU algo. Brightness is a good application for this, I geuss it would involve only adjusting the LUMA while leaving the color difference relatively untouched, I will have to investigate the possibilities though I already have a pretty decent adjust LUMA function that I wrote for TBPaint there is always room for improvement.
Title: Re: Graphics a la FPU
Post by: donkey on January 10, 2005, 01:42:55 AM
BTW,

I agree that there is little need to convert to YUV just for the sake of converting, which is why I have no problem with using floats in the YUV structure. But accuracy is very important when dealing with images, which is why I am looking for 99% or so, I have 100% but at the cost of speed, every number in the range works perfectly but I admit that I have not tested the speed of the FPU one yet though to convert RGB>YUV>RGB without error takes about 1.5 seconds on my P3-700 for all 16.7 million colors. Still to slow for any kind of image processing but OK for small bitmaps. I hope to get the time to run a speed vs accuracy test this week sometime on all the algos.
Title: Re: Graphics a la FPU
Post by: raymond on January 10, 2005, 05:39:28 AM
This is getting interesting.

I would suspect that graphics editors mostly use 32-bit floats to compute YUVs, perform their modifications on those floats and then convert from the modified floats back to RGBs.

I thus ran some tests for that assumption and found that the RGBs seem to be reconstructed with 100% accuracy based on several different samples I tested throughout the RGB ranges.

I then modified the "CPU" code to store the YUV values with the 16 bits of the fraction and, as expected, the RGBs seem to be reconstructed with 100% accuracy based on several similar samples throughout the RGB ranges.

Attached are the two files of the source code for doing the testing (one RGB at a time), each containing the two necessary procedures.

If your tests should prove that the CPU route may be the fastest, I can provide more help for the computations you would need to do on that type of fixed point data.

Raymond

Edit: Had changed the location of some labels before zipping the source files and had forgotten to remove one of the redundant labels. New zip file is corrected.


[attachment deleted by admin]
Title: Re: Graphics a la FPU
Post by: raymond on January 10, 2005, 09:11:51 PM
Here are the results and observations of more tests to verify the accuracy and relative speeds of the FPU 32-bit floats and CPU 32-bit fixed-point math algos while retaining the full precision of each for the YUV values.

The entire range of RGB values from 000000 to FFFFFF was verified by first converting the RGB to YUV, converting back from the stored YUV to RGB, and comparing the returned RGB to its original one. Both algos were 100% accurate.

The time required on my P3-550 to perform those 16,777,216 iterations was:
7.3 seconds for the CPU-based algo
8.5 seconds for the FPU-based algo

Removing the "finit" instruction from each of the two procedures in the FPU algo (and using it only once before starting the timing) resulted in reducing the measured time to: 6.9 seconds.
(This would necessitate that the programmer can manage the content of the FPU registers to prevent "FPU stack overflow")

The above time was further reduced to: 5.3 seconds
by changing the "Precision Control" bits of the FPU Control Word for REAL4 precision instead of the default REAL10 precision when the FPU is initialized. (Doing this after the finit instruction when it is retained in each procedure had no significant effect on the time required.)

Raymond
Title: Re: Graphics a la FPU
Post by: dioxin on January 10, 2005, 09:29:48 PM
Raymond,
the non-FPU version I posted earlier is also 100% accurate and for the whole range 0-FFFFFF it runs in 0.65s on an Athlon XP2600+ (1900MHz).

A closer comparrison with yours, it runs in 3.03secs on a  K6-III/400MHz.

Paul.
Title: Re: Graphics a la FPU
Post by: raymond on January 11, 2005, 02:05:29 AM
dioxin,

I agree that your code would be 100% accurate for converting RGBs to YUVs and back to RGBs. The main speed improvement I can see is performing a multiplication with a declared reciprocal instead of a division. I will test that variation in my algo. I don't think that the other major differences should have that much of an effect on timing, most of the cycles being taken by multiplications.

The only problem I would have with your version is how well it would perform when you start modifying the stored YUV values with only 9 bits available for the integer portion, one of those bits being required for the sign. Any increase of the absolute value of the integer above 255 would change the sign and then be disastrous on the resulting RGB.

Raymond
Title: Re: Graphics a la FPU
Post by: raymond on January 11, 2005, 05:41:51 PM
Replacing the divisions with multiplications (of reciprocals) yielded the following relative improvements over previously reported results (still with 100% accuracy).

CPU algo: from 7.3 down to 4.1 seconds
FPU algo: from 5.3 down to 3.8 seconds

Adapting dioxin's code provided only a marginal improvement over my CPU algo but does not perform any checking for underflow nor overflow.

It would seem that the FPU would be the route to take. It should be simpler and less prone to errors (and more portable) to work with floats when modifying the YUVs.

Raymond
Title: Re: Graphics a la FPU
Post by: dioxin on January 11, 2005, 05:46:26 PM
Raymond,
Quotebut does not perform any checking for underflow nor overflow.
No checking needs to be done since no overflow is possible when starting with RGB values.

Paul.
Title: Re: Graphics a la FPU
Post by: raymond on January 12, 2005, 01:20:48 AM
Quote from: dioxin on January 11, 2005, 05:46:26 PM
Raymond,
Quotebut does not perform any checking for underflow nor overflow.
No checking needs to be done since no overflow is possible when starting with RGB values.

Paul.

I totally agree with you on that point. However, RGBs are converted to YUV to perform some operations on those values before converting the modified YUVs back to RGBs. Those modifications could result in an overflow/underflow situation. The YUV2RGB procedure MUST check that possibility to avoid erroneous results.

As an example, increasing the brightness of a color where the RED is already at its maximum would increase the value of each of the color components, including that of the RED, which would then overflow. If the increase is small, the level of the RED could go from FF to 02 which is certainly not what you should expect from an increase of brightness.

Raymond

Title: Re: Graphics a la FPU
Post by: dioxin on January 12, 2005, 11:53:25 PM
Raymond,
   I see what you're saying. In the past I've only ever used YUV as a broadcast standard, never to process data. Video processing was always done in RGB, and then, at the end, converted to YUV ready for transmission.

   I can see that my method doesn't check for over/underflow when converting (presumably invalid) data from YUV to RGB but shouldn't it be the job of the YUV processing to make sure the output is valid?


   Having said that, I think the main problem here is that there has been no standard specified for how YUV data should be stored.
   RGB appears to be in the form 00BBGGRR but everyone here has chosen their own way to represent YUV. Perhaps Donkey has a "standard" in mind for how YUV should be stored? Perhaps there is a real standard way to store YUV that everyone (except me) knows? If there is to be signal processing of the YUV data then we need a standard way to store it.


   On the business of CPU vs FPU.
   I reckon there is scope to use both to give the best result.
   The RGB->YUV conversion is probably too simple to use both but the YUV-> RGB conversion is more complex and may allow the FPU to do useful stuff while the CPU is also doing useful stuff.

   I've already modified my code to speed it up a bit. The RGB to YUV conversion can be done as follows:


mov ebx,col        ;get the RGB colour
movzx edi,bh       ;edi=green
movzx esi,bl       ;esi=red
shr ebx,16         ;ebx=blue

imul edi,&h4B22D0      ;GREEN*0.587
imul edx,ebx,&hE978D   ;BLUE*0.114
imul ecx,esi,&h2645A1  ;RED*0.299

add edi,edx            ;accumulate in edi
add edi,ecx            ;accumulate in edi. edi now contains Y

mov y,edi      ;store Y


mov eax,esi    ;red
shl eax,23     ;line up with Y
sub eax,edi    ;(R-Y)
imul v0877&    ;0.877*(R-Y)

shl eax,1      ;double result to correct for v0877& being half size to prevent overflow
rcl edx,1

mov V,edx      ;V done


mov eax,ebx    ;blue
shl eax,23     ;line up with Y
sub eax,edi    ;(B-Y)
imul v0492&    ;0.492*(B-Y)
mov U,edx      ;U done



   I'm reluctant to do any further work if the format of the YUV values is in question since a lot of the code is very specific to the way the data is stored.


Paul.
Title: Re: Graphics a la FPU
Post by: donkey on January 12, 2005, 11:59:40 PM
Hi Guys,

I'm really sorry that I haven't gotten back to this, my work week sucks up much of my time and I plan to tackle the problem again this weekend with some approximations that I have been trying to work out. My preferred format is unimportant though I have chosen in my test to use 3 QWORD floats (REAL8) . Because this is an internal only type routine it is open for change.
Title: Re: Graphics a la FPU
Post by: dioxin on January 13, 2005, 12:23:52 AM
Donkey,
Quote
3 QWORD floats
   Aww, bugger. That messes up my method!


Paul
Title: Re: Graphics a la FPU
Post by: raymond on January 13, 2005, 04:40:52 AM
My tests seemed to indicate that 32-bit floats may be accurate enough if stored in that format (vs storing only their rounded integer value). Using 64-bit floats may be an overkill.

Raymond
Title: Re: Graphics a la FPU
Post by: donkey on January 13, 2005, 08:05:24 AM
Hi Raymond,

Yes, I tested my routine with 3 32 bit floats and it works so I think 3 x 32 bit floats is the structure I will be using. I am not sure that it is possible to reliably convert back and forth across the full range without the FPU so for now I am looking at optimizing the FPU routine. As I have little experience with the FPU it is a bit difficult for me to know where the bottle necks are and what can be done more effectively. Certainly the finit is a problem, but the routine is unstable without it.

YUV STRUCT
Y DD ?
U DD ?
V DD ?
ENDS

YUV2RGB FRAME pYUV
uses esi
LOCAL RED :D
LOCAL GREEN :D
LOCAL BLUE :D
LOCAL garbage :D

CONST SECTION
n2p032 DD 2.032
n1p703 DD 1.703
n1p14 DD 1.14
np509 DD 0.509
np194 DD 0.194

CODE SECTION

mov esi,[pYUV]

finit

fld D[esi+YUV.V]
fld D[n1p14]
fmul
fld D[esi+YUV.Y]
fadd ST0,ST1
fist D[RED]
fxch ST0,ST1
fstp D[garbage]

fld D[esi+YUV.U]
fld D[n2p032]
fmul
fld D[esi+YUV.Y]
fadd ST0,ST1
fist D[BLUE]
fxch ST0,ST1
fstp D[garbage]

fld D[esi+YUV.Y]
fld D[n1p703]
fmul

; Bring RED to the batters box...
fxch ST0,ST2
fld D[np509]
fmul

; Bring BLUE to the batters box...
fxch ST0,ST1
fld D[np194]
fmul

; Bring Y to the batters box
fxch ST0,ST2
fsub ST0,ST1
fsub ST0,ST2
fistp D[GREEN]

and D[GREEN],0FFh
and D[RED],0FFh
and D[BLUE],0FFh

mov eax,[BLUE]
shl eax,8
or eax,[GREEN]
shl eax,8
or eax,[RED]

RET
ENDF

RGB2YUV FRAME clrRGB, pYUV
uses esi
LOCAL RED :D
LOCAL GREEN :D
LOCAL BLUE :D

CONST SECTION
n877 DD 0.877
n492 DD 0.492
n114 DD 0.114
n299 DD 0.299
n587 DD 0.587

CODE SECTION

/*
Y = 0.299 R + 0.587 G + 0.114 B
U = 0.492 (B - Y)
V = 0.877 (R - Y)
*/
finit

mov esi, [pYUV]
mov eax,[clrRGB]
and eax,0FFh
mov [RED],eax

mov eax,[clrRGB]
shr eax,8
and eax,0FFh
mov [GREEN],eax

mov eax,[clrRGB]
shr eax,16
and eax,0FFh
mov [BLUE],eax

; ######### Y
fild D[RED]
fld D[n299]
fmul
fild D[BLUE]
fld D[n114]
fmul
fild D[GREEN]
fld D[n587]
fmul
fadd ST0,ST1
fadd ST0,ST2
fst D[esi+YUV.Y]

; ######### U
fild D[BLUE]
fsub ST0,ST1
fld D[n492]
fmul
fstp D[esi+YUV.U]

; ######### V
fild D[RED]
fsub ST0,ST1
fld D[n877]
fmul
fstp D[esi+YUV.V]

RET
ENDF
Title: Re: Graphics a la FPU
Post by: dioxin on January 13, 2005, 02:57:34 PM
Donkey,
  you can speed up the splitting of 00BBGGRR into the colours using something like:

mov ebx,col        ;get the RGB colour
movzx edi,bh       ;edi=green
movzx esi,bl       ;esi=red
shr ebx,16         ;ebx=blue
mov red,esi        ;store in memory where the FPU can get at them
mov green,edi
mov blue,ebx


Quote
Certainly the finit is a problem, but the routine is unstable without it.
It looks like you're overflowing the FPU stack. You load lots onto it but rarely seem to pop anything off it.
Don't forget, an instruction like FMUL leaves one operand on the stack and overwrites the other with the result. It doesn't pop the stack unless you explicitly tell it to using FMULP.
If you make sure the stack is sorted then the FINIT problem will be solved.

Paul.

edit: It looks like FMUL with no parameters might compile as FMULP which does pop the stack but the jist of my comment is still valid, you need to explicitly pop the stack using the P version of the instructions e.g. FADDP not FADD, FSUBP not FSUB unless you want to keep the old value on the stack for later.
Title: Re: Graphics a la FPU
Post by: dioxin on January 13, 2005, 04:49:11 PM
Donkey,
   to try to show the problem more clearly, the FPU part of your RGB->YUV routine does this:
   

; ######### Y           st0             st1             st2             st3             st4
fild D[RED]             R
fld D[n299]             n299            R
fmul                    red*.299
fild D[BLUE]            B               .299R
fld D[n114]             n144            B               .299R
fmul                    .144B           .299R
fild D[GREEN]           G               .144B           .299R
fld D[n587]             n587            G               .144B           .299R
fmul                    .587G           .144B           .299R
(1)     fadd ST0,ST1            .587G+.114R     .144B           .299R
(2)     fadd ST0,ST2            Y               .144B           .299R
fst D[esi+YUV.Y]        Y               .144B           .299R

; ######### U
fild D[BLUE]            B               Y               .144B           .299R
fsub ST0,ST1            B-Y             Y               .144B           .299R
fld D[n492]             .492            B-Y             Y               .144B           .299R
fmul                    U               Y               .144B           .299R
fstp D[esi+YUV.U]       Y               .144B           .299R

; ######### V
fild D[RED]             R               Y               .144B           .299R
(3)     fsub ST0,ST1            R-Y             Y               .144B           .299R
fld D[n877]             n877            R-Y             Y               .144B           .299R
fmul                    V               Y               .144B           .299R
fstp D[esi+YUV.V]       Y               .144B           .299R 


   
   It looks like there are 3 remaining items on the stack when there should be none.

   at (1) you should have used faddp st1,st0 to remove the .144B from the stack
   this would add ST0 to ST1, but then pops the stack leaving the result in ST0

   at (2) you should have used faddp st1,st0 to remove the .299R from the stack
   this would add ST0 to ST1, but then pops the stack leaving the result in ST0

   at (3) you should have used fsubrp  to remove Y from the stack.
   this would sub st1 from st0, pop the stack leaving result in st0

   You have similar problems in the YUV->RGB code.

   If you sort out these then it shouldn't be necessary to finit before each conversion.


Quote
I am not sure that it is possible to reliably convert back and forth across the full range without the FPU

   That's not right. I did it in my earlier posting.
   Keep in mind, the ALU is 32 bit, I used 9.23 as the fixed point integer format, although only 31 bits were often used to prevent overflows.
   The FPU using SINGLEs (32-bit FPU values) only has 24 bit precision, the other bits are exponent.
   So, it's LESS accurate to use FP SINGLES than it is to us fixed point 32 bit integers.


Paul.
Title: Re: Graphics a la FPU
Post by: raymond on January 13, 2005, 09:39:36 PM
donkey

Working with values already on the FPU is always faster than loading it from memory every time you need it. It only requires good management of the FPU registers. Good practice is to keep track of the content of each FPU register after each FPU instruction (and knowing what effect those instructions will do to the registers).

Unless you are using REAL10 (80-bit) floats or QWORD integers from memory, it is generally not necessary to load a memory variable before using it with the content of ST0 (such as adding, multiplying, etc.).

In the modified following code:
- instructions which have not been changed are left with the original indent.
- instructions which have been modified have an extra 3 space indent
- new instructions have only 3 spaces less indent
- instructions which have been deleted are preceded with  ;;;
- comments have been added for the content of FPU registers

Your reciprocal constants have been declared with maximum precision for REAL4 floats. Otherwise, your conversion would not be accurate.

Your logic for computing the GREEN has been corrected.

Code has been added to correct the computed RGBs for underflow/overflow. Simply ANDing with FF is wrong. If you slightly decrease a value of 0, it would become negative such as FFFFFFFE for -2. If you only AND it with FF, it would result in an intensity of FE which is certainly not what you would expect. Similarly, slightly increasing a maximum value of FF could give 102 which would result in only 02 when ANDed with FF; that color component get eliminated, again not what you want.

I tried to continue with your syntax except for the use of the @@: label which I don't know if you can use.

Raymond
YUV STRUCT
      Y     DD    ?
      U     DD    ?
      V     DD    ?
ENDS

YUV2RGB FRAME pYUV
      uses esi
      LOCAL RED   :D
      LOCAL GREEN :D
      LOCAL BLUE  :D
;;;      LOCAL garbage :D

CONST SECTION
n2p032 DD 2.0325203    ;1/0.492
n1p703 DD 1.7035775    ;1/0.587
n1p14       DD 1.1402509    ;1/0.877
   n114 DD 0.114
   n299 DD 0.299
;;; np509       DD 0.509
;;; np194       DD 0.194

CODE SECTION

mov esi,[pYUV]

;;; finit
   fld D[esi+YUV.Y]     ;load first, used several times
fld D[esi+YUV.V]  ;V   Y
;;; fld D[n1p14]
  fmul D[n1p14]   ;(V/.877)   Y
;;; fld D[esi+YUV.Y]
fadd ST0,ST1      ;(V/.877+Y)   Y

fist D[RED]       ;store rounded integer, keep value on FPU for reuse
;;; fxch ST0,ST1
;;; fstp D[garbage]

fld D[esi+YUV.U]  ;U   (RED)   Y
;;; fld D[n2p032]
   fmul D[n2p032] ;(U/.492)   (RED)   Y

;----------------------
;cleanup RED while FPU busy doing multiplication

   mov  eax, D[RED]
   or   eax,eax         ;test for negative
   jns  @F
   xor  eax,eax         ;replace with 0 if negative (underflow)
@@:
   cmp  eax,255
   jbe  @F
   mov  eax,255         ;replace with maximum if overflow
@@:
   mov  D[RED],eax
;----------------------

;;; fld D[esi+YUV.Y]
   fadd ST0,ST2   ;(U/.492+Y)   (RED)   Y

fist D[BLUE]      ;store rounded integer, keep value on FPU for reuse
;;; fxch ST0,ST1
;;; fstp D[garbage]

;;; fld D[esi+YUV.Y]
;;; fld D[n1p703]
;;; fmul

;;; ; Bring RED to the batters box...
   ;BLUE is currently in ST0
;;; fxch ST0,ST2
;;; fld D[np509]
   fmul  n114     ;(BLUE*0.114)   (RED)   Y

;----------------------
;cleanup BLUE while FPU busy doing multiplication

   mov  eax, D[BLUE]
   or   eax,eax
   jns  @F
   xor  eax,eax
@@:
   cmp  eax,255
   jbe  @F
   mov  eax,255
@@:
   mov  D[BLUE],eax
;---------------------

   fsubp ST2,ST0        ;(RED)   (Y-BLUE*0.114)

;;; ; Bring BLUE to the batters box...
   ;RED is now currently in ST0
;;; fxch ST0,ST1
;;; fld D[np194]
   fmul  n299     ;(RED*0.299)   (Y-BLUE*0.114)
   fsubp ST1,ST0        ;(Y-BLUE*0.114-RED*0.299)
   fmul  n1p703         ;(Y-BLUE*0.114-RED*0.299)/0.587 = GREEN

;;; ; Bring Y to the batters box
;;; fxch ST0,ST2
;;; fsub ST0,ST1
;;; fsub ST0,ST2
fistp D[GREEN]    ;ALL registers on the FPU are now free

;----------------------
;cleanup GREEN

   mov  eax, D[GREEN]
   or   eax,eax
   jns  @F
   xor  eax,eax
@@:
   cmp  eax,255
   jbe  @F
   mov  eax,255
@@:
   mov  D[GREEN],eax
;---------------------

;;; and D[GREEN],0FFh
;;; and D[RED],0FFh
;;; and D[BLUE],0FFh

mov eax,[BLUE]
shl eax,8
or eax,[GREEN]
shl eax,8
or eax,[RED]

RET
ENDF

RGB2YUV FRAME clrRGB, pYUV
uses esi
LOCAL RED :D
LOCAL GREEN :D
LOCAL BLUE :D

CONST SECTION
n877 DD 0.877
n492 DD 0.492
n114 DD 0.114
n299 DD 0.299
n587 DD 0.587

CODE SECTION

/*
Y = 0.299 R + 0.587 G + 0.114 B
U = 0.492 (B - Y)
V = 0.877 (R - Y)
*/
;;; finit

mov esi, [pYUV]
;;; mov eax,[clrRGB]
;;; and eax,0FFh
;;; mov [RED],eax

;;; mov eax,[clrRGB]
;;; shr eax,8
;;; and eax,0FFh
;;; mov [GREEN],eax

mov eax,[clrRGB]

;per dioxin suggestion
   movzx ecx,al
   mov [RED],ecx
   movzx ecx,ah
   mov [GREEN],ecx

shr eax,16
and eax,0FFh
mov [BLUE],eax

; ######### Y
fild D[RED]       ;RED
   fild D[BLUE]           ;BLUE   RED
   fld  ST1               ;RED   BLUE   RED
;;; fld D[n299]
   fmul D[n299]   ;(RED*.299)   BLUE   RED
fld  ST1          ;BLUE   (RED*.299)   BLUE   RED
;;; fld D[n114]
   fmul D[n114]   ;(BLUE*.114)   (RED*.299)   BLUE   RED
fild D[GREEN]     ;GREEN   (BLUE*.114)   (RED*.299)   BLUE   RED
;;; fld D[n587]
   fmul D[n587]   ;(GREEN*.587)   (BLUE*.114)   (RED*.299)   BLUE   RED
   faddp ST1,ST0  ;(GREEN*.587+BLUE*.114)   (RED*.299)   BLUE   RED
   faddp ST1,ST0  ;Y   BLUE   RED
fst D[esi+YUV.Y]  ;Y   BLUE   RED

; ######### U
;;; fild D[BLUE]
   fsub ST2,ST0   ;Y   BLUE   (R-Y)
   fsubp ST1,ST0          ;(B-Y)   (R-Y)
;;; fld D[n492]
   fmul D[n492]   ;((B-Y)*.492)   (R-Y)
fstp D[esi+YUV.U] ;(R-Y)

; ######### V
;;; fild D[RED]
;;; fsub D[RED]   ;(R-Y)
;;; fld D[n877]
   fmul D[n877]   ;((R-Y)*.877)
fstp D[esi+YUV.V] ;ALL registers on the FPU are now free

RET
ENDF

Title: Re: Graphics a la FPU
Post by: daydreamer on January 14, 2005, 01:52:17 AM
just for the fun, I am coding a SSE version
but I have to get my shuffles right first before I post it

ARGB    dw 00FFh,08040h ;ARGB format
ALIGN 16
CNSTRGB    REAL4 0.0,0.299,0.587,0.114
CBYRY   REAL4 0.492,0.877,0.492,0.877

msk    dq 000000FF000000FFh
.CODE
RGBtoYUV  PROC
    ;load constants before loop
    MOVAPS XMM7,[CNSTRGB]
    MOVAPS XMM6,[CBYRY]
    MOVQ MM2,[msk]
    PXOR MM1,MM1
    PINSRW MM0,[ARGB],0 ;load 16 bits at a time from ARGB
    PINSRW MM0,[ARGB+1],2
    PINSRW MM1,[ARGB+2],0
    PAND MM0,MM2
    PAND MM1,MM2 ;mask out upper half of 16bit numbers retrieved with PINSRW
   
    CVTPI2PS XMM0,MM0 ;mov and converts two dw->2floats A and R
    MOVLHPS XMM1,XMM0 ;movs to upper half two floats
    CVTPI2PS XMM1,MM1 ;mov and convert second two G and B
    MOVAPS XMM0,XMM1 ;copy in order to save B and R
       MULPS XMM0,XMM7
       MOVHLPS XMM2,XMM0
       ADDPS XMM0,XMM2 ;A + xG, xR+xB
               
       ;shufps ??? shuffle to line up for next add
       ADDSS XMM0,XMM2 ;final add
       MOVLHPS XMM0,XMM2 ;make two copies of Y
       ;pshufd instr to line up BY,RY
       SUBPS XMM0,XMM2
       MULPS XMM0,XMM6 ;multiply with last two constants
       ;xmm2 contains Y, xmm0 contains U and V     

    EMMS ;after looping thru all pixels in a picture
    ret
RGBtoYUV ENDP






Title: Re: Graphics a la FPU
Post by: daydreamer on January 17, 2005, 09:15:35 PM
could appreciate help of how I should get right values in SHUFPS, otherwise I have to solve it with a crappy intermediate on memory using MOVUPS
Title: Re: Graphics a la FPU
Post by: oex on April 05, 2010, 03:49:58 PM
What am I not understanding? I can find a ton of RGB->YUV converters but when it comes to converting YUV to RGB there are no sample versions on google :P I'm starting to understand what various YUV formats mean, my video camera records in I420/IYUV formats so I want to convert that back to RGB/A values I'm more familiar with for editing.... (Yes I have gathered that to convert back again RGB->IYUV I will loose quality)

PS sorry if is silly question, I am still trying to construct a picture of this after a long day/night coding :lol

EDIT: I guess it is a silly question.... just read first post and it has conversion both ways ty Edgar :bg.... but heh I've bumped a topic that may have some life still in it :)

* Further Edit: I'm rather confused over the many YUV formats my video camera software gives I420/IYUV formats and lists I420 (default) as 12 bit which M$ doesnt list so I'm assuming the above procs dont help.... Any thoughts appreciated

320                 40 01 00 00        @☺      biWidth
240                 F0 00 00 00        ≡      biHeight
12                  0C 00 00 00        ♀      biBits
808596553           49 34 32 30        I420      biCompression

115200              00 C2 01 00         biSize
0                   00 00 00 00         biXPelsPerMeter
0                   00 00 00 00         biYPelsPerMeter
808583169           01 00 32 30        ☺      biPlanes
Title: Re: Graphics a la FPU
Post by: oex on April 05, 2010, 07:01:47 PM
So.... If I understand correctly :/? Your methods were working on 4:4:4 YUV data....

YUV STRUCT
   Y   DD   ?
   U   DD   ?
   V   DD   ?
ENDS

Whilst I am working on 'downsampled' 4:2:2 data? Further I am working on 12 bit data rather than 96 bit data? So for me Y=8bit and U and V are 2 bits each?
Title: Re: Graphics a la FPU
Post by: FORTRANS on April 05, 2010, 09:52:04 PM
Quote from: oex on April 05, 2010, 07:01:47 PM
So for me Y=8bit and U and V are 2 bits each?

   Very unlikely.  It probably means a block of four Y samples,
two U samples, and two V samples.  Usually they are 8 bits
per sample.  But 12 is not out of the question.  Some of this
is mentioned in the JPEG specification for their usage.

Regards,

Steve
Title: Re: Graphics a la FPU
Post by: oex on April 05, 2010, 09:55:24 PM
I'm working off this: http://en.wikipedia.org/wiki/YUV/RGB_conversion_formulas#Y.27UV420p_.28and_Y.27V12.29

So far R value seems about right (10 off or so and I have the bits muddled) will soon know I think :lol....

115200              00 C2 01 00         biSize

on a 320x240 image so 115200/12 bits = 320x240 = 76800 so at least that seems to add up