array of unions

thomas_remkus · May 25, 2007, 04:53:05 PM

I am attempting to create an array of unions and no odd syntax I try seems to work out.

WINDOW_BITS     EQU (WINDOW_SIZE_X * WINDOW_SIZE_Y)

;easy union struct so i can get the RGB values
PixRGB UNION
    rgb DWORD ?
   STRUCT colors
        red BYTE ?
        grn BYTE ?
        blu BYTE ?
        alf BYTE ?
   ENDS
PixRGB ENDS

.data?
    pix PixRGB WINDOW_BITS

I'm not sure now to get this working. Can someone help a little?

evlncrn8 · May 25, 2007, 05:43:52 PM

PixRGB STRUCT
rgb union
DWORD ?
ends
red BYTE ?
grn BYTE ?
blu BYTE ?
alf BYTE ?
PixRGB ENDS

should do the trick

if in doubt check the masm32\include\windows.inc

and to use..

.data?

pix PixRgb ?

some union stuff in there

dsouza123 · May 25, 2007, 06:22:00 PM

Code Select


WINDOW_SIZE_X   EQU 6
WINDOW_SIZE_Y   EQU 4

WINDOW_BITS     EQU (WINDOW_SIZE_X * WINDOW_SIZE_Y)

.data?
  PixRGB UNION           ; easy union struct so i can get the RGB values
     rgb DWORD ?
     STRUCT              ; colors not needed
          red BYTE ?
          grn BYTE ?
          blu BYTE ?
          alf BYTE ?
     ENDS
  PixRGB ENDS

  pix PixRGB WINDOW_BITS dup (<?>)   ; 0..23 of PixRGB

  xip dd WINDOW_BITS dup (?)   ; equivalent using dwords

.code
start:
  mov [pix.grn + 9*4], 7   ; put 7 in grn, byte 37,  bytes go from 0 to 95

If you left colors in then the mov would be

Code Select


  mov [pix.colors.grn + 9*4], 7   ; put 7 in grn, byte 37

adding an extra level compared to rgb

Code Select


  mov eax, [pix.rgb + 9*4]
  mov ah, 7
  mov [pix.rgb + 9*4], eax

equivalent rgb code

Code Select


  and [pix.rgb + 9*4], 0FFFF00FFh
  or  [pix.rgb + 9*4], 700h

thomas_remkus · May 30, 2007, 09:46:33 PM

OK. I have tried for 5 days (putting in about 4 hours each day) to solve this and I'm just coming out flat. Now, frustrated, I have to post some more code here because I have been successful in getting MUCH of what I want but I can't seem to translate the rest.

Here's what I have ... I have recently been wow-ed by a certain person's firework's display that was written in ASM. Yes, "ronybc". It was fantastic! So I started to decode what he had done because it was very impressive and motivating. What I started with was a simple concept of just getting some blur/bloom/tracers/fade on the screen. I chose the mouse and a circle as a natural shape. This was all because I couldn't really understand ronybc's work.

Next, I needed a timer so I used the simple WM_TIMER with windows because it was easy enough. From there I created a DIB section and started to play. The results were really cool and better than I expected. So, the trick has been to convert this to MASM because is has so many normal elements I need in a normal day. It adds, multiplies, uses the Win32 API, uses a struct, blah, blah, blah.

The idea is really easy ... just take the current pixel location and find the other surrounding pixel locations. You average the RGB together (Rs together, Gs together, Bs together) and make your current pixel that color. Everything defaults to black so you get like this cool tracer effect. The math runs very fast and I presume I would be able to get even better performance from MASM.

Here's where everything is stopped. I can't do a proper loop for the life of me and get information to/from my array. From my code:

Code Select

WINDOW_SIZE_X   EQU 400
WINDOW_SIZE_Y   EQU 400
WINDOW_BITS     EQU (WINDOW_SIZE_X * WINDOW_SIZE_Y)
BLUR_GRID       EQU 9
BLUR_COUNT      EQU 1

PixRGB UNION
    rgb DWORD ?
   STRUCT colors
        red BYTE ?
        grn BYTE ?
        blu BYTE ?
        alf BYTE ?
   ENDS
PixRGB ENDS

.data
    ActualSize  DWORD   -1   
    ActualBits  DWORD   -1

.data?
    diPix       PixRGB WINDOW_BITS dup(<0>)
    pt          PixRGB BLUR_GRID dup(<0>)

You can see that I have an array of "dd"/"PixRGB" values for the entire graphic and another array with just 9 elements. The idea is to loop through all the standard pixels and populate the array of 9 with the pixels to blur. In C/C++ this was really easy. In MASM I am just lost trying to get values in and out of registers while attempting to keep some values only in registers for performance.

Code Select

posIndex = dibIndex + bmi.bmiHeader.biWidth + 1;
					
if (posIndex <= maxIndex)
	pt[0].rgb = diPix[posIndex];

posIndex = dibIndex + bmi.bmiHeader.biWidth;

if (posIndex <= maxIndex)
	pt[1].rgb = diPix[posIndex];

posIndex = dibIndex + bmi.bmiHeader.biWidth - 1;

if (posIndex <= maxIndex)
	pt[2].rgb = diPix[posIndex];
posIndex = dibIndex + 1;

if (posIndex <= maxIndex)
	pt[3].rgb = diPix[posIndex];

pt[4].rgb = diPix[dibIndex];

posIndex = dibIndex - 1;

if (posIndex >= 0)
	pt[5].rgb = diPix[posIndex];

posIndex = dibIndex - bmi.bmiHeader.biWidth + 1;

if (posIndex >= 0)
	pt[6].rgb = diPix[posIndex];

posIndex = dibIndex - bmi.bmiHeader.biWidth;

if (posIndex >= 0)
	pt[7].rgb = diPix[posIndex];

posIndex = dibIndex - bmi.bmiHeader.biWidth - 1;

if (posIndex >= 0)
	pt[8].rgb = diPix[posIndex];

diPix[dibIndex] = RGB( \
	((pt[0].c.red + pt[1].c.red + pt[2].c.red + pt[3].c.red + pt[4].c.red + pt[5].c.red + \
	pt[6].c.red + pt[7].c.red + pt[8].c.red) / 9), \
	((pt[0].c.grn + pt[1].c.grn + pt[2].c.grn + pt[3].c.grn + pt[4].c.grn + pt[5].c.grn + \
	pt[6].c.grn + pt[7].c.grn + pt[8].c.grn) / 9), \
	((pt[0].c.blu + pt[1].c.blu + pt[2].c.blu + pt[3].c.blu + pt[4].c.blu + pt[5].c.blu + \
	pt[6].c.blu + pt[7].c.blu + pt[8].c.blu) / 9));

The above just becomes some sort of soup of attempt after attempt to just get the syntax correct. I can't seem to make my syntax work at all. My loop at least seems to work:

Code Select

            ;ecx - used as the actual byte offset for the rgb bytes
            mov ecx, 0
            
            mov eax, BLUR_COUNT
            .if eax > 0
                mov blurCount, 0
                blur_loop:
                    mov dibIndex, 0
                    pix_loop:
                        ;set edx to the current index location offset
                        mov eax, dibIndex
                        mul ecx
                        mov edx, eax
                        
                        ;need blur code here.
                
                        ;increase the btye offset
                        add ecx, 4
                        
                        ;increase the index count
                        mov eax, dibIndex
                        inc eax
                        mov dibIndex, eax
                        
                        ;check to see if we are at the end of the byte array
                        cmp eax, ActualSize
                    jnz pix_loop
                    
                    ;increase the blur count so we know how many times we need to blur the image
                    add blurCount, 1
                    mov eax, blurCount
                    
                    ;see if we are at the end of our blur count
                    cmp eax, BLUR_COUNT
                jnz blur_loop
            .endif

... But I honestly have no idea because I can't get output to see if it's doing something. Hey, it's not flashing any more errors asking to send my information to Microsoft (shoot, I wish I could turn that off!) but I can't get what I want either.

Can someone help me to UNDERSTAND what I need to do to make this work. I'm hoping that providing enough of my own code (imperfect as it is) someone will be able to explain this to me in terms of my project so I might relate better. I do NOT want to have my code completed for me so please don't post a complete solution. Some guidance from you mighty programmers is all that I ask. Once complete, I would then like to show this off and learn how I could have made it better at that time.

Thanks for any/all help!!

Humbly,
thomas :eek

Tedd · May 31, 2007, 11:12:41 AM

Well your loops look okay, but since you're not using the actual value of blurCount I would probably start it at BLUR_COUNT and then decrease blurCount until zero - but, yes I know, no optimisation yet :P

I'm guessing it's probably the edge-checking that's messing you around, so I would ignore it for now - in fact, as it will also be causing a major slow-down in the main loop, I'd remove it and do the edge pixels as a separate loop (since there's really no need to check whether the middle pixels are on the edges).
Soo.. Start on the second row, second column (1,1), and then do that whole row to lastColumn-1, and then each row until lastRow-1 -- then you don't need to check if you're on an edge, and it simplifes that code a bit.
Secondly, instead of copying the pixels to another array and then adding up the components, have 3 counters (red_sum, grn_sum, blu_sum) and add on each of the components as you go through each pixel; then you just divide at the end to get the average -- wow, simplification and speed-up :bg

Re-arranging your code a bit for summing the (middle; no edge-checking) pixels -- should make it easier to impliment..

Code Select

//** TOP **
posIndex = dibIndex + bmi.bmiHeader.biWidth - 1;
pt[0].rgb = diPix[posIndex];

++posIndex;
pt[1].rgb = diPix[posIndex];

++posIndex;
pt[2].rgb = diPix[posIndex];

//** MIDDLE **
posIndex += bmi.bmiHeader.biWidth-2
pt[3].rgb = diPix[posIndex];

pt[4].rgb = diPix[dibIndex];

posIndex = dibIndex + 1;
pt[5].rgb = diPix[posIndex];

//** BOTTOM **
posIndex += bmi.bmiHeader.biWidth-2
pt[6].rgb = diPix[posIndex];

++posIndex;
pt[7].rgb = diPix[posIndex];

++posIndex;
pt[8].rgb = diPix[posIndex];

(You had this doing the pixels right-to-left, I changed it left-to-right - no big difference really, as they're only averaged anyway, but I think it's easier to understand this way.)

thomas_remkus · May 31, 2007, 01:18:19 PM

Thank you for the suggestions! Those logic optimizations are really going to help performance, it's true, but I'm having issues getting my values with MASM. I like the idea about just adding each R,G,B to a value and then getting that average, but I can't seem to get the values in MASM. Things I have tried are:

Code Select

;get a corner
mov eax, ecx
add eax, bmi.bmiHeader.biWidth
inc eax

.if eax <= maxIndex
   mov ecx, biPix[eax].rgb
   mov pt[0].rgb, eax
.endif

I will update the proto project that's in C/C++ with the enhancements you suggested. They are very good and logical and are sure to increase my performance. For now, I would still like to get this working as such because that's what's embedded in my head. What's up with my syntax that I can't seem to get this RGB value? And WOW, I feel constrained by registers so much. Does it have something to do with LEA?

Tedd · May 31, 2007, 05:34:17 PM

"biPix[eax]" is taken to mean "[OFFSET biPix + eax]" - which is what you want, except that the array 'index' (eax) isn't automatically sized for you, so array elements better be a byte each otherwise it doesn't do what you want. You'll have to multiply the array index by the size of each element (luckily for us, that's only 4 and it can be done in the same instruction = "mov ecx, biPix[4*eax].rgb"; equally for accessing "pt")

ecx = biPix[eax]
pt[0] = eax
..notice the unfortunate mistake? :lol

One extra thing I would suggest is that you don't allocate large arrays (i.e. diPix) in the data/data? section, anything large-ish should really be allocated using the memory allocation functions (if you notice how long it's taking a assemble your code - that's why.)

thomas_remkus · May 31, 2007, 09:03:22 PM

Very cool information. I'll try that when I get home (working right now).

As far as the array. I notice if I put ...

Code Select

diPix PixRGB WINDOW_BITS dup(<0>)

... in the ".data" section the compile takes longer and the EXE is larger. If I place this in the ".data?" section it's fast and small. The code to get to this is the same. Is there a performance difference? I would imaging that anything that does not need to use dynamic memory would be much faster and that's what I'm attempting to go for.

Tedd · June 01, 2007, 11:24:14 AM

Yes, the .data section is stored in the exe itself, and then copied into memory at load-time, whereas the .data? section is taken to be inherently 'empty' so it doesn't need to be stored - its size is given in the exe and then it's allocated at load-time (and automatically zeroed.)

As for allocating memory at run-time Vs at load-time, there's no difference once the program is actually running (unless you start taking memory alignment and caching effects in account - then I'd say run-time can do slightly better). At load-time, the .data? section is allocated for you and zeroed - it's just a block of memory with enough space for your declared variables. If you allocate at run-time (using one of HeapAlloc, GlobalAlloc, VirtualAlloc) then you get back a block of memory to do with as you wish. In the case of large buffers, I would generally recommend the latter (this may largely be personal preference, but it feels 'dirty' allocating large amounts in the .data? section; and too much is definitely a bad thing - sometimes it will cause the program to refuse to load, or crash randomly.)

Constantly allocating memory dynamically is obviously going to be slower simply due to repeatedly allocating, freeing, allocating, allocating.. etc, but if you just allocate one big block once then there isn't any performance hit. VirtualAlloc is the best (though slightly harder to use) for this kind of thing - it just allocates 'raw' blocks memory with no decoration (the others have a little overhead for keeping track of variable size allocations and block chains; they actually end up using VirtualAlloc anyway, and allocate chunks from that 'pool.')

thomas_remkus · June 01, 2007, 12:44:55 PM

.data and .data? ... that's very good information. I think I read about this but now it seems to hit home. Thanks for the explanation.

When working with some generic testing in C++ I tried wanted to see for myself the effects of brute force password creation. I created a small array of char of 21 elements. Then I just looped through and tried to make my name from all the standard characters. I then did the test with allocated memory (malloc) and the same test yielded such different results I might as well given up.

From this I learned ... brute force password creation would have been longer than my lifetime, and that allocated memory was the worst thing in the world. Since then, only for high performance needs I try to keep significant arrays as load-time arrays. When running the prototype of this application I tried both methods and found this to be true and increased performance by a factor of 3. That's why in my translation of proto (c/c++) to "production" (masm) it's in there. After I get this working, I will try VirtualAlloc.

Thanks for the info!

Tedd · June 01, 2007, 07:10:57 PM

Quote from: thomas_remkus on June 01, 2007, 12:44:55 PM
When working with some generic testing in C++ I tried wanted to see for myself the effects of brute force password creation. I created a small array of char of 21 elements. Then I just looped through and tried to make my name from all the standard characters. I then did the test with allocated memory (malloc) and the same test yielded such different results I might as well given up.

Without seeing the C++ code, I can only guess, but it sounds like there was a lot of object creation/destruction going on, and that will have contributed to the slow down; or if the array was passed into a function, or any of the other numerous situations where a copy is secretly created for you. If you simply allocate a static block and then re-use that repeatedly (pass by pointer, or reference), I see no reason for it to cause a performance difference.

News:

array of unions

thomas_remkus

evlncrn8

dsouza123

thomas_remkus

Tedd

thomas_remkus

Tedd

thomas_remkus

Tedd

thomas_remkus

Tedd