I have an two-dimensional array of 1-byte values stored in row-major order.
The base address of the array is in EBX.
The row is in ECX
the column is in EAX
The address of any particular byte in the 2-dimensional array is:
(row * COLUMN_COUNT) + col
What's the ideal way for accessing an element of a two-dimensional array
where the element size is a single byte?
I suppose there's no way to get around that multiplication (except in special circumstances).
If I'm accessing every element of the array, adding column count successively is the same as multiplication.
But I'm trying to find the ideal way to just access a single element. Am I forced to multiply?
For the general case I can see no way to avoid the multiply. But if you could arrange for the number of columns to be a power of two, then you could replace the multiply with a scale factor (2, 4, 8), or a left shift (larger powers of two).
Then again, for accessing the array randomly the access time might be more a function of caching effects than of the method used to generate the address.
yeah multiplying is the way to go, it's fast though so there's no worries. Divide is what's slow, but using this tool http://www.masm32.com/board/index.php?topic=9937.0, you can divide instead by multiplying with constants.
also if you want to get real optimized you can do a cache like system, where the most used/accessed array elements are pushed to the front of that or another array so you can get to em faster.
There is a way, but I don't know wheter is faster or slower. The advantage is it can expand or shrinks dynamicly.
1)Allocate a 1-D column array which contains pointers of row.
2)Allocate all 1-D rows and put theirs offsets into the column.
So you can simply access any element without multipication.
I know I'm not good with words so please look the image in the attachment.
If two cycles per load is not too slow for you...
214 cycles for 100*load element
yeah.. array-of-arrays are not a performance win
The big performance drain is when iterating over the entire set, which is simple with a standard sequential setup (dont even need to multiply...) but not so simple with an array-of-arrays setup... and allocating an array-of-arrays is also an annoying step, especially when taken to 3D or higher .. allocating length*width*height*elementsize is so much nicer.
If you know your row or column count, what wrong with using the array count for the first dimension and simply adding the count to get the second ?