New image library

raymond · March 21, 2006, 04:17:08 AM

Several months ago, I attempted to write a program to display image files (mainly JPG) as a "slide show". The only assembly algo available at the time seemed to be Ernest Murphy's. However, I had numerous problems with it and decided to write my own.

After some two months of "intensive labor", with barely any description of all the details anywhere on the web for this complicated encoding format, I finally managed to render most JPG images using strictly assembly instructions. Being quite familiar with the BMP format, I also quickly wrote a procedure for loading such files. I then attacked GIF files which proved to be relative simple to decode (as compared to JPG files).

I have now prepared individual modules of those procedures and made them part of a new library for rendering images. I have also included a procedure to identify the supported image files according to their header (regardless of the file extension), and then call the appropriate procedure to render the image if so indicated.

I'm currently still trying to understand the PNG format which seems to be a cross between JPG and GIF formats. When I finally succeed (failure is not an option :snooty:), it will also be added to the library.

Although I have tested the current procedures extensively on my system (P3-550 with Win98SE), I still need feedback from other potential users before I officially issue the library. The attached zip file only contains the INC and LIB files, in addition to a preliminary Help file. The source code will be made available later.

The Help file does contain an example of code to render a valid JPG, BMP or GIF image in a window.

If feedback is positive, this library could eventually be considered as an alternative to Ernest Murphy's algo.

Raymond

[attachment deleted by admin]

hutch-- · March 21, 2006, 04:40:02 AM

Ray,

I just had a read through the help file and it is looking like a very useful range of functionality to have available. Ernie's work has been fixed to some extent by a number of members of both this and the win32asm forum but having newer up to date versions is a far better prospect and it can be tweaked here and there if any problems are discovered.

These will be appreciated by many when you are satisfied with them.

Darrel · June 30, 2006, 09:41:47 AM

Hi Raymond,

A suggestion for your library, change your second argument for loadJPG to a pointer to a DIBINFO structure and place the following code in your procedure.

Code Select

INVOKE	GetDC,NULL

mov	hdcdesktop,eax
mov	edx,lpDIBINFO
mov	ecx,edx
add	edx,4
add	ecx,8

INVOKE	CreateDIBSection,hdcdesktop,ecx,DIB_RGB_COLORS,edx,NULL,NULL

mov	edx,lpDIBINFO
mov	DWORD PTR[edx],eax

INVOKE	ReleaseDC,NULL,hdcdesktop

My concern is if your returning a pointer to a DIBINFO structure, how do I free the memory when I am done with it?

I am writing a bitmap editor myself and believe I have a grasp on everything to read a jpg except for the Inverse Discrete Cosine Transform step. Will you help me with this algorithm?

I haven't looked it over yet, but puff.c in the folder contrib\puff of the zlib c source code is supposed to provide more detail of the deflate algorithm used for png compression. Have you found a better explaination of the deflate algorithm than rfc1951?

Attached is a copy of the bitmap editor I am working on.

Regards,

Darrel

Edit: Go here for a copy of the bitmap editor.

Mark_Larson · June 30, 2006, 01:24:20 PM

Great stuff Raymond! :) I'm gonna look at it at work :)

raymond · July 01, 2006, 04:32:23 AM

QuoteMy concern is if your returning a pointer to a DIBINFO structure, how do I free the memory when I am done with it?

The DIBINFO structure is a library specific structure. It is described and included in the library's .inc file.

I have to assume that you downloaded the RFimageLibV09.zip which contains a Help file. The "Example" section of that Help file contains information on how to free the memory correctly. Let me know if it needs more clarification and/or if it answers your question.

Raymond

Darrel · July 01, 2006, 02:15:20 PM

I read the "Example" section of the Help file. I still don't know how to free the memory (48 Bytes) used by the DIBINFO structure.

It's good to see someone else working with image file formats.

btw: Thanks for making the FPU tutorials available, they have been very beneficial.

Regards,

Darrel

stanhebben · July 01, 2006, 02:39:01 PM

Some time ago I wrote a png reader - in assembly.

You will have to decompress zlib data, for which I used the default zlib library.

The main difficulty with png is that it supports many bit depths and color formats, so you'll have quite some work. (the reader I wrote only understands some of the formats)

I don't have the code on this computer, but I could send it if you're interested.

Stan

Darrel · July 01, 2006, 06:32:57 PM

Stan,

I am definitely interested.

Thanks,

Darrel

raymond · July 02, 2006, 02:24:48 AM

QuoteI read the "Example" section of the Help file. I still don't know how to free the memory (48 Bytes) used by the DIBINFO structure.

Whenever you assume something.....
I had assumed you were talking about the created DIB. As for the DIBINFO structure itself, you cannot free the memory taken up by it since it is part of your global variables in your uninitialized .data? section. (The JPG module actually adds over 8kb of global variables to that .data? section and a little over 100 bytes to your initialized .data section.)

I also started last winter to work on a module to read PNG files. I didn't think that anything could be as bad as the JPG format (which took me about 2 months to decipher). Whoever designed that PNG format must have been ????. For example, Huffman encoding is used for the data and for the Huffman tables themselves; however, the coded bits are packed from left-to-right in one case but right-to-left in the other case!!!

And, as Stan pointed out, the range of bit depths and color format supported, coupled with 5 different filters, makes it a nightmare to cover all the possibilities.

Have fun

Raymond

hutch-- · July 02, 2006, 02:30:09 AM

Ray,

It sounds like the "experts" in the design group for JP "experts" G have struck again. :P

stanhebben · July 02, 2006, 09:58:18 AM

Here is the png reader I wrote some time ago. It decodes the file to a simple structure in ARGB format.

It only decodes two formats yet: 24-bit RGB colors, and 32-bit ARGB, but can easily extended to support all other formats. All filter types are supported (up,sub,average,paeth), interlacing is not supported.

It's part of a larger project, and will need some modifications before you can use it in your own program. (for which I didn't have time yet) I plan on rewriting the reader so it supports any png format, and release it as open-source library. But I'm currently busy with another project, so that will be for later.

The zlib library is supplied too. More info about zlib at www.zlib.org.

Stan

[attachment deleted by admin]

raymond · July 04, 2006, 02:26:06 AM

QuoteI am writing a bitmap editor myself and believe I have a grasp on everything to read a jpg except for the Inverse Discrete Cosine Transform step. Will you help me with this algorithm?

Sorry for the delay in answering that question. I did that work more than 6 months ago and I had to "recycle" myself on my algo.

The JPG format is usually based on the processing of an 8x8 block of pixels. The Inverse Discrete Cosine Transform (IDCT) involves the multiplication of each of the 8 components of each line by a specific cosine value and add them up for each component of the line in an intermediate block. Then, each of the 8 components of each column of that intermediate block must be multiplied by a specific cosine value and added up for each component of the transformed block (the cosine values used for the lines can be reused for the columns). And, for 24-bit color, you need to process 3 separate blocks. This IDCT is the part of JPG conversion which is the most computer intensive because of all those multiplications.

My approach was to precalculate the required 64 cosine values in an array; (otherwise, you wouldn't render a large JPG until dooms day). After I got the algo working correctly using FPU instructions, I converted it using MMX instructions and single precision floats. The speed increase was very significant.

Attached is my algo for computing the table of required cosines and my procedure using MMX instructions for computing the IDCT of an 8x8 block. Hope you can understand it.

Raymond

[attachment deleted by admin]

Ossa · July 04, 2006, 09:04:06 AM

Raymond, I don't know if you are aware, but there is a much faster algorithm for this. If you look at the values in the IDCT matrix, you will see a lot of repeated and symmetric values. Therefore the algorithm can split the 8x8 IDCT (and DCT matrix - but they are just transposes as they are orthonormal) into once 4x4 and 2 2x2 submatrices. I forget the name, but I do remember that the reduction is as follows:

Multiplies: 64 -> 22
Additions: 56 -> 28

which is significant.

OK, I just googled and got this page (which makes me laugh - it's written by the guy who gave the lectures on this subject): http://cnx.org/content/m11093/latest/

Anyway, hope it helps,
Ossa

raymond · July 04, 2006, 08:44:51 PM

Thanks Ossa.

The algo which I posted was my very first attempt to use MMX instructions. I am thus a newbie with that set. Maybe someone more familiar with it could reply if that "faster" algo could make use of MMX instructions.

Otherwise, with MMX, the 64 mults become 16 mults and a lot of additions are part of the pmaddwd instruction. I am assuming that the 16 parallel MMX multiplications are close to being as fast as 16 regular multiplications. If so, the 16 mults with MMX may be faster than the 22 regular multiplications of the "faster" algo.

Please anyone correct me if my assumptions are wrong.

Raymond

Ossa · July 04, 2006, 10:41:22 PM

Nearly. The speeded up algo still uses a 4x4 matrix, which means that those 16 mults can be reduced to 4 SSE mults. This leaves us with a maximum of 10 equivalent mults. Also the algorithm can use the general purpose registers to do the data manipulation overheads whilst the SSE instructions are executing on the other execution ports (IIRC that is). If I have time (unlikely for the next month or so unless I get bored) I will code a drop-in replacement for your code.

[edit] Just noticed that your code uses word sized values (can't remember the required bit-depth for JPEG etc) so I'm unsure if I can get the speed-up noted above [/edit]

Ossa

News:

New image library

stanhebben

stanhebben