Print Page - RLE - encoding/decoding program and source

Title: RLE - encoding/decoding program and source
Post by: drarem on March 11, 2005, 07:09:58 PM

The zip contains a program with source I have worked on for a couple of days involving rle byte compression/decompression.
It is buggy and in the alpha stage but feel free to test and comment away.. it works best on files less than 2Mb as it
generates a two-byte key which would probably exist in a larger file and mess up the decompression algo.

Click the button 'Load File' to set the input filename and 'Save File' to set the output (restored) filename. A temporary file
is created on your harddisk called 'rlecoded.tmp'.

Click the 'encode' button, it will display actual and compressed file size - and write the temp file.
Click the 'decode' button, it will reverse the numbers - actual compressed and compressed label meaning actual restored size
- and write to the save file name.

That's pretty much it for now, I'm sick as a dog and it sucks. Some of the restored filenames come back large as in .exe's, not
sure why yet but for the most part they execute and seem to function normally.

http://drarem.ms11.net/rlepkr.zip - approx 9Kb download

Title: Re: RLE - encoding/decoding program and source
Post by: James Ladd on March 11, 2005, 10:14:49 PM

drarem,
Im interested in this topioc and will look at your source.
Can you explain the algorithm you are using ?
Rgs, striker

Title: Re: RLE - encoding/decoding program and source
Post by: drarem on March 12, 2005, 01:46:07 AM

yes it looks at the data byte by byte, for anything repeating.

It is put into a generic string such as this:

[key][occurances][character]

For the key I generate a 2-byte key, default is AB - is not complete but would add 1 to A and search through the file looking for the key combination - in order to create a unique key that wont confuse the decompressor.

[A][#occurances][byte]

At the beginning of the compressed file is the key header, the key which the decompressor uses to decompress the file. Since the key sequence takes up 4 bytes, there's no point in counting anything repeating less than 4 as the key sequence would add to the file.

So if you had a data file:

XXOOOOOOXXXXABABCCC

Compressed would look like:

BBXXBB*XXXXABABCCC

* = funny symbol which is value of 6.

Getting the algo to work took lots of effort and banging my head into the wall, and it is still off it seems on some files.

It doesn't save much atm - disk stored rounds up apparently to the closest 4096k so storage isn't really advantage unless you have a file full of repeating characters that could take advantage of it.

Title: Re: RLE - encoding/decoding program and source
Post by: James Ladd on March 12, 2005, 03:11:39 AM

Very interesting.
Have you looked at other schemes or are you doing this to try and teach yourself ?
I look forward to seeing some more.

Title: Re: RLE - encoding/decoding program and source
Post by: drarem on March 12, 2005, 01:31:44 PM

Yes and yes, I have a small data compression book written in the 90's, by Gilbert Held. He explains the theories, I had to come up with a workable algo.

I will attempt other methods such as huffman - he says it's stored as a tree.

Title: Re: RLE - encoding/decoding program and source
Post by: Jibz on March 14, 2005, 09:10:28 AM

Nice work :U.

Quote from: draremXXOOOOOOXXXXABABCCC

Compressed would look like:

BBXXBB*XXXXABABCCC

Isn't there an O missing? .. something like BBXXBB*OXXXXABABCCC

Quote from: draremFor the key I generate a 2-byte key, default is AB - is not complete but would add 1 to A and search through the file looking for the key combination - in order to create a unique key that wont confuse the decompressor.

You could use an 'escape from the escape' to handle this situation. E.g. let a repeat count of 0 mean insert only a single symbol, then, if e.g. AB is the digraph that occurs the fewest times, you can do:

XXXXXXABXXXXXX...

compresses to

ABAB*XAB+ABAB*X...

where * is 6 and + is 0, and the bold part encodes the occurence of AB.

Quote from: draremAt the beginning of the compressed file is the key header, the key which the decompressor uses to decompress the file. Since the key sequence takes up 4 bytes, there's no point in counting anything repeating less than 4 as the key sequence would add to the file.

This means you could subtract 4 from the repeat count -- this way you are able to encode repeats in the range 4..259 instead of 0..255. If you add the 'escape from escape' you would subtract 3 instead.

Quote from: draremGetting the algo to work took lots of effort and banging my head into the wall, and it is still off it seems on some files.

Playing with compression can sometimes drive you mad .. but it's lot's of fun :U.

Huffman is definitely worth a look .. Lempel-Ziv compression is also fairly easy and fun.

Title: Re: RLE - encoding/decoding program and source
Post by: drarem on March 19, 2005, 04:51:58 AM

Quote
XXOOOOOOXXXXABABCCC

Compressed would look like:

BBXXBB*XXXXABABCCC

Isn't there an O missing? .. something like BBXXBB*OXXXXABABCCC

sorry, I was in a rush (and still am) - now I'm working on php, odbc and a couple of access databases - this like so many other projects of mine is taking a back burner.
I'll come back to the compression eventually.

Could you explain the escape from an escape? Almost sounds like an indexed dictionary compression - maybe I'm reading too much into it. It is late :/

Thanks.

Title: Re: RLE - encoding/decoding program and source
Post by: gabor on March 19, 2005, 09:14:33 AM

Hi!

It was funny, that these days I've been thinking about an RLE algo too. :bg It is a bit different and comes from an old picture format used by Deluxe Paint, but as far as I know RLE was used in PCX as well.
Maybe someone doesn't know the meaning of RLE: it is Run Length Encoding, since it encodes the length of pattern repetition. (how long a pattern runs :bg)

This algo does not use keys but the MSB of the byte containing the repeat count. So it makes possible the compression of 127 bytes only. But this can be modified...
The count and the data byte makes two bytes, so only 3 byte sequences should be compressed this can be used as written in one post earlier: The count for compressed data is between 3 and 130, but remains between 0-127 for uncompressed data.

Here is an example:

Index:	00	01	02	03	04	05	06	07	08	09	0A	0B	0C	0D	0E	0F
Data:	12	12	12	38	6B	E0	E0	00	0F	00	00	00	00	00	00	00

[/b]
Compressed data would look like this:
80 12 06 38 6B E0 E0 00 0F 84 00
- at value 80 the MSB is set indicating that compressed data follow, the count is 00+3=3. So 12 is written 3 times.
- at value 06 the MSB is not set indicating that uncompressed data follow, the count is 6, So the next 6 bytes are copied just as they are.
- finally 84 indicates that 7 bytes of 0 were compressed, so 0 is written 7 times.

I am coding this algo, hope to be ready after the weekend...but have tons of other thing to do, thats the brute reality :'(

Please feel free to ask or comment, I like discussing asm coding topics!

BTW: I could write some word about Huffmann and LZW too, if anyone interested.

Greets to all and nice coding time for everyone!

Title: Re: RLE - encoding/decoding program and source
Post by: drarem on March 19, 2005, 12:34:34 PM

QuoteThis algo does not use keys but the MSB of the byte containing the repeat count. So it makes possible the compression of 127 bytes only. But this can be modified...
The count and the data byte makes two bytes, so only 3 byte sequences should be compressed this can be used as written in one post earlier: The count for compressed data is between 3 and 130, but remains between 0-127 for uncompressed data.

Here is an example:

Index: 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
Data: 12 12 12 38 6B E0 E0 00 0F 00 00 00 00 00 00 00

Isn't it possible for the MSB to be elsewhere in a larger binary file? How would the decoder differentiate between a valid MSB and a binary piece of data? Are you talking about RLE on ascii files?

What I want to know about is building an indexed dictionary - putting different repeating sequences into a table and using predefined symbols to encode them.

Title: Re: RLE - encoding/decoding program and source
Post by: The Dude of Dudes on March 19, 2005, 04:38:48 PM

Gabor,

Very cool! :U

Drarem,

Read his post again. :wink

QuoteCompressed data would look like this:
80 12 06 38 6B E0 E0 00 0F 84 00

80 12 specifies 3 -12's
06 specifies skip over (dont de-compress) the next six bytes: 38 6B E0 E0 00 0F
84 00 specifies 7 - 00's

I'd be interested to see compression results!

Title: Re: RLE - encoding/decoding program and source
Post by: gabor on March 19, 2005, 05:16:35 PM

Quote
Isn't it possible for the MSB to be elsewhere in a larger binary file? How would the decoder differentiate between a valid MSB and a binary piece of data? Are you talking about RLE on ascii files?

I'm sorry I'm afraid I didn't really get the point. Since the MSB is the most significant bit (highest bit of a byte) it is everywhere in the file. Inthis algo the MSB of the count is used as a flag. According to this flag the decoder can decide wheter tha data is compressed or not.
Say, the compressed data looks like this:
80 12 06 38 6B E0 E0 00 0F 84 00
Now, the decoder takes 80. MSB is set, so the next data must be written out more than once. Precisely 00+3 times. 00 comes from 80 after MSB is cleared. About the 3 I wrote before. So the result is 3 times the next data: 12 12 12
The decoder now reads 06. MSB is not set, so the next 6 data is uncompressed. So 38 6B E0 E0 00 0F is written out.
Finally it reads 84. This will end up in writing out 0 7 times: 00 00 00 00 00 00 00
This algo just like the others can be used on binary and ascii files as well. But the efficiency is very different, and ge4nerally not very nice... :'(

Quote
What I want to know about is building an indexed dictionary - putting different repeating sequences into a table and using predefined symbols to encode them.

Ok, RLE is not a dictionary based algo, but Huffman and LZW is indeed.
I dig out my university books and post a pseudo algo for them if interested.

For now this:

Huffmann: gathers info about the frequncy of each token (character/byte or word). This is done by reading the whole file and counting the occurence of every character. After this the algo builds up a binary tree and assigns a code to every token: the more frequent tokens get shorter code. Important is, that the code is not "byte based", it is "bit based" so the result is a bit stream. After creating the codewords the input file is read once again and encoded. However, there are modified algos which read the input only once (Adaptive Huffmann).

LZW: This smart algo reads the input, builds the dictionary and encodes the same time. I don't remember this algo that well so I don't get into the details...

I post my RLE algo tomorrow.

Good work, bye then!

ps.: the Dude of the Dude posted a shorter and yet clear explanation to the first question while I was writing this. Sorry for my laziness about not rewriting my post.

Title: Re: RLE - encoding/decoding program and source
Post by: drarem on March 19, 2005, 08:35:51 PM

so you're saying, I can only encode the data if the character byte is between 3 and 127, or something like that respectively. I am encoding 0 to 255 for EACH BYTE. That is why a unique key is required in my 'hack' method. so if I have FD FD FD FD FD it becomes 41 42 * FD if I use the MSB of the byte that would confuse the decoder, no?

Did I say RLE was a dictionary? No I didn't, I implied the LZW limpman whatever method, building indexed dictionary.

Title: Re: RLE - encoding/decoding program and source
Post by: James Ladd on March 19, 2005, 10:52:09 PM

I think this thread is very interesting.
Another thing I find interesting is that the encoding "appears" to me to be byte based and not bit based.
Does any of the above encode into a stream of bits?

Title: Re: RLE - encoding/decoding program and source
Post by: The Dude of Dudes on March 20, 2005, 12:48:32 AM

Got excited about RLE this afternoon and came up with this as an enconder: :bg

Code Select


.data?

TempBuffer	db	4096	dup	(?)

	pSingleStart	dd	?	;pointer to start of single bytes
	pSequenceStart	dd	?	;pointer to start of repeating bytes
	bSkipSingleInstance	dd	?
.code

	EndSingletonChain	proto
	EndRepeatingChain	proto
	SingleInstance		proto

;~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	PackIt	proc, Source: dword, Destination: dword, cbSource: dword
;~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	
	
	mov esi, Source
	mov edi, Destination
	xor ecx, ecx 

	mov pSequenceStart, 0
	mov pSingleStart, 0

.repeat				;Main Loop
	
	mov al, [esi]			;current byte

;=== byte is repeating ===
	.if [esi+1] == al	
	
	  ;--- Start of Series ---	
		.if !pSequenceStart		;it's the first byte of a series
			.if [esi+2] == al	;ensure min 3-byte sequence
				mov pSequenceStart, esi	;save pointer to start of series
			.else			;<3 same bytes, treat as singletons
				jmp Singletons
			.endif
		.endif
	  invoke EndSingletonChain	
   
	.else
;=== Byte is not repeating ===	
	Singletons:
	mov bSkipSingleInstance, FALSE		
	invoke EndRepeatingChain

	  .if bSkipSingleInstance == FALSE
		invoke SingleInstance
	  .endif

	.endif

	inc esi
	inc ecx

.until ecx == cbSource 

;=== End Current Chain ===

	invoke EndRepeatingChain
	invoke EndSingletonChain
	sub edi, Destination
	mov eax, edi		;return cb of compressed file
	ret

PackIt endp


;~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	EndSingletonChain	proc
;~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

;--- End chain of singletons ---

		.if pSingleStart		
			push esi
			push ecx

			mov ecx, esi
			sub ecx, pSingleStart	;cb of singletons
			mov esi, offset TempBuffer	;ptr to saved singletons

			.while ecx > 127
				mov byte ptr[edi], 01111111b	;max single chars - 127 (LSB clear)
				inc edi
				push ecx
				mov ecx, 127
				cld
				rep movsb	;copy 127 singletons
				pop ecx
				sub ecx, 127
			.endw
			
			mov byte ptr[edi], cl		;cb of singletons (LSB clear)
			inc edi
			cld
			rep movsb

			pop ecx
			pop esi	
			mov pSingleStart, 0		;clear pointer (end of singetons)
		.endif

			ret

EndSingletonChain endp

;~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	EndRepeatingChain	proc
;~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

;--- End of a repeating series---
		.if pSequenceStart	
			mov edx, esi
			sub edx, pSequenceStart	;edx = number of repeating bytes
			add edx, 1
		;--- we can only represent a max of 130 with one entry ---
			.while edx >= 130
				mov byte ptr[edi], 0ffh	;130 repetitions
				mov byte ptr[edi+1], al	;of <byte>
				add edi, 2
				sub edx, 130
			.endw
			
			
		;--- now edx < 130 ---
			.if dl >2
			sub dl, 3
			or dl, 10000000b	;set MSB (aka escape for sequence)
			
			mov byte ptr[edi], dl	;number of repetitions
			mov byte ptr[edi+1], al	;of <byte>
			add edi, 2

			.else
				.if dl == 2
					mov byte ptr[edi], 2
					mov byte ptr[edi+1], al
					mov byte ptr[edi+2], al
					add edi, 3
				.elseif dl == 1
					mov byte ptr[edi], 1
					mov byte ptr [edi+1], al
					add edi, 2
				.endif					
			.endif
			mov pSequenceStart, 0	;clear pointer (end of series)
			mov bSkipSingleInstance, TRUE
		.endif

		ret

EndRepeatingChain endp

;~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	SingleInstance		proc
;~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

;--- Single Instance of Byte ---
		.if !pSingleStart		;first singleton in series	
			mov pSingleStart, esi	;save pointer to start of singletons
		.endif			
	
		mov edx, esi
		sub edx, pSingleStart	;index into temp buffer
		add edx, offset TempBuffer
		mov [edx], al		;save singleton in tempbuffer
		
		ret

SingleInstance endp

My tempbuffer could probably be larger (for streams of singles over 4096)

Here's the decoder:

Code Select

;~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	UnPackIt	proc, Source: dword, Destination: dword, cbSource: dword
;~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	
	
	inc cbSource
	mov ecx, 0
	mov esi, Source
	mov edi, Destination
.repeat
	mov al, [esi]
	.if (al & 10000000b)	;unpack
		and al, 01111111b	;get byte count
		push ecx
		xor ecx, ecx
		mov cl, al
		add cl, 3
		mov al, [esi+1]		;byte to copy
		cld
		rep stosb
		pop ecx
		add esi, 2
		add ecx, 2
	.else			;string of single bytes
		and eax, 0ffh
		add ecx, eax
		inc ecx
		push ecx
		xor ecx, ecx
		mov cl, al	;byte count
		inc esi		;ptr to start of bytes to copy
		cld
		rep movsb
		pop ecx
	.endif



.until ecx == cbSource

	sub edi, Destination
	mov eax, edi		;return cb of uncompressed file
	ret

UnPackIt endp

Title: Re: RLE - encoding/decoding program and source
Post by: drarem on March 20, 2005, 04:01:19 PM

Here is your test, dude of dudes - original size=351k, compressed=0 bytes, uncompressed=0 bytes

That would be an awesome algo, if it worked...

Look at the functions RLE_ENCODE_2 and RLE_DECODE_2, I cut and paste with very slight mods.

[attachment deleted by admin]

Title: Re: RLE - encoding/decoding program and source
Post by: gabor on March 21, 2005, 02:17:44 PM

Hi again!

A bit delayed, but finally here is my RLE algo. rle.zip (http://members.chello.hu/nemeth.gabor1/rle.zip)
It is wrapped very simply in a Win32 process, since the goal was to test the encoder/decoder. Please test it!
I have some ideas how to optimize it, but I thing RLE is a bit old method, and it's simply not worth working on it for that much time.

The proggy works like this:
1. throws an OpenFileName Dialog to get the file to be compressed.
2. throws a SaveFileName Dialog to get the output file.
3. throws an OpenFileName Dialog to get the file to be decompressed.
4. throws a SaveFileName Dialog to get the output file.

My results show what I've expected: the compression ratio is not very good, for text files it is a disaster. Once again RLE can be used for images with having large areas of the same color, or for sound samples of low voices...

Note:
- it uses memory buffers to read from and write to. The buffers are 64k long, so longer files are truncated.
- the rle.asm is full of debuginfo generating code, but they are commented out. If no debug is needed all this stuff with all the includes can be deleted.

I hope it works for every case.

Furthermore I would like to discuss the other packing algos, Huffmann and LZW. Does anyone agree??

By for now!

Gábor

Title: Re: RLE - encoding/decoding program and source
Post by: drarem on March 21, 2005, 06:59:10 PM

Nice work, i was able to encode/decode a small .zip file and a ms paint file.

My packed files were 5 to 200 bytes larger than yours, I see how the key system works now.

Thanks for the post.

Title: Re: RLE - encoding/decoding program and source
Post by: The Dude of Dudes on March 22, 2005, 05:02:40 AM

Fixed my previous code, bundled it into (yet another!) example. Just drag and drop the file to pack/unpack onto Packer.exe's icon. It also retains the file extension when unpacking. Will take a look at the other examples when I have some time! :bg

[attachment deleted by admin]

Title: Re: RLE - encoding/decoding program and source
Post by: gabor on March 22, 2005, 08:58:06 AM

Good Job Dude :U!

I didn't check your packer completly, however I found one bug (we could live with it :bg). When running the program directly, not dropping a file on it, it gives a memory access exception.

The results: as expected your packer gives the same compression as mine, this means we both implemented the algo correctly :toothy

Interested in LZW?

Greets, Gábor

Title: Re: RLE - encoding/decoding program and source
Post by: The Dude of Dudes on March 23, 2005, 12:48:51 AM

Yes, it does throw an exception. If you check the source, you'll see the commandline handling is quite simplistic which is why it causes an exception! :red It could actually be re-worked quite a bit, I threw it together in quite a hurry.

I'd be very interested in other compression methods!

Title: Re: RLE - encoding/decoding program and source
Post by: drarem on March 23, 2005, 03:38:13 PM

wwooooopps... hate to beat dead horses but apparently with characters greater than 128 your decompressor gets confused, try the bitmap below, the picture shifts a few bytes to the right. How can the decompressor know which is an actual MSB and which is what is supposed to be there unless you know for certain an entire binary file normally uncompressed won't contain an MSB?

Like I said, I wonder what it would do with larger files in binary.

[attachment deleted by admin]

Title: Re: RLE - encoding/decoding program and source
Post by: gabor on March 23, 2005, 03:47:07 PM

Hello!

I thought I'd better not bore all of you with packing algos, so I've put some explaination on http://members.chello.hu/nemeth.gabor1. At the moment it mainly consists of the the stuff already discussed here. But, I've also put there a not finished discription about LZW. (But I will finish it real soon!)

Greets, Gábor

Title: Re: RLE - encoding/decoding program and source
Post by: The Dude of Dudes on March 24, 2005, 01:39:06 AM

Hmm... Im not sure what you mean, Drarem. I have no trouble with the bmp. I verified my decompression by examining Hex dumps of original and decompressed files. I check for matching length and randomly select offsets and compare their values. If the Image was shifted the offsets would not contain the same value, they would be shifted.

The decompressor processes the file in order like this:

The first byte is read and it determines what the following byte(s) mean. If the msb is set, the following byte is repeated (value of bits 0-6 add 3 times). If the msb is NOT set, its a literal value of how many single bytes follow. In either case, it can be determined where the offset of the next 'control charater' is. In the case of the repeating sequence, the next control character is 2 bytes past the current control character. In the case of a string of single bytes, the next control character is (control char value) + 1 bytes past the current control char. So as you can see, whether the msb is set in the bytes of the source file has no effect on the decompression. The key is getting the offset of the next control character, and the next, and so on. Hope this makes sense!

Title: Re: RLE - encoding/decoding program and source
Post by: gabor on March 24, 2005, 08:17:59 AM

Dude, I'd say a clean 'n tidy explanation!

And I'd say could we jmp to the LZW, I would like anyone else (but me) to come up with an encoder/decoder code. As I've posted before, I put on a webpage the info about the LZW algo.

Greets, Gábor

Title: Re: RLE - encoding/decoding program and source
Post by: roticv on March 24, 2005, 04:47:19 PM

Maybe I will code something when I have time..

Title: Re: RLE - encoding/decoding program and source
Post by: roticv on March 27, 2005, 04:20:12 PM

I have started to code the LZW compressor/uncompressor. Just give me some time and you guys would get to see it.

Title: Re: RLE - encoding/decoding program and source
Post by: The Dude of Dudes on March 27, 2005, 07:35:28 PM

I've been working on a version as well. Still buggy. The de-compression is by far more work! I think my understanding of the Algo may be wrong. Specifically during de-compression. This is what I am using:

read a character k
output k
w = k
loop
read a character
entry = dictionary entry for k
output entry (fully translate all codes to final characters)
add w + first char of entry to the dictionary
w = entry
endloop

I think my problem is when k is a yet undefined Dictionary entry. This is how I am handling it:

If k = next uncreated Dictionary entry, last k outputted is appended to w and creates the next Dictionary Entry

In a small test file i used a repeating series of numbers (1-5). At the 12th series it decodes to 1,2,3,4,5,5,2,3,4,5,1,2,3,4,5,1,1,3,4,5

Any ideas?

Title: Re: RLE - encoding/decoding program and source
Post by: The Dude of Dudes on March 28, 2005, 03:06:39 AM

:dance: Solution: :dance:

If k = next uncreated Dictionary entry, FIRST CHAR of last code output is appended to w and creates the next Dictionary Entry

(big sigh!)

Same as my RLE Packer. Drag and drop files onto Packer icon to compress/decompress.

Works GREAT with most text files (up to 50% compaction), not as good with exe's (%30 maybe)

[attachment deleted by admin]

Title: Re: RLE - encoding/decoding program and source
Post by: gabor on March 29, 2005, 09:29:40 AM

Hi Dude!

Nice work. Are your entries in the dictionary only 2 character long? I would say you could try a bigger value, and you could use other code length not just 12 bit (the number of entries, so the size of the dictionary would change as well).

Greets, Gábor

Title: Re: RLE - encoding/decoding program and source
Post by: The Dude of Dudes on April 01, 2005, 03:43:55 AM

Here's a 16-bit version. Little better compression. Here's results from Windows.inc from the MASM package.

Windows.inc - 995kb
12bit lzw(2 byte entries) - 503kb
16bit lzw (2 byte entries) - 377kb :cheekygreen:

I am still using 2 bytes entries. I believe increasing the dictionary size or dictionary entry size achieves the same result, does it not (please tell me if I am wrong)?

[attachment deleted by admin]

Title: Re: RLE - encoding/decoding program and source
Post by: gabor on April 02, 2005, 06:16:09 AM

Hi!

I'm still having problems with my encoder/decoder. I want to make it possible to adjust code word length and entry string length...
As soon as I'm finished, I post my code and test results.

Greets, Gábor

Title: Re: RLE - encoding/decoding program and source
Post by: Tedd on April 02, 2005, 10:55:49 AM

The problem is that LZW is supposed to be streamed, so you have to decide on your codeword size beforehand because once you start writing the compressed data you can't go back and change it. If your dictionary gets full, then that's it, you just have to keep on munching and spit out 'badly' compressed data.

Of course, you could be unconventional and compress the whole file once - find out how many entries are required - and then recompress it with the appropriate 'optimal' codeword size.

Title: Re: RLE - encoding/decoding program and source
Post by: gabor on April 02, 2005, 03:43:39 PM

Hi Dude!

Huhh, it took me lot of time till I understood what you have done there. Maybe I wasn't reading your post correctly, but I can't remember for that solution you applied. You use 2 byte long strings that's almost true: more precisely you use a hibrid - a directory index for the first characters and another for the last character. Why iddn't come this on MY mind! :)) I must bow before this thinking. And yes, when handling the entries this way, longer entries doesn't seem to have any effect...

After this I feel a bit ashamed for my solution, it is very slow, wastes memory and seems to be buggy as well...
What really knocked me off and turned me away of packing things was when tried to compress windows.inc with rar. The result was only 183kB and lasted a few seconds only. Well, there is still much work...

Greets, Gábor

Title: Re: RLE - encoding/decoding program and source
Post by: Marko on March 15, 2006, 08:00:36 PM

Due too i am still learning ASM when i came to this thread i triedto give an implemetation of rle algo based on some document on the net.
When we encounter to simbols that are the same we write the pair and after count how many there are after and write this count so if we have a string like that:

aaaabcdeee
the output string will be
aa2bcdee1

The program actally still read from a static buffer ( i am going to make it read from a file oninput) and have still a problem (or maybe more than one :lol i am still trying to enter in an asm mind) due too i write the occurence in a byte if a simbol appear more than 256 time this algo fail

Code Select


include \masm32\include\masm32rt.inc

.data
szInput db "aaaaaaabcdfffffghilmooooppqr",0
OutBuffer db 128 dup(0ffh)

.code
start:

	call main
	
	exit
	
main PROC
	
	lea esi, szInput
	lea edi, OutBuffer
	lodsw
	
the_loop:
	test ah,ah
	jz SHORT eos
	test al,al
	jz SHORT eos
	cmp ah,al
	je SHORT the_same
	
not_the_same:
	stosb
	dec esi
	lodsw
	jmp SHORT the_loop

the_same:
	stosw
	call ContaBytes
	mov eax, ecx
	stosb							;scrivi il risultato del conteggio della procedura nel buffer
	dec esi
	lodsw
	jmp SHORT the_loop
	
eos:
	
	ret

main endp


ContaBytes PROC
	xor ecx,ecx
	
the_loop:
	lodsb
	cmp ah,al
	jnz the_end
	inc ecx
	jmp SHORT the_loop
	
the_end:
	ret	
ContaBytes endp

end start

I am wrinting the decoder as i finished it i will post it up.

Any suggestions are welcome

Marko

Ps i dunno if anyone has done this type of implemetation because i dunno i can't access the code in this thread if anyone did forgive me

The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: drarem on March 11, 2005, 07:09:58 PM