News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Alternative way of compressing data: CHALLENGE # 1

Started by frktons, September 01, 2010, 10:31:34 PM

Previous topic - Next topic

frktons

What about a CODE instead of DATA to compress data?

Here it is a very feasible task:

- a program that reads an external file and displays its content on the screen

The external file is 8000 bytes, but many of them are repetitive sequences of bytes,
so it should be easy to replace them with assembly instructions.

In my forecasting an average result could be to reproduce the file with 500 bytes
of code more or less.

The most advanced among you can do it in less than 250.
I include the actual program with the file, so you can see what it is like and think
your own way of transforming it into code.

The challenge is open for everybody. Let's see the best compression DATA2CODE
algorithm  around :U
Mind is like a parachute. You know what to do in order to use it :-)

ecube

sounds like a simple run len compressor like huff :P

frktons

Quote from: E^cube on September 02, 2010, 06:32:54 AM
sounds like a simple run len compressor like huff :P

Any algorithm can do it, but at the end no external file at all, this is the main objective:
Quote
The external file is 8000 bytes, but many of them are repetitive sequences of bytes,
so it should be easy to replace them with assembly instructions.
One or more PROC to replace the external file, with all the data to reproduce it
inside the PROCs.

Enjoy the task.

Frank
Mind is like a parachute. You know what to do in order to use it :-)

frktons

I'm actually working at a prototype in BASIC, that is easier to manage,
given my high level of Assembly n00b-iness.  :P
In a couple of days, hopefully, it'll be ready. After that I'll translate it
into MASM, test it and post the first working draft to optimize.

I'm quite slow and the available time is short, but during next week
my first solution should be ready.  :wink

What about you?

Frank
Mind is like a parachute. You know what to do in order to use it :-)

daydreamer

I think it is possible todo it with less than 100 bytes code,but I wish I had the time for it

frktons

Quote from: daydreamer on September 02, 2010, 08:26:21 PM
I think it is possible todo it with less than 100 bytes code,but I wish I had the time for it
daydreamer... =  less than 100 bytes

daydreamer... =  I wish I had the time for it

daydreamer... No problem, you can carry on dreaming...  :lol

I hoped we could do it in 80 bytes   :P , showing that a 100:1 compression ratio is possible,
but for the time being I'll be happy with 100 bytes, if someone shows up with a working solution.

The one I'm working on will take some 800 bytes more or less, but It's still working in progress.

Enjoy the task

Frank


Mind is like a parachute. You know what to do in order to use it :-)

MichaelW

The data includes 256 unique characters, so how do you propose to encode these in 80, 100, or even 250 bytes of code + data? Would pulling the characters out of an available font, possibly not an exact match for the original be acceptable, or is the "decompressed" data supposed to be an exact match for the original?

At the BYTE and WORD levels there are no repeated sequences. At the DWORD level there are 18 repeated sequences with a length greater than 2 repeats:
78
9
7
7
78
78
3
3
3
3
3
3
3
3
3
3
31
78

For a total of 396*4=1584 bytes out of 8000, so RLE alone could not begin to approach the expected compression level. The file compresses ~6.1:1 when zipped.
eschew obfuscation

frktons

Quote from: MichaelW on September 03, 2010, 01:45:34 AM
The data includes 256 unique characters, so how do you propose to encode these in 80, 100, or even 250 bytes of code + data? Would pulling the characters out of an available font, possibly not an exact match for the original be acceptable, or is the "decompressed" data supposed to be an exact match for the original?

At the BYTE and WORD levels there are no repeated sequences. At the DWORD level there are 18 repeated sequences with a length greater than 2 repeats:
78
9
7
7
78
78
3
3
3
3
3
3
3
3
3
3
31
78

For a total of 396*4=1584 bytes out of 8000, so RLE alone could not begin to approach the expected compression level. The file compresses ~6.1:1 when zipped.


Well MichaelW,
what about a LOOP from 0 to 255 that generates 256 chars without the need
to store them somewhere? You need 256 ASCII? Generate them on the fly.
This is what I mean DATA2CODE, instead of DATA you have instructions
building the DATA you need.
How many bytes a LOOP can take? 20-30? You know better than me.

We are not trying to zip, we don't need algorithms for compression, we
only need to think how can we reproduce the file using LOOP, MOV,
REGISTERS, and so on.

The task is to replace these routine/instructions:


GetConsole PROTO :DWORD
.....
INVOKE GetConsole, offset FmtName
.....
GetConsole PROC szFmtName:DWORD

    mov hFile, fopen(szFmtName)

    mov ebx, fread(hFile, ADDR ConsoleScreen, BufLen)
   
    fclose hFile

    ret
   
GetConsole ENDP


With others routines/instructions that create the data on the fly,
without using external files. We have a buffer to fill, so instead of
using data from a file, we generate them inside the program.

We have to reproduce exactly the same file inside the buffer.

The size of the final program, compared with the original I posted,
compiled and linked with ML/LINK as I did, will give us the difference
in bytes that we used to reproduce the external file, that the new
program will not use.


Maybe I didn't express myself in a clear way, but I'm doing my best
with a foreign language.

By the way, if something is not clear, just ask and I'll try to explain
the CHALLENGE in a better way.  :wink

Regarding the 80-100-250 bytes as a result of CODE to reproduce the
8,000 bytes file, I was jocking with daydreamer, who told he could do
it in 80 bytes more or less.
My broad estimate is I'll take 800 bytes or more to do it, but I am a
total n00b. Somebody here can do far more better than me.

Frank

Mind is like a parachute. You know what to do in order to use it :-)

mineiro

I'm without time for a while, I start develop it, but, I simply cannot end it because personal problems.
After think a while, I start developing it using bios int's and do a .com dos file(to reach these number said by frktons), so I don't have sure if this can be valid (well I'm at least following the rules).
I'm posting my started work(maybe can be usefull to other people). It works fine in XPSP3, I unrolled it to if someone cannot understand bios int's. I commented some lines. And I have attached some usefull functions to your (maybe) future continuation.
Regards, now I'm resolving other problems.


frktons

Quote from: mineiro on September 03, 2010, 04:48:30 AM
I'm without time for a while, I start develop it, but, I simply cannot end it because personal problems.
After think a while, I start developing it using bios int's and do a .com dos file(to reach these number said by frktons), so I don't have sure if this can be valid (well I'm at least following the rules).
I'm posting my started work(maybe can be usefull to other people). It works fine in XPSP3, I unrolled it to if someone cannot understand bios int's. I commented some lines. And I have attached some usefull functions to your (maybe) future continuation.
Regards, now I'm resolving other problems.

Thanks mineiro.  :U

Oh! Oh! INT - COM files, BIOS, it looks like 16 bit Assembly.
You are probably trying to directly print on the screen through the
INT and using COM to spare some space.  ::)

Somebody will get inspiration from your attempt, I'm sure about this.

But here we have to use a preexisting program, that uses Windows API,
and has to run on 32/64 bit machine. I am not able to run your example on my
windows 7/64 bit. I'll have a look at it anyway, if can I understant it.  ::)

The only PROC we can change, in order to test the size question, is the
GetConsole PROC that has to be replaced with one/more proc that fill
the buffer without using an external file.


All the remaining of the program has to be the same.

My prototype is at 85% more or less, and it is in BASIC.
I'll post it before tomorrow night, I hope.

Frank


Mind is like a parachute. You know what to do in order to use it :-)

frktons

Here it is, 85% complete, BASIC/Assembly PBCC compiled.

NOT OPTIZIMED to any degree, just a quick draft to put an idea
into work. There is a lot of room for optimizing both size and speed,
but it'll be the guideline for my next MASM version.

If anybody would like to have a look, just download it, run it, and let me
know if you get some inspiration from it.

The full version is coming tonight/tomorrow.

Frank
Mind is like a parachute. You know what to do in order to use it :-)

daydreamer

Quote from: frktons on September 03, 2010, 09:42:39 AM
If anybody would like to have a look, just download it, run it, and let me
know if you get some inspiration from it.

The full version is coming tonight/tomorrow.

Frank
I tested it and all it does is what you stated earlier, it creates all 256 characters on the fly nothing more, no dword filling, no handle the ascii number counter, no 8k full file
I am running XP

frktons

Quote from: daydreamer on September 03, 2010, 10:04:20 AM
Quote from: frktons on September 03, 2010, 09:42:39 AM
If anybody would like to have a look, just download it, run it, and let me
know if you get some inspiration from it.

The full version is coming tonight/tomorrow.

Frank
I tested it and all it does is what you stated earlier, it creates all 256 characters on the fly nothing more, no dword filling, no handle the ascii number counter, no 8k full file
I am running XP


Thanks for explaining me what I did. Now I know  :U

What do you mean with no dword filling? Were I supposed to do it?

I Draw the two boxes, I print the caption, I have the LOOP that generates and fills
the buffer, I have the attributes already into the buffer, I have only to add the code
for the 255 numbers inside the LOOP.

no 8k full file? Are you sure? The buffer I'm filling is 8k, and the external file has
not to be there anymore. In fact it is not there anymore, I'm not using it.

85% done.  And it is an UNOPTIMIZED prototype in BASIC.

And yes, it does exactly what I said, I'm not cheating  :lol

Waiting for your 80 bytes solution. I'll really enjoy iy, IF I'll ever see it.  :clap:

Frank
Mind is like a parachute. You know what to do in order to use it :-)

mineiro

Now you need only print in decimal the value of hex(characters) in a field of 3+1 spaces. Nice.
regards.

frktons

#14
Quote from: mineiro on September 03, 2010, 10:25:33 AM
Now you need only print in decimal the value of hex(characters) in a field of 3+1 spaces. Nice.
regards.


Yes mineiro. Thanks. I'm filling the 8k buffer, that is a little bit more difficult than writing
directly to the screen, but this is the CHALLENGE, I have to reproduce the 8k file inside the buffer
and display it with WriteConsoleOutput, that is already in the original program.

Working on it...

Frank

Mind is like a parachute. You know what to do in order to use it :-)