The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: frktons on September 01, 2010, 10:31:34 PM

Title: Alternative way of compressing data: CHALLENGE # 1
Post by: frktons on September 01, 2010, 10:31:34 PM
What about a CODE instead of DATA to compress data?

Here it is a very feasible task:

- a program that reads an external file and displays its content on the screen

The external file is 8000 bytes, but many of them are repetitive sequences of bytes,
so it should be easy to replace them with assembly instructions.

In my forecasting an average result could be to reproduce the file with 500 bytes
of code more or less.

The most advanced among you can do it in less than 250.
I include the actual program with the file, so you can see what it is like and think
your own way of transforming it into code.

The challenge is open for everybody. Let's see the best compression DATA2CODE
algorithm  around :U
Title: Re: Alternative way of compressing data: CHALLENGE # 1
Post by: ecube on September 02, 2010, 06:32:54 AM
sounds like a simple run len compressor like huff :P
Title: Re: Alternative way of compressing data: CHALLENGE # 1
Post by: frktons on September 02, 2010, 07:13:46 AM
Quote from: E^cube on September 02, 2010, 06:32:54 AM
sounds like a simple run len compressor like huff :P

Any algorithm can do it, but at the end no external file at all, this is the main objective:
Quote
The external file is 8000 bytes, but many of them are repetitive sequences of bytes,
so it should be easy to replace them with assembly instructions.
One or more PROC to replace the external file, with all the data to reproduce it
inside the PROCs.

Enjoy the task.

Frank
Title: Re: Alternative way of compressing data: CHALLENGE # 1
Post by: frktons on September 02, 2010, 07:52:09 PM
I'm actually working at a prototype in BASIC, that is easier to manage,
given my high level of Assembly n00b-iness.  :P
In a couple of days, hopefully, it'll be ready. After that I'll translate it
into MASM, test it and post the first working draft to optimize.

I'm quite slow and the available time is short, but during next week
my first solution should be ready.  :wink

What about you?

Frank
Title: Re: Alternative way of compressing data: CHALLENGE # 1
Post by: daydreamer on September 02, 2010, 08:26:21 PM
I think it is possible todo it with less than 100 bytes code,but I wish I had the time for it
Title: Re: Alternative way of compressing data: CHALLENGE # 1
Post by: frktons on September 02, 2010, 09:10:29 PM
Quote from: daydreamer on September 02, 2010, 08:26:21 PM
I think it is possible todo it with less than 100 bytes code,but I wish I had the time for it
daydreamer... =  less than 100 bytes

daydreamer... =  I wish I had the time for it

daydreamer... No problem, you can carry on dreaming...  :lol

I hoped we could do it in 80 bytes   :P , showing that a 100:1 compression ratio is possible,
but for the time being I'll be happy with 100 bytes, if someone shows up with a working solution.

The one I'm working on will take some 800 bytes more or less, but It's still working in progress.

Enjoy the task

Frank


Title: Re: Alternative way of compressing data: CHALLENGE # 1
Post by: MichaelW on September 03, 2010, 01:45:34 AM
The data includes 256 unique characters, so how do you propose to encode these in 80, 100, or even 250 bytes of code + data? Would pulling the characters out of an available font, possibly not an exact match for the original be acceptable, or is the "decompressed" data supposed to be an exact match for the original?

At the BYTE and WORD levels there are no repeated sequences. At the DWORD level there are 18 repeated sequences with a length greater than 2 repeats:
78
9
7
7
78
78
3
3
3
3
3
3
3
3
3
3
31
78

For a total of 396*4=1584 bytes out of 8000, so RLE alone could not begin to approach the expected compression level. The file compresses ~6.1:1 when zipped.
Title: Re: Alternative way of compressing data: CHALLENGE # 1
Post by: frktons on September 03, 2010, 03:09:08 AM
Quote from: MichaelW on September 03, 2010, 01:45:34 AM
The data includes 256 unique characters, so how do you propose to encode these in 80, 100, or even 250 bytes of code + data? Would pulling the characters out of an available font, possibly not an exact match for the original be acceptable, or is the "decompressed" data supposed to be an exact match for the original?

At the BYTE and WORD levels there are no repeated sequences. At the DWORD level there are 18 repeated sequences with a length greater than 2 repeats:
78
9
7
7
78
78
3
3
3
3
3
3
3
3
3
3
31
78

For a total of 396*4=1584 bytes out of 8000, so RLE alone could not begin to approach the expected compression level. The file compresses ~6.1:1 when zipped.


Well MichaelW,
what about a LOOP from 0 to 255 that generates 256 chars without the need
to store them somewhere? You need 256 ASCII? Generate them on the fly.
This is what I mean DATA2CODE, instead of DATA you have instructions
building the DATA you need.
How many bytes a LOOP can take? 20-30? You know better than me.

We are not trying to zip, we don't need algorithms for compression, we
only need to think how can we reproduce the file using LOOP, MOV,
REGISTERS, and so on.

The task is to replace these routine/instructions:


GetConsole PROTO :DWORD
.....
INVOKE GetConsole, offset FmtName
.....
GetConsole PROC szFmtName:DWORD

    mov hFile, fopen(szFmtName)

    mov ebx, fread(hFile, ADDR ConsoleScreen, BufLen)
   
    fclose hFile

    ret
   
GetConsole ENDP


With others routines/instructions that create the data on the fly,
without using external files. We have a buffer to fill, so instead of
using data from a file, we generate them inside the program.

We have to reproduce exactly the same file inside the buffer.

The size of the final program, compared with the original I posted,
compiled and linked with ML/LINK as I did, will give us the difference
in bytes that we used to reproduce the external file, that the new
program will not use.


Maybe I didn't express myself in a clear way, but I'm doing my best
with a foreign language.

By the way, if something is not clear, just ask and I'll try to explain
the CHALLENGE in a better way.  :wink

Regarding the 80-100-250 bytes as a result of CODE to reproduce the
8,000 bytes file, I was jocking with daydreamer, who told he could do
it in 80 bytes more or less.
My broad estimate is I'll take 800 bytes or more to do it, but I am a
total n00b. Somebody here can do far more better than me.

Frank

Title: Re: Alternative way of compressing data: CHALLENGE # 1
Post by: mineiro on September 03, 2010, 04:48:30 AM
I'm without time for a while, I start develop it, but, I simply cannot end it because personal problems.
After think a while, I start developing it using bios int's and do a .com dos file(to reach these number said by frktons), so I don't have sure if this can be valid (well I'm at least following the rules).
I'm posting my started work(maybe can be usefull to other people). It works fine in XPSP3, I unrolled it to if someone cannot understand bios int's. I commented some lines. And I have attached some usefull functions to your (maybe) future continuation.
Regards, now I'm resolving other problems.

Title: Re: Alternative way of compressing data: CHALLENGE # 1
Post by: frktons on September 03, 2010, 08:48:13 AM
Quote from: mineiro on September 03, 2010, 04:48:30 AM
I'm without time for a while, I start develop it, but, I simply cannot end it because personal problems.
After think a while, I start developing it using bios int's and do a .com dos file(to reach these number said by frktons), so I don't have sure if this can be valid (well I'm at least following the rules).
I'm posting my started work(maybe can be usefull to other people). It works fine in XPSP3, I unrolled it to if someone cannot understand bios int's. I commented some lines. And I have attached some usefull functions to your (maybe) future continuation.
Regards, now I'm resolving other problems.

Thanks mineiro.  :U

Oh! Oh! INT - COM files, BIOS, it looks like 16 bit Assembly.
You are probably trying to directly print on the screen through the
INT and using COM to spare some space.  ::)

Somebody will get inspiration from your attempt, I'm sure about this.

But here we have to use a preexisting program, that uses Windows API,
and has to run on 32/64 bit machine. I am not able to run your example on my
windows 7/64 bit. I'll have a look at it anyway, if can I understant it.  ::)

The only PROC we can change, in order to test the size question, is the
GetConsole PROC that has to be replaced with one/more proc that fill
the buffer without using an external file.

All the remaining of the program has to be the same.

My prototype is at 85% more or less, and it is in BASIC.
I'll post it before tomorrow night, I hope.

Frank


Title: Re: Alternative way of compressing data: CHALLENGE # 1
Post by: frktons on September 03, 2010, 09:42:39 AM
Here it is, 85% complete, BASIC/Assembly PBCC compiled.

NOT OPTIZIMED to any degree, just a quick draft to put an idea
into work. There is a lot of room for optimizing both size and speed,
but it'll be the guideline for my next MASM version.

If anybody would like to have a look, just download it, run it, and let me
know if you get some inspiration from it.

The full version is coming tonight/tomorrow.

Frank
Title: Re: Alternative way of compressing data: CHALLENGE # 1
Post by: daydreamer on September 03, 2010, 10:04:20 AM
Quote from: frktons on September 03, 2010, 09:42:39 AM
If anybody would like to have a look, just download it, run it, and let me
know if you get some inspiration from it.

The full version is coming tonight/tomorrow.

Frank
I tested it and all it does is what you stated earlier, it creates all 256 characters on the fly nothing more, no dword filling, no handle the ascii number counter, no 8k full file
I am running XP
Title: Re: Alternative way of compressing data: CHALLENGE # 1
Post by: frktons on September 03, 2010, 10:23:19 AM
Quote from: daydreamer on September 03, 2010, 10:04:20 AM
Quote from: frktons on September 03, 2010, 09:42:39 AM
If anybody would like to have a look, just download it, run it, and let me
know if you get some inspiration from it.

The full version is coming tonight/tomorrow.

Frank
I tested it and all it does is what you stated earlier, it creates all 256 characters on the fly nothing more, no dword filling, no handle the ascii number counter, no 8k full file
I am running XP


Thanks for explaining me what I did. Now I know  :U

What do you mean with no dword filling? Were I supposed to do it?

I Draw the two boxes, I print the caption, I have the LOOP that generates and fills
the buffer, I have the attributes already into the buffer, I have only to add the code
for the 255 numbers inside the LOOP.

no 8k full file? Are you sure? The buffer I'm filling is 8k, and the external file has
not to be there anymore. In fact it is not there anymore, I'm not using it.

85% done.  And it is an UNOPTIMIZED prototype in BASIC.

And yes, it does exactly what I said, I'm not cheating  :lol

Waiting for your 80 bytes solution. I'll really enjoy iy, IF I'll ever see it.  :clap:

Frank
Title: Re: Alternative way of compressing data: CHALLENGE # 1
Post by: mineiro on September 03, 2010, 10:25:33 AM
Now you need only print in decimal the value of hex(characters) in a field of 3+1 spaces. Nice.
regards.
Title: Re: Alternative way of compressing data: CHALLENGE # 1
Post by: frktons on September 03, 2010, 10:30:08 AM
Quote from: mineiro on September 03, 2010, 10:25:33 AM
Now you need only print in decimal the value of hex(characters) in a field of 3+1 spaces. Nice.
regards.


Yes mineiro. Thanks. I'm filling the 8k buffer, that is a little bit more difficult than writing
directly to the screen, but this is the CHALLENGE, I have to reproduce the 8k file inside the buffer
and display it with WriteConsoleOutput, that is already in the original program.

Working on it...

Frank

Title: Re: Alternative way of compressing data: CHALLENGE # 1
Post by: mineiro on September 03, 2010, 10:45:43 AM
I think while I'm trying this one to write direct in video memory, so, can be a buffer to, but in XP don't worked fine. I'm reading your code now, and I see a diferent idea of yours and mine.
I know you say it is not optimized; in your "TextToDisplay" I removed the spaces and fill it with code, the rest you know, high bit of strings are zeroed. Nice this challenge Sr, I enjoyed much try do it, but console in win32 to me is a new concept.
regards.
Title: Re: Alternative way of compressing data: CHALLENGE # 1
Post by: frktons on September 03, 2010, 12:24:42 PM
Quote from: mineiro on September 03, 2010, 10:45:43 AM
I think while I'm trying this one to write direct in video memory, so, can be a buffer to, but in XP don't worked fine. I'm reading your code now, and I see a diferent idea of yours and mine.
I know you say it is not optimized; in your "TextToDisplay" I removed the spaces and fill it with code, the rest you know, high bit of strings are zeroed. Nice this challenge Sr, I enjoyed much try do it, but console in win32 to me is a new concept.
regards.

Take your time mineiro, this challenge is just started, I haven't seen jet any working code that does
what the challenge requires. I suppose it'll take quite a while. I'am going  myself to take quite a while to
accomplish the task. It is just a spare time hobby. During the next weeks we'll see what shows up.  :U

Frank
Title: Re: Alternative way of compressing data: CHALLENGE # 1
Post by: frktons on September 03, 2010, 05:24:42 PM
The prototype is now complete.  :U

Time to start to clean up the CODE, eliminate the redundant instructions
and translate it into MASM32.

Anyone working on the CHALLENGE?
Let me know.  :P

Title: Re: Alternative way of compressing data: CHALLENGE # 1
Post by: frktons on September 05, 2010, 12:20:55 AM
A couple of routines are ready, the one that sets the color attribute and
the one that Draws the Boxes inside the buffer.

There is already a lot of optimizing to do for size reduction.
These are the new routines:

; -------------------------------------------------------------------------
; The buffer is filled before displaying it
;--------------------------------------------------------------------------

BuildConsole PROC 

    CALL SetBufferColor

    INVOKE BuildBox, 1, 3
    INVOKE BuildBox, 4, 25

    ret
   
BuildConsole ENDP

; -------------------------------------------------------------------------
; Fill the buffer with a box of 80 characters starting at line requested
; till the number of line passed as II parameter
;--------------------------------------------------------------------------

BuildBox PROC Row1:DWORD, Row2:DWORD


    mov  eax,  Row1
    sub  eax,  1
    imul eax,  320
    mov  edx, eax

    lea  eax, ConsoleScreen
    add  eax, edx


    movzx ebx, byte ptr [TopLeft]
    mov   byte ptr [eax],bl

    add  eax, 4

    mov  ecx, 78

    movzx ebx, byte ptr [HorizLine]

    mov  bl,  HorizLine
   
FillBoxLine1:

    mov  byte ptr [eax], bl
    add  eax,   4
    dec  ecx
    jnz  FillBoxLine1

    movzx ebx, byte ptr [TopRight]

    mov  byte ptr [eax], bl
    add  eax,   4

    mov  ecx, Row2
    sub  ecx, Row1
    dec  ecx
    movzx ebx, byte ptr [VerticLine]

FillBorder:


    mov   byte ptr [eax], bl

    add  eax,   4
    mov  edx,   4
    imul edx,  78

    add  eax, edx
    mov  byte ptr [eax], bl
    add  eax,   4   
    dec  ecx
    jnz  FillBorder

    movzx ebx,byte ptr [BottomLeft]
    mov   byte ptr [eax], bl

    add  eax,   4

    mov  ecx, 78

    movzx ebx, byte ptr [HorizLine]

FillBoxLine2:

    mov  byte ptr [eax], bl
    add  eax,   4
    dec  ecx
    jnz  FillBoxLine2

    movzx ebx, byte ptr [BottomRight]
    mov   byte ptr [eax], bl


    ret


BuildBox ENDP

; -------------------------------------------------------------------------
; The buffer is initialized with main color attribute Bright Yellow on Blue
;--------------------------------------------------------------------------

SetBufferColor PROC 

    push edi

    mov edi, offset ConsoleScreen
    mov eax, 201E2020H
    mov ecx, 2000
    rep stosd
   
    pop edi

    ret
   
SetBufferColor ENDP

; -------------------------------------------------------------------------


and attached the stage 0.1 of the program that displays the two boxes
with the chosen color [Bright Yellow on Dark Blue].

Any suggestion to reduce the size?

At the moment the program takes 3,584 bytes versus the 3,584 bytes of
the original one. Replacing the proc that read the external file with these
has taken 0 bytes of size increment.  :P

Because I don't see many alternative posted by more expert asm programmers
let's use my beginner version in order to optimize it for size reduction.

Frank
Title: Re: Alternative way of compressing data: CHALLENGE # 1
Post by: frktons on September 12, 2010, 05:14:18 PM
Just a quick update of the 100:1 compression task.

I added the routine to print the Screen Title with enhanced colors.

The EXE is still the same size as the version with the external file.
No trick whatsoever. Just MS ML/LINK version 10 as always.

At the moment there are serious reasons to think I'm going to demonstrate
that Fourieru [If I correctly remember his nick-name] was right at least
for some specific cases.

Attached the slightly updated version, sorry I was away from my pc or any
pc whatsoever.  :P

And moreover, considering nobody dared to show up with an original idea
for compressing data through code, I'm going to win the challenge as well  :dance:

Frank
Title: Re: Alternative way of compressing data: CHALLENGE # 1
Post by: frktons on September 12, 2010, 11:34:36 PM
And here we are, the complete first draft of the ASCII program, version 0.3,
that has the same size 3.584 bytes than the original one, and displays the
extended ASCII chart without the use of the 8.000 bytes external file.

So apparentely we have reduced the size of the file to zero byte  :P

Of course it is not true, but it is impressive nevertheless.  :U

This is not optimized for reducing the size to the limits, but it gives a first
idea that sometime code can replace data in a very effective way.

Frank
Title: Re: Alternative way of compressing data: CHALLENGE # 1
Post by: Magnum on September 13, 2010, 02:17:01 PM
Nice work Frank.

I been looking for a popup ASCII chart as well.

Title: Re: Alternative way of compressing data: CHALLENGE # 1
Post by: frktons on September 13, 2010, 06:11:48 PM
Quote from: Magnum on September 13, 2010, 02:17:01 PM
Nice work Frank.

I been looking for a popup ASCII chart as well.

You can use the routines that build the buffer and after display it at will
with WriteConsoleOutput.
Just allocate a buffer of 2000 CHAR_INFO, fill it with the routines, and it is
there whenever you need it, ready to pop.

Frank