Print Page - Entropy

Title: Entropy
Post by: 0x401000 on October 19, 2009, 11:55:37 AM

How can we calculate the entropy, not using float, double, and other types of floating point. Missing functions log and others from math.h. Help please with an example. Thank you!

Title: Re: Entropy
Post by: dedndave on October 19, 2009, 12:12:30 PM

QuoteMissing functions log and others from math.h.

this is not a complete sentance - hard to understand what you mean

as for entropy - entropy of what system ?

Title: Re: Entropy
Post by: redskull on October 19, 2009, 12:26:01 PM

@@: INC entropy
JMP @B

:bg

But seriously, the FPU has everything you should need: FYL2X will do your base 2 log (lg), and you can use the change of base to do ln or log: lg(x)/lg(b), where x is your argument and b is your desired base. Specifically, what are you missing?

Also, fpu.lib should have the functions as well, along with the tutorial (see the qeditor help menu)

Title: Re: Entropy
Post by: 0x401000 on October 19, 2009, 12:36:16 PM

The task is to calculate the entropy of the compressed file without using float, double, and other types of floating point without the functions log and others from math.h :wink Also, without using the FPU

Title: Re: Entropy
Post by: dedndave on October 19, 2009, 04:02:01 PM

well - you have to know and understand the compression methodology being used
your prof is probably not looking for a math formula
he is likely expecting an essay type answer that describes how the entropy for that compression may be defined
i.e. what factors in the compression methodology might contribute to errors

Title: Re: Entropy
Post by: 0x401000 on October 19, 2009, 04:06:52 PM

The problem is not in theory, but how it is implemented in software, for instance on the same assembly, without any additional libraries and without coprocessor

Title: Re: Entropy
Post by: Mirno on October 19, 2009, 04:26:41 PM

You mean compute the differences between the compressed and uncompressed versions of two things.

Compress, decompress, compare to the original.

That's pretty much the basics of it - how you then report that is up to you. In graphics, displaying the XOR is a common method - you should have a black image if the compression is lossless.

For sound - I guess you would need some other representation - but the principal is the same.

Mirno

Title: Re: Entropy
Post by: dedndave on October 19, 2009, 04:46:53 PM

more like - what are the chances of two non-identical files compressing to the same form
or - what is the likelyhood of an error due to compression
what characteristics of the particular compression methodology might contribute to errors
in some cases, there is no chance of a compression error
then, you might describe how that is achieved

here is a good definition that may be applied to data compression: (from wikipedia)

Quote"In words, entropy is just the logarithm of the number of ways of arranging things in the system (times the Boltzmann's constant)."

since you cannot use logarithms, you may start by describing without logarithms how many possible files of a particular size may exist - simple enough

Title: Re: Entropy
Post by: MichaelW on October 19, 2009, 06:38:40 PM

To calculate an entropy value in bits per byte, John Walker's ENT program would use:

Code Select


for (i = 0; i < 256; i++) {
  if (prob[i] > 0.0) {
    ent += prob[i] * log2(1 / prob[i]);
  }
}

~~Where prob[] is a 256-element array of doubles were each element contains the number of bytes in the file that have the value of the element's index.~~

Where prob[] is a 256-element array of doubles were each element contains (the number of bytes in the file that have the value of the element's index) / (the total number of bytes).

And log2 is defined as:

Code Select


#define log2of10 3.32192809488736234787

/*  LOG2  --  Calculate log to the base 2  */

static double log2(double x)
{
    return log2of10 * log10(x);
}

The problem is in doing the log2(1 / prob) without floating-point calculations. And then there is the question of how accurate the entropy value should be.

Title: Re: Entropy
Post by: FORTRANS on October 19, 2009, 06:47:15 PM

Hi,

To avoid using the FPU, use a lookup table? The number of
entries would determine the accuracy.

Sounds interesting anyway. Will have to add the entropy to
one of my diagnostic tools.

Regards,

Steve N.

Title: Re: Entropy
Post by: dedndave on October 19, 2009, 07:16:37 PM

i got out some of my old text books
of course, my major was EE - not CS - so i never had to approach entropy for data compression
i remember fighting with it in physics - lol
specifically, in thermodynamics (probably my least favorite science)
at any rate, Michael may have the right idea - you just need to manipulate that equation to eliminate the log's
logarithms may be re-written as exponential expressions

Title: Re: Entropy
Post by: FORTRANS on October 20, 2009, 08:58:41 PM

Hi,

Well I went to wikipedeia Entropy (information theory),
and pondered a bit. Got an old tool program that had the
basic parts working, and added the entropy calculation. It
is interesting I guess, but as a metric for day to day usage,
I'll have to say it's a bit unintuitive.

To the OP, it does not seem that much accuracy is
needed in the final answer, but the intermediate results
can cover a fairly large range of values. If I were told
not to use an FPU, then I would use fixed point arithmetic
with a lookup table for doing the logarithm. I may end
up doing it later as the program I modified runs on a
machine without an FPU. I don't see enough of a gain
in usefulness to do it soon.

If the problem can be bounded to a set of test cases,
so that the range of values is limited, the math should
not be too bad.

It would be nice to see some test cases worked out
to calibrate against. Or a practical application.

Regards,

Steve N.

Title: Re: Entropy
Post by: drizz on October 23, 2009, 04:58:29 PM

ent += prob * log2(1 / prob);

==

ent += prob * ( log2(1) - log2(prob) );

==

ent += prob * ( 0 - log2(prob) );

==

ent += prob * ( - BSR(prob) );

Where prob[] is a 256-element array of DWORDS were each element contains (the number of bytes in the file that have the value of the element’s index -without division by total bytes- )

FURTHER,

X= BSR(totalbytes)

ent += (prob/totalbytes) * ( X - BSR(prob) );

==

ent += prob *(X - BSR(prob);
...after loop
ent /=totalbytes

SUM(Xi/Y) == SUM(Xi)/Y

Title: Re: Entropy
Post by: dedndave on October 23, 2009, 07:11:34 PM

yah - what he said ^

Title: Re: Entropy
Post by: dsouza123 on November 15, 2009, 07:06:00 AM

An example using real numbers in the calculations

If instead of 256 integer values with a range of 0 to 255,
there were only 2 values 0 and 1
and the number of samples were 4.

Code Select


samples db 0, 1, 1, 1

for each of the values the following calculations
value      0,      1
count      1,      3
fraction   0.2500, 0.7500  for 1/4 and 3/4  aka prob  or  percentage
reciprocal 4.0000, 1.3333  reciprocal of the fraction  aka 1 / fraction
log2 recip 2.0000, 0.4150  log2 8 = 3  log2 16 = 4  log2 2 = 1  log2 1 = 0  log2 4 = 2
ent        0.5000, 0.3112  fraction * log2 recip  

total ent          0.8112  sum of the ent of each value

What is needed is an alternate to floating point,
as already mentioned fixed point or maybe some fraction representation could work,
for the base 2 log
the integer part works with bsr
calculating the fractional part needs a different algorithm

Title: Re: Entropy
Post by: dsouza123 on November 23, 2009, 03:52:31 AM

A test file test1.exe 2560 bytes long was used to evaluate entropy.

A table with the count of occurences of each byte from test1.ex was constructed.

Fixed point with 32 . 32 dword integer part and dword fractional part was used to store the numbers.

The log 2 function was only exact if the number was a power of 2 with no fraction example 128.
A number such as 160 was broken down to log 2 of 128 plus log 2 (1.25) from a 16 entry lookup table,
so the values are approximate. 1.25 is 160/128
log 2 ( 128 * 1.25) -> log 2 ( 2^7 * 1.25) -> log 2 ( 2^7) + log 2 (1.25) -> 7 log 2 ( 2 ) + log 2 ( 1.25 ) -> 7 + log 2 ( 1.25 )

There is a formula using continued fractions that will directly calculate the ln ( log e, natural logarithm ) of a number.
First the value's range is brought down below e using something similar to the preceding transformation.
Then ln(value) / ln(2) will transform it to a base 2 logarithm. This formula wasn't used.

The proper base for an 8 bit item is base 256 so using base 2 gives a range 0.0 to 8.0
scaling by 1/8 corrects it to 0.0 to 1.0

If the items were 1 bit then base 2 would be appropriate.

Title: Re: Entropy
Post by: 0x401000 on November 23, 2009, 04:32:10 PM

Wow, joy has no limits, thanks! :U

The MASM Forum Archive 2004 to 2012

General Forums => The Workshop => Topic started by: 0x401000 on October 19, 2009, 11:55:37 AM