News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Entropy

Started by 0x401000, October 19, 2009, 11:55:37 AM

Previous topic - Next topic

0x401000

How can we calculate the entropy, not using float, double, and other types of floating point. Missing functions log and others from math.h. Help please with an example. Thank you!

dedndave

QuoteMissing functions log and others from math.h.
this is not a complete sentance - hard to understand what you mean

as for entropy - entropy of what system ?

redskull

@@: INC entropy
JMP @B

:bg

But seriously, the FPU has everything you should need: FYL2X will do your base 2 log (lg), and you can use the change of base to do ln or log: lg(x)/lg(b), where x is your argument and b is your desired base.  Specifically, what are you missing?

Also, fpu.lib should have the functions as well, along with the tutorial (see the qeditor help menu)
Strange women, lying in ponds, distributing swords, is no basis for a system of government

0x401000

The task is to calculate the entropy of the compressed file without using float, double, and other types of floating point without the functions log and others from math.h   :wink  Also, without using the FPU

dedndave

well - you have to know and understand the compression methodology being used
your prof is probably not looking for a math formula
he is likely expecting an essay type answer that describes how the entropy for that compression may be defined
i.e. what factors in the compression methodology might contribute to errors

0x401000

The problem is not in theory, but how it is implemented in software, for instance on the same assembly, without any additional libraries and without coprocessor

Mirno

You mean compute the differences between the compressed and uncompressed versions of two things.

Compress, decompress, compare to the original.

That's pretty much the basics of it - how you then report that is up to you. In graphics, displaying the XOR is a common method - you should have a black image if the compression is lossless.

For sound - I guess you would need some other representation - but the principal is the same.

Mirno

dedndave

more like - what are the chances of two non-identical files compressing to the same form
or - what is the likelyhood of an error due to compression
what characteristics of the particular compression methodology might contribute to errors
in some cases, there is no chance of a compression error
then, you might describe how that is achieved

here is a good definition that may be applied to data compression: (from wikipedia)
Quote"In words, entropy is just the logarithm of the number of ways of arranging things in the system (times the Boltzmann's constant)."


since you cannot use logarithms, you may start by describing without logarithms how many possible files of a particular size may exist - simple enough

MichaelW

#8
To calculate an entropy value in bits per byte, John Walker's ENT program would use:

for (i = 0; i < 256; i++) {
  if (prob[i] > 0.0) {
    ent += prob[i] * log2(1 / prob[i]);
  }
}


Where prob[] is a 256-element array of doubles were each element contains the number of bytes in the file that have the value of the element's index.

Where prob[] is a 256-element array of doubles were each element contains (the number of bytes in the file that have the value of the element's index) / (the total number of bytes).

And log2 is defined as:

#define log2of10 3.32192809488736234787

/*  LOG2  --  Calculate log to the base 2  */

static double log2(double x)
{
    return log2of10 * log10(x);
}


The problem is in doing the log2(1 / prob) without floating-point calculations. And then there is the question of how accurate the entropy value should be.
eschew obfuscation

FORTRANS

Hi,

   To avoid using the FPU, use a lookup table?  The number of
entries would determine the accuracy.

   Sounds interesting anyway.  Will have to add the entropy to
one of my diagnostic tools.

Regards,

Steve N.

dedndave

i got out some of my old text books
of course, my major was EE - not CS - so i never had to approach entropy for data compression
i remember fighting with it in physics - lol
specifically, in thermodynamics (probably my least favorite science)
at any rate, Michael may have the right idea - you just need to manipulate that equation to eliminate the log's
logarithms may be re-written as exponential expressions

FORTRANS

Hi,

   Well I went to wikipedeia Entropy (information theory),
and pondered a bit.  Got an old tool program that had the
basic parts working, and added the entropy calculation.  It
is interesting I guess, but as a metric for day to day usage,
I'll have to say it's a bit unintuitive.

   To the OP, it does not seem that much accuracy is
needed in the final answer, but the intermediate results
can cover a fairly large range of values.  If I were told
not to use an FPU, then I would use fixed point arithmetic
with a lookup table for doing the logarithm.  I may end
up doing it later as the program I modified runs on a
machine without an FPU.  I don't  see enough of a gain
in usefulness to do it soon.

   If the problem can be bounded to a set of test cases,
so that the range of values is limited, the math should
not be too bad.

   It would be nice to see some test cases worked out
to calibrate against.  Or a practical application.

Regards,

Steve N.

drizz

  ent += prob * log2(1 / prob);

==

  ent += prob * ( log2(1) - log2(prob) );

==

  ent += prob * ( 0 - log2(prob) );

==

  ent += prob * ( - BSR(prob) );


Where prob[] is a 256-element array of DWORDS were each element contains (the number of bytes in the file that have the value of the element’s index -without division by total bytes- )

FURTHER,

X= BSR(totalbytes)

  ent += (prob/totalbytes) * ( X - BSR(prob) );

==

  ent += prob *(X - BSR(prob);
...after loop
  ent /=totalbytes

SUM(Xi/Y) == SUM(Xi)/Y
The truth cannot be learned ... it can only be recognized.

dedndave

yah - what he said ^

dsouza123

An example using real numbers in the calculations

If instead of 256 integer values with a range of 0 to 255,
there were only 2 values 0 and 1
and the number of samples were 4.


samples db 0, 1, 1, 1

for each of the values the following calculations
value      0,      1
count      1,      3
fraction   0.2500, 0.7500  for 1/4 and 3/4  aka prob  or  percentage
reciprocal 4.0000, 1.3333  reciprocal of the fraction  aka 1 / fraction
log2 recip 2.0000, 0.4150  log2 8 = 3  log2 16 = 4  log2 2 = 1  log2 1 = 0  log2 4 = 2
ent        0.5000, 0.3112  fraction * log2 recip 

total ent          0.8112  sum of the ent of each value


What is needed is an alternate to floating point,
as already mentioned fixed point or maybe some fraction representation could work,
for the base 2 log
the integer part works with bsr
calculating the fractional part needs a different algorithm