How can we calculate the entropy, not using float, double, and other types of floating point. Missing functions log and others from math.h. Help please with an example. Thank you!
QuoteMissing functions log and others from math.h.
this is not a complete sentance - hard to understand what you mean
as for entropy - entropy of what system ?
@@: INC entropy
JMP @B
:bg
But seriously, the FPU has everything you should need: FYL2X will do your base 2 log (lg), and you can use the change of base to do ln or log: lg(x)/lg(b), where x is your argument and b is your desired base. Specifically, what are you missing?
Also, fpu.lib should have the functions as well, along with the tutorial (see the qeditor help menu)
The task is to calculate the entropy of the compressed file without using float, double, and other types of floating point without the functions log and others from math.h :wink Also, without using the FPU
well - you have to know and understand the compression methodology being used
your prof is probably not looking for a math formula
he is likely expecting an essay type answer that describes how the entropy for that compression may be defined
i.e. what factors in the compression methodology might contribute to errors
The problem is not in theory, but how it is implemented in software, for instance on the same assembly, without any additional libraries and without coprocessor
You mean compute the differences between the compressed and uncompressed versions of two things.
Compress, decompress, compare to the original.
That's pretty much the basics of it - how you then report that is up to you. In graphics, displaying the XOR is a common method - you should have a black image if the compression is lossless.
For sound - I guess you would need some other representation - but the principal is the same.
Mirno
more like - what are the chances of two non-identical files compressing to the same form
or - what is the likelyhood of an error due to compression
what characteristics of the particular compression methodology might contribute to errors
in some cases, there is no chance of a compression error
then, you might describe how that is achieved
here is a good definition that may be applied to data compression: (from wikipedia)
Quote"In words, entropy is just the logarithm of the number of ways of arranging things in the system (times the Boltzmann's constant)."
since you cannot use logarithms, you may start by describing without logarithms how many possible files of a particular size may exist - simple enough
To calculate an entropy value in bits per byte, John Walker's ENT program would use:
for (i = 0; i < 256; i++) {
if (prob[i] > 0.0) {
ent += prob[i] * log2(1 / prob[i]);
}
}
Where prob[] is a 256-element array of doubles were each element contains the number of bytes in the file that have the value of the element's index.
Where prob[] is a 256-element array of doubles were each element contains (the number of bytes in the file that have the value of the element's index) / (the total number of bytes).
And log2 is defined as:
#define log2of10 3.32192809488736234787
/* LOG2 -- Calculate log to the base 2 */
static double log2(double x)
{
return log2of10 * log10(x);
}
The problem is in doing the log2(1 / prob) without floating-point calculations. And then there is the question of how accurate the entropy value should be.
Hi,
To avoid using the FPU, use a lookup table? The number of
entries would determine the accuracy.
Sounds interesting anyway. Will have to add the entropy to
one of my diagnostic tools.
Regards,
Steve N.
i got out some of my old text books
of course, my major was EE - not CS - so i never had to approach entropy for data compression
i remember fighting with it in physics - lol
specifically, in thermodynamics (probably my least favorite science)
at any rate, Michael may have the right idea - you just need to manipulate that equation to eliminate the log's
logarithms may be re-written as exponential expressions
Hi,
Well I went to wikipedeia Entropy (information theory),
and pondered a bit. Got an old tool program that had the
basic parts working, and added the entropy calculation. It
is interesting I guess, but as a metric for day to day usage,
I'll have to say it's a bit unintuitive.
To the OP, it does not seem that much accuracy is
needed in the final answer, but the intermediate results
can cover a fairly large range of values. If I were told
not to use an FPU, then I would use fixed point arithmetic
with a lookup table for doing the logarithm. I may end
up doing it later as the program I modified runs on a
machine without an FPU. I don't see enough of a gain
in usefulness to do it soon.
If the problem can be bounded to a set of test cases,
so that the range of values is limited, the math should
not be too bad.
It would be nice to see some test cases worked out
to calibrate against. Or a practical application.
Regards,
Steve N.
ent += prob * log2(1 / prob);
==
ent += prob * ( log2(1) - log2(prob) );
==
ent += prob * ( 0 - log2(prob) );
==
ent += prob * ( - BSR(prob) );
Where prob[] is a 256-element array of DWORDS were each element contains (the number of bytes in the file that have the value of the element’s index -without division by total bytes- )
FURTHER,
X= BSR(totalbytes)
ent += (prob/totalbytes) * ( X - BSR(prob) );
==
ent += prob *(X - BSR(prob);
...after loop
ent /=totalbytes
SUM(Xi/Y) == SUM(Xi)/Y
yah - what he said ^
An example using real numbers in the calculations
If instead of 256 integer values with a range of 0 to 255,
there were only 2 values 0 and 1
and the number of samples were 4.
samples db 0, 1, 1, 1
for each of the values the following calculations
value 0, 1
count 1, 3
fraction 0.2500, 0.7500 for 1/4 and 3/4 aka prob or percentage
reciprocal 4.0000, 1.3333 reciprocal of the fraction aka 1 / fraction
log2 recip 2.0000, 0.4150 log2 8 = 3 log2 16 = 4 log2 2 = 1 log2 1 = 0 log2 4 = 2
ent 0.5000, 0.3112 fraction * log2 recip
total ent 0.8112 sum of the ent of each value
What is needed is an alternate to floating point,
as already mentioned fixed point or maybe some fraction representation could work,
for the base 2 log
the integer part works with bsr
calculating the fractional part needs a different algorithm
A test file test1.exe 2560 bytes long was used to evaluate entropy.
A table with the count of occurences of each byte from test1.ex was constructed.
Fixed point with 32 . 32 dword integer part and dword fractional part was used to store the numbers.
The log 2 function was only exact if the number was a power of 2 with no fraction example 128.
A number such as 160 was broken down to log 2 of 128 plus log 2 (1.25) from a 16 entry lookup table,
so the values are approximate. 1.25 is 160/128
log 2 ( 128 * 1.25) -> log 2 ( 2^7 * 1.25) -> log 2 ( 2^7) + log 2 (1.25) -> 7 log 2 ( 2 ) + log 2 ( 1.25 ) -> 7 + log 2 ( 1.25 )
There is a formula using continued fractions that will directly calculate the ln ( log e, natural logarithm ) of a number.
First the value's range is brought down below e using something similar to the preceding transformation.
Then ln(value) / ln(2) will transform it to a base 2 logarithm. This formula wasn't used.
The proper base for an 8 bit item is base 256 so using base 2 gives a range 0.0 to 8.0
scaling by 1/8 corrects it to 0.0 to 1.0
If the items were 1 bit then base 2 would be appropriate.
Wow, joy has no limits, thanks! :U