News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Statistical Functions - Chi-Square tests

Started by Abel, February 11, 2005, 10:36:03 AM

Previous topic - Next topic

Abel

Recent discussion herein with regard to randomness prompted a closer look at the subject, in particular, John Walker's ENT program and chi-square analysis.  His program is byte oriented and byte occurrences are tallied in 256 bins so that bin 33 holds a count of the number of bytes having that value in the data stream, etc.  A chi-square value is obtained by summation of (n(i)-m(i))^2/m(i) over bins.  (For an assumed random distribution, m, the expected value, is a constant.)  A chi-square probability is then calculated by an approximation to the continuous normal distribution and a probability lookup table sets broad range limits of 50%, 25%, 10%,...  Note that the same bin distribution would be obtained were the data sorted beforehand.

While quite appropriate for the program's intended purpose, not all random number sequences are generated modulo 256 and, for critical testing, accurate probabilities are required to show that the probabilities for repeated tests are themselves uniformly distributed.  Although chi-square values may cluster about 256, their associated probabilities should give as many results in the 0-10% range as in the 45-55% range if data fit the assumed distribution.

Finding chi-square probabilities for the general case brings up the "Incomplete Gamma Function" whose calculation is a nontrivial exercise.  It is an important tool in many areas of mathematics and having an accurate implementation deserves  some effort.  I've checked results vs. tables in Abramowitz and Stegun (NBS 55) and they seem to match to the accuracy of the latter.

The link below contains masm source code for a number of functions, the gamma function, the incomplete gamma function, the Gaussian or normal probability function, and the chi-square probability function as well as some lighter stuff for generating arrays of random integers given a random number generator (default included), and performing chi-square analyses on same.  Typically 10^5 numbers of arbitrary modulus can be generated and tested in a fraction of a second and repetitions used to generate an array of probability values for further analysis.  For documentation, see the source code comments.  Feel free to fold, spindle, mutilate or redistribute as freeware.

Note that chi-square testing can only demonstrate that the distribution of numbers is compatible with a random distribution.  It is totally insensitive to serial randomness or the lack thereof (sorting).  Walker includes a test for consecutive bytes, but correlations over longer separations (FFT autocorrelation function?) should also show white noise.

Abel

http://prove66.home.att.net/files/asm/Statpack.zip