_Perfect_ simulation of a lotto-machine

cobold · November 10, 2007, 11:08:35 PM

Dear fellow programmers,

I have written an algorithm to simulate an austrian lotto-machine (draws 6 numbers from a funnel containing 45 numbers, plus a so called "additional number").
To my surprise the program works :toothy but the results are slightly different as expected from the theory of probabilities. :(

That's why I would appreciate any feedback from you experts, especially feedback concerning
a) the design of the algorithm itself (as a newbie to MASM I'm curious about how experts judge my code, and how I could possibly enhance it)
b) my ideas about this simulation, and if and where they are possibly wrong.

Should you think that this is off-topic here, please tell me and forgive me :U - I won't bother you then any more

In medias res:
When one takes a look at the numbers that the lotto-machine has produced since a.d. 1986 - when lotto was introduced in good old austria - one finds out that these numbers are distributed fairly even. By the way, :bg I'm mentioning austrian lotto not because I want to advertise for them, but because to point out that I'm referring to "real-world-data".
--> First requirement of algorithm: Produce random numbers between 1 ... 45 that are evenly distributed.
E.g. If I let generate the algo 0ffffffffh drawings, I expect every number to occur _more or less_ 572 662 306 times (2^21-1)*6/45
--> Second requirement â€" so I thought: decrement the range by one for every number drawn, because with each number drawn, one number less is in the funnel.
AND HEREâ€™S THE â€žFUNNYâ€œ THING:
If I call create_draw WITHOUT decrementing range the numbers are evenly distributed, but the â€œwinsâ€? are too less according to theory.
If I call create_draw WITH decrementing range, numbers 1..40 occur more than expected, degrading more and more until 45, but I have too much â€œwinsâ€? according to theory.
If someone could enlighten me on that, Iâ€™d be more than happy, and should this someone ever be in Vienna just call me and I spend him/her as many beer or whatever drink he/she prefers

[attachment deleted by admin]

hutch-- · November 10, 2007, 11:25:04 PM

Hi,

Quote
--> First requirement of algorithm: Produce random numbers between 1 ... 45 that are evenly distributed.

This will skew your results as the notion of "random" and evenly distributed across a numeric range are contradictory requirements. The closer you come to a random method the more likely the result that the distribution will not be uniform over the numeric range.

cobold · November 10, 2007, 11:41:19 PM

Hello Hutch

1st of all, thanks for your reply. There's just one thing I don't understand:

f.ex. a dice: If you throw a dice f.ex. 2^32 times shouldn't the numbers be _more or less_ equally distributed and all the same "randomly"?
Perhaps I'm too silly to understand, but please explain me why "randomly" and equally "distributed" is a contradiction?
It's because I'm stuck.
I mean, isn't it funny that we can fly to the moon, produce atomic bombs and at the same time we are not able to predict which 6 little balls out of 45 will come out of a machine?

kindest regards
Klaus- the cobold, because he was a COBOL-Programmer once

hutch-- · November 11, 2007, 12:15:29 AM

Klaus,

T he notion of "random" is literally "without order" where equal distribution is to some extent "ordered". Vaguely I remember a lotto analysis here in OZ where I live that produced a "Lotto" distribution curve based on every lotto draw since it started and it was by no means ordered or evenly distributed.

Basically the more "unordered" your random distribution is the more powerful it is.

cobold · November 11, 2007, 12:52:17 AM

hutch,

thks again for replying,

a question, what is "OZ" New Zealand, Australia?

about "evenly distributed", if u take a look at www.win2day.at/gaming/index.html you'll see the numbers came between 259 times and 197 times.
At the beginning they made a "draw" every sunday, since a few years- I don't know exactly since when- they make draws every sunday and wednesday.
I'll find that out, but taken 1600 rounds the probalibility for each nbr ist:
(1600 rounds * 6 nbrs per round) /45 numbers = 213 per nbr
But okay, I have to do my homework and find out what is the exact nbr of rounds. Anyway, my idea behind all of this ist: simulate the machine as good as possible and thereby enhance your chance of winning.

Comments? Wellcome!

rgds
Klaus

zooba · November 11, 2007, 01:43:28 AM

If you're intending to use this to determine the most likely numbers to be drawn, biasing the likelihood of each number being drawn seems to defeat the purpose.

I fully believe that a lotto machine is uniform (that is, each ball has an equal probability of being selected) but the number of possible combinations far exceeds the number of draws that have ever been made. Any distribution that is based on less than a few billion draws is going to be flawed.

Searching Google Scholar or something similar (to find academic papers) will likely find some very thorough analysis of the subject.

Cheers,

Zooba :U

MichaelW · November 11, 2007, 04:14:35 AM

I also believe that the numbers produced by most lotto machines come close to a uniform distribution, itâ€™s just not apparent because the available samples are too small. AFAIK most PRNGs are designed to produce numbers with a uniform distribution, and the fixed version of nrandom does a reasonable job of this. This code compares the distribution produced by nrandom to the distribution produced by the Windows CSP RNG.

Code Select


; Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«
Â  Â  include \masm32\include\masm32rt.inc
Â  Â  include \masm32\include\advapi32.inc
Â  Â  includelib \masm32\lib\advapi32.lib
; Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«
Â  Â  .data
Â  Â  Â  cnts dd 10 dup(0)
Â  Â  .code
; Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«

; Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«
; This proc collects random bytes generated by the cryptographic
; service provider (CSP), and returns a 32-bit integer in the
; interval 0 to base-1.
;
; Everything is in a single procedure to make automatic release
; of the CSP handle simple. The large buffer is a quick and dirty
; correction for the slowness of the CSP generator.
; Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«
.data
Â  hProvÂ  Â  Â  Â dd 0
Â  rand_buffer dd 10000 dup(0)
Â  pNextÂ  Â  Â  Â dd 0
.code

PROV_RSA_FULL equ 1

; ------------------------------------------------------------
; This value for the dwFlags parameter of CryptAcquireContext
; is necessary for Windows 2000. For Windows 98 the parameter
; can be zero.
; ------------------------------------------------------------

CRYPT_VERIFYCONTEXT equ 0F0000000h

align 4
csp_rand proc uses ebx base:DWORD
Â  Â  mov ebx,pNext
Â  Â  .IF ebx == 0
Â  Â  Â  Â  invoke CryptAcquireContext, ADDR hProv,
Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  NULL,
Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  NULL,
Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  PROV_RSA_FULL,
Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  CRYPT_VERIFYCONTEXT
Â  Â  Â  Â  .IF eax == 0
Â  Â  Â  Â  Â  Â  print "CryptAcquireContext failed",13,10
Â  Â  Â  Â  Â  Â  ret
Â  Â  Â  Â  .ENDIF
Â  Â  Â  Â  invoke CryptGenRandom, hProv, 10000*4, ADDR rand_buffer
Â  Â  Â  Â  .IF eax == 0
Â  Â  Â  Â  Â  Â  print "CryptGenRandom failed",13,10
Â  Â  Â  Â  .ENDIF
Â  Â  Â  Â  invoke CryptReleaseContext, hProv, 0
Â  Â  .ENDIF
Â  Â  mov eax, [rand_buffer+ebx]
Â  Â  add ebx, 4
Â  Â  .IF ebx > 9999*4
Â  Â  Â  xor ebx, ebx
Â  Â  .ENDIF
Â  Â  mov pNext, ebx
Â  Â  xor edx, edx
Â  Â  div base
Â  Â  mov eax, edx
Â  Â  ret
csp_rand endp

; Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«
start:
; Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«
Â  Â  SAMPLES equ 10000000

Â  Â  xor ebx, ebx
Â  Â  .WHILE ebx < SAMPLES
Â  Â  Â  invoke nrandom, 10
Â  Â  Â  inc [cnts+eax*4]
Â  Â  Â  inc ebx
Â  Â  .ENDW
Â  Â  xor ebx, ebx
Â  Â  .WHILE ebx < 10
Â  Â  Â  print ustr$(ebx),9
Â  Â  Â  print ustr$([cnts+ebx*4]),9
Â  Â  Â  sub [cnts+ebx*4], SAMPLES / 10
Â  Â  Â  js @F
Â  Â  Â  print chr$("+")
Â  Â  @@:
Â  Â  Â  add edi, [cnts+ebx*4]Â  
Â  Â  Â  print str$([cnts+ebx*4]),13,10
Â  Â  Â  mov [cnts+ebx*4], 0
Â  Â  Â  inc ebx
Â  Â  .ENDW

Â  Â  print chr$(13,10)

Â  Â  xor ebx, ebx
Â  Â  .WHILE ebx < SAMPLES
Â  Â  Â  invoke csp_rand, 10
Â  Â  Â  inc [cnts+eax*4]
Â  Â  Â  inc ebx
Â  Â  .ENDW
Â  Â  xor ebx, ebx
Â  Â  .WHILE ebx < 10
Â  Â  Â  print ustr$(ebx),9
Â  Â  Â  print ustr$([cnts+ebx*4]),9
Â  Â  Â  sub [cnts+ebx*4], SAMPLES / 10
Â  Â  Â  js @F
Â  Â  Â  print chr$("+")
Â  Â  @@:Â  
Â  Â  Â  print str$([cnts+ebx*4]),13,10
Â  Â  Â  mov [cnts+ebx*4], 0
Â  Â  Â  inc ebx
Â  Â  .ENDW

Â  Â  inkey "Press any key to exit..."
Â  Â  exit
; Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«
end start

Code Select


0Â  Â  Â  Â 1001901 +1901
1Â  Â  Â  Â 999198Â  -802
2Â  Â  Â  Â 1001659 +1659
3Â  Â  Â  Â 1000060 +60
4Â  Â  Â  Â 999904Â  -96
5Â  Â  Â  Â 1000079 +79
6Â  Â  Â  Â 999780Â  -220
7Â  Â  Â  Â 998903Â  -1097
8Â  Â  Â  Â 1000925 +925
9Â  Â  Â  Â 997591Â  -2409

0Â  Â  Â  Â 1000055 +55
1Â  Â  Â  Â 999199Â  -801
2Â  Â  Â  Â 1001228 +1228
3Â  Â  Â  Â 999010Â  -990
4Â  Â  Â  Â 999554Â  -446
5Â  Â  Â  Â 1001137 +1137
6Â  Â  Â  Â 1000259 +259
7Â  Â  Â  Â 999340Â  -660
8Â  Â  Â  Â 999389Â  -611
9Â  Â  Â  Â 1000829 +829

u · November 11, 2007, 07:19:07 AM

The better your nrandom() function is, the worse distribution you'd get with that current approach:

Code Select


        inc [arr+eax*4]             ; Merken, wie oft die Zahl gekommen ist
        .if [arr+eax*4] > 1         ; Wenn Zahl bereits gekommen
            dec [arr+eax*4]
            jmp @B ; !!!!!! this. You're forcing another nrandom() call, instead of using the data
        .endif

If nrandom() returns "0", take the first empty slot. If nrandom() returns "33", take the 34th empty slot.
It's not striking that ball 45 is rarely taken - you give a chance for it to be taken _only_ on the first draw! And 44 - only on the first two draws. That's why balls "1" to "40" are getting good even results currently.

Btw,

Code Select


(line 120)    .until ebx == range

This is a future bug waiting to happen. Use "Max" instead of "range" :). Same for line 139, but you have to restore the initial value of Max somehow.

u · November 11, 2007, 07:41:11 AM

Here's an outline of the above idea:

Code Select


create_draw proc Max,Num
	local Taken[range]:BYTE
	
	inc Num ; so that we fit the "bonus number"
	
	;-----[ clear Taken ]-------[
	mov ebx,Max
	@@:
		dec ebx
		mov Taken[ebx],0
		jnz @B
	;---------------------------/
	
	
	xor ebx,ebx
	.repeat
		invoke nrandom,Max
		;---[ find eax-th empty slot ]------[
		mov ecx,-1
		inc eax
		
		@@:
		inc ecx
		
		cmp Taken[ecx],0
		jne @B
		
		dec eax
		jnz @B
		mov Taken[ecx],1
		; ecx is the index 0...44 now
		;-----------------------------------/
		
		inc ecx
		mov [draw+ebx*4], ecx

		inc ebx
		dec Max
	.until ebx==Num


	ret
create_draw endp

jj2007 · November 11, 2007, 12:39:09 PM

Quote from: Ultrano on November 11, 2007, 07:19:07 AMIf nrandom() returns "0", take the first empty slot. If nrandom() returns "33", take the 34th empty slot.
It's not striking that ball 45 is rarely taken - you give a chance for it to be taken _only_ on the first draw!

Indeed. What you might do instead is to offer the full 45 in each loop, check if the drawn number has been used already, and discard if it has been used.

MichaelW · November 11, 2007, 03:36:26 PM

This code displays the distribution for 1,000,000,000 groups of 6 unique numbers from the range 0 to 44.

Code Select


; Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«
Â  Â  include \masm32\include\masm32rt.inc
; Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«
Â  Â  .data
Â  Â  Â  cntsÂ  dd 45 dup(0)
Â  Â  Â  flags dd 45 dup(0)
Â  Â  .code
; Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«
start:
; Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«
Â  Â  SAMPLES equ 1000000000

Â  Â  xor ebx, ebx
Â  Â  .WHILE ebx < SAMPLES
Â  Â  Â  invoke RtlZeroMemory, ADDR flags, 45*4
Â  Â  Â  xor esi, esi
Â  Â  Â  .WHILE esi < 6
Â  Â  Â  Â  invoke nrandom, 45
Â  Â  Â  Â  .IF DWORD PTR[flags+eax*4] == 0
Â  Â  Â  Â  Â  inc DWORD PTR[flags+eax*4]
Â  Â  Â  Â  Â  inc [cnts+eax*4]
Â  Â  Â  Â  Â  inc esi
Â  Â  Â  Â  .ENDIF
Â  Â  Â  .ENDW
Â  Â  Â  inc ebx
Â  Â  .ENDW

Â  Â  xor ebx, ebx
Â  Â  .WHILE ebx < 45
Â  Â  Â  print ustr$([cnts+ebx*4]),13,10
Â  Â  Â  inc ebx
Â  Â  .ENDW

Â  Â  print chr$(13,10)

Â  Â  inkey "Press any key to exit..."
Â  Â  exit
; Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«
end start

Code Select

jj2007 · November 11, 2007, 04:14:39 PM

Quote from: jj2007 on November 11, 2007, 12:39:09 PM
Quote from: Ultrano on November 11, 2007, 07:19:07 AMIf nrandom() returns "0", take the first empty slot. If nrandom() returns "33", take the 34th empty slot.
It's not striking that ball 45 is rarely taken - you give a chance for it to be taken _only_ on the first draw!
Indeed. What you might do instead is to offer the full 45 in each loop, check if the drawn number has been used already, and discard if it has been used.

Other option, maybe more precise: Instead of discarding and repeating, provide an initial table with 45 entries
balls db 1,2,3,4,5,6, ... 44,45
.. and apply the random generator with equal chances for all 45 entries.
Now assume that 5 has been drawn in the first round:
- copy 6...45 into the initial 5..44 slots
- which results in a modified table 1,2,3,4,6,...45
- apply the random generator with equal chances for all remaining 44 entries

This procedure mimics exactly what happens in reality.

cobold · November 11, 2007, 08:35:32 PM

Thanks to everone for your feedback. For now I'm trying to adept all the ideas.

Mark Jones · November 11, 2007, 09:07:40 PM

A 64kb serial result of Agner Fog's prng routine (analyzed with ent.exe) yields:

Code Select

Entropy = 7.997172 bits per byte.

Optimum compression would reduce the size
of this 65536 byte file by 0 percent.

Chi square distribution for 65536 samples is 256.51, and randomly
would exceed this value 50.00 percent of the times.

Arithmetic mean value of data bytes is 127.9213 (127.5 = random).
Monte Carlo value for Pi is 3.130562168 (error 0.35 percent).
Serial correlation coefficient is 0.003217 (totally uncorrelated = 0.0).

That's the best PRNG I've seen. For larger sample sets (1MB, 10MB) the numbers approach the ideals.

Tedd · November 11, 2007, 09:17:56 PM

Quote from: jj2007 on November 11, 2007, 04:14:39 PM
Other option, maybe more precise: Instead of discarding and repeating, provide an initial table with 45 entries
balls db 1,2,3,4,5,6, ... 44,45
.. and apply the random generator with equal chances for all 45 entries.
Now assume that 5 has been drawn in the first round:
- copy 6...45 into the initial 5..44 slots
- which results in a modified table 1,2,3,4,6,...45
- apply the random generator with equal chances for all remaining 44 entries

This procedure mimics exactly what happens in reality.

To avoid the copying (which is inherently slow), you can count through the list until you find the Nth undrawn number.
Or, if you have complete faith in randomness (you shouldn't :lol) then you can swap the Nth number with the last element of the list -- a random random number is still random.

News:

_Perfect_ simulation of a lotto-machine