News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Fast random generator

Started by stanhebben, May 27, 2006, 10:05:56 AM

Previous topic - Next topic

stanhebben

A fair while ago I discovered a fast random generator. It uses an algorithm explained here:http://www.agner.org/random/.

I rewrote the code a bit, and applied Mark Larsen's ideas to speed up the code even more.

These are the macro's:
- WBRandom, generates a 64-bit number in EDX:EAX
- WBRandom2, generates a 32-bit number in EAX
- WRandom, generates a floating-point number
- WIRandom, generates a number in a specified range
- WDRandom, generates a double-precision floating-point number
- WSRandom, generates a single-precision floating-point number

Several algoritms are benchmarked in a sample program.

Here are the results on a 2.66GHz celeron:

11 cycles, WBRandom
9 cycles, WBRandom - lookup
7 cycles, WBRandom - 32 bit
5 cycles, WBRandom - 32 bit & lookup
2073 cycles, WRandom
23 cycles, WIRandom
74 cycles, WDRandom
10 cycles, WSRandom
Press any key to exit...


Note that WRandom uses FPU instructions and is very slow on an intel CPU, so use the WDRandom (double) or WSRandom (single) macros instead.

These macros also seem to run *way* faster on an AMD cpu, I'll post the results soon. I remember the last WBRandom variant to run in a single cycle.

[attachment deleted by admin]

Ossa

Athlon 64 Mobile 3400+

5 cycles, WBRandom
5 cycles, WBRandom - lookup
4 cycles, WBRandom - 32 bit
1 cycles, WBRandom - 32 bit & lookup
37 cycles, WRandom
7 cycles, WIRandom
25 cycles, WDRandom
6 cycles, WSRandom


Ossa
Website (very old): ossa.the-wot.co.uk

hutch--

Stan,

These look great, a fast random is useful in many places. have you had the time to test it using Johnny Walker's ENT ? I keep it on the forum web site for this purpose.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

stanhebben

I didn't know about ENT, but I'll certainly do some tests [edit]today[/edit].

stanhebben

I wrote a little program which writes some  bytes to a file, to perform tests with. The fastest version of the ranrot macro's is used (that is, M_WRandom2L). Adjust the DATA_SIZE constant to obtain different file sizes.

The ENT tests gave the following results: (64KB file)

Entropy = 7.997461 bits per byte.

Optimum compression would reduce the size
of this 65536 byte file by 0 percent.

Chi square distribution for 65536 samples is 229.18, and randomly
would exceed this value 75.00 percent of the times.

Arithmetic mean value of data bytes is 127.7784 (127.5 = random).
Monte Carlo value for Pi is 3.138619300 (error 0.09 percent).
Serial correlation coefficient is -0.002759 (totally uncorrelated = 0.0).


Nice results.

[attachment deleted by admin]

hutch--

Stan,

They are good results, it would be interesting to see what the results are like on a much bigger sample, say a meg or even bigger.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

stanhebben

Ok, here with a 16MB file:


Entropy = 7.999990 bits per byte.

Optimum compression would reduce the size
of this 16777216 byte file by 0 percent.

Chi square distribution for 16777216 samples is 238.79, and randomly
would exceed this value 75.00 percent of the times.

Arithmetic mean value of data bytes is 127.4967 (127.5 = random).
Monte Carlo value for Pi is 3.143805777 (error 0.07 percent).
Serial correlation coefficient is -0.000293 (totally uncorrelated = 0.0).

hutch--

Yes,

This is a good result and the random distribution appears to be consistent from small samples to large ones so I think you have a winner here. This type of speed and precision will be very useful in a number of places.  :U
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

MichaelW

The generated numbers also appear to be random in a visual (scatter plot) test. In the attachment, I modified randrottest2.asm so I could use it as an include (commented out the end directive and the start label), and eliminated the calls to the M_WDRandom and M_WSRandom macros so I could assemble with MASM6.14. The large speed difference is not apparent because the update code is in the message loop.






[attachment deleted by admin]
eschew obfuscation

GregL

Note: M_WDRandom gave me a GPF (illegal instruction) on a Pentium III, movlpd is SSE2. MichaelW alluded to this.

I know, I'm overdue for a new PC.


stanhebben

I wrote an SSE version of the random generators.

The macro M_WBRandomSSE fills two xmm registers with random numbers, most implementations may prefer to use M_WBRandomSSE2, which fills only xmm0 and saves the other half for later.


M_WBRandomSSE MACRO

mov ebx, psse1
mov ecx, psse2
movapd xmm0, [randbufsse][ebx]
movapd xmm1, [randbufsse][ebx+16]
movapd xmm2, xmm0
movapd xmm3, xmm1
pslld xmm0, R1SSE
pslld xmm1, R2SSE
psrld xmm2, 32-R1SSE
psrld xmm3, 32-R2SSE
por xmm0, xmm2
por xmm1, xmm3
paddd xmm0, [randbufsse][ecx]
paddd xmm1, [randbufsse][ecx+16]
movapd [randbufsse][ebx], xmm1
movapd [randbufsse][ebx+16], xmm0
sub ebx, 16
jnc @F
mov ebx, (KKSSE-1)*32
@@:
sub ecx, 16
jnc @F
mov ecx, (KKSSE-1)*32
@@:
mov psse1, ebx
mov psse2, ebx

ENDM


I also tried to use lookup tables to provide an extra speed gain, but this didn't have any positive effect.

There are - of course - also two macros that generate floating-point values. They generate 2 doubles or 4 singles and are very fast.

I benchmarked the new macros too: (Amd athlon 64)

11 cycles, WBRandom SSE
13 cycles, WBRandom SSE, lookup
5 cycles, WBRandom SSE, 128 bit
5 cycles, WBRandom SSE, 128 bit, lookup
8 cycles, WDRandom SSE, 2 doubles
8 cycles, WSRandom SSE, 4 singles
Press any key to exit...


Of course, I also wanted to check the randomness of the new macros, to be sure I didn't make any mistakes:

Entropy = 7.999987 bits per byte.

Optimum compression would reduce the size
of this 16777216 byte file by 0 percent.

Chi square distribution for 16777216 samples is 313.67, and randomly
would exceed this value 1.00 percent of the times.

Arithmetic mean value of data bytes is 127.4900 (127.5 = random).
Monte Carlo value for Pi is 3.141411100 (error 0.01 percent).
Serial correlation coefficient is 0.000556 (totally uncorrelated = 0.0).


Which looks good.

Soon I'll also pack all macros (both sse and general-purpose) into include files, then everybody can use them.

[attachment deleted by admin]

stanhebben

Here are all of the random generation functions, implemented as macros.

Feel free to use them in your own applications.

I also improved them a little, here are the last benchmarks: (Amd athlon 64)

-- general-purpose --

2 cycles (WBRandom)
0 cycles (WBRandom2)
20 cycles (WDRandom)
5 cycles (WSRandom)
6 cycles (WIRandomC)
6 cycles (WIRandom)
Press any key to exit...

WBRandom2 seems to run so fast in that you can't measure it anymore  :bg

-- sse --

12 cycles (WBRandom)
4 cycles (WBRandom2)
8 cycles (WDRandom)
8 cycles (WSRandom)
Press any key to exit...


I recommend using the SSE version if you need a lot of floating-point numbers. I didn't write an SSE macro for integers within a certain range, so use the general-purpose version instead.

Could somebody with a pentium 4 do the benchmark too?

[attachment deleted by admin]

MichaelW

From the ENT documentation for the Chi-square Test:
Quote
We interpret the percentage as the degree to which the sequence tested is suspected of being non-random. If the percentage is greater than 99% or less than 1%, the sequence is almost certainly not random. If the percentage is between 99% and 95% or between 1% and 5%, the sequence is suspect. Percentages between 90% and 95% and 5% and 10% indicate the sequence is "almost suspect".

I added the M_WDRandomSSE and M_WDRandomSSEL macros to the visual test, but I cannot test the result because my P3 cannot execute the movapd instructions. Hopefully I did everything correctly. I assumed that calling WRandomInitSSE more than once would not cause problems. If it does, then remove the calls from lines 247 and 265, and add a call ahead of the call to main on line 42.



[attachment deleted by admin]
eschew obfuscation

stanhebben

You forgot that the sse macros don't place their result in edx:eax, but in xmm0 and xmm1.

I corrected it, but can't rebuild the program (it compiles/links correctly, but doesn't open a window)

[edit] and initializing more than once must not be a problem at all [/edit]

[edit] the chi test resulting more than 256 makes me suspect something too... I have to check that [/edit]

MichaelW

This is the makeit.bat that I used to build it.

@echo off

if not exist rsrc.rc goto over1
\masm32\bin\rc /v rsrc.rc
\masm32\bin\cvtres /machine:ix86 rsrc.res
:over1

if exist "vrandrottest.obj" del "vrandrottest.obj"
if exist "vrandrottest.exe" del "vrandrottest.exe"

\masm32\bin\ml615\ml /c /coff "vrandrottest.asm"
if errorlevel 1 goto errasm

if not exist rsrc.obj goto nores

\masm32\bin\Link /SUBSYSTEM:WINDOWS "vrandrottest.obj" rsrc.res
if errorlevel 1 goto errlink

dir "vrandrottest.*"
goto TheEnd

:nores
\masm32\bin\Link /SUBSYSTEM:WINDOWS "vrandrottest.obj"
if errorlevel 1 goto errlink
dir "vrandrottest.*"
goto TheEnd

:errlink
echo _
echo Link error
goto TheEnd

:errasm
echo _
echo Assembly Error
goto TheEnd

:TheEnd

pause

eschew obfuscation