After the generation of the sequence of random numbers covered in the previous SUB,
now it's time to "see" what we have produced. It's always better to check if everything
is on place before moving to the difficult #6 STEP. :P
Working on #5 right now.
Enjoy
Here we are with the generating package and the displaying one.
In the attached file:
1) RandGen3.bas -- Source code
2) RandGen3.exe -- the executable - generate the numbers and create the file RandGen.dat - execute first
3) RandGen.scn -- Screen format to display the CPU cycles and the Milliseconds elapsed to generate the numbers
4) RandGenView.bas -- Source code
5) RandGenView.exe -- executable to "see" what we produced with first step
6) RandGenView.scn -- Screen format to display the groups of numbers with surfing keys
in RandGenView.exe if you press PgUp you go back 100 datarec, and PgDown takes you 100 datarec forward
Now we go back to SUB #4 for the optimization task.
If there is any inconsistence please let me know.
Cheers
Frank
i come up with 1736 clock cycles per 4-value group on my prescott
i am guessing we should be able to get that down to something like 100 clock cycles
dang - i ran it several more times and can't get under 2175 with it for some reason
Quote from: dedndave on April 25, 2010, 02:26:21 PM
i come up with 1736 clock cycles per 4-value group on my prescott
i am guessing we should be able to get that down to something like 100 clock cycles
dang - i ran it several more times and can't get under 2175 with it for some reason
If we'll get to 100 cycles per 4-value group it'll be great. I'll think about something myself
as the algos approach my mind ::)
that's only an "order of magnitude" estimation
it may be as high as 200
Quote from: dedndave on April 25, 2010, 04:27:43 PM
that's only an "order of magnitude" estimation
it may be as high as 200
While I was looking for some good idea, I tried to estrapolate just the RND function and used it
320,000 times to see what kind of performance it has, and it showed that it is very slow, as you
already knew.
The RND function alone takes about 600 CPU cycles per 4-value group on my core duo processor.
It is quite a lot. ::)
it probably generates some very "random" data, too (i.e., it's a good generator)
the one we use may not be as good
but, we only need to generate 320,000 random numbers
these generators repeat a pattern after so many pulls
that's not the only measure of a random number generator, though
for what we want, we can sacrifice some randomness for some speed
EDIT - that brings up another point
random numbers may not be the best approach for testing what you want to do
because, so much of the performance measurement is based on the generator
in the real-world application, these values probably come from a file or something
it would be more realistic to test file reading speed than random number generation :bg
you could remove the random number generation from your test completely
they might be pre-generated or even set to 0
then, you would be spending time optimizing the part of the code that will actually be used
Quote from: dedndave on April 25, 2010, 05:13:44 PM
it probably generates some very "random" data, too (i.e., it's a good generator)
the one we use may not be as good
but, we only need to generate 320,000 random numbers
these generators repeat a pattern after so many pulls
that's not the only measure of a random number generator, though
for what we want, we can sacrifice some randomness for some speed
EDIT - that brings up another point
random numbers may not be the best approach for testing what you want to do
because, so much of the performance measurement is based on the generator
in the real-world application, these values probably come from a file or something
it would be more realistic to test file reading speed than random number generation :bg
you could remove the random number generation from your test completely
they might be pre-generated or even set to 0
then, you would be spending time optimizing the part of the code that will actually be used
Agreed. I think we can move to STEP #6, where probably we can use some ASM trick to
apply in what I've in mind. :P
This one should be much faster, 18 cycles on my P3:
rand32 proc
mov eax, rand_seed
mov ecx, 16807 ; a = 7^5
mul ecx ; edx:eax == a*seed == D:A
mov ecx, 7fffffffh ; ecx = m
add edx, edx ; edx = 2*D
cmp eax, ecx ; eax = A
jna @F
sub eax, ecx ; if A>m, A = A - m
@@:
add eax, edx ; eax = A + 2*D
jns @F
sub eax, ecx ; If (A + 2*D)>m
@@:
mov rand_seed, eax ; save new seed
ret
rand32 endp
It is the Rand32 code posted by Abel here (http://www.masm32.com/board/index.php?topic=6558.0), without the scaling code.
Quote from: MichaelW on April 25, 2010, 11:41:00 PM
This one should be much faster, 18 cycles on my P3:
rand32 proc
mov eax, rand_seed
mov ecx, 16807 ; a = 7^5
mul ecx ; edx:eax == a*seed == D:A
mov ecx, 7fffffffh ; ecx = m
add edx, edx ; edx = 2*D
cmp eax, ecx ; eax = A
jna @F
sub eax, ecx ; if A>m, A = A - m
@@:
add eax, edx ; eax = A + 2*D
jns @F
sub eax, ecx ; If (A + 2*D)>m
@@:
mov rand_seed, eax ; save new seed
ret
rand32 endp
It is the Rand32 code posted by Abel here (http://www.masm32.com/board/index.php?topic=6558.0), without the scaling code.
Nice one, I'll have a look at that. I'm quite slow in the process of learning, so
it could take a while.
Only one question: it takes 18 cycles for each generated number?
It is pretty much faster than RND function in PB :U
QuoteOnly one question: it takes 18 cycles for each generated number?
Yes, and scaling the value with DIV would more than double the cycles. I eliminated the scaling because I was experimenting with extracting 4 numbers from each 32-bit value, with each of the numbers scaled to a limited range (1 to 50). The problem is with doing this efficiently and in a way that will result in a uniform distribution of the values. I was hoping to somehow combine the extraction operation with the scaling operation, but I'm nowhere near anything workable.
Quote from: MichaelW on April 26, 2010, 01:40:56 AM
QuoteOnly one question: it takes 18 cycles for each generated number?
Yes, and scaling the value with DIV would more than double the cycles. I eliminated the scaling because I was experimenting with extracting 4 numbers from each 32-bit value, with each of the numbers scaled to a limited range (1 to 50). The problem is with doing this efficiently and in a way that will result in a uniform distribution of the values. I was hoping to somehow combine the extraction operation with the scaling operation, but I'm nowhere near anything workable.
The idea is interesting. Let me know if you find something doable and fast enough. :U