The MASM Forum Archive 2004 to 2012

Project Support Forums => PowerBASIC Low Level Code => Topic started by: frktons on April 25, 2010, 07:10:21 PM

Title: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: frktons on April 25, 2010, 07:10:21 PM
Probably in this part of the FUN ASM CODING, I can learn some trick and
some mnemonics as well.
The idea I have in my mind is about searching inside the combinations we generated
with many kind of statistical patterns.

Bacause all the job can be done using memory, registers and stack, and simple arithmetic operations,
and we are not I/O bound, or anything else, all the full speed and potential of ASM CODING
can be used in this SUB. At least this is what I hope.

Let's start the easy way:

- we have a file of 16 groups of 4-value integer numbers, for 5,000 data records.
- we can read all of them inside the memory, into an array structure or the like
- we start searching, among all the combinations of 4 numbers, which one is the "lucky one",
   so to say, adding 1 to a counter each time one of the four digit we are searching is found
   inside one of the 16 groups of 4-value numbers
- at the end "The winner is..." n1,n2,n3,n4 totaling xxx,xxx count inside the 80,000 groups
   of 4-digit combinations.

I'm going to realize a PB version, in this way I can learn some PB as well.
Afterwhile I think we can find many points to optimize with ASM.
We only have integer values, integer registers, integer operations. It should be a good
occasion to speed up things with ASM.

Let's see what we get.

Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: dedndave on April 25, 2010, 08:02:59 PM
well - "search" probably isn't the right word

let me see if i understand you...
you want to see which of the values from 1 to 50 occurs most often ?
        .data

Totals  dd 50 dup(0)

        .code

Total0: push    esi
        mov     ecx,80000
        push    edi
        mov     esi,offset FileArray

;tally the totals

Total1: mov     edi,[esi]
        or      edi,edi              ;safety test   - may be omitted
        jz      Total2               ;safety branch - may be omitted

        cmp     edi,50               ;safety test   - may be omitted
        ja      Total2               ;safety branch - may be omitted

        shl     edi,2
        inc dword ptr Totals[edi-4]

Total2: add     esi,4
        dec     ecx
        jnz     Total1

;find the largest value in Totals array
;in this case, we return the lower index if more than one are max total

        mov     esi,offset Totals
        mov     ecx,50

Total3: mov     eax,[esi]
        mov     edx,esi
        jmp short Total5

Total4: cmp     eax,[esi]
        jb      Total3

Total5: add     esi,4
        dec     ecx
        jnz     Total4

        sub     edx,offset Totals-4
        pop     edi
        shr     edx,2
        pop     esi

;EAX = frequency
;EDX = number


EDIT - corrected a mistake   :P
oops - another one
Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: frktons on April 25, 2010, 10:16:09 PM
Great!

This could be one of the millions of possible statistic. I'll save and study it
for exercising some ASM concept.  :P

What I had in mind, for the first "counting", is a little more
complicated:

- We have to generate all the possible combinations of 4 digit ranging 1-50
like I did in the small routine I posted earlier: perm4.zip, that means:
230,300 combinations.
- For each of them we have to check if one/more of the four digit is inside the
80,000 groups of 4 digit created previosly with ramdom generating routine.
- This means that we can find more than one digit at a time and have to count
them all.
- When we finish with the first group [1 2 3 4], we save the result and the
combination somewhere, like numeric variables.  :lol
- We start with the next combination [1 2 3 5] and do the same.
- We confront the results of the first and the second, and save the bigger counter
and the relative 4 digit group, deleting the other
- we go on until the last combination [47 48 49 50]
- when the last counting and compare is done we remain with the biggest
counter found and the relative four digit combination
- we display the result:
"The combination: " n1 n2 n3 n4 " has totalised " count_hits " the biggest found so far"

This will take a little bit more to compute and we have to find ways of optimizing it
to have the job done in less than 1 minute.

It is somewhat more challenging.  :dazzled:
I think in optimized PB I can have it done in about 4 minutes, but I've to check
before giving the correct amount of elapsed time and CPU cycles.

It's past midnight over here, and I don't know if my mind is ready. I try anyway
right now. If I fall asleep, I'll continue in few hours.  :8)


Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: dedndave on April 25, 2010, 10:27:28 PM
well - that is a LOT of comparing
i seriously doubt you can do it exactly that way in less than a minute on any computer
one way to speed things up would be......
only test combinations that are present
for example - it is unlikely that 1,2,3,4 will exist in the randomly generated groups
no need to test for it, then
better to pick up a group that you do have, and totalize matching groups
mark them - you have plenty of extra bits to use as markers
then, on the next pass, the ones that are marked may be skipped

you will want a little sorter so, when comparing groups, they are in sorted order

when you pick up the first "master" group, sort it
then, look for a match against the lowest value only - not all 4
if a group does not have that value in it - no need wasting time looking at the other 3 values
Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: frktons on April 25, 2010, 10:54:57 PM
Quote from: dedndave on April 25, 2010, 10:27:28 PM
well - that is a LOT of comparing
i seriously doubt you can do it exactly that way in less than a minute on any computer
one way to speed things up would be......
only test combinations that are present
for example - it is unlikely that 1,2,3,4 will exist in the randomly generated groups
no need to test for it, then
better to pick up a group that you do have, and totalize matching groups
mark them - you have plenty of extra bits to use as markers
then, on the next pass, the ones that are marked may be skipped

you will want a little sorter so, when comparing groups, they are in sorted order

Well, if the group [1 2 3 4] is not among the randomly generated groups, it doesn't
matter, because we have to test for the single digits, so if we test against [43 12 2 38]
we find the digit "2" and we count  +1, if we test against [23 4 1 18] we find "4" and "1"
and we count +2, and so on.
Sorry for not being very clear, I'm trying to express the idea in the best way I can.

That is a lot of comparing, you're right, but the algo I have in mind doesn't really need
a lot of them. I'm thinking about some tricks to get the job done in a faster way.

Give me some time to CODE and test, and I'll show you what I designed.

One of the trick could be:

Instead of comparing each single digit of [1 2 3 4] with all the four single digits of [43 12 2 38]
that is 16 compare, we do:

1) an array of 50 integer initialized to 0 value outside of the cycle with index ranging 1 to 50
2) we use the 4 digits of the random group [43 12 2 38] like index to switch to 1 the relative
    array elements, "ADD 1" I mean, or better "MOV 1" to the array element.
3) we use the 4 digits of the testing group [1 2 3 4] like index and ADD the four array elements
    this way, if "1" is not in the [43 12 2 38] group we ADD 0 to the counter, because the array element
    has 0 value, and if "2" is switched to "1" we ADD 1 to the counter.
4) in total we do 4 MOV, 4 ADD, 4 MOV to reset to "0" the four array index [43 12 2 38]
5) I think this should be faster than 16 COMPARE and some ADDs to the counter.

With tricks like this in ASM we can do a very fast routine, I think.  ::)


Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: dedndave on April 25, 2010, 11:01:25 PM
i am not sure what you mean - lol
when you write the code, i will see if i can read it and "gleen" an answer like Carnac the Magnificent

(http://blogs.dallasobserver.com/sportatorium/Carnac.jpg)
Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: frktons on April 25, 2010, 11:14:48 PM
Trying to explain in PB-like CODE:

file randomly generated:  I group [12- 5- 34- 1]

First group to test: [1 - 2 - 3 - 4]

the randomly generated numbers switch the relative array elements to 1:

num(12) = 1
num(5) = 1
num(34) = 1
num(1) = 1

we count the hits found:

count = count + num(1) + num(2) + num(3) + num(4) ' in this case num(1) has got "1" the others have "0"

we reset the switched array elements for the next group to count:

num(12) = 0
num(5) = 0
num(34) = 0
num(1) = 0

so we did no compare at all, only moved values between integers and added integers [very fast operations]

Is it a little bit clearer?  :8)



Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: clive on April 25, 2010, 11:23:59 PM
Yes, basically a score board. And cheaper to clear the four known locations than to clear the whole array. Personally I'd thought of using a bit vector to the same effect.

Any reason for 50 (lottery), verses 64? Generating a power of two would be cheaper than MOD 50.

-Clive
Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: dedndave on April 25, 2010, 11:25:20 PM
no   :P

i am not sure what you expect "count" to be at the end
i think it will be 80000

maybe - instead of explaining the algorithm, explain the real-world use for it
i am an engineer, but i understand business, too   :bg

or - just write it in PB and i'll see if i get it, then   :P
Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: frktons on April 25, 2010, 11:43:32 PM
Quote from: clive on April 25, 2010, 11:23:59 PM
Yes, basically a score board. And cheaper to clear the four known locations than to clear the whole array. Personally I'd thought of using a bit vector to the same effect.

Any reason for 50 (lottery), verses 64? Generating a power of two would be cheaper than MOD 50.

-Clive

Do you think a bit vector would be faster than integers? I don't know very much about ASM but
anything I read so far tells me integer operations are faster. Of course it could be the reverse.
50 or 64 or 100 doesn't really matter, I just used a manageable amount of combinations.
100 numbers in groups of 6-10 would be almost impossible to check in a actual machine.
In order to learn and to test the SUBs, the amounts have to be doable in a reasonable time,
and you cannot always choose the fastest way in real life.  ::)

Quote from: dedndave on April 25, 2010, 11:25:20 PM
no   :P

i am not sure what you expect "count" to be at the end
i think it will be 80000

maybe - instead of explaining the algorithm, explain the real-world use for it
i am an engineer, but i understand business, too   :bg

or - just write it in PB and i'll see if i get it, then   :P

Probably it will be near 80,000 more or less, if the RND function in PB is a good one.
The point is: it doesn't really matter, otherwise it would be "ASM for JOB", or the like.
Mine if "ASM for FUN", so if I can't use it in real life I'll enjoy more.

It is very satisfying to reach a goal, and mine is to learn ASM playing with it.
It was somewhat boring to CODE for BUSINESS in PLI, I could not choose
very much.

Data structure was already there, the flow of procedure was already established,
so I had just a small portion of the whole to make working.
When I really enjoyed was when I could "invent" some algo or find some trick
to have a better performance.

Now I think I found one of those trick, not that I am the first, maybe it is a well
established concept, but for me it is a new one.

As done before for all the SUBs, I'll CODE it in PB [PB for FUN  :lol ] and afterwhile
with the help of the more experienced ones I'll try to optimize it.

Optimizing is really a good exercise for the mind, isn't it?  :lol


Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: clive on April 26, 2010, 12:47:06 AM
A bit vector would be faster than integers especially if you could keep it in a couple of registers, your [1, 2, 3, 4] example could be confirmed with a cmp eax,0Fh. Not that I think that [1,2,3,4] or [4,3,2,1], etc is an unreasonable sequence.

Using 64 combinations, I'd be tempted to use a Maximal LFSR to generate random sequences. I'd have to do some tests on the number distributions, but they'd be highly efficient.

-Clive
Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: frktons on April 26, 2010, 01:21:44 AM
Quote from: clive on April 26, 2010, 12:47:06 AM
A bit vector would be faster than integers especially if you could keep it in a couple of registers, your [1, 2, 3, 4] example could be confirmed with a cmp eax,0Fh. Not that I think that [1,2,3,4] or [4,3,2,1], etc is an unreasonable sequence.

-Clive

Let me see if I've got your point. If I can use EAX and EBX I have 64 bits available, how do I
test the sequence [1, 2, 3, 4] against for example [23, 11, 7, 3] ?
How do I count the results?
Do I start with all the bits "0" and after I switch the bits [23, 11, 7, 3] to "1" before the cmp?
Or what? Please I'm a beginner, I don't actually know very much about ASM.
Bear some patience with me, try not to skip any step. It could be obvious for you but I easily get lost
at the moment.  ::)
Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: clive on April 26, 2010, 03:51:45 AM
Here are some thoughts on how to do this in assembler. Hopefully the fragments will spark some ideas/questions.

I'm going to pack your numbers into one 32-bit word, as 1..50 easily fits in 8-bits. You could read it out of DWORDs in memory if your wanted instead of rotating the packed version.


    xor  eax,eax ; clear bit vector
    xor  ebx,ebx

    mov  ecx,004030201h ; [1, 2, 3, 4] packed in a DWORD

    ; rotate through packed format converting to bit vector ebx:eax

    call setbit ; bit convert entry in CL
    rol  ecx,8
    call setbit
    rol  ecx,8
    call setbit
    rol  ecx,8
    call setbit
    rol  ecx,8



setbit PROC

    push  ecx ; save the original
    movzx ecx,cl ; convert low order byte into dword

    dec ecx     ; convert 1..50 to 0..49 range
    test ecx,20h ; test bit 5 (ie >= 32)
    jnz highdword ; bit 5 is high

        ; 1-32 (0..31)

    bts eax,ecx ; set bit ecx (0..31), in eax
    jmp done

highdword:  ; 33-64 (32..63)

    and ecx,1Fh ; mask lower 5 bits, eqv -32
    bts ebx,ecx ; set bit ecx (0..31), in ebx

done:

    pop  ecx
    ret

bitset ENDP


For [1, 2, 3, 4]

eax=00000000 ebx=00000000 Cleared

cl=01
eax=00000001 ebx=00000000

cl=02
eax=00000003 ebx=00000000

cl=03
eax=00000007 ebx=00000000

cl=04
eax=0000000F ebx=00000000


For [23, 11, 7, 3]

eax=00000000 ebx=00000000 Cleared

cl=17
eax=00400000 ebx=00000000

cl=0B
eax=00400400 ebx=00000000

cl=07
eax=00400440 ebx=00000000

cl=03
eax=00400444 ebx=00000000
Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: clive on April 26, 2010, 04:01:43 AM
Ok, so if you think [1, 2, 3, 4] is equivalent to [4, 3, 2, 1], [3, 2, 1, 4], etc

You convert all your number sets to bit vectors, then sort the bit vectors, and finally you could enumerate through the list of sorted bit vectors counting the ones which are the same.

The point of sorting would  be to minimize the amount of searching required to find matching bit vector patterns. Sorting would naturally group them together making the counting phase simpler.
Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: frktons on April 26, 2010, 07:58:00 AM
Quote from: clive on April 26, 2010, 04:01:43 AM
Ok, so if you think [1, 2, 3, 4] is equivalent to [4, 3, 2, 1], [3, 2, 1, 4], etc

You convert all your number sets to bit vectors, then sort the bit vectors, and finally you could enumerate through the list of sorted bit vectors counting the ones which are the same.

The point of sorting would  be to minimize the amount of searching required to find matching bit vector patterns. Sorting would naturally group them together making the counting phase simpler.

I can imagine this can be done, but what about speed? It looks like it'll take longer doing all these bit operations than
doing the simple MOVs and ADDs I had in mind.
All depends of course on how fast these bit operations are.
As soon as I'll have the PB code and test it, we can see the difference.

Thanks anyway for your help
Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: frktons on April 26, 2010, 03:28:42 PM
Here we have the first counting excercise  :lol

In the attached zip file:
- CombCheck4.bas
- CombCheck4.exe
- CombCheck4.scn

to run the program you have to use the program RandGen3.exe/scn attached in SUB #5
to generate the file with random numbers and after the CombCheck4 to count/display
the repetitions of the single numbers.

The RND function of PB has done a pretty good job as you can see.

This is a simplified counting, and the performance misures all the steps togheter
not the single counting routine.

All the job is done with a simple loop:

SUB CountNumbers
     
    FOR x = 1 TO 320000
        INCR numbers&(n&(x))
    NEXT x
   
END SUB


It is just a short sample of what we can count on the file.
The next SUBs are going to do more complex counting, like the one I described earlier.


Enjoy
Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: clive on April 26, 2010, 05:11:35 PM
Ok, so creating 80000, 4 groups of 4, followed by identifying [1, 2, 3, 4] in any order, as well as any other 4 group pairings.

3 GHz P4 Prescott, takes 220 ms in C, without much effort to optimize.

Here we identified FOUR occurrences of [1,2,3,4],[1,2,4,3],[4,3,2,1],etc from the 320000 groups generated

Buffer    5120000 bytes

Generating..

        16 Ticks,   76402283 Cycles

Saving.. 'rndtest.out'


Evaluating Distribution..

   25600 Expected Average per bin

1 :      25616     16
2 :      25528    -72
3 :      25565    -35
4 :      25677     77
5 :      25915    315
6 :      25689     89
7 :      25880    280
8 :      25484   -116
9 :      25690     90
10 :      25340   -260
11 :      25548    -52
12 :      25686     86
13 :      25382   -218
14 :      25813    213
15 :      25240   -360
16 :      25729    129
17 :      25560    -40
18 :      25385   -215
19 :      25507    -93
20 :      25676     76
21 :      25339   -261
22 :      25727    127
23 :      25488   -112
24 :      25821    221
25 :      25463   -137
26 :      25583    -17
27 :      25844    244
28 :      25361   -239
29 :      25504    -96
30 :      25660     60
31 :      25542    -58
32 :      25476   -124
33 :      25628     28
34 :      25629     29
35 :      25512    -88
36 :      25493   -107
37 :      25708    108
38 :      25533    -67
39 :      25874    274
40 :      25475   -125
41 :      25723    123
42 :      25612     12
43 :      25477   -123
44 :      25528    -72
45 :      25925    325
46 :      25596     -4
47 :      25648     48
48 :      25706    106
49 :      25543    -57
50 :      25672     72

Converting to bit vectors..

        16 Ticks,   71265015 Cycles

Sorting bit vectors..

       187 Ticks,  571231005 Cycles

  4 000000000000000F [1,2,3,4]
  2 0000000000000017 [1,2,3,5]
  3 000000000000001D [1,3,4,5]
  4 000000000000002B [1,2,4,6]
  2 000000000000002D [1,3,4,6]
  2 000000000000002E [2,3,4,6]
... ad tedium
  2 0003804000000000 [39,48,49,50]
  3 0003840000000000 [43,48,49,50]
  2 0003880000000000 [44,48,49,50]
  2 0003900000000000 [45,48,49,50]


The most frequent in this run

  9 0000001000020820 [6,12,18,37]
  9 0000001100480000 [20,23,33,37]
  9 0000044000088000 [16,20,39,43]
  9 0000100420000020 [6,30,35,45]
  9 0000200001001100 [9,13,25,46]
Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: frktons on April 26, 2010, 07:54:31 PM
For the generation of random numbers did you use a C ANSI function?

the performance you reported:

Generating..

        16 Ticks,   76,402,283 Cycles


is much the same of the PB function RND, that in my core duo 2.6 Ghz takes:

        16 ticks,     60,298,705 Cycles


For identifing the sequences of 4 numbers I should have a look at the C CODE to
understand what you did, hoping I can grasp it's meaning.  :lol

As I get time enough to code the 4 digit search inside the random numbers file
I'll post it and test the performance of the routine to have a performance match.

Regarding this part of the report:


Sorting bit vectors..

       187 Ticks,  571231005 Cycles


That's what I suspected, a lot of time spent doing something that the
algo I've in mind skips altogheter, no need to sort anything.

If I don't fall asleep, in few hours I'll post the new version for the 4 digit searching
and counting.

Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: clive on April 26, 2010, 08:25:34 PM
I used a 32-bit MLS (maximal length sequence) pseudo random binary sequence pulling out 6-bits at a time. Rolling the dice again if the number is outside 0..49, or if I hit a previously selected number. It's derived from some efficient ARM assembler (5 cycles), and probably doesn't translate well in C, or x86. The distribution looked better than rand().

The vectorization takes your groups of 4, and scoreboards the values into a 64-bit representation. Once I've filled this table, I then QuickSort it.

ELEMENTS = (80000 * 4)

  printf("Converting to bit vectors..\n\n");

  TickStart = GetTickCount();
  TSCStart = ReadTSC();

  p = Buffer; // DWORD *, where group of 4 elements [ p[0],p[1],p[2],p[3] ]
  q = BitVector; // __int64 *

  for(i=0; i<ELEMENTS; i++)
  {
    *q = ((__int64)1 << (p[0]-1)) | ((__int64)1 << (p[1]-1)) | ((__int64)1 << (p[2]-1)) | ((__int64)1 << (p[3]-1));

    p += 4; // Advance by 4 entries
    q++; // Advance by single 64-bit entry
  }

  TSCEnd = ReadTSC();
  TickEnd = GetTickCount();

  printf("%10d Ticks, %10d Cycles\n",TickEnd - TickStart, TSCEnd - TSCStart);
Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: frktons on April 26, 2010, 09:27:49 PM
Quote from: clive on April 26, 2010, 05:11:35 PM
Ok, so creating 80000, 4 groups of 4, followed by identifying [1, 2, 3, 4] in any order, as well as any other 4 group pairings.


  4 000000000000000F [1,2,3,4]
  2 0000000000000017 [1,2,3,5]
  3 000000000000001D [1,3,4,5]
  4 000000000000002B [1,2,4,6]
  2 000000000000002D [1,3,4,6]
  2 000000000000002E [2,3,4,6]
... ad tedium
  2 0003804000000000 [39,48,49,50]
  3 0003840000000000 [43,48,49,50]
  2 0003880000000000 [44,48,49,50]
  2 0003900000000000 [45,48,49,50]


The most frequent in this run

  9 0000001000020820 [6,12,18,37]
  9 0000001100480000 [20,23,33,37]
  9 0000044000088000 [16,20,39,43]
  9 0000100420000020 [6,30,35,45]
  9 0000200001001100 [9,13,25,46]


I can't really understand these results.
Let's see what they mean:


The most frequent in this run

9 0000001000020820 [6,12,18,37]

9 = how many times you found it?
0000001000020820 = the vectorization of a 4 digit group?
[6,12,18,37] = the corresponding 4 digit group?

Please help me to understand the output of your routine.  ::)


Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: clive on April 26, 2010, 09:38:32 PM
Yes, the bit patterns at the end were each identified 9 times in the 320000 entries. I ignored entries found only once.

The bit patterns are 64-bit HEX, where the bits represent 2**(N-1). So bit 0 is 1, bit 49 is 50

N=1, 0001
N=2, 0002
N=3, 0004
N=4, 0008

[1, 2, 3, 4] = 1 or 2 or 4 or 8 = 15 or 0x0F
Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: frktons on April 26, 2010, 10:02:39 PM
Quote from: clive on April 26, 2010, 09:38:32 PM
Yes, the bit patterns at the end were each identified 9 times in the 320000 entries. I ignored entries found only once.

The bit patterns are 64-bit HEX, where the bits represent 2**(N-1). So bit 0 is 1, bit 49 is 50

N=1, 0001
N=2, 0002
N=3, 0004
N=4, 0008

[1, 2, 3, 4] = 1 or 2 or 4 or 8 = 15 or 0x0F

This puzzles me even more.

We should have a total of 230,300 possible combinations of 4 numbers,
with no repetition and ignoring the order inside the single combinations,
[1,3,4,1] = not allowed because of repetition

[1,2,3,4] = any combination of the 4 digit = we consider 1 combination total

if we have a total of 230,300, and we find some of them 9 times, it means we should
find others 0 times to balance the total.
Did you find many combinations with 0 score?

We are matching 230,300 combinations [all the possible ones] against
80.000 [the random generated ones], or at least this is what I have imagined.
Are you using this pattern?
Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: dedndave on April 26, 2010, 10:36:29 PM
if there are 230,300 combinations and 80,000 groups, most of them should have a 0 count no matter what   :red
it is somewhat unlikely that a group will occur twice and more unlikely it will occur three times if you have a decent RNG
Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: clive on April 27, 2010, 01:32:51 AM
There are 80000 x 4 entries (320000) each with a group of 4 numbers

With potentially 230300 combinations, it looks to cover about 172000 of them. I'd have to say the routine I was using, and rand() both has "sequence" patterns.

I haven't counted the number of 0 counts, because it doesn't generated any.
Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: dedndave on April 27, 2010, 01:43:32 AM
i thought there were only supposed to be 80,000 four-value groups
no wonder you have the bigger numbers, Clive    :red
it should be easy to figure out how many 0-groups there are
take the number of possible groups minus the number of different groups found
Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: clive on April 27, 2010, 01:57:25 AM
With 80000, I'm getting around ~67650 unique combinations.
Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: frktons on April 27, 2010, 08:13:42 AM
Quote from: clive on April 27, 2010, 01:57:25 AM
With 80000, I'm getting around ~67650 unique combinations.

That's quite probable.

What about the performance?
My PB algo is taking something around 10-11 minutes, based on early tests,
to match 230,300 combinations against the 80,000 random ones.

What performance did you get? How long does it take to run the whole loop?



             groups occurring 0 times: 162,887
             groups occurring 1 times: 56,191
             groups occurring 2 times: 9,970
             groups occurring 3 times: 1,143
             groups occurring 4 times: 105
             groups occurring 5 times: 4
             groups occurring 6 times: 0
             groups occurring 7 times: 0
             groups occurring 8 times: 0
             groups occurring 9 times: 0
             groups occurring 10 times: 0

             Total groups scanned: 230,300
                     Elapsed MS: 646,936
                     Elapsed CPU Cycles: 1,548,762,482,493



The core routine for matching the combinations:



'---------------------------------------------------------------------------------------
' Counts the occurrences of the 4 digit groups that match all the digits.
' Test all the groups of 4 digits starting from [1,2,3,4] to [47,48,49,50]
' against the 80,000 randomly generated ones.
'---------------------------------------------------------------------------------------

SUB CountNumbers

    groups = 0
    n_idx = 1
    n_target = 4

    FOR x1 = 1 TO 47
        FOR x2 = x1 + 1 TO 48
            FOR x3 = x2 + 1 TO 49
                FOR x4 = x3 + 1 TO 50
                    GOSUB Check
                NEXT x4
            NEXT x3
        NEXT x2
    NEXT x1

    GOTO exit_count

Check:


    FOR x = 1 TO 80000


        numbers&(n&(n_idx)) = 1
        INCR n_idx
        numbers&(n&(n_idx)) = 1
        INCR n_idx
        numbers&(n&(n_idx)) = 1
        INCR n_idx
        numbers&(n&(n_idx)) = 1


        total_group = numbers&(x1) + numbers&(x2) + numbers&(x3) + numbers&(x4)
        IF total_group = n_target THEN
           INCR match
        END IF

        total_group = 0

        numbers&(n&(n_idx)) = 0
        numbers&(n&(n_idx -1)) = 0
        numbers&(n&(n_idx -2)) = 0
        numbers&(n&(n_idx -3)) = 0

        INCR n_idx

    NEXT x

    INCR occurr&(match)

    match = 0
    n_idx = 1


RETURN

exit_count:

END SUB


I didn't do anything to optimize, yet. This is the raw version.  :P
Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: frktons on April 27, 2010, 11:14:27 AM
With the correct use of a couple of registers it runs in less than 8 minutes now  :8)


    #REGISTER NONE
    REGISTER n_idx AS LONG
    REGISTER total_group AS LONG   
Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: dedndave on April 27, 2010, 11:20:38 AM
 :8)  <--- not so fast, Frank
8 minutes is too long, i should think
that is because you are testing all 230,300 combinations

sorting the groups was mentioned earlier (sorting the 4 values in each group - not the groups)
i think i would sort them before they are stored - as they are generated, in fact
that will speed up anything you want to do with them afterward

if they are ordered groups, you could also convert them to ordinals, as well
it depends a lot on what you want to do with the values when you are done
but for each group, you could store the index (dword), the ordinal (dword), and the 4 values (bytes)
now you can even sort the groups - you know what order they were chose in by the index value
tallying the totals like you are would be very fast
Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: frktons on April 27, 2010, 01:13:44 PM
Quote from: dedndave on April 27, 2010, 11:20:38 AM
:8)  <--- not so fast, Frank
8 minutes is too long, i should think
that is because you are testing all 230,300 combinations


You are right, of course. I was enjoying the fact that without
changing the algo, only defining 2 variables in a different way, the
algo got a 30% optimization.  :P <== probably this is more appropriate  :lol

Quote from: dedndave on April 27, 2010, 11:20:38 AM

sorting the groups was mentioned earlier (sorting the 4 values in each group - not the groups)
i think i would sort them before they are stored - as they are generated, in fact
that will speed up anything you want to do with them afterward

if they are ordered groups, you could also convert them to ordinals, as well
it depends a lot on what you want to do with the values when you are done
but for each group, you could store the index (dword), the ordinal (dword), and the 4 values (bytes)
now you can even sort the groups - you know what order they were chose in by the index value
tallying the totals like you are would be very fast

You are right again, in this particular situation, given a file with numbers
that are not changing, an index would speed up the algo a lot.

I think because it could help skipping the combinations [most of them]
that are out of the index range we are searching.

Anyway I've to think about it. I haven't decided yet what kind of statistics
estrapolate from the randomly generated numbers, but it should be
something like: any.  :dazzled:

For example in this case we have searched for 4 digit match, it could be
for 1 digit match or 2 or 3 as well, or it could be: when a specific number appears, what are
the numbers that are more likely to appear in the next 10-20 groups, if any,
or it could be how long [or many groups] between a number appears and the next
occurrence, any combination of the 1 digit - 2 digit - 3 digit inside the groups and
the above, and so on...

Like in a lottery they say: this number doesn't show since 12 months, or in
a web shop: this client doesn't place an order since feb 2009, or the like
in any flavour.
In which season of the year this particular customer places a lot of orders,
or in numbers: when this particular number shows more frequently...

Probably for some task a sorted group [1,2,3,4] is better than [2,1,4,3] and
for some others it is not, or it doesn't affect the performance. I'm not sure. :red

From your comments and suggestions I have to think that it is always better
to have some kind of index.
I don't have an engineer background, so I'm just making suppositions.
Recalling my own past experience, mostly of the times it is better to have
an index than not.

I have to thank you and all the guys in the forum for the good suggestions.
I need some time to digest, I am quite slow at the moment.

Because it is just an exercise, with no particular objective to reach, anything
that shows up is just an occasion to learn something new.

So bear some patience, and let's have some random fun  :P

P.S. When I get time enough I'll try using an index with this particular task
and see how does it work. No algo in my mind at the moment.  :lol


Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: clive on April 27, 2010, 05:37:38 PM
I added some code to provide stats and bin the groups. Even with printing and saving we are at about 80ms.
-Clive

Buffer    1280000 bytes

Generating..

        16 Ticks,   18381900 Cycles

Saving.. 'rndtest.out'

Converting to bit vectors..

         0 Ticks,   17054647 Cycles

Sorting bit vectors..

        47 Ticks,  136977937 Cycles

Scanning..

         0 Ticks,    2399887 Cycles

   67515 Unique
   80000 Elements

Groups occuring 1 times :    56370
Groups occuring 2 times :     9949
Groups occuring 3 times :     1060
Groups occuring 4 times :      128
Groups occuring 5 times :        8
                             67515

Entire Application Totals

        79 Ticks,  229950982 Cycles
Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: frktons on April 27, 2010, 08:32:11 PM
Quote from: clive on April 27, 2010, 05:37:38 PM
I added some code to provide stats and bin the groups. Even with printing and saving we are at about 80ms.
-Clive

Buffer    1280000 bytes

Generating..

        16 Ticks,   18381900 Cycles

Saving.. 'rndtest.out'

Converting to bit vectors..

         0 Ticks,   17054647 Cycles

Sorting bit vectors..

        47 Ticks,  136977937 Cycles

Scanning..

         0 Ticks,    2399887 Cycles

   67515 Unique
   80000 Elements

Groups occuring 1 times :    56370
Groups occuring 2 times :     9949
Groups occuring 3 times :     1060
Groups occuring 4 times :      128
Groups occuring 5 times :        8
                             67515

Entire Application Totals

        79 Ticks,  229950982 Cycles


It looks like this is a far better algorithm  :U

Did you write it in C or ASM?

I imagine that all the job is:

1) generating 80,000 random numbers and storing them into memory
2) writing them to a file
3) vectorizing the whole
4) sorting it in groups of four integer
5) counting the occurrences of matching pair of groups
    without generating all the possibile 230,300 combinations
    and without having to match 230,300 against 80,000
6) displaying the results

Well, skipping more than half unnecessary work is in itself
a good optimization  :lol
And you have also used some methods I am not aware of.  :red

Would you mind posting the complete source code?
With some study and effort I could understand pivotal methods of
doing things that I have never used before.

Thanks for your time


Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: clive on April 27, 2010, 09:09:18 PM
Let me clean it up a little, it is some what devoid of comments, it was a quick prototype and was done in C. There are probably bits which could be done more efficiently in ASM. The comparison used by qsort() is one that's called a lot. And the whole QuickSort could probably be customized/optimized for 64-bit integers.

The real trick is the sort, because it gets everything in order and you don't need to search for matches. You just check the immediate neighbour(s) in a single linear pass over the 80000 groups.

The bit vectoring flattens your 24 (4!) combinations of [1,2,3,4] in to a single comparison.

The whole taking a few minutes thing is what initially bugged me. But I think that is because you were dealing with 5.5M iterations.

-Clive
Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: frktons on April 28, 2010, 12:06:52 AM
Quote from: clive on April 27, 2010, 09:09:18 PM
Let me clean it up a little, it is some what devoid of comments, it was a quick prototype and was done in C. There are probably bits which could be done more efficiently in ASM. The comparison used by qsort() is one that's called a lot. And the whole QuickSort could probably be customized/optimized for 64-bit integers.

The real trick is the sort, because it gets everything in order and you don't need to search for matches. You just check the immediate neighbour(s) in a single linear pass over the 80000 groups.

The bit vectoring flattens your 24 (4!) combinations of [1,2,3,4] in to a single comparison.

The whole taking a few minutes thing is what initially bugged me. But I think that is because you were dealing with 5.5M iterations.

-Clive

The original plan was about searching a single digit match inside the 4 digit group,
dealing with all the possible combinations of 4 digits for statistical reasons,
but at this point I am more interested in understanding the algo you have used, so
I'd like to deal first with vectorization that is quite new for me, and return to the
previous plan later on.  :U

 
Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: clive on April 29, 2010, 04:06:02 PM
App with source.
-Clive
Title: Re: ASM for FUN - #6 SUB [Searching inside random numbers]
Post by: frktons on April 29, 2010, 11:26:42 PM
Quote from: clive on April 29, 2010, 04:06:02 PM
App with source.
-Clive

Thanks Clive, very kind of you.

Being back from week-end, in few days I'll start studying it
and probably bothering you with some questions as well.  :P