The MASM Forum Archive 2004 to 2012

General Forums => The Workshop => Topic started by: RuiLoureiro on August 09, 2010, 04:25:24 PM

Title: Ordering an array
Post by: RuiLoureiro on August 09, 2010, 04:25:24 PM
Anyone knows a procedure
to order an array of 1 200 000 positive integers (dwords) ?
It should do it in some milliseconds
Thanks
Title: Re: Ordering an array
Post by: dedndave on August 09, 2010, 05:01:00 PM
i have seen a few examples Rui
try searching the forum for "quick sort", "quicksort", "bubble sort"
Title: Re: Ordering an array
Post by: cork on August 09, 2010, 05:59:36 PM
Try a quicksort, merge sort or radix sort.

Don't bother with a bubble sort on such a large array.
Title: Re: Ordering an array
Post by: RuiLoureiro on August 09, 2010, 06:19:55 PM
Hi dedndave,
            Thanks i got one from this forum
            MergeSort: a procedure written by Jing that is elsewhere here

Hi cork,
            The same, thanks.
            I used that MergeSort in my algo 2 to solve that problem you
            know and i got 3153 cases in 140 ms (very vary fast !).
            So your algo is not correct.
Jing,
             Thanks
Title: Re: Ordering an array
Post by: MichaelW on August 09, 2010, 08:11:09 PM
A bubble sort should be OK for this, it's just a matter of scheduling. With a good implementation running on a fast system, if you started it in the evening, by the next morning it would be done.
Title: Re: Ordering an array
Post by: ecube on August 09, 2010, 08:24:14 PM
Quote from: MichaelW on August 09, 2010, 08:11:09 PM
A bubble sort should be OK for this, it's just a matter of scheduling. With a good implementation running on a fast system, if you started it in the evening, by the next morning it would be done.


you being serious?  :bg
Title: Re: Ordering an array
Post by: cork on August 10, 2010, 12:45:58 AM
Actually, a merge sort is preposterously slow when you could just allocate 512 MB of data and flag the bits using a count sort. One pass and your set of 32-bit numbers is sorted. :dance:
Title: Re: Ordering an array
Post by: RuiLoureiro on August 10, 2010, 03:14:45 PM
MichaelW,
            If you say !!!
            So "bubble sorts" is good to eat with cheese, no ?  :green
Title: Re: Ordering an array
Post by: Rockoon on August 10, 2010, 04:19:27 PM
Quote from: cork on August 10, 2010, 12:45:58 AM
Actually, a merge sort is preposterously slow when you could just allocate 512 MB of data and flag the bits using a count sort. One pass and your set of 32-bit numbers is sorted. :dance:


Thats assuming counts of 0 or 1 only.

Such a counting sort isn't quite so simple on arbitrary 32-bit data and just can't be done in O(n) time without at least (log2(n) * 2^32) bits of memory. In this case he needs at least 21 bits per counter (n = 1.2 million, log2(n) = ~20.19) to handle the case where the entire array is the same value.

You are right that these sorts are incredibly fast when you can use them, but in practice they just dont apply to arbitrary 32-bit data yet (very few people are sporting 11 or more gigs of memory)

With that in mind tho, its very quick to test if the sort can be done with this linear time method (a single pass collecting minval and maxval) could easily reveal that more than enough memory is available.
Title: Re: Ordering an array
Post by: cork on August 10, 2010, 06:38:10 PM
Quote from: Rockoon on August 10, 2010, 04:19:27 PM
Quote from: cork on August 10, 2010, 12:45:58 AM
Actually, a merge sort is preposterously slow when you could just allocate 512 MB of data and flag the bits using a count sort. One pass and your set of 32-bit numbers is sorted. :dance:

Thats assuming counts of 0 or 1 only.

Such a counting sort isn't quite so simple on arbitrary 32-bit data and just can't be done in O(n) time without at least (log2(n) * 2^32) bits of memory. In this case he needs at least 21 bits per counter (n = 1.2 million, log2(n) = ~20.19) to handle the case where the entire array is the same value.

Good point about assuming counts of 0 or 1. If the numer of duplicates is limited to a known range, you can optimize based on that. The general case, as you pointed out, isn't that attractive for 32-bit numbers.

Quote from: Rockoon on August 10, 2010, 04:19:27 PM
Quote from: cork on August 10, 2010, 12:45:58 AM
Actually, a merge sort is preposterously slow when you could just allocate 512 MB of data and flag the bits using a count sort. One pass and your set of 32-bit numbers is sorted. :dance:
You are right that these sorts are incredibly fast when you can use them, but in practice they just dont apply to arbitrary 32-bit data yet (very few people are sporting 11 or more gigs of memory)

With that in mind tho, its very quick to test if the sort can be done with this linear time method (a single pass collecting minval and maxval) could easily reveal that more than enough memory is available.

Here's another idea.

If we are sorting 32-bit numbers and you were to do a standard count sort on that data, you'd require 512 MB if you assume counts of 0 or 1 (1 bit per). You'd require 1 GB if you assume counts of 0-3 (2 bits per). You'd require 1 1/2 GB if you assume counts of 0-7 (3 bits per), you're require 2 GB if you assume counts of 0-15 (4 bits per), etc. If you assume counts between 0 and 255 you'd require the full 4 GB (8 bits per).

Keeping the above in mind...

A 32-bit number can be from 0 to 2^32 - 1.

Let's do a count sort on the numbers from 0 to 2147483647. Write those results out to the final array.
Then do a count sort on the numbers from 2147483648 - 4294967295. Wrote those results out to the final array.

Doing so requires 1/2 the memory of a standard count sort, but the 2 times the number of passes of a count sort.

If we were to do a count sort on the first quarter of the range, then the second quarter of the range, then the third quarter of the range, we'd require 1/4 the memory that a standard count sort would take, but it would make 4 times the number of passes that a count sort would make.

Assuming you have a list of 32-bit numbers having no duplicates, you would require 512 MB for a single pass count sort. If you wanted to use less memory you could use two pass method, cutting your memory requirements down to 256 MB. If you want to use even less memory you could use the four pass method, cutting your memory down to a quite reasonable 128 MB.

If you require X bits to hold your count (instead of just 1 bit as in my example above), then you'd multiply X by 256 MB for the two-pass method. You'd multiply X by 128 MB for the 4 pass method.

Two pass method memory requirements:
  256 MB for counts from 0-1 (1 bit each)
  512 MB for counts from 0-3 (2 bits each)
  768 MB for counts from 0-7 (3 bits each)
  etc.

Four pass method memory requirements:
  128 MB for counts from 0-1 (1 bit each)
  256 MB for counts from 0-3 (2 bits each)
  384 MB for counts from 0-7 (3-bits each)
  etc.

Title: Re: Ordering an array
Post by: cork on August 10, 2010, 07:18:25 PM
Another idea...

A 32 bit number has the range from 0 to 4294967295.

Given an array of 32-bit numbers to sort, split the range in half, then write numbers in the range 0 - 2147483647 to the left side of the array, putting numbers in the range 2147483647 - 4294967295, on the right side of the array. Sort of like partitioning with quicksort, except you use 1/2 the range as your median instead an arbitrary 3-way median that is unsually utilized by quicksort.

You could just look at the left bit. If 0 put on the left side of the array. If 1 put the number on right side of array.

This "dividing the range in half" can be done in-place, in one pass, using the same method that quicksort does (working inward from each side).

Now take the part of the array corresponding to the first half of the range and do the same thing, splitting into 2 subarrays. Do the same for the second half of the range. Now you have 4 sub-arrays, accomplished in 2 passes, and partitioned, in-place requiring no extra memory.

Arrays:
A = first 1/4 of the 32-bit range
B = second 1/4 of the 32-bit range
C = third 1/4 of the 32-bit range
D = fourth 1/4 of the 32-bit range

Now do a count sort on each 1/4 of the range, requiring just 1/4 of the memory that a regular count sort would take.
  Count sort on A, then write to final array
  Count sort on B, then write to final array
  Count sort on C, then write to final array
  Count sort on D, then write to final array

You could take this one more level, splitting the range into eighths, requiring 3 passes through the data.
Then a count sort on each 8th, requiring just one more full pass. That is 4 passes, yet you only use 1/8th the memory of a standard count sort. Not a bad tradeoff between memory and time.

---

If you take this to the extreme you get a binary quicksort, which is like a radix sort using 2 as the radix, and just sorting in-place in the array. I terminated that after a 2 or 3 passes, then do a count sort on each partition which is crazy fast.

---
Title: Re: Ordering an array
Post by: RuiLoureiro on August 11, 2010, 04:24:51 PM
Hi cork,
        I dont know the theory behind these methods.
        What is "count sort" ? Could you post an algo to do it ?
        Did you see the MergeSort written by Jing ?
        Thanks
Title: Re: Ordering an array
Post by: Rockoon on August 11, 2010, 05:20:59 PM
Quote from: RuiLoureiro on August 11, 2010, 04:24:51 PM
Hi cork,
        I dont know the theory behind these methods.
        What is "count sort" ? Could you post an algo to do it ?
        Did you see the MergeSort written by Jing ?
        Thanks


You simply count the number of times each value appears, then with just the frequency counts for each value, you can generate a sorted list of values. You can also do other things with such frequency counts, such as produce a list of values that only appear N times... find the median value in O(n) time, etc..

Title: Re: Ordering an array
Post by: cork on August 11, 2010, 05:45:27 PM
Quote from: RuiLoureiro on August 11, 2010, 04:24:51 PM
Hi cork,
        I dont know the theory behind these methods.
        What is "count sort" ? Could you post an algo to do it ?
        Did you see the MergeSort written by Jing ?
        Thanks

Let's say you have an input array of 1,000,000 numbers, and each of the numbers can be between 0 and 255.

Remember, the only numbers possible are from 0 to 255.

First you create an output array of 256 32-bit integers.

Next, loop through your input array of 1,000,000 numbers. If the number is 55 then you'd add 1 to the 55th element in the output array. If the number is 27, you add 1 to the 27th element of the output array.

It is usually called a "count sort" or a "distribution sort".
Title: Re: Ordering an array
Post by: RuiLoureiro on August 11, 2010, 06:26:04 PM
Rockoon & Cork:

                Okay ! Thanks both
                I understood
Title: Re: Ordering an array
Post by: MichaelW on August 11, 2010, 09:36:19 PM
A very crude implementation:

Edit:

Added a qsort timing for comparison. I switched to a random distribution for qsort to put it on a more equal footing, but even so running on my P3 it was 38 times slower.

;==============================================================================
    include \masm32\include\masm32rt.inc
;==============================================================================
    .data
    .code
;==============================================================================
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 4
compare proc C p1:DWORD, p2:DWORD
    mov eax, [esp+4]
    mov edx, [esp+8]
    mov eax, [eax]
    mov edx, [edx]
    sub eax, edx
    ret
compare endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
;==============================================================================
start:
;==============================================================================

    mov esi, alloc(1000000*4)
    mov edi, alloc(256*4)

    mov ebx, 1000000
    .WHILE ebx
        dec ebx
        invoke nrandom, 256
        mov [esi+ebx*4], eax
    .ENDW

    xor ebx, ebx
    .WHILE ebx < 1000000
        print str$([esi+ebx*4]),13,10
        add ebx, 4000
    .ENDW

    inkey "Press any key to continue..."

    invoke Sleep, 2000

    invoke GetTickCount
    push eax

    xor ecx, ecx
    .WHILE ecx < 1000000
        mov eax, [esi+ecx*4]
        inc DWORD PTR [edi+eax*4]
        inc ecx
    .ENDW
    xor ecx, ecx
    xor ebx, ebx
    .WHILE ecx < 256
        mov eax, [edi+ecx*4]
        .WHILE eax
            mov [esi+ebx*4], ecx
            inc ebx
            dec eax
        .ENDW
        inc ecx
    .ENDW

    invoke GetTickCount
    pop edx
    sub eax, edx
    print str$(eax),"ms",13,10

    inkey "Press any key to continue..."

    xor ebx, ebx
    .WHILE ebx < 1000000
        print str$([esi+ebx*4]),13,10
        add ebx, 4000
    .ENDW

    free esi
    free edi

    inkey "Press any key to continue..."

    mov esi, alloc(1000000*4)

    mov ebx, 1000000
    .WHILE ebx
        dec ebx
        invoke nrandom, 0ffffffffh
        mov [esi+ebx*4], eax
    .ENDW

    invoke GetTickCount
    push eax

    invoke crt_qsort, esi, 1000000, 4, compare

    invoke GetTickCount
    pop edx
    sub eax, edx
    print str$(eax),"ms",13,10

    print chr$(13,10,13,10)

    free esi

    inkey "Press any key to exit..."
    exit

;==============================================================================
end start

Title: Re: Ordering an array
Post by: RuiLoureiro on August 11, 2010, 10:05:05 PM
MichaelW,
            I read it just now and it is clear
            Thanks
             Have a nice night  :green
Title: Re: Ordering an array
Post by: cork on August 12, 2010, 01:13:54 AM
Thanks Michael.

This is why I love the radix sort - it utilizes the very simple idea of the count sort in an elegant way.
Title: Re: Ordering an array
Post by: donkey on August 12, 2010, 02:47:38 AM
Most of the sort routines mentioned in this thread are variations of the bucket sort, a Radix sort is a great solution for most cases until you end up with a number of small buckets or a large number of empty buckets. In cases where bucket sort variations are used you really need even distribution of the data to make them effective. A good way to implement a fast bucket sort algorithm is to pre-examine the array then decide where the partitions will fall so that you will get a more even distribution, then on the second pass actually sort the array. This effectively removes the worst case scenario which is all of your data falling into the same bucket. For some cases, such as a highly random distribution of data, it will add some overhead however for real life data it generally pays for itself. In most cases where I have seen sort algorithms tested the author has set up random data and used it as a basis for timing tests, this is generally misleading as you would rarely find anything close to random data in real world scenarios, those same algorithms might perform very badly indeed when facing strings which generally fall into only a few buckets. Best advice is usually to examine the type of data you expect then choose an algorithm for that not just the one that performs fastest in random number tests (unless that's what you expect to feed it).
Title: Re: Ordering an array
Post by: Rockoon on August 12, 2010, 10:58:34 AM
In the atou() benchmarking thread the test cases others were using equal portions of length 10, 7, 3, and 1... which I found distasteful... I would think that 2 and 3 digit numbers (percentages, temperatures, ..) would be the most common for large processing tasks .. but hey...

Anyways, radix sort doesn't have to have an all-in-one-bucket weakness, without any presampling. The problem of everything landing in one bucket is a design issue related to preallocation. Just dont preallocate in a greedy fashion.

One solution to not preallocating greedily is to use a linked list of arrays paradigm. The actual linked list structure is provided by hybrid means, and looks like:

BucketUpkeep:
  dd itemcount, ptr0, ptr1, ptr2, ptr3, ...
  dd itemcount, ptr0, ptr1, ptr2, ptr3, ...
  dd itemcount, ptr0, ptr1, ptr2, ptr3, ...
...

Each ptr here is not a pointer to items in the bucket, but instead pointers to arrays that hold items in the bucket. In this way we can lazily allocate space as the itemcount in a bucket grows.

Let (N) be the # of items to sort
Let (L) be the array size (in bytes)
Let (R) be the # of buckets (ex: radix-256)
Let (E) be the element size (usually 4 byte pointers into the input array)

The number of entries in BucketUpkeep is (R), the size of each entry is (4 + E * N / L), the total Upkeep space is R * (4 + E * N / L)

The configurable values are L and R, with R = 256 being popular for obvious reasons. E also never needs to be larger than 4.

A small L means a larger upkeep structure, memory allocations are more frequent and waste within the buckets is reduced. A large L means a smaller upkeep structure, memory allocations are less frequent and waste within buckets can be substantial. The total amount of waste within buckets is bounded by (R * L). The choice of L also effects cache efficiency, where larger L's will reduce the size of the upkeep structure. So essentially we've got a time/space tradeoff on L.

Assuming N = 10,000,000
L = 16384
R = 256
E = 4

BucketUpkeep is 626,024 bytes in size, with a maximum within-bucket waste of 4,194,304 bytes. Total waste+overhead here is bounded at only a fraction of the size of the input data (10% in the case of sorting a list of pointers to items such as strings)

With this choice of L on this large input set, the algorithm's upkeep structure will miss L1 often, and even L2 on occasion (its randomly accessed) on most processors. However, because the input set is so large its hard to imagine comparison-based sorts not also missing the caches frequently. A smaller input set like 100,000 has the entire upkeep structure (~6KB) easily fitting in L1 even on crappy celerons with their 8KB L1.

Title: Re: Ordering an array
Post by: RuiLoureiro on August 12, 2010, 05:22:49 PM
Quote from: MichaelW on August 11, 2010, 09:36:19 PM
...
Added a qsort timing for comparison.
...
                On P4 i got :  0 ms and 375ms for qsort ?
                Thanks MichaelW

Why not to use LIMIT instead of 256
LIMIT       equ 256
;LIMIT       equ 65536
Title: Re: Ordering an array
Post by: RuiLoureiro on August 14, 2010, 05:38:15 PM
Cork,
        If i understood it correctly
        the following procedure do CountSort (for kids)
        Is this correct ?
        Thanks

Quote
;===========================================================
    include \masm32\include\masm32rt.inc
;=============================================================
LAST        equ 9           ; last number is 9
NUMBERS     equ 20          ; There are 20 numbers in the array

    .data
_ArrayX     dd 1, 5, 6, 6, 2, 4, 8, 9, 1, 7     ; I have an array of numbers
            dd 6, 6, 3, 2, 4, 2, 2, 3, 5, 9     ; from 0 to 9 only

_FreqTable  dd 10 dup (0)   ; Frequency table
    .code
;======================================================
start:
;=======================================================
            mov     esi, offset _ArrayX
            mov     edi, offset _FreqTable
            mov     ecx, NUMBERS
            ;
            ; Fill _FreqTable
            ;
@@:         sub     ecx, 1
            mov     ebx, dword ptr [esi + ecx*4]
            add     dword ptr [edi+ebx*4], 1
            or      ecx, ecx
            jnz     short @B
            ;
            ; Fill _ArrayX
            ;
            xor     eax, eax
_loop:      mov     ecx, dword ptr [edi + eax*4]
            or      ecx, ecx
            jz      short _next
            ;
            ; Put the same, ecx times
            ; »»»»»»»»»»»»»»»»»»»»»»»
@@:         mov     dword ptr [esi], eax
            add     esi, 4
            sub     ecx, 1
            jnz     short @B
                       
_next:      add     eax, 1
            cmp     eax, LAST
            jbe     short _loop
            ;
            ; Show
            ; ----
            mov     esi, offset _ArrayX           
            mov     ecx, NUMBERS
            xor     ebx, ebx
@@:         push    ecx
                print   " index  "
                print   str$(ebx), "    "
                print   str$([esi + ebx*4]),13,10
            pop     ecx

            add     ebx, 1
            sub     ecx, 1
            jnz     short @B
           
; ###########################################################
    print chr$(13,10,13,10)
    inkey "Press any key to exit..."
    exit
;============================================================
end start