News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Ordering an array

Started by RuiLoureiro, August 09, 2010, 04:25:24 PM

Previous topic - Next topic

MichaelW

#15
A very crude implementation:

Edit:

Added a qsort timing for comparison. I switched to a random distribution for qsort to put it on a more equal footing, but even so running on my P3 it was 38 times slower.

;==============================================================================
    include \masm32\include\masm32rt.inc
;==============================================================================
    .data
    .code
;==============================================================================
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 4
compare proc C p1:DWORD, p2:DWORD
    mov eax, [esp+4]
    mov edx, [esp+8]
    mov eax, [eax]
    mov edx, [edx]
    sub eax, edx
    ret
compare endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
;==============================================================================
start:
;==============================================================================

    mov esi, alloc(1000000*4)
    mov edi, alloc(256*4)

    mov ebx, 1000000
    .WHILE ebx
        dec ebx
        invoke nrandom, 256
        mov [esi+ebx*4], eax
    .ENDW

    xor ebx, ebx
    .WHILE ebx < 1000000
        print str$([esi+ebx*4]),13,10
        add ebx, 4000
    .ENDW

    inkey "Press any key to continue..."

    invoke Sleep, 2000

    invoke GetTickCount
    push eax

    xor ecx, ecx
    .WHILE ecx < 1000000
        mov eax, [esi+ecx*4]
        inc DWORD PTR [edi+eax*4]
        inc ecx
    .ENDW
    xor ecx, ecx
    xor ebx, ebx
    .WHILE ecx < 256
        mov eax, [edi+ecx*4]
        .WHILE eax
            mov [esi+ebx*4], ecx
            inc ebx
            dec eax
        .ENDW
        inc ecx
    .ENDW

    invoke GetTickCount
    pop edx
    sub eax, edx
    print str$(eax),"ms",13,10

    inkey "Press any key to continue..."

    xor ebx, ebx
    .WHILE ebx < 1000000
        print str$([esi+ebx*4]),13,10
        add ebx, 4000
    .ENDW

    free esi
    free edi

    inkey "Press any key to continue..."

    mov esi, alloc(1000000*4)

    mov ebx, 1000000
    .WHILE ebx
        dec ebx
        invoke nrandom, 0ffffffffh
        mov [esi+ebx*4], eax
    .ENDW

    invoke GetTickCount
    push eax

    invoke crt_qsort, esi, 1000000, 4, compare

    invoke GetTickCount
    pop edx
    sub eax, edx
    print str$(eax),"ms",13,10

    print chr$(13,10,13,10)

    free esi

    inkey "Press any key to exit..."
    exit

;==============================================================================
end start

eschew obfuscation

RuiLoureiro

MichaelW,
            I read it just now and it is clear
            Thanks
             Have a nice night  :green

cork

Thanks Michael.

This is why I love the radix sort - it utilizes the very simple idea of the count sort in an elegant way.

donkey

Most of the sort routines mentioned in this thread are variations of the bucket sort, a Radix sort is a great solution for most cases until you end up with a number of small buckets or a large number of empty buckets. In cases where bucket sort variations are used you really need even distribution of the data to make them effective. A good way to implement a fast bucket sort algorithm is to pre-examine the array then decide where the partitions will fall so that you will get a more even distribution, then on the second pass actually sort the array. This effectively removes the worst case scenario which is all of your data falling into the same bucket. For some cases, such as a highly random distribution of data, it will add some overhead however for real life data it generally pays for itself. In most cases where I have seen sort algorithms tested the author has set up random data and used it as a basis for timing tests, this is generally misleading as you would rarely find anything close to random data in real world scenarios, those same algorithms might perform very badly indeed when facing strings which generally fall into only a few buckets. Best advice is usually to examine the type of data you expect then choose an algorithm for that not just the one that performs fastest in random number tests (unless that's what you expect to feed it).
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

Rockoon

In the atou() benchmarking thread the test cases others were using equal portions of length 10, 7, 3, and 1... which I found distasteful... I would think that 2 and 3 digit numbers (percentages, temperatures, ..) would be the most common for large processing tasks .. but hey...

Anyways, radix sort doesn't have to have an all-in-one-bucket weakness, without any presampling. The problem of everything landing in one bucket is a design issue related to preallocation. Just dont preallocate in a greedy fashion.

One solution to not preallocating greedily is to use a linked list of arrays paradigm. The actual linked list structure is provided by hybrid means, and looks like:

BucketUpkeep:
  dd itemcount, ptr0, ptr1, ptr2, ptr3, ...
  dd itemcount, ptr0, ptr1, ptr2, ptr3, ...
  dd itemcount, ptr0, ptr1, ptr2, ptr3, ...
...

Each ptr here is not a pointer to items in the bucket, but instead pointers to arrays that hold items in the bucket. In this way we can lazily allocate space as the itemcount in a bucket grows.

Let (N) be the # of items to sort
Let (L) be the array size (in bytes)
Let (R) be the # of buckets (ex: radix-256)
Let (E) be the element size (usually 4 byte pointers into the input array)

The number of entries in BucketUpkeep is (R), the size of each entry is (4 + E * N / L), the total Upkeep space is R * (4 + E * N / L)

The configurable values are L and R, with R = 256 being popular for obvious reasons. E also never needs to be larger than 4.

A small L means a larger upkeep structure, memory allocations are more frequent and waste within the buckets is reduced. A large L means a smaller upkeep structure, memory allocations are less frequent and waste within buckets can be substantial. The total amount of waste within buckets is bounded by (R * L). The choice of L also effects cache efficiency, where larger L's will reduce the size of the upkeep structure. So essentially we've got a time/space tradeoff on L.

Assuming N = 10,000,000
L = 16384
R = 256
E = 4

BucketUpkeep is 626,024 bytes in size, with a maximum within-bucket waste of 4,194,304 bytes. Total waste+overhead here is bounded at only a fraction of the size of the input data (10% in the case of sorting a list of pointers to items such as strings)

With this choice of L on this large input set, the algorithm's upkeep structure will miss L1 often, and even L2 on occasion (its randomly accessed) on most processors. However, because the input set is so large its hard to imagine comparison-based sorts not also missing the caches frequently. A smaller input set like 100,000 has the entire upkeep structure (~6KB) easily fitting in L1 even on crappy celerons with their 8KB L1.

When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

RuiLoureiro

Quote from: MichaelW on August 11, 2010, 09:36:19 PM
...
Added a qsort timing for comparison.
...
                On P4 i got :  0 ms and 375ms for qsort ?
                Thanks MichaelW

Why not to use LIMIT instead of 256
LIMIT       equ 256
;LIMIT       equ 65536

RuiLoureiro

Cork,
        If i understood it correctly
        the following procedure do CountSort (for kids)
        Is this correct ?
        Thanks

Quote
;===========================================================
    include \masm32\include\masm32rt.inc
;=============================================================
LAST        equ 9           ; last number is 9
NUMBERS     equ 20          ; There are 20 numbers in the array

    .data
_ArrayX     dd 1, 5, 6, 6, 2, 4, 8, 9, 1, 7     ; I have an array of numbers
            dd 6, 6, 3, 2, 4, 2, 2, 3, 5, 9     ; from 0 to 9 only

_FreqTable  dd 10 dup (0)   ; Frequency table
    .code
;======================================================
start:
;=======================================================
            mov     esi, offset _ArrayX
            mov     edi, offset _FreqTable
            mov     ecx, NUMBERS
            ;
            ; Fill _FreqTable
            ;
@@:         sub     ecx, 1
            mov     ebx, dword ptr [esi + ecx*4]
            add     dword ptr [edi+ebx*4], 1
            or      ecx, ecx
            jnz     short @B
            ;
            ; Fill _ArrayX
            ;
            xor     eax, eax
_loop:      mov     ecx, dword ptr [edi + eax*4]
            or      ecx, ecx
            jz      short _next
            ;
            ; Put the same, ecx times
            ; »»»»»»»»»»»»»»»»»»»»»»»
@@:         mov     dword ptr [esi], eax
            add     esi, 4
            sub     ecx, 1
            jnz     short @B
                       
_next:      add     eax, 1
            cmp     eax, LAST
            jbe     short _loop
            ;
            ; Show
            ; ----
            mov     esi, offset _ArrayX           
            mov     ecx, NUMBERS
            xor     ebx, ebx
@@:         push    ecx
                print   " index  "
                print   str$(ebx), "    "
                print   str$([esi + ebx*4]),13,10
            pop     ecx

            add     ebx, 1
            sub     ecx, 1
            jnz     short @B
           
; ###########################################################
    print chr$(13,10,13,10)
    inkey "Press any key to exit..."
    exit
;============================================================
end start