News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

quicksort

Started by peter84, September 02, 2005, 09:28:09 PM

Previous topic - Next topic

peter84

Okey Eóin, I think I almost got this, just have to find out how to get arguments from a C-function, which I will soon.

I am, however, eager to see the rest of your solution (I guess that would be the first initialising, swap, and the recursion itself) even though it is in FASM.

I pretty much translated from a C-function i wrote.. Not any optimalizations, but it probably works just fine!  :8)

Eóin

#16
Here you go peter. It uses the Tri-Median technique at the start. The initial function QuickSort uses FASM procedure macros. They have in fact been updated recently so don't pay too much attention to them :bg . Anyway to reason for using two functions was to allow for a clean interface while cutting down on any overhead for the qsort recursive bit.

proc QuickSort,pAry,aSze ;aSze in BYTES not dwords
; Sort an array "pAry" of signed ints.
enter
mov eax,[aSze]
push edi esi ebx
sub eax,4
mov ebx,[pAry]
push eax 0
call qsort
pop ebx esi edi
return

rgh equ esp+8
lft equ esp+4

align 4
qsort: mov edi,[rgh]
mov esi,[lft]

lea ecx,[esi+edi]
shr ecx,1
and ecx,not 3

; Tri-Median
mov edx,[ebx+edi]
mov eax,[ebx+esi]
cmp eax,edx ;cmp
cmovg edx,eax
cmovg eax,[ebx+edi]
mov [ebx+edi],edx
mov [ebx+esi],eax

mov eax,[ebx+ecx]
mov edx,[ebx+edi]
cmp eax,edx ;cmp
cmovg edx,eax
cmovg eax,[ebx+edi]
mov [ebx+edi],edx
mov [ebx+ecx],eax

mov eax,[ebx+esi]
mov edx,[ebx+ecx]
cmp eax,edx ;cmp
cmovle edx,eax
cmovle eax,[ebx+ecx]
mov [ebx+ecx],edx
mov [ebx+esi],eax

sub edi,4
mov eax,[ebx+esi]
mov edx,[ebx+edi]
mov [ebx+edi],eax
mov [ebx+esi],edx

mov edx,[ebx+edi]
.lpFor:
.wle1: add esi,4
cmp [ebx+esi],edx ;cmp
jl .wle1
.wle2: sub edi,4
cmp [ebx+edi],edx ;cmp
jg .wle2
cmp esi,edi
jg .exFor

mov eax,[ebx+esi]
mov ecx,[ebx+edi]
mov [ebx+edi],eax
mov [ebx+esi],ecx
jmp .lpFor
.exFor:
mov ecx,[rgh]
sub ecx,4

mov eax,[ebx+esi]
mov edx,[ebx+ecx]
mov [ebx+ecx],eax
mov [ebx+esi],edx

add esi,4
mov eax,[lft]

push pd[rgh]
push esi
push edi
push eax
call qsort
call qsort
.ex:
retn 8


WARNING I appologise but the above code could cause a problem because I made a mistake in removing the Insertion sort code. So I've attached the original hybrid QuickSort/Insertion Sort.

[attachment deleted by admin]

notoka

Quote from: roticv on September 05, 2005, 01:59:55 AM
HAHA
What about equating it to another programming language?
...
Peter84 commenced this thread by inquiring about quicksort.  Sluggy and PBrennick offered some links.
Thanking their efforts, Peter84 replied that he was having some difficulty visualizing the storage of the components of the search, and Eoin offered some pseudocode to explain his approach.  Eoin's post was very informative.  However, a couple of posts later, a question arose regarding an implementation detail, since Peter84 was apparently using GAS, which employs ATT syntax, and Eoin proposed that most folks on this forum do not understand ATT syntax.  Roticv then supported this denunciation of GAS with a well explained illustration showing how the Intel/ATT formats differ.  In offering this lucid explanation, however, Roticv suggested that "gas has a screwed up convention...", a notion with which I feel compelled to disagree.
Eoin replied to my note of indignation by writing: "I would submit that equating English language grammar (or metaphors ) to assembly is ridiculous."  Well, I disagree with Eoin: grammar comparison between human and machine languages is certainly not absurd.  It may not accord with some narrow minded notions of what assembly language should look like, but all grammars are capable of analysis--one is not superfluous, and another spectacular.
Roticv replied to this interchange by asking about comparison between the GAS format and some other programming language.  We need not move to C or other "high level" languages, stay right here, with assembly language programming:
http://simon.baymoo.org/universe/tools/symset/symset.txt
"Other differences aside, you will notice that the AT&T format makes assignments from left to right, and that modifying instructions modify their right-most arguments. The primary reason for this difference is due to the VAX assembly format for which the AT&T style was originally invented. (The Motorola 68000 and its descendents were heavily influenced by the VAX. Likewise, their assembly language format moves in this direction as well!) " 
In other words, it is not GAS which is the odd man out, it is Intel's peculiar notion, which is distinct from the prototypical modern assemblers derived from the BEST EVER ASSEMBLER in my humble opinion:  VAX. I think one should not a priori underestimate the experience of participants of this, or any other forum, by suggesting that most participants would be unable to grasp the notion of an assembler which follows the intuitive notion of specifying a source prior to the destination.  :U

Eóin

Its a very personal issue and so its impossible to say which is better or worse. But equating things to natural languages can't work because there are always loads of ways of doing things in such languages. You suggest an example where src would naturally come before dest, but how about "Make eax the same as edx". Or simply "Set eax equal to edx".

Comparisons really have to be done between formal languages, and perhaps the most widely understood formal language is that used in mathematical proofs, and they use the dest,src convention- eax := edx.

In my personal opinion Intel has mostly been right in its choices because I believe that they picked what would be the intuitive choices most people would go for if they came at things with a clean mind uncluttered by the multitude of (and often conflicting) conventions we all have to deal with in our daily lives. I say believe because of course its largely impossible test or be sure of that belief :bg .

hutch--

Eóin,

I wondered if you have a decent C implimentation of a quick sort that does the tri-median calculation on board and actually works properly. I have been ratting around the internet and while I have a number of normal quick sorts that work OK, I have yet to find a C version that is not broken.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Eóin

Here is a C version which seems to perform quite well. Unfortunatly its a hybrid QuickSort Insertion Sort. I know you wanted just the QuickSort but its reasonably easy to seperate the QuickSort bit out. Just change the constant M to 0 and comment/uncomment out the indicated lines.

void _inline swap(int* a, unsigned l, unsigned r) {
int tmp = a[l];
a[l] = a[r];
a[r] = tmp;
}

void qSort(int* a, unsigned l, unsigned r) {
const int M = 50; unsigned i; unsigned j; int v;

if ((int)(r-l)>M) {
i = (r+l)/2;
if (a[l]>a[i]) swap(a,l,i);
if (a[l]>a[r]) swap(a,l,r);
if (a[i]>a[r]) swap(a,i,r);

// if ((r-l)==1) return; // uncomment

j = r-1;
swap(a,i,j);
i = l;
v = a[j];
for(;;) {
while(a[++i]<v);
while(a[--j]>v);

if (j<i) break;
swap (a,i,j);
}
swap(a,i,r-1);
qSort(a,l,j);
qSort(a,i+1,r);
}
}

void iSort(int* a, unsigned l, unsigned r) {
unsigned i; unsigned j; int v;

for (i=l+1; i<=r; i++) {
v = a[i];
j = i;
while ((j>l) && (a[j-1]>v)) {
a[j] = a[j-1];
j--;
}
a[j] = v;
}
}

void _cdecl QuickSort(int* array, unsigned size) {

qSort(array,0,size);
iSort(array,0,size); // comment
}


If compiled into its own OBJ file use the following declaration in the file that calls it.
extern "C" void _cdecl QuickSort(int* array, unsigned size);

hutch--

Thanks Eóin, this version looks like good stuff. Should be easy enough to test it with and without the insertion sort.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

notoka

Quote from: hutch-- on September 05, 2005, 10:19:45 PM
... I have a number of normal quick sorts that work OK, I have yet to find a C version that is not broken.
You may wish to look at Dandamudi's recursive Quick Sort written in ASM, pages 469-471:
S. Dandamudi, "Introduction to Assembly Language Programming for Pentium and RISC processors" Springer 2005 (second edition), ISBN 0387206361
http://www.scs.carleton.ca/~sivarama/asm_book/index.html
:U

hutch--

His book looks good so far but it looks like 16 bit DOS assembler which is not really on the pace. I wonder if the author wil be doing some 32 bit assembler as well ?
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

hutch--

Eóin,

This is the first working tri-median quick sort I have seen and its also faster than any of the others I have played with. I stripped te insertion sort and benchmarked it against two of the others and its clearly faster. It is more algo dependent than the rest so optimising it has been more work but in the first attempt its about 5% faster so far. The working loop code is not that hard to work on but getting the three leading swaps tidy enough has been a ton of fun.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

notoka

Quote from: hutch-- on September 06, 2005, 12:23:09 PM
His book looks good so far but it looks like 16 bit DOS assembler which is not really on the pace. I wonder if the author wil be doing some 32 bit assembler as well ?
I may have misunderstood, Hutch, but to my eye, it looks like Dandamudi has authored PRIMARILY 32 bit code:
for example, from his recursive quick sort routine, on page 471:
"
94:  sep_done:
95:   xchg AX,[EBX+ESI*2] ;
96:   xchg AX,[EBX+EDI*2] ;  x <=> x[hi]
97:   xchg AX,[EBX+ESI*2] ;
98:
99:   dec ESI
100: mov EDI,ESI ;  hi = i-1
101:...
102:...
103:...
104:  mov ESI,ECX
105:  call qsort
"
Further, on page ix, i.e. in the preface, Dandamudi writes:

"Here is a summary of the special features that sets this book apart:
....
This book uses NASM and Linux as opposed to scores of other books that use MASM and windows....." 

Linux is, and always has been, 32 bit, from its origin.  MASM on the other hand, BEGAN AS PURE 16 bit code, working strictly with DOS.  I remember from nearly twenty years ago, being obliged to change config.sys or autoexec.bat, forgot now, which, in order to use MASM--had to change the path variable, must have been autoexec.bat.  I have not looked at MASM since 1988, so I have no idea what it is like today. 
Do you describe Dandamudi's code as primarily 16 bit, because of his use of the instructions:
xchg AX,[EBX+ESI*2]
where he is using a 16 bit register, AX, instead of EAX?
As I recall, and it is a bit fuzzy now, it was not until version 5 of MASM that one could use 32 bit registers, (since they did not exist in the cpu itself, until the mid 1980's) though the output of the assembler was still dependent on pure 16 bit DOS.
Regards, notoka
  :U

Eóin

Hi hutch--, this post over on win32asm board is discussing conditional swaps; http://board.win32asmcommunity.net/index.php?topic=21771.0;topicseen . Just the things for optimising the tri-median bit :U

hutch--

I have just run into a weird effect maunally optimising the quick sort that Eóin posted that has caught me a number of times. I start with a DWORD array of 3 million with the same count for random range, optimise the algo them when I change the size in either direction it sometime produces an error in its results. It will produce the correct results at 8 million but not at 10 million. I have tested it on my spare box and its an identical problem.

I could understand if it was just a mistake in the encoding but I have never seen this effect before. I have been using this algo as practice to work out a number of things in code optimisation, among them how to try and avoid the costs of memory page swapping when you are repeatedly working of two addresses that don't fit into the data cache at the same time.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Eóin

The only thing I can think of off hand is that you have to be very careful of the case when you are dealing with a partition of just two elements.

hutch--

Here is try number 3. This one tests correct up to arrays of 50 million so its probably OK. This original tri-median qsort is the best I have seen yet in non hybrid quick sorts. From the unoptimised C output, optimising the loop code first saw about 95% of the speed gain. Cleaning up the 3 median calculations was a percent or so. Inlining the bypass code around each recursive call was worth a couple of percent and unrolling the two partition calculation loops was another couple of percent. It is barely faster than the optimised C code but it is faster on every array size tested from 1 million to 50 million.

This is a very good piece of code optimisation practice as you learn a lot about what matters and what does not. When the array address as replaced with a register there was a measurable speed gain but when 33 odd moves of the array address into the register were removed there was no speed gain at all. This algo is just complcated enough to make you work hard at reusing registers carefully so that you don't need to use any locals. On this PIV there was no gain aligning any of the labels and in the loop code it actually slowed the algo down.

The actualy algorithm is bound by the memory access speed when dealing with two addresses that are far apart enough to not be in cache at the same time and with current hardware there apears little that can be done about it.



; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

align 16

qSortc proc arr:DWORD,lft:DWORD,rgt:DWORD

    push ebx
    push esi
    push edi

    mov ebx, arr

    mov eax, rgt
    add eax, lft
    cdq
    sub eax, edx
    sar eax, 1
    mov esi, eax

    mov edx, lft
    mov eax, [ebx+esi*4]
    mov ecx, [ebx+edx*4]
    cmp ecx, eax
    jle lbl0
    mov [ebx+esi*4], ecx
    mov [ebx+edx*4], eax

  lbl0:
    mov edi, lft
    mov ecx, rgt
    mov edx, [ebx+edi*4]
    mov eax, [ebx+ecx*4]
    cmp eax, edx
    jg lbl1
    mov [ebx+ecx*4], edx
    mov [ebx+edi*4], eax

  lbl1:
    mov edx, rgt
    mov eax, [ebx+esi*4]
    mov ecx, [ebx+edx*4]
    cmp ecx, eax
    jg lbl2
    mov [ebx+edx*4], eax
    mov [ebx+esi*4], ecx

  lbl2:
    mov ecx, rgt
    sub ecx, lft
    cmp ecx, 1
    je lbl9

  lbl3:
    mov edi, rgt
    sub edi, 1
    mov ecx, [ebx+esi*4]
    mov edx, [ebx+edi*4]
    mov [ebx+edi*4], ecx
    mov [ebx+esi*4], edx

    mov esi, lft
    mov eax, [ebx+edi*4]    ; pivot

    jmp lbl4

; ----------------------------

  pre:
    mov [ebx+edi*4], edx
    mov [ebx+esi*4], ecx

  lbl4:
  REPEAT 7
    add esi, 1
    mov edx, [ebx+esi*4]
    cmp edx, eax
    jge lbl5
  ENDM
    add esi, 1
    mov edx, [ebx+esi*4]
    cmp edx, eax
    jl lbl4

  lbl5:
  REPEAT 7
    sub edi, 1
    mov ecx, [ebx+edi*4]
    cmp ecx, eax
    jle lbl6
  ENDM
    sub edi, 1
    mov ecx, [ebx+edi*4]
    cmp ecx, eax
    jg lbl5

  lbl6:
    cmp edi, esi
    jge pre

; ----------------------------

    mov ecx, rgt
    mov edx, [ebx+esi*4]
    mov eax, [ebx+ecx*4-4]
    mov [ebx+ecx*4-4], edx
    mov [ebx+esi*4], eax

    mov eax, edi
    sub eax, lft
    test eax, eax
    jle @F
    invoke qSortc,ebx,lft,edi
  @@:

    add esi, 1
    mov eax, rgt
    sub eax, esi
    test eax, eax
    jle lbl9
    invoke qSortc,ebx,esi,rgt
  lbl9:

    pop edi
    pop esi
    pop ebx

  quit:

    ret

qSortc endp

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php