Hi, I'm doing quicksort in assembly (masm), but I have no idea how to start. Maybe I'll get help from the course later, but I want to start anyway. So... what would you do in this case? Splitting the programming-task into small pieces, so, what pieces do you think that should be? Not looking for code, just some help thinking the right way :thumbu
The best thing for you to do is to learn the simple theory behind the quicksort algorithm, then design a bit of code, then ask the questions.
This link (http://www.google.com/search?num=100&hs=NcI&hl=en&lr=&client=opera&rls=en&q=%22how+to%22+quicksort&btnG=Search), and particularly this one (http://www.ics.uci.edu/~eppstein/161/960118.html) should help you.
peter84,
Hutch is working on the exect same thing and will be posting his code sometime today in this thread, http://www.masmforum.com/simple/index.php?topic=2579.msg20396#msg20396
It will give you some ideas.
Paul
Thanks for the replies guys! Guess I forgot to mention that I've already done quicksort in java about a year ago.
I guess what is bothering me is how to represent an array in assembly, how to make the algorithm manipulate it, i.e. choose a pivot and compare all other to that. We have the cmpl-operand (using GNU/linux), and the bigger and smaller elements are to be seperated. Where do I store the 2 smaller parts, and then the 4 smaller parts, etc?
Appreciete your replies! :wink
Here is some example code of an inner QuickSort loop. First the Java code for reference, then the Assembly. I'm not posting the full code to avoid overwhelming you.
j = r-1;
i = l;
v = a[j];
for(;;)
{
while(a[++i]<v);
while(a[--j]>v);
if (j<i) break;
swap (a,i,j);
}
mov edi,[rgh]
mov esi,[lft]
sub edi,4
mov edx,[ebx+edi]
.lpFor:
.wle1: add esi,4
cmp [ebx+esi],edx
jl .wle1
.wle2: sub edi,4
cmp [ebx+edi],edx
jg .wle2
cmp esi,edi
jg .exFor
mov eax,[ebx+esi]
mov ecx,[ebx+edi]
mov [ebx+edi],eax
mov [ebx+esi],ecx
jmp .lpFor
.exFor:
In the java code a is the array. l & r are offsets representing the left and right hand side of the partition to divide. v is chosen as the pivot and i & j become the offsets to the array elements which get compared against the pivot.
The Assembly code has many one-to-one correspondences with the java code. The array in assembly in a continuous block of memory, perhaps allocated by a routine like HeapAlloc on Windows. The array (in this example) stores integers as DWORDs, hence when incrementing or decrementing an offset into the array, 4 is added or subtracted, 4 being the size of a DWORD.
A pointer to the start of this array is kept in ebx. The pivot value is moved into edx, and edi & esi become the offset into the array that scan from the left and right sides of the partition. You can see how after the appropriate elements in the array are determined for swapping a check is performed to see if we need to exit the loop early much like the if (j<i) break; code in the java version.
I will quite happily post the full code if you wish but it’s written in FASM (similar but not identical to MASM) and it uses an Insertion Sort after the partitions get below a critical size, which is a feature you may not want. And of course you say you don't just want to be handed code.
Thanks Eóin! I think I see the solution in front of me now (no, not at the computerscreen :P). I'll keep working, and drop a post if I'm stuck.
Oh, and thanks for the good explanation. Are you a teacher? :)
No, but I hope to teach Math eventually when I finish my studies.
Eóin,
Personally, I think you will be great at that! Keep up the studies and the good work!
Paul
So... I can reserve space for the numbers with the .equ-operand, but how do I define how large it should be? And does anyone know how to translate this to GNU/linux asm: cmp [ebx+esi],edx # the stuff inside the [ ]
Thanks! :8)
It's not so much a question of GNU/linux asm, rather its a question of which Assembler are you using. Unforunatly if you're using GAS not many people here will probably know of it seeing as this forum is mostly MASM users.
gas has a screwed up convention of
opcode src, dest
while most assembler has the convention of
opcode dest, src
So in gas cmp [ebx+esi],edx
is the same as cmp edx, [ebx+esi] in masm
I would submit that it is Intel's format: Destination, Source that is counterintuitive. Do you know of any airline that asks from a prospective passenger the destination, PRIOR to inquiring about the intended port of embarkation? http://portal.lufthansa.com/online/portal?ctest=1
::)
I would submit that equating English language grammar (or metaphors ::) ) to assembly is ridiculous.
HAHA
What about equating it to another programming language?
Like for example in C, we do
var = 1;
isn't that the same as
dest, src
:lol
That remember me time a go... tought dont remember exactly what was my problem with that :green2.
Okey Eóin, I think I almost got this, just have to find out how to get arguments from a C-function, which I will soon.
I am, however, eager to see the rest of your solution (I guess that would be the first initialising, swap, and the recursion itself) even though it is in FASM.
I pretty much translated from a C-function i wrote.. Not any optimalizations, but it probably works just fine! :8)
Here you go peter. It uses the Tri-Median technique at the start. The initial function QuickSort uses FASM procedure macros. They have in fact been updated recently so don't pay too much attention to them :bg . Anyway to reason for using two functions was to allow for a clean interface while cutting down on any overhead for the qsort recursive bit.
proc QuickSort,pAry,aSze ;aSze in BYTES not dwords
; Sort an array "pAry" of signed ints.
enter
mov eax,[aSze]
push edi esi ebx
sub eax,4
mov ebx,[pAry]
push eax 0
call qsort
pop ebx esi edi
return
rgh equ esp+8
lft equ esp+4
align 4
qsort: mov edi,[rgh]
mov esi,[lft]
lea ecx,[esi+edi]
shr ecx,1
and ecx,not 3
; Tri-Median
mov edx,[ebx+edi]
mov eax,[ebx+esi]
cmp eax,edx ;cmp
cmovg edx,eax
cmovg eax,[ebx+edi]
mov [ebx+edi],edx
mov [ebx+esi],eax
mov eax,[ebx+ecx]
mov edx,[ebx+edi]
cmp eax,edx ;cmp
cmovg edx,eax
cmovg eax,[ebx+edi]
mov [ebx+edi],edx
mov [ebx+ecx],eax
mov eax,[ebx+esi]
mov edx,[ebx+ecx]
cmp eax,edx ;cmp
cmovle edx,eax
cmovle eax,[ebx+ecx]
mov [ebx+ecx],edx
mov [ebx+esi],eax
sub edi,4
mov eax,[ebx+esi]
mov edx,[ebx+edi]
mov [ebx+edi],eax
mov [ebx+esi],edx
mov edx,[ebx+edi]
.lpFor:
.wle1: add esi,4
cmp [ebx+esi],edx ;cmp
jl .wle1
.wle2: sub edi,4
cmp [ebx+edi],edx ;cmp
jg .wle2
cmp esi,edi
jg .exFor
mov eax,[ebx+esi]
mov ecx,[ebx+edi]
mov [ebx+edi],eax
mov [ebx+esi],ecx
jmp .lpFor
.exFor:
mov ecx,[rgh]
sub ecx,4
mov eax,[ebx+esi]
mov edx,[ebx+ecx]
mov [ebx+ecx],eax
mov [ebx+esi],edx
add esi,4
mov eax,[lft]
push pd[rgh]
push esi
push edi
push eax
call qsort
call qsort
.ex:
retn 8
WARNING I appologise but the above code could cause a problem because I made a mistake in removing the Insertion sort code. So I've attached the original hybrid QuickSort/Insertion Sort.
[attachment deleted by admin]
Quote from: roticv on September 05, 2005, 01:59:55 AM
HAHA
What about equating it to another programming language?
...
Peter84 commenced this thread by inquiring about quicksort. Sluggy and PBrennick offered some links.
Thanking their efforts, Peter84 replied that he was having some difficulty visualizing the storage of the components of the search, and Eoin offered some pseudocode to explain his approach. Eoin's post was very informative. However, a couple of posts later, a question arose regarding an implementation detail, since Peter84 was apparently using GAS, which employs ATT syntax, and Eoin proposed that most folks on this forum do not understand ATT syntax. Roticv then supported this denunciation of GAS with a well explained illustration showing how the Intel/ATT formats differ. In offering this lucid explanation, however, Roticv suggested that "gas has a screwed up convention...", a notion with which I feel compelled to disagree.
Eoin replied to my note of indignation by writing: "I would submit that equating English language grammar (or metaphors ) to assembly is ridiculous." Well, I disagree with Eoin: grammar comparison between human and machine languages is certainly not absurd. It may not accord with some narrow minded notions of what assembly language should look like, but all grammars are capable of analysis--one is not superfluous, and another spectacular.
Roticv replied to this interchange by asking about comparison between the GAS format and some other programming language. We need not move to C or other "high level" languages, stay right here, with assembly language programming:
http://simon.baymoo.org/universe/tools/symset/symset.txt
"Other differences aside, you will notice that the AT&T format makes assignments from left to right, and that modifying instructions modify their right-most arguments. The primary reason for this difference is due to the VAX assembly format for which the AT&T style was originally invented. (The Motorola 68000 and its descendents were heavily influenced by the VAX. Likewise, their assembly language format moves in this direction as well!) "
In other words, it is not GAS which is the odd man out, it is Intel's peculiar notion, which is distinct from the prototypical modern assemblers derived from the BEST EVER ASSEMBLER in my humble opinion: VAX. I think one should not a priori underestimate the experience of participants of this, or any other forum, by suggesting that most participants would be unable to grasp the notion of an assembler which follows the intuitive notion of specifying a source prior to the destination. :U
Its a very personal issue and so its impossible to say which is better or worse. But equating things to natural languages can't work because there are always loads of ways of doing things in such languages. You suggest an example where src would naturally come before dest, but how about "Make eax the same as edx". Or simply "Set eax equal to edx".
Comparisons really have to be done between formal languages, and perhaps the most widely understood formal language is that used in mathematical proofs, and they use the dest,src convention- eax := edx.
In my personal opinion Intel has mostly been right in its choices because I believe that they picked what would be the intuitive choices most people would go for if they came at things with a clean mind uncluttered by the multitude of (and often conflicting) conventions we all have to deal with in our daily lives. I say believe because of course its largely impossible test or be sure of that belief :bg .
Eóin,
I wondered if you have a decent C implimentation of a quick sort that does the tri-median calculation on board and actually works properly. I have been ratting around the internet and while I have a number of normal quick sorts that work OK, I have yet to find a C version that is not broken.
Here is a C version which seems to perform quite well. Unfortunatly its a hybrid QuickSort Insertion Sort. I know you wanted just the QuickSort but its reasonably easy to seperate the QuickSort bit out. Just change the constant M to 0 and comment/uncomment out the indicated lines.
void _inline swap(int* a, unsigned l, unsigned r) {
int tmp = a[l];
a[l] = a[r];
a[r] = tmp;
}
void qSort(int* a, unsigned l, unsigned r) {
const int M = 50; unsigned i; unsigned j; int v;
if ((int)(r-l)>M) {
i = (r+l)/2;
if (a[l]>a[i]) swap(a,l,i);
if (a[l]>a[r]) swap(a,l,r);
if (a[i]>a[r]) swap(a,i,r);
// if ((r-l)==1) return; // uncomment
j = r-1;
swap(a,i,j);
i = l;
v = a[j];
for(;;) {
while(a[++i]<v);
while(a[--j]>v);
if (j<i) break;
swap (a,i,j);
}
swap(a,i,r-1);
qSort(a,l,j);
qSort(a,i+1,r);
}
}
void iSort(int* a, unsigned l, unsigned r) {
unsigned i; unsigned j; int v;
for (i=l+1; i<=r; i++) {
v = a[i];
j = i;
while ((j>l) && (a[j-1]>v)) {
a[j] = a[j-1];
j--;
}
a[j] = v;
}
}
void _cdecl QuickSort(int* array, unsigned size) {
qSort(array,0,size);
iSort(array,0,size); // comment
}
If compiled into its own OBJ file use the following declaration in the file that calls it.
extern "C" void _cdecl QuickSort(int* array, unsigned size);
Thanks Eóin, this version looks like good stuff. Should be easy enough to test it with and without the insertion sort.
Quote from: hutch-- on September 05, 2005, 10:19:45 PM
... I have a number of normal quick sorts that work OK, I have yet to find a C version that is not broken.
You may wish to look at Dandamudi's recursive Quick Sort written in ASM, pages 469-471:
S. Dandamudi, "Introduction to Assembly Language Programming for Pentium and RISC processors" Springer 2005 (second edition), ISBN 0387206361
http://www.scs.carleton.ca/~sivarama/asm_book/index.html
:U
His book looks good so far but it looks like 16 bit DOS assembler which is not really on the pace. I wonder if the author wil be doing some 32 bit assembler as well ?
Eóin,
This is the first working tri-median quick sort I have seen and its also faster than any of the others I have played with. I stripped te insertion sort and benchmarked it against two of the others and its clearly faster. It is more algo dependent than the rest so optimising it has been more work but in the first attempt its about 5% faster so far. The working loop code is not that hard to work on but getting the three leading swaps tidy enough has been a ton of fun.
Quote from: hutch-- on September 06, 2005, 12:23:09 PM
His book looks good so far but it looks like 16 bit DOS assembler which is not really on the pace. I wonder if the author wil be doing some 32 bit assembler as well ?
I may have misunderstood, Hutch, but to my eye, it looks like Dandamudi has authored PRIMARILY 32 bit code:
for example, from his recursive quick sort routine, on page 471:
"
94: sep_done:
95: xchg AX,[EBX+ESI*2] ;
96: xchg AX,[EBX+EDI*2] ; x
<=> x[hi]
97: xchg AX,[EBX+ESI*2] ;
98:
99: dec ESI
100: mov EDI,ESI ; hi = i-1
101:...
102:...
103:...
104: mov ESI,ECX
105: call qsort
"
Further, on page ix, i.e. in the preface, Dandamudi writes:
"Here is a summary of the special features that sets this book apart:
....
This book uses NASM and Linux as opposed to scores of other books that use MASM and windows....."
Linux is, and always has been, 32 bit, from its origin. MASM on the other hand, BEGAN AS PURE 16 bit code, working strictly with DOS. I remember from nearly twenty years ago, being obliged to change config.sys or autoexec.bat, forgot now, which, in order to use MASM--had to change the path variable, must have been autoexec.bat. I have not looked at MASM since 1988, so I have no idea what it is like today.
Do you describe Dandamudi's code as primarily 16 bit, because of his use of the instructions:
xchg AX,[EBX+ESI*2]
where he is using a 16 bit register, AX, instead of EAX?
As I recall, and it is a bit fuzzy now, it was not until version 5 of MASM that one could use 32 bit registers, (since they did not exist in the cpu itself, until the mid 1980's) though the output of the assembler was still dependent on pure 16 bit DOS.
Regards, notoka
:U
Hi hutch--, this post over on win32asm board is discussing conditional swaps; http://board.win32asmcommunity.net/index.php?topic=21771.0;topicseen . Just the things for optimising the tri-median bit :U
I have just run into a weird effect maunally optimising the quick sort that Eóin posted that has caught me a number of times. I start with a DWORD array of 3 million with the same count for random range, optimise the algo them when I change the size in either direction it sometime produces an error in its results. It will produce the correct results at 8 million but not at 10 million. I have tested it on my spare box and its an identical problem.
I could understand if it was just a mistake in the encoding but I have never seen this effect before. I have been using this algo as practice to work out a number of things in code optimisation, among them how to try and avoid the costs of memory page swapping when you are repeatedly working of two addresses that don't fit into the data cache at the same time.
The only thing I can think of off hand is that you have to be very careful of the case when you are dealing with a partition of just two elements.
Here is try number 3. This one tests correct up to arrays of 50 million so its probably OK. This original tri-median qsort is the best I have seen yet in non hybrid quick sorts. From the unoptimised C output, optimising the loop code first saw about 95% of the speed gain. Cleaning up the 3 median calculations was a percent or so. Inlining the bypass code around each recursive call was worth a couple of percent and unrolling the two partition calculation loops was another couple of percent. It is barely faster than the optimised C code but it is faster on every array size tested from 1 million to 50 million.
This is a very good piece of code optimisation practice as you learn a lot about what matters and what does not. When the array address as replaced with a register there was a measurable speed gain but when 33 odd moves of the array address into the register were removed there was no speed gain at all. This algo is just complcated enough to make you work hard at reusing registers carefully so that you don't need to use any locals. On this PIV there was no gain aligning any of the labels and in the loop code it actually slowed the algo down.
The actualy algorithm is bound by the memory access speed when dealing with two addresses that are far apart enough to not be in cache at the same time and with current hardware there apears little that can be done about it.
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
align 16
qSortc proc arr:DWORD,lft:DWORD,rgt:DWORD
push ebx
push esi
push edi
mov ebx, arr
mov eax, rgt
add eax, lft
cdq
sub eax, edx
sar eax, 1
mov esi, eax
mov edx, lft
mov eax, [ebx+esi*4]
mov ecx, [ebx+edx*4]
cmp ecx, eax
jle lbl0
mov [ebx+esi*4], ecx
mov [ebx+edx*4], eax
lbl0:
mov edi, lft
mov ecx, rgt
mov edx, [ebx+edi*4]
mov eax, [ebx+ecx*4]
cmp eax, edx
jg lbl1
mov [ebx+ecx*4], edx
mov [ebx+edi*4], eax
lbl1:
mov edx, rgt
mov eax, [ebx+esi*4]
mov ecx, [ebx+edx*4]
cmp ecx, eax
jg lbl2
mov [ebx+edx*4], eax
mov [ebx+esi*4], ecx
lbl2:
mov ecx, rgt
sub ecx, lft
cmp ecx, 1
je lbl9
lbl3:
mov edi, rgt
sub edi, 1
mov ecx, [ebx+esi*4]
mov edx, [ebx+edi*4]
mov [ebx+edi*4], ecx
mov [ebx+esi*4], edx
mov esi, lft
mov eax, [ebx+edi*4] ; pivot
jmp lbl4
; ----------------------------
pre:
mov [ebx+edi*4], edx
mov [ebx+esi*4], ecx
lbl4:
REPEAT 7
add esi, 1
mov edx, [ebx+esi*4]
cmp edx, eax
jge lbl5
ENDM
add esi, 1
mov edx, [ebx+esi*4]
cmp edx, eax
jl lbl4
lbl5:
REPEAT 7
sub edi, 1
mov ecx, [ebx+edi*4]
cmp ecx, eax
jle lbl6
ENDM
sub edi, 1
mov ecx, [ebx+edi*4]
cmp ecx, eax
jg lbl5
lbl6:
cmp edi, esi
jge pre
; ----------------------------
mov ecx, rgt
mov edx, [ebx+esi*4]
mov eax, [ebx+ecx*4-4]
mov [ebx+ecx*4-4], edx
mov [ebx+esi*4], eax
mov eax, edi
sub eax, lft
test eax, eax
jle @F
invoke qSortc,ebx,lft,edi
@@:
add esi, 1
mov eax, rgt
sub eax, esi
test eax, eax
jle lbl9
invoke qSortc,ebx,esi,rgt
lbl9:
pop edi
pop esi
pop ebx
quit:
ret
qSortc endp
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
Hi, hutch--
I'd like to give a try to this qSortc PROC. Could you explain the meaning of the three arguments, please?
I guess the first one is the address of the array we want to sort, but I don't know which data I need to pass to procedure as second and third arguments :( . Any help would be appreciated.
Thanks in advance.
Best regards.
Hi tc,
Welcome on board, I will have a look for the example on my local machine and come back to you. Just be aware that a quick sort can effectively lock up if it is fed the wrong type of data and it is inherant in its design where on random data it is the fastest of the exchange sorts.
Thank you very much, hutch-- :)
I've been testing the QuickSort procedure which comes with the masm32 package (nrQsort) and it sometimes fails :( . I'm afraid that I need another kind of algorithm, since the data I need to sort is not completely random. Could you recommend one which is "reliable" working with any kind of data and, at the same time, is the fastest possible? However, I'll check this qSortc procedure, if you would be so kind as to post the meaning of the arguments.
Thanks again.
Best regards.
P.D.: Sorry about my English :red