I've been playing with the nrQsortA routine in Masmlib. Does anyone know the worst case possible, or any failure cases for this routine?
I've test presorted, reverse sorted, constant, alternating high/low, and random input data. The slowest is the random, so no suprises. But I'm really interested in any "quadratic" behavior.
Jim,
It a recode of a very old design from the days when stack memory was very limited but it should be its fastest on random data and get slower with more ordered data. No quick sort is safe on highly ordered data or a simple mistake where all the data is identical. Normally when you have to process fully unknown data you use a hybrid that will handle extreme behaviour so you don't get quadratic behaviour.
That's what I thought too, but testing shows otherwise.
I generated 1,000,000 random number using nrandom. This ran the slowest.
I tried a million sequential numbers, which ran in less than half the time of the random number pass.
I tried a it reversed, 1000000 to 1 which ran about the same as the sequential number run.
I tried the sequence -500000,500000,-499999,499999,-499998,499998, etc. which ran in about 11/16ths the time of the random pass.
I tried 1000000 numbers all identical, equal to 9999. It ran only slightly slower than the sequential number run.
This seems to be a very robust version of quicksort, which is why I asked for a worst case.
I'll try to put together a test program that fit for human consumption shortly. In the meantime, here's a gif of my results-
[attachment deleted by admin]
Well, I've found one really bad sequence. Tapering down from outside in, eg. 10 8 6 4 2 1 3 5 7 9
Previous to this, the worst was random numbers. One million random numbers take 132 milliseconds to sort.
One million numbers in this tapering sequence take 33770 milliseconds, or over 250 times longer. Ugh.
thats why I dont use a quick sort
Its BEST runtime is O(n log n) which is not an acceptable runtime in some cases
Its WORSE runtime is O(N^2) which is NEVER an acceptable runtime
For arbitrary data I use merge sort, which has both a BEST and WORST case of O(n log n)
For mostly sorted data I use an insertion sort
For small lists I always use bubble or a network sort
Remember also that if the sort routine will be used in any sort of public setting, you should EXPECT that it will be handed data that produces worst-case behavior. I recall a public chess server that had been open source which used quicksort. A disgruntled user was causing the server to bog down, effectively a Denial Of Service attack, by handing it worst-case data.
I basically agree with ths view, an unprotected quick sort is a free kick to deliberately structured data. It is clearly the fastest on truly random data, (not necessarily the nrqsort in the masm32 lib) but it fails badly on ordered data or a large count of identical data.
To use it on open data i prefer to set a recursion depth indicator and if it exceeds that depth process the rest with a sort that is insensitive to data order. My own choice is a double ended comb sort (cocktail shaker but on a comb sort) then use an insertion sort when the gap reduces to one (1). A comb sort is highly predictable with the number of passes it makes and its time varies only on the number of swaps to perform.
A tri-median quick sort usually has some legs on a normal one but fails on ordered data. I found an Incerpi - Sedgewick shell sort that worked OK but it was easily beaten by the quick sort on random data and would still fail on some types of ordered data.
Comparing an insertion sort to a bubble sort, the insertion sort was always faster on near ordered data, usually about twice as fast.
Quick is often prefered over Merge because Merge typically requires O(N) elements of auxilary space whereas Quick only requires O(Log N) elements of auxilary space.
However, Merge has the advantage that it is a stable sort (if two equal sort keys are present, merge sort will preserve their ordering..) Quick cannot make such a guarantee unless it is implimented in a way that offers even more avenues of algorithmic attack.
I just can't get around the fact that when Quick is bad, it can be really really bad. Merge will always take the same amount of time over all permutations of inputs. It is unattackable from an algorithmic perspective; bullet proof.
Special-case alternatives include Radix sort, which is truely one of the fastest sorts you will find. Runtime can approach O(n). Unfortunately it requires a lot of extra storage space, so in practice it is even more prone to clever attacks on the algorithm.
I agree. I'm going back to combsort. But it's been fun playing. If anyone is interested, here's my mods to the masmlib quicksort (qsorta.asm). On my Amd computer, it runs 20 to 30 percent faster. On my celeron, only 5-10 percent faster.
nrQsortAx4 proc Arr:DWORD,count:DWORD
LOCAL First,Last,cntr,bLen,hMem,temp
;note: this first section can be smaller, but need to kill
; some bytes to get good 16 byte alignment at MainLoop anyway
push ebx
push esi
push edi
mov esi,Arr ; source address kept in esi througout
mov cntr,0 ; pointer into heap
;inv GetProcessHeap ; for testing, will always be known
;mov hHeap,eax
mov eax,count ; allocate temporary buffer
add eax,40
mov bLen,eax
invoke SysAllocStringByteLen,0,eax
;inv HeapAlloc,hHeap,0,eax ; better if you already have the heap handle
mov hMem,eax
mov edi,eax ; hMem ; keep buffer address in edi thoroughout
mov First,0 ; set First and Last reference values
mov eax,count
sub eax,1
mov Last,eax ; Last, when zero based array
align 16
MainLoop:
mov edx,Last
MainLoop2: ; second entry into Main loop
mov ecx,First
MainLoop3:
mov eax,ecx ; calculate midpoint
add eax,edx
shr eax,1 ; \2 cannot be combined with *4 in next instruction
mov ebx,[esi+eax*4] ; value at midpoint kept in ebx throughout
mov temp,ebx
dec ecx ; setup for tight loop
.repeat
inc ecx ; look for value greater than midpoint on left side of group
InnerLoop2:
.until sdword ptr [esi+ecx*4]>=ebx ; else will match at midpoint and fall out
inc edx ; found one, or midpoint itself
.repeat
dec edx ; look for value less than midpoint on right side
.until sdword ptr [esi+edx*4]<=ebx ; will match at midpoint and fall out
.if ecx<=edx
mov eax,[esi+ecx*4]
mov ebx,[esi+edx*4] ; swap elements
mov [esi+ecx*4], ebx
mov [esi+edx*4], eax
mov ebx,temp ; restore midpoint
inc ecx
dec edx
cmp ecx,edx
jle InnerLoop2 ; go back and look for more values to move
.endif
.if sdword ptr ecx<Last ; changed from <= to <
mov eax, cntr
mov [edi+eax*4], ecx
mov ebx, Last
mov [edi+eax*4+4], ebx
add cntr, 2
.endif
mov Last, edx ; new Last point
cmp edx, First ; compare Last & First
jg MainLoop2
mov eax,cntr
.if eax!=0 ; are we done?
sub eax,2 ; no, get previous First and Last
mov cntr,eax
mov ecx, [edi+eax*4]
mov First, ecx
mov edx, [edi+eax*4+4]
mov Last, edx
jmp MainLoop3
.endif
invoke SysFreeString,hMem
;inv HeapFree,hHeap,0,hMem
pop edi
pop esi
pop ebx
ret
nrQsortAx4 endp
Attached is the interactive test program I use...
[attachment deleted by admin]
Interesting. Here's my results with an AMD XP 2500+ (1800MHz)
[attachment deleted by admin]
Rockoon-
I hate to admit it, but you're right. I hacked together a non-recursive mergesort, (I just hate recursion), or at least what I would call a merge sort even though it's not much like any descriptions I read. There was no particular attempt at optimizing.
The first time out of the box, it ran better than twice as fast as combsort. Only 15% slower than quicksort on random data. I don't really know what O(Log N) means, but if N is array size, it uses at most an N/2 scratch array.
I can hardly wait to see what a little optimizing can do!
Quote from: Jimg on November 25, 2007, 07:50:06 PM
I don't really know what O(Log N)Â means, but if N is array size, it uses at most an N/2 scratch array.
I can hardly wait to see what a little optimizing can do!
Big-O notation is used in computer science to classify algorithms based on how their runtime or memory footprint grows with input size (approximately)
For instance a bubble sort is O(N^2) meaning that its runtime growth is exponential. It isnt an exact count of the iterations but instead how iteration counts, given multiple N, will proportionally relate to each other. A naive bubble and an optimized bubble are both classified equally as O(N^2)
Most 'divide and conquer' algorithms have at their root a runtime of O(Log N) .. for instance a single binary search or a probe into binary space partition is O(Log N)
(Log is almost always a base-2 logarithm in Big-O notation .. which is approximately the number of bits needed to encode N)
In the case of Merge sort, there are N such 'divide and conquer' stages so the runtime is said to be O(N log N)
Probing a hash table is O(1) meaning it runs in ~constant time
For any specific algorithm, you have actual constant costs such as the cost of each iteration so an algorithm with a 'worse' Big-O runtime may perform better with sufficiently small N. An obvious case of this is bubble sort which has a very small per-iteration cost so when N is very small, bubble trumps most other sorts.
There exist a 'sweet spot' where given a specific N, bubble performs the same as merge in practice. For any N larger, bubble is inferior and very rapidly becomes worthless as N grows even larger.
Common Big-O runtimes you will encounter, from best to worst, are:
O(1)
  ex: Hash lookups
O(log N)Â Â
  ex: Binary searches
O(N)
  ex: Linear searches
O(N log N)
  ex: Merge sort
O(N^2)
  ex: N-body problems
O(N^3)
  ..can't remember (traveling salesman?)
O(B^N)
  MiniMax/AlphaBeta gametree searches and such
Jim,
This much, what use do you have for an integer sort, there are tons of examples of different types but in most instances an integer sort is useless. The most common application is string sorts with slightly more specialised tasks being array sorts where a number of fields or elements are grouped together.
Thanks Rockoon.
Hutch-
I agree, it's just for proof of concept. A string sort will follow. Sorting indexes to strings is very close to sorting integers, just have to free up an extra register or two for the comparison...
As far as arrays, the beauty of the mergesort, as Rockoon said above, is it's "stable", so multi element sorts can be done with consecutive sorts (though not very efficient that way).
... Ok, so I'm a little excited. Mergesort seems to be very efficient, and, unlike quicksort, easy to understand how and why it works.
Guys we should have a contest!
How does this sound...
Someone could produce a 50 MB file of strings between 5 and 40 characters long (length is random). Then have a general routine to load this file into memory that everyone can use. Use as much memory as you want for data structures or a scratch buffer.
The platform would be Windows. The routine would be locked to a single processor and run as high priority. No multi-threaded code. No making any calls to outside routines, the code must be all your own x86 ASM code. The code would be 1 file that assembles in MASM32.
Then see who can code the fastest version.
We'd have to have a reference machine... on which the code would be run and judged.
Anybody interested in making an x86 Assembly Language sorting contest happen? It could be a clash of the titans for supremacy and bragging rights, and perhaps a small prize.
I have got a genuinely fast Sedgewick hybrid Quick sort for string data but truly its no joy coded in masm.
The gauntlets been thrown. It would be cool to see a bad-ass brawl in the mud for bragging rights.
:bg
Sad to say its all been done before, thats what we have the Laboratory for. Post code in there to get it beaten to death.
Quote from: cork on July 13, 2010, 09:19:43 PM
Guys we should have a contest!
How does this sound...
Someone could produce a 50 MB file of strings between 5 and 40 characters long (length is random).
Great idea. It should also handle Unicode and have an "ignore case" option, though. Don't forget conditional breaks, leading whitespace and the like :U
The problem is still not fully specified. How much extra memory (including both stack and heap) would be afforded?
Clearly a radix sort will dominate more and more as extra memory increases.
If extra memory is set to something small, like on the scale of O(log n), then things like MergeSort become much more complicated (an in-place merge sort variant does exist, tho) .. the common implementation of MergeSort is at the O(n) scale in space requirements.
O(log n) space is required if we wish to support any binary decimation recursive functions (MergeSort and QuickSort)
O(n) space would be the bare minimum for Radix and other bucket sorts (and not easy to use in practice)
If all we are doing is sorting strings with no extra values assigned to the elements, then there is also the trie strategy which not only will sort the strings, but will also reduce the space requirements for storing them and will have O(1) search behavior to boot...
Quote from: Rockoon on July 14, 2010, 09:42:26 PM
The problem is still not fully specified. How much extra memory (including both stack and heap) would be afforded?
If we get a few people interested, then that group can decide the rules of the contest.
My preference would be to use all the extra memory that you want and have the focus entirely on speed.
I'd prefer it to be simple. Initially you're given a 100 MB file of random length strings, zero terminated, UTF-8 or UTF-16 unicode, length from 5 to 40 characters long. Use as much memory as you want. We'll have to decide on a reference processor on which the final judging will be done.
Cork
Hey Cork,
I think the problem here is as Hutch said 'it's been done before'.... For specific and needed sorts it's worth putting them in the laboratory but a quick search on google will come up with a large array of algorithms for multiple sorting purposes.... The hardest part with creating a competition like this is understanding the subject well enough to enable you to set fair and worthwhile parameters
Creating random competitions without a baseline takes the fun out of it a bit.... http://sortbenchmark.org/
If you're up for a challenge there is a good ongoing one on compression here: http://prize.hutter1.net/
or there is always the more commercial challenges like: http://en.wikipedia.org/wiki/Eternity_II_Puzzle :lol.... I'm sure this would benefit from a few good algorithms