I wrote this sorting routine for fun and wanted to share it. It is extremely inefficient memory wise but I think it runs in O(N) time (?) and is reasonably easy to understand. I'm sure I've reinvented the wheel here but I haven't seen a sort of this type mentioned anywhere and I'd be curious to know the proper name. I've called it Offsort (Offset Sort) which kind of explains how it works.
TITLE Offsort Demo
INCLUDE Irvine32.inc
.data
array WORD 234,2,1,5,67,67,10,45,3,12,16,3,3,3
addressArray BYTE 65535 DUP(0)
.code
main PROC
call LoadAddressArray
call ReloadArray
exit
main ENDP
;-------------------START LoadAddressArray PROC-----------------------------
LoadAddressArray PROC
mov ecx,LENGTHOF array
mov edx,0
LP1:
movzx eax,array[edx]
mov bl,addressArray[eax]
inc bl
mov addressArray[eax],bl
add edx,2
loop LP1
ret
LoadAddressArray ENDP
;---------------------END LoadAddressArray PROC---------------
;----------------------START ReloadArray PROC --------------------------------
ReloadArray PROC
mov ecx,LENGTHOF addressArray
mov edx,0
mov ebx,0
RA1:
cmp addressArray[edx],0
jnz ReloadElement
add dx,1
loop RA1
ret
ReloadElement:
push ecx
movzx ecx,addressArray[edx]
RE1:
mov array[bx],dx
add bx,2
loop RE1
pop ecx
add dx,1
loop RA1
ret
ReloadArray ENDP
;--------------------- END ReloadArray PROC-----------
END main
One thing that immediately stands out is the use of the LOOP instruction, it's very slow, use DEC ECX, JNE.
How would I go about using JNE? I can see JCXZ:
LoadAddressArray PROC
mov ecx,LENGTHOF array
mov edx,0
LP1:
movzx eax,array[edx]
mov bl,addressArray[eax]
inc bl
mov addressArray[eax],bl
add edx,2
DEC ecx
JCXZ LAADone
JMP LP1
LAADone:
ret
LoadAddressArray ENDP
So that would make a difference in performance? Also, I think the performance would be O(M*N) where M is the number of 'repeats' at a given element in the addressArray but I'm still learning big-O analysis.
Hi,
Jump if not equal, JNE, is the same instruction as jump if
not zero, JNZ. So DEC ECX, JNZ performs the same actions
as LOOP except for not preserving the flags. You would
have to test to see if that is faster than your J(E)CXZ, JMP.
I tend to use LOOP if I need to preserve the flags, or DEC,
JA (jump if above).
LOOP Label1
; Would become
DEC ECX
JNZ Label1
Regards,
Steve N.
OK I changed it to:
LoadAddressArray PROC
mov ecx,LENGTHOF array
mov edx,0
LP1:
movzx eax,array[edx]
mov bl,addressArray[eax]
inc bl
mov addressArray[eax],bl
add edx,2
DEC ecx
JNZ LP1
ret
LoadAddressArray ENDP
..since I'm not worried about the flags.
now, replace the other 2 LOOP's :bg
Right, so that gives us:
TITLE Offsort Demo
INCLUDE Irvine32.inc
.data
array WORD 2,1,5,67,67,10,45,3,12,16,3,3,3
addressArray BYTE 65535 DUP(0) ;initialize a byte array of size WORD range with zeros
.code
main PROC
call LoadAddressArray
call ReloadArray
exit
main ENDP
;-------------------START LoadAddressArray PROC-----------------------------
LoadAddressArray PROC
mov ecx,LENGTHOF array
mov edx,0
LP1:
movzx eax,array[edx]
mov bl,addressArray[eax]
inc bl
mov addressArray[eax],bl
add edx,2
DEC ecx
JNZ LP1
ret
LoadAddressArray ENDP
;---------------------END LoadAddressArray PROC---------------
;----------------------START ReloadArray PROC --------------------------------
ReloadArray PROC
mov ecx,LENGTHOF addressArray
mov edx,0
mov ebx,0
RA1:
cmp addressArray[edx],0
jnz ReloadElement
INC dx
DEC ecx
JNZ RA1
ret
ReloadElement:
push ecx
movzx ecx,addressArray[edx]
RE1:
mov array[bx],dx
add bx,2
DEC ecx
JNZ RE1
pop ecx
add dx,1
JMP RA1
ReloadArray ENDP
;--------------------- END ReloadArray PROC-----------
END main
I've put the changes in caps. I was considering adding a check in the ReloadArray procedure to return if we get to the end of the 'array' array. Right now it will keep going all the way to the end of addressArray even though it is empty. On the other hand, that would introduce one more comparison and I was kind of trying to keep the comparisons to a minimum.
BTW I did some research and this looks a version of a Counting Sort.
looks fishy that you are mixing 16-bit addressing with 32-bit code
i can't see how it would work correctly :bg
also - you must specify a size when comparing immediates or when the size is otherwise unknown
movzx eax,byte ptr array[edx] ; or word ptr
.
.
.
cmp byte ptr addressArray[edx],0 ; or word ptr or dword ptr
.
.
.
movzx ecx,byte ptr addressArray[edx] ; or word ptr
makes it hard to follow when i don't know the sizes - lol
Works very well.
Here is a pic of the memory window that shows a little better how it works:
(http://www.shellnotes.com/quinn/Lists/Photos/Programming/_w/memory_jpg.jpg)
The yellow corresponds to the unsorted array. Everything below is the addressArray. The blue highlit 04 corresponds to the four '3's' in the unsorted array and the 01's correspond to the other numbers.
In the LoadAddressArray proc:
Each element of the array is used as an offset to increment an element of the addressArray.
In the ReloadArray proc:
The procedure cycles through the addressArray and whenever it finds something other than 0, it loads that number into ecx and then loops that many times, converting the address in addressArray into a value to put into array.
So, the big downside is that addressArray must be of size WORD range. If the unsorted array was of type DWORD, then addressArray would would be of size 4,294,967,295.
Also, even if there only two elements in the unsorted array, it would still need to have addressArray the same size as the range of whatever data type was in the unsorted array.