Hope you enjoy this stuff, it keeps me busy when I'm bored :o)
Name ----- 10/10 --- 100/100 --- 1000/1000 --- 13/67 --- 777/101
================================================================
kuntz 32 180 1542 38 206
szCmp 65 397 3723 84 528
lstrcmp 540 2303 22306 978 8163
Unaligned by 1 (100/100)
kuntz 206
szCmp 398
lstrcmp 2299
Unaligned by 2 (100/100)
kuntz 206
szCmp 398
lstrcmp 2299
Unaligned by 3 (100/100)
kuntz 206
szCmp 398
lstrcmp 2299
I wonder what the MS programmer who made lstrcmp was thinking.
.686p
.model flat, stdcall
option casemap :none
include \masm32\include\windows.inc
include \masm32\include\kernel32.inc
includelib \masm32\lib\kernel32.lib
include \masm32\include\user32.inc
includelib \masm32\lib\user32.lib
include \masm32\include\masm32.inc
includelib \masm32\lib\masm32.lib
include \masm32\kinc\strcmp.inc
include \masm32\kinc\timer.inc
.data
align 4
str1 db 777 dup("a"), 0
align 4
str2 db 101 dup("a"), 0
msgstr db "Cycles: %d", 0
.data?
buff db 32 dup(?)
.code
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
counter_begin 10000000, REALTIME_PRIORITY_CLASS
invoke lstrcmp, addr str1, addr str2
counter_end
invoke wsprintf, addr buff, addr msgstr, eax
invoke MessageBox, 0, addr buff, 0, 0
invoke ExitProcess, 0
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start
strcmp proto :dword, :dword
.code
;##########################################################################
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
strcmp proc str1:dword, str2:dword
push ebx
push edi
push esi
mov esi, dword ptr [esp+16]
mov edi, dword ptr [esp+20]
tp: mov ecx, dword ptr [esi]
mov edx, dword ptr [edi]
mov eax, ecx
mov ebx, edx
add esi, 4
add edi, 4
sub ecx, 16843009
sub edx, 16843009
and ecx, 2155905152
jnz p1
and edx, 2155905152
jnz p2
sub eax, ebx
jz tp
pop esi
pop edi
pop ebx
ret 8
p1: not eax
and ecx, eax
jz tp
test cl, cl
jnz c1
test ch, ch
jnz c2
test ecx, 16711680
jnz c3
jmp c4
p2: not ebx
and edx, ebx
jz tp
test dl, dl
jnz c1
test dh, dh
jnz c2
test edx, 16711680
jnz c3
jmp c4
c1: movzx eax, byte ptr [esi-4]
movzx ebx, byte ptr [edi-4]
sub al, bl
pop esi
pop edi
pop ebx
ret 8
c2: movzx eax, word ptr [esi-4]
movzx ebx, word ptr [edi-4]
sub ax, bx
pop esi
pop edi
pop ebx
ret 8
c3: mov eax, dword ptr [esi-4]
mov ebx, dword ptr [edi-4]
and eax, 16777215
and ebx, 16777215
sub eax, ebx
pop esi
pop edi
pop ebx
ret 8
c4: mov eax, dword ptr [esi-4]
mov ebx, dword ptr [edi-4]
sub eax, ebx
pop esi
pop edi
pop ebx
ret 8
strcmp endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
;##########################################################################
-I learned that push/pop are faster then moving data onto/off of the stack
-movzx eax, word ptr [esi-4]
movzx ebx, word ptr [edi-4]
sub ax, bx
and
mov eax, dword ptr [esi+4]
mov ebx, dword ptr [edi+4]
and eax, 0000ffff
and ebx, 0000ffff
sub eax, ebx
and
mov eax, dword ptr [esi+4]
mov ebx, dword ptr [edi+4]
sub eax, ebx
and eax, 0000ffff
all use the same clock cycles to execute, however the one I opted to go with only takes up 11 bytes where as the other ones take up 14+, so I save 3 bytes in the cache!
Nice work. :)
Quote from: kunt0r on August 18, 2005, 02:30:10 AM
I wonder what the MS programmer who made lstrcmp was thinking.
According to Win32.hlp:
* The function
lstrcmp returns the difference of the values of the first unequal characters it encounters. For example, lstrcmp determines that "abcz" is greater than "abcdefg" and returns the difference of z and d.
* With a double-byte character set (DBCS) version of Windows, this function can compare two DBCS strings.
* The Win32 lstrcmp function uses a word sort, rather than a string sort. A word sort treats hyphens and apostrophes differently than it treats other symbols that are not alphanumeric, in order to ensure that words such as "coop" and "co-op" stay together within a sorted list."
Ah mine also returns the difference between the two strings it first encounters as wrong, and I'm not to sure what the word sort stuff means.
Me either. :) Hutch is the "sort-king" around here, maybe he knows what they are talking about?
I am in debt to Greg Falen for the original idea a few years ago where you compare two strings evaluating each byte and exiting with different values depending on if one is greater, lesser or the same. You use this basis when sorting string data.
kunt0r,
Nice work :U
A little recommendation, instead of those 16843009 , you should use the hex notation which is more readable.
kunt0r,
here is something faster: :lol
Please terminate any high-priority tasks and press ENTER to begin.
lstrcmp Tests:
lstrcmp - original: 19314 clocks; Return value: 1
lstrcmp - kunt0r : 957 clocks; Return value: 24832
lstrcmp - Lingo : 501 clocks; Return value: 1
Press ENTER to exit...
Regards,
Lingo
[attachment deleted by admin]
Wow that one is fast, good work.
Lingo is the king... now let's all make s'mores. :)
Thank you guys :lol
Here is a bit faster version on my P4 3.6 GHz
with WinXP Pro SP2
lstrcmp Tests:
lstrcmp - Lingo : 875 clocks; Return value: -1
lstrcmp - Lingo Old : 942 clocks; Return value: -1
Press ENTER to exit...
Regards,
Lingo
[attachment deleted by admin]
Lingo,
Results are even more impressive on my current PIV.
Please terminate any high-priority tasks and press ENTER to begin.
lstrcmp Tests:
lstrcmp - Lingo : 665 clocks; Return value: -1
lstrcmp - Lingo Old : 900 clocks; Return value: -1
Press ENTER to exit...
LATER : Here are the results from 2.4 AMD Sempron.
Please terminate any high-priority tasks and press ENTER to begin.
lstrcmp Tests:
lstrcmp - Lingo : 704 clocks; Return value: -1
lstrcmp - Lingo Old : 679 clocks; Return value: -1
Press ENTER to exit...
Even though the AMD reverses trhe results, the difference between the two algos is a lot smaller on the AMD than on a PIV so the new algo has better averages.
PIII 1 GHz
lstrcmp Tests:
lstrcmp - Lingo : 704 clocks; Return value: -1
lstrcmp - Lingo Old : 695 clocks; Return value: -1
Press ENTER to exit...
AMD XP 2500+
lstrcmp Tests:
lstrcmp - Lingo : 716 clocks; Return value: -1
lstrcmp - Lingo Old : 690 clocks; Return value: -1
Press ENTER to exit...
No such thing as a free lunch. :)
Thank you Hutch,Greg and Mark Jones :U
Here is new version with additional functionality according r22's advice:
"Just a suggestion, make it a straight boolean compare true or false, because the 1 and -1 returns are somewhate useless (knowing that a string is bigger or smaller without knowing by what value has little to no uses)." by r22,
From SDK:
"If the string pointed to by lpString1 is less than the string pointed to by lpString2, the return value is negative. If the string pointed to by lpString1 is greater than the string pointed to by lpString2, the return value is positive. If the strings are equal, the return value is zero.
The function returns the difference of the values of the first unequal characters it encounters. For example, lstrcmp determines that "abcz" is greater than "abcdefg" and returns the difference of z and d. "
Example:
str1 db "abcdefg", 0
str2 db "abcz", 0
So, z=122 d=100 and
z-d=122-100=22
and minus because str1 is less then str2
lstrcmp Tests:
lstrcmp - Lingo : 20 clocks; Return value: -22
lstrcmp - Lingo Old : 29 clocks; Return value: -1
Press ENTER to exit...
Regards,
Lingo
[attachment deleted by admin]
Interesting. :U
lstrcmp - Lingo : 20 clocks; Return value: -22
lstrcmp - Lingo Old : 26 clocks; Return value: -1
Thank you Mark Jones,
but I updated it again :lol
Regards,
Lingo