News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

strcmp optimize

Started by Brett Kuntz, August 18, 2005, 02:30:10 AM

Previous topic - Next topic

Brett Kuntz

Hope you enjoy this stuff, it keeps me busy when I'm bored :o)


Name ----- 10/10 --- 100/100 --- 1000/1000 --- 13/67 --- 777/101
================================================================
kuntz      32        180         1542          38        206
szCmp      65        397         3723          84        528
lstrcmp    540       2303        22306         978       8163



Unaligned by 1 (100/100)
kuntz 206
szCmp 398
lstrcmp 2299

Unaligned by 2 (100/100)
kuntz 206
szCmp 398
lstrcmp 2299

Unaligned by 3 (100/100)
kuntz 206
szCmp 398
lstrcmp 2299


I wonder what the MS programmer who made lstrcmp was thinking.


    .686p
    .model flat, stdcall
    option casemap :none

    include \masm32\include\windows.inc
    include \masm32\include\kernel32.inc
    includelib \masm32\lib\kernel32.lib
    include \masm32\include\user32.inc
    includelib \masm32\lib\user32.lib
    include \masm32\include\masm32.inc
    includelib \masm32\lib\masm32.lib

    include \masm32\kinc\strcmp.inc
    include \masm32\kinc\timer.inc

.data
    align 4
    str1 db 777 dup("a"), 0
    align 4
    str2 db 101 dup("a"), 0
    msgstr db "Cycles: %d", 0
.data?
    buff db 32 dup(?)
.code
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

    counter_begin 10000000, REALTIME_PRIORITY_CLASS
    invoke lstrcmp, addr str1, addr str2
    counter_end

    invoke wsprintf, addr buff, addr msgstr, eax
    invoke MessageBox, 0, addr buff, 0, 0
    invoke ExitProcess, 0

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start



    strcmp proto :dword, :dword
.code
;##########################################################################
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

strcmp proc str1:dword, str2:dword

    push ebx
    push edi
    push esi
    mov esi, dword ptr [esp+16]
    mov edi, dword ptr [esp+20]
tp: mov ecx, dword ptr [esi]
    mov edx, dword ptr [edi]
    mov eax, ecx
    mov ebx, edx
    add esi, 4
    add edi, 4
    sub ecx, 16843009
    sub edx, 16843009
    and ecx, 2155905152
    jnz p1
    and edx, 2155905152
    jnz p2
    sub eax, ebx
    jz tp
    pop esi
    pop edi
    pop ebx
    ret 8
p1: not eax
    and ecx, eax
    jz tp
    test cl, cl
    jnz c1
    test ch, ch
    jnz c2
    test ecx, 16711680
    jnz c3
    jmp c4
p2: not ebx
    and edx, ebx
    jz tp
    test dl, dl
    jnz c1
    test dh, dh
    jnz c2
    test edx, 16711680
    jnz c3
    jmp c4
c1: movzx eax, byte ptr [esi-4]
    movzx ebx, byte ptr [edi-4]
    sub al, bl
    pop esi
    pop edi
    pop ebx
    ret 8
c2: movzx eax, word ptr [esi-4]
    movzx ebx, word ptr [edi-4]
    sub ax, bx
    pop esi
    pop edi
    pop ebx
    ret 8
c3: mov eax, dword ptr [esi-4]
    mov ebx, dword ptr [edi-4]
    and eax, 16777215
    and ebx, 16777215
    sub eax, ebx
    pop esi
    pop edi
    pop ebx
    ret 8
c4: mov eax, dword ptr [esi-4]
    mov ebx, dword ptr [edi-4]
    sub eax, ebx
    pop esi
    pop edi
    pop ebx
    ret 8

strcmp endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
;##########################################################################


-I learned that push/pop are faster then moving data onto/off of the stack

-movzx eax, word ptr [esi-4]
movzx ebx, word ptr [edi-4]
sub ax, bx

and

mov eax, dword ptr [esi+4]
mov ebx, dword ptr [edi+4]
and eax, 0000ffff
and ebx, 0000ffff
sub eax, ebx

and

mov eax, dword ptr [esi+4]
mov ebx, dword ptr [edi+4]
sub eax, ebx
and eax, 0000ffff

all use the same clock cycles to execute, however the one I opted to go with only takes up 11 bytes where as the other ones take up 14+, so I save 3 bytes in the cache!

Mark Jones

Nice work. :)

Quote from: kunt0r on August 18, 2005, 02:30:10 AM
I wonder what the MS programmer who made lstrcmp was thinking.

According to Win32.hlp:
* The function lstrcmp returns the difference of the values of the first unequal characters it encounters. For example, lstrcmp determines that "abcz" is greater than "abcdefg" and returns the difference of z and d.
* With a double-byte character set (DBCS) version of Windows, this function can compare two DBCS strings.
* The Win32 lstrcmp function uses a word sort, rather than a string sort. A word sort treats hyphens and apostrophes differently than it treats other symbols that are not alphanumeric, in order to ensure that words such as "coop" and "co-op" stay together within a sorted list."
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

Brett Kuntz

Ah mine also returns the difference between the two strings it first encounters as wrong, and I'm not to sure what the word sort stuff means.

Mark Jones

Me either. :) Hutch is the "sort-king" around here, maybe he knows what they are talking about?
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

hutch--

I am in debt to Greg Falen for the original idea a few years ago where you compare two strings evaluating each byte and exiting with different values depending on if one is greater, lesser or the same. You use this basis when sorting string data.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Vortex

kunt0r,

Nice work  :U

A little recommendation, instead of those 16843009 , you should use the hex notation which is more readable.

lingo

kunt0r,

here is something faster:  :lol


Please terminate any high-priority tasks and press ENTER to begin.

lstrcmp  Tests:

lstrcmp - original: 19314 clocks; Return value: 1
lstrcmp - kunt0r  : 957 clocks; Return value: 24832
lstrcmp - Lingo   : 501 clocks; Return value: 1

Press ENTER to exit...


Regards,
Lingo

[attachment deleted by admin]

Brett Kuntz

Wow that one is fast, good work.

Mark Jones

Lingo is the king... now let's all make s'mores. :)
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

lingo

Thank you guys  :lol

Here is a bit faster version on my P4 3.6 GHz
with WinXP Pro SP2


lstrcmp  Tests:

lstrcmp - Lingo     : 875 clocks; Return value: -1
lstrcmp - Lingo Old : 942 clocks; Return value: -1

Press ENTER to exit...


Regards,
Lingo


[attachment deleted by admin]

hutch--

Lingo,

Results are even more impressive on my current PIV.


Please terminate any high-priority tasks and press ENTER to begin.


lstrcmp  Tests:

lstrcmp - Lingo     : 665 clocks; Return value: -1
lstrcmp - Lingo Old : 900 clocks; Return value: -1

Press ENTER to exit...


LATER : Here are the results from 2.4 AMD Sempron.


Please terminate any high-priority tasks and press ENTER to begin.


lstrcmp  Tests:

lstrcmp - Lingo     : 704 clocks; Return value: -1
lstrcmp - Lingo Old : 679 clocks; Return value: -1

Press ENTER to exit...


Even though the AMD reverses trhe results, the difference between the two algos is a lot smaller on the AMD than on a PIV so the new algo has better averages.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

GregL

PIII 1 GHz

lstrcmp  Tests:

lstrcmp - Lingo     : 704 clocks; Return value: -1
lstrcmp - Lingo Old : 695 clocks; Return value: -1

Press ENTER to exit...



Mark Jones

AMD XP 2500+


lstrcmp  Tests:

lstrcmp - Lingo     : 716 clocks; Return value: -1
lstrcmp - Lingo Old : 690 clocks; Return value: -1

Press ENTER to exit...


No such thing as a free lunch. :)
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

lingo

#13
Thank you Hutch,Greg and Mark Jones  :U

Here is new version with additional functionality according r22's advice:

"Just a suggestion, make it a straight boolean compare true or false, because the 1 and -1 returns are somewhate useless (knowing that a string is bigger or smaller without knowing by what value has little to no uses)." by r22,

From SDK:
"If the string pointed to by lpString1 is less than the string pointed to by lpString2, the return value is negative. If the string pointed to by lpString1 is greater than the string pointed to by lpString2, the return value is positive. If the strings are equal, the return value is zero.

The function returns the difference of the values of the first unequal characters it encounters. For example, lstrcmp determines that "abcz" is greater than "abcdefg" and returns the difference of z and d. "


Example:
str1  db "abcdefg", 0
str2  db "abcz", 0

So, z=122  d=100  and
z-d=122-100=22
and minus because str1 is less then str2


lstrcmp  Tests:

lstrcmp - Lingo     : 20 clocks; Return value: -22
lstrcmp - Lingo Old : 29 clocks; Return value: -1

Press ENTER to exit...


Regards,
Lingo


[attachment deleted by admin]

Mark Jones

Interesting. :U


lstrcmp - Lingo     : 20 clocks; Return value: -22
lstrcmp - Lingo Old : 26 clocks; Return value: -1
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08