Print Page - String comparision

Title: String comparision
Post by: Danesh on January 04, 2007, 11:08:58 PM

Hi all,

I am looking for a function which can compare strings (it should not be case sensitive) and return a value which represents if the input strings were equal or the first one has had higher order in dictionary order or vice versa. In MASM32 library there is a function which szCmpi but it just compares two strings (it is not case sensitive either) and returned value can be used to know if they were equal or not. Actually, it is easy to find out if two strings are equal or one has higher order, but the problem is that I do not want to convert both to uppercase or lowercase to make the functions insensitive. I also know that szCmpi uses a table to be insensitive but the source code was quite confusing. Can anybody help?

D.

Title: Re: String comparision
Post by: jdoe on January 04, 2007, 11:27:08 PM

Danesh,

lstrcmpi win32 api is doing that and it have the advantage to take care of the language of the string which I don't think szCmpi care. But lstrcmpi is slow.

Title: Re: String comparision
Post by: hutch-- on January 04, 2007, 11:52:57 PM

Danesh,

The algo is not that complicated, it has a table that has the "a" to "z" range and the "A" the "Z" ranges of the same value so that case comparisons are no longer a problem. By using a table "A" and "a" match because their position in the table is different but the value is the same. There should be no problem with just using it if you don't want to analyse the code.

Title: Re: String comparision
Post by: LithiumDex on January 05, 2007, 12:14:08 AM

In psuedo code:

func cisStrCmp ( string a, string b )
len = min(length(a), length(b))
for i = 1 to len
ca = tolower(a)
cb = tolower(b)
if ca > cb then return a comes after b
if cb > ca then return b comes after a
next
if length(a) < length(b) return a comes before b
if length(b) < length(a) return b comes before a
return a and b are equal
end func

I havn't done asm in a while, so I didn't write it as such, but it shouldn't be hard too... and this algorithm is O(n), I don't think it get's much faster, although I could be mistaken... (2nd thought) some optimizations when writing could be:
- store length of a and length of b for reuse at start
- make lookup-table for tolower, which would be a 0-255 array of bytes, where a = i, but [A-Z] = a-z

hope this helps.

- Chris

Title: Re: String comparision
Post by: hutch-- on January 05, 2007, 01:14:11 AM

Danesh,

Here is an over commented version of the algo. The two character ranges for upper and lower case have the same values so that case insensitive comparisons work. The algo code reads the two byte values by zero extending them into 32 bit registers. Gets the first byte value from the table and zero extends it into a 32 bit register then does a direct BYTE comparison to the other value in the table.

Its done this way for two reasons, the table can be modified for different chaacter sets and the table method is very fast.

Code Select


; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

    .486                      ; force 32 bit code
    .model flat, stdcall      ; memory model & calling convention
    option casemap :none      ; case sensitive

comment * -----------------------------------------------------------
        This algorithm was developed in conjunction with Jim Giordano
        ----------------------------------------------------------- *

    .data
    align 16
      szCmpi_tbl \
      db   0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15
      db  16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31
      db  32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47
      db  48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63
      db  64, 97, 98, 99,100,101,102,103,104,105,106,107,108,109,110,111    ; < 1st 26 characters
      db 112,113,114,115,116,117,118,119,120,121,122, 91, 92, 93, 94, 95
      db  96, 97, 98, 99,100,101,102,103,104,105,106,107,108,109,110,111    ; < 2nd 26 characters
      db 112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127
      db 128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143
      db 144,145,146,147,148,149,150,151,152,153,154,155,156,156,158,159
      db 160,161,162,163,164,165,166,167,168,169,170,171,172,173,173,175
      db 176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191
      db 192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207
      db 208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223
      db 224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239
      db 240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255

    .code

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

OPTION PROLOGUE:NONE                        ; remove stack frame
OPTION EPILOGUE:NONE 

align 16

szCmpi proc src:DWORD,dst:DWORD,ln:DWORD

  ; -------------------------------------------------------------------------
  ; entry point of procedure is the address of the following PUSH instruction
  ; -------------------------------------------------------------------------
    push ebx                                ; preserve used registers
    push esi
    push edi

    mov esi, [esp+4+12]                     ; load source adress
    mov edi, [esp+8+12]                     ; load destination address
    sub eax, eax                            ; zero eax as index

  align 4
  @@:
    movzx edx, BYTE PTR [esi+eax]           ; zero extend source byte into EDX
    movzx ebx, BYTE PTR [edi+eax]           ; zero extend dest byte into EBX
    movzx ecx, BYTE PTR [edx+szCmpi_tbl]    ; zero extend 1st byte table value into ECX
    add eax, 1
    cmp cl, [ebx+szCmpi_tbl]                ; compare it to second byte in table
    jne quit                                ; exit on mismatch with count in EAX
    cmp eax, [esp+12+12]                    ; test index against byte count
    jb @B

    sub eax, eax                            ; set EAX to zero on match

  quit:
    pop edi                                 ; restore used registers
    pop esi
    pop ebx

    ret 12                                  ; balance the stack on exit

szCmpi endp

OPTION PROLOGUE:PrologueDef                 ; restore stack frame
OPTION EPILOGUE:EpilogueDef 

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

    end

Title: Re: String comparision
Post by: jdoe on January 05, 2007, 02:38:13 AM

I've worked on bytes table to make it worked with french characters also (for szCmpi from masm32.lib).

Table

Code Select


szCmpi_tbl BYTE 000h,001h,002h,003h,004h,005h,006h,007h,008h,009h,00Ah,00Bh,00Ch,00Dh,00Eh,00Fh
           BYTE 010h,011h,012h,013h,014h,015h,016h,017h,018h,019h,01Ah,01Bh,01Ch,01Dh,01Eh,01Fh
           BYTE 020h,021h,022h,023h,024h,025h,026h,027h,028h,029h,02Ah,02Bh,02Ch,02Dh,02Eh,02Fh
           BYTE 030h,031h,032h,033h,034h,035h,036h,037h,038h,039h,03Ah,03Bh,03Ch,03Dh,03Eh,03Fh
           BYTE 040h,061h,062h,063h,064h,065h,066h,067h,068h,069h,06Ah,06Bh,06Ch,06Dh,06Eh,06Fh
           BYTE 070h,071h,072h,073h,074h,075h,076h,077h,078h,079h,07Ah,05Bh,05Ch,05Dh,05Eh,05Fh
           BYTE 060h,061h,062h,063h,064h,065h,066h,067h,068h,069h,06Ah,06Bh,06Ch,06Dh,06Eh,06Fh
           BYTE 070h,071h,072h,073h,074h,075h,076h,077h,078h,079h,07Ah,07Bh,07Ch,07Dh,07Eh,07Fh
           BYTE 080h,081h,082h,083h,084h,085h,086h,087h,088h,089h,08Ah,08Bh,08Ch,08Dh,08Eh,08Fh
           BYTE 090h,091h,092h,093h,094h,095h,096h,097h,098h,099h,09Ah,09Bh,09Ch,09Dh,09Eh,09Fh
           BYTE 0A0h,0A1h,0A2h,0A3h,0A4h,0A5h,0A6h,0A7h,0A8h,0A9h,0AAh,0ABh,0ACh,0ADh,0AEh,0AFh
           BYTE 0B0h,0B1h,0B2h,0B3h,0B4h,0B5h,0B6h,0B7h,0B8h,0B9h,0BAh,0BBh,0BCh,0BDh,0BEh,0BFh
           BYTE 0E0h,0C1h,0E2h,0C3h,0E4h,0C5h,0E6h,0E7h,0E8h,0E9h,0EAh,0EBh,0CCh,0CDh,0EEh,0EFh
           BYTE 0D0h,0D1h,0F2h,0D3h,0F4h,0D5h,0F6h,0D7h,0D8h,0D9h,0DAh,0FBh,0FCh,0DDh,0DEh,0DFh
           BYTE 0E0h,0E1h,0E2h,0E3h,0E4h,0E5h,0E6h,0E7h,0E8h,0E9h,0EAh,0EBh,0ECh,0EDh,0EEh,0EFh
           BYTE 0F0h,0F1h,0F2h,0F3h,0F4h,0F5h,0F6h,0F7h,0F8h,0F9h,0FAh,0FBh,0FCh,0FDh,0FEh,0FFh

I don't think I forgot any but if I forgot a character send me a mail (I'm not sure you'll see these in your browser if not french Windows).

Dites moi si j'ai oublié un accent quelconque pour que j'ajuste la table de caractères. Merci

Code Select


szTest1 BYTE "À Ä Â Ç É È Ë Ê Î Ï Ô Ö Ò Û Ü Æ",0
szTest2 BYTE "à ä â ç é è ë ê î ï ô ö ò û ü æ",0

Title: Re: String comparision
Post by: hutch-- on January 05, 2007, 02:45:18 AM

jdoe,

Code Select


szTest1 BYTE "À Ä Â Ç É È Ë Ê Î Ï Ô Ö Ò Û Ü Æ",0
szTest2 BYTE "à ä â ç é è ë ê î ï ô ö ò û ü æ",0

Displays fine on my Gecko engined K-MELEON.

Title: Re: String comparision
Post by: jdoe on January 05, 2007, 03:24:31 AM

Quote from: hutch-- on January 05, 2007, 02:45:18 AM
Displays fine on my Gecko engined K-MELEON.

I think it must be because the french characters fits in the ASCII table but those who use unicode language could have problems to see it. I can't be sure of that and that's why I'm never sure if it will be displayed right.

:wink

Title: Re: String comparision
Post by: jdoe on January 05, 2007, 03:29:49 PM

This updated table should take care of ALL lower and uppercase characters, even if they are not used.

Code Select


szCmpi_tbl BYTE 000h,001h,002h,003h,004h,005h,006h,007h,008h,009h,00Ah,00Bh,00Ch,00Dh,00Eh,00Fh
           BYTE 010h,011h,012h,013h,014h,015h,016h,017h,018h,019h,01Ah,01Bh,01Ch,01Dh,01Eh,01Fh
           BYTE 020h,021h,022h,023h,024h,025h,026h,027h,028h,029h,02Ah,02Bh,02Ch,02Dh,02Eh,02Fh
           BYTE 030h,031h,032h,033h,034h,035h,036h,037h,038h,039h,03Ah,03Bh,03Ch,03Dh,03Eh,03Fh
           BYTE 040h,061h,062h,063h,064h,065h,066h,067h,068h,069h,06Ah,06Bh,06Ch,06Dh,06Eh,06Fh
           BYTE 070h,071h,072h,073h,074h,075h,076h,077h,078h,079h,07Ah,05Bh,05Ch,05Dh,05Eh,05Fh
           BYTE 060h,061h,062h,063h,064h,065h,066h,067h,068h,069h,06Ah,06Bh,06Ch,06Dh,06Eh,06Fh
           BYTE 070h,071h,072h,073h,074h,075h,076h,077h,078h,079h,07Ah,07Bh,07Ch,07Dh,07Eh,07Fh
           BYTE 080h,081h,082h,083h,084h,085h,086h,087h,088h,089h,09Ah,08Bh,09Ch,08Dh,09Eh,08Fh
           BYTE 090h,091h,092h,093h,094h,095h,096h,097h,098h,099h,09Ah,09Bh,09Ch,09Dh,09Eh,09Fh
           BYTE 0A0h,0A1h,0A2h,0A3h,0A4h,0A5h,0A6h,0A7h,0A8h,0A9h,0AAh,0ABh,0ACh,0ADh,0AEh,0AFh
           BYTE 0B0h,0B1h,0B2h,0B3h,0B4h,0B5h,0B6h,0B7h,0B8h,0B9h,0BAh,0BBh,0BCh,0BDh,0BEh,0BFh
           BYTE 0E0h,0E1h,0E2h,0E3h,0E4h,0E5h,0E6h,0E7h,0E8h,0E9h,0EAh,0EBh,0ECh,0EDh,0EEh,0EFh
           BYTE 0D0h,0F1h,0F2h,0F3h,0F4h,0F5h,0F6h,0D7h,0F8h,0F9h,0FAh,0FBh,0FCh,0FDh,0DEh,0DFh
           BYTE 0E0h,0E1h,0E2h,0E3h,0E4h,0E5h,0E6h,0E7h,0E8h,0E9h,0EAh,0EBh,0ECh,0EDh,0EEh,0EFh
           BYTE 0F0h,0F1h,0F2h,0F3h,0F4h,0F5h,0F6h,0F7h,0F8h,0F9h,0FAh,0FBh,0FCh,0FDh,0FEh,0FFh

Code Select


szTest1 BYTE "À Ä Â Ç É È Ë Ê Î Ï Ô Ö Ò Û Ü Æ Ú Œ Á Ì Í Ó Õ Ù Ú Ø Š Ž Ý Å Ã Ñ",0
szTest2 BYTE "à ä â ç é è ë ê î ï ô ö ò û ü æ ú œ á ì í ó õ ù ú ø š ž ý å ã ñ",0

There is nothing more to do with this table :P

Title: Re: String comparision
Post by: UlliN on January 05, 2007, 05:47:11 PM

Hello Danesh,

that's my way of comparing strings. The main idea is to compare DWORDs instead of bytes. A possible char translation is done on the fly. Its a bit faster than the algo above. :-)
There are 2 procs with and without char translation.

Beside this I've attached my CharTranslate-Routines with some predefined tables e.g. AsciiToAnsi, AnsiUpper, EbcdicToAscii. Significant faster than the MS APIs.

Ulli

[attachment deleted by admin]

Title: Re: String comparision
Post by: Danesh on January 05, 2007, 06:40:29 PM

WOW !!! Thanks everybody! I have modified the szCmpi table and of course the code. My goal was to support English and Swedish characters. (Swedish has three chars which are "äåö").

Thanks again,

Danesh

Title: Re: String comparision
Post by: jdoe on January 05, 2007, 07:11:33 PM

Quote from: Danesh on January 05, 2007, 06:40:29 PM
My goal was to support English and Swedish characters. (Swedish has three chars which are "äåö").

I had the feeling that if these few more characters were there that it could be useful somehow.

:8)

Title: Re: String comparision
Post by: Vortex on January 05, 2007, 07:33:02 PM

Here my strcmp algo, it's a modest one but it does the job :

Code Select

.386
.model flat,stdcall
option casemap:none

.code

strcmp PROC USES esi edi dest:DWORD,src:DWORD

	mov		esi,src
	mov		edi,dest
	xor		eax,eax
	xor		ecx,ecx
	xor		edx,edx
	dec		edx
@@:
	add		edx,1
	mov		al,BYTE PTR [edi+edx]
	mov		cl,BYTE PTR [esi+edx]
	cmp		al,cl
	jne		no_match
	test		al,al
	jnz		@b
	xor		eax,eax
	ret

no_match:

	sub		eax,ecx
	ret

strcmp ENDP
END

The MASM Forum Archive 2004 to 2012

General Forums => The Workshop => Topic started by: Danesh on January 04, 2007, 11:08:58 PM