The MASM Forum Archive 2004 to 2012

General Forums => The Campus => Topic started by: uosunsetter on April 18, 2012, 11:30:01 PM

Title: Checking whether a string contains another string
Post by: uosunsetter on April 18, 2012, 11:30:01 PM
Hello!

I checked some examples and topics about it but none of them was showing what I exactly wanted. So I tried to write one by myself :)

Let's say there are 2 strings

1: "hi my name is deniz. did u like it?"

2: "deniz"

So what I am trying to achieve is:

hi my name is deniz. did u like it?
deniz
deniz
  deniz

sliding like that until the match.

When I run the code below it always says "yes" :(

Btw, I've used cpu registers as much as possible for better performance (not even sure if the usages were proper thougjh). However I still can't be sure of my code about performance so if there is a faster way of comparing strings, please let me know :)

.386
.model flat,stdcall
option casemap:none

include     \masm32\include\windows.inc
include     \masm32\include\kernel32.inc
include     \masm32\include\masm32.inc
include     \masm32\include\msvcrt.inc

includelib  \masm32\lib\kernel32.lib
includelib  \masm32\lib\msvcrt.lib
includelib  \masm32\lib\masm32.lib



.data
msgyes db 'yes ',0
msgno db 'no ',0

.data?
str1 db 100 dup(?)
str2 db 100 dup(?)

.code

start:

;Get 2 strings
invoke StdIn,ADDR str1,100
invoke StdIn,ADDR str2,100

;Get the length of str1 and save it to bl
xor bl, bl ;Zero bl
lea esi, str1

repeatstr1:
mov al, [esi]
inc esi
inc bh
cmp al, 0
jne repeatstr1



;Get the length of str2 and save it to bh
xor bh, bh ;Zero bh
lea esi, str2

repeatstr2:
mov al, [esi]
inc esi
inc bh
cmp al, 0
jne repeatstr2

;Decrease bl and bh because there will be no need to compare '0'
dec bl
dec bh

;Set cl to know how many times to check(in my case bl will be always greater then bh)
mov cl, bl
sub cl, bh

;Set initials

lea esi, str1
lea edi, str2
lea edx, str1 ;will need a copy
mov ch, bh; another copy for len of str2
jmp Compare ;start comparing

;Prepare for next comparision also check if it is the last comparision

PrepareForNextComp:
dec cl
cmp cl, 0
je notfound
lea edi, str2
inc edx
mov esi, edx
mov bh, ch

Compare:
mov al, [esi]
mov ah, [edi]
cmp al, ah
jne PrepareForNextComp
dec bh
cmp bh, 0 ;if done str2's len times, than found
je found
inc esi
inc edi
jmp Compare

notfound:
invoke StdOut,ADDR msgno
jmp finish
found:
invoke StdOut,ADDR msgyes
jmp finish
;wait for input and end
finish:
invoke StdIn,ADDR str2,100
invoke ExitProcess,0

END start



Title: Re: Checking whether a string contains another string
Post by: jj2007 on April 19, 2012, 12:34:25 AM
First of all, this is a very nice program, written from scratch and well commented. Compliments :U

Can it be done faster? You have to set up a benchmark, and test it...

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
838     cycles for MyTestCode
438     cycles for InString
402     cycles for Instr_ (MasmBasic)


So, yes, it can be done a bit faster, but so much speed rarely matters. What I would suggest, though, is to reflect on how to set up a testing environment for this kind of exercise. For example, every time you launch your code you have to type in the strings - that can be avoided. See the attachment for a proposal that uses conditional assembly. It contains the timer.asm file taken from the Masm32 Laboratory (http://www.masm32.com/board/index.php?topic=770.0). Make sure you tick "use path names" when unzipping.

Keep up the good work, you are on a very good track :thumbu
Title: Re: Checking whether a string contains another string
Post by: dedndave on April 19, 2012, 01:13:15 AM
you might start by searching for the first letter
then scan the rest of the string
Title: Re: Checking whether a string contains another string
Post by: SteveAsm on April 19, 2012, 01:26:13 AM
The C standard library contains a function for exactly this purpose, called: strstr()
Here's an example of its use:

  .486                       ; use 486 instructions
  .model flat, stdcall       ; Win32 memory model
  option casemap:none        ; case sensitive

    include ..\Include\windows.inc
    include ..\Include\stdio.inc
    include ..\Include\stdlib.inc
    include ..\Include\conio.inc

    includelib ..\Lib\crtdll.lib

    Main  proto

.data
    StringA     DB  'hi my name is deniz. did u like it?',0
    StringB     DB  'deniz',0
    StringC     DB  'result=',0
    style1      DB  '%s',0
    style2      DB  '%d',0

.data?
    iLVAR      dd    ?
    iRVAR      dd    ?
    string1    dd    ?
    string2    dd    ?
    result     dd    ?

.code
start:

  invoke Main
  invoke exit, 0

Main proc
  mov eax, offset StringA
  mov string1, eax
  mov eax, offset StringB
  mov string2, eax
 
  FINIT
  invoke strlen, string1
  mov iLVAR, eax
  invoke strlen, string2
  mov iRVAR, eax
;
  FLDZ
  mov eax, iLVAR
  mov ebx, iRVAR
  cmp eax, ebx
  jl Label1
;
  mov eax, [string1]
  mov iLVAR, eax
  invoke strstr, string1, string2
  mov iRVAR, eax
  .if iRVAR != NULL
      FILD iRVAR
      FILD iLVAR
      FSUBP ST(1), ST
      FLD1
      FADDP ST(1), ST
  .endif
Label1:
  FISTP result ; store integer, pop stack
;
  invoke printf, offset style1, offset StringC
  invoke printf, offset style2, result
;
  call _getch   ; pause
  ret                               ; return to windows
Main endp
;
end start



When run, the output is 15.
Title: Re: Checking whether a string contains another string
Post by: abnuque on April 19, 2012, 04:53:35 AM
Hi Steve,

Where can I get the following:
    include ..\Include\stdio.inc
    include ..\Include\stdlib.inc
    include ..\Include\conio.inc
These are not in the Masm package.
Thanks.
Title: Re: Checking whether a string contains another string
Post by: jj2007 on April 19, 2012, 06:23:59 AM
Quote from: abnuque on April 19, 2012, 04:53:35 AM
Where can I get the following:
    include ..\Include\stdio.inc
    include ..\Include\stdlib.inc
    include ..\Include\conio.inc
These are not in the Masm package.

Use the standard masm32rt include and prepend crt_ to all C functions, as shown below. However, C strstr is a bit slow - and apparently, there is not even a case-insensitive version:
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
756     cycles for MyTestCode
624     cycles for InString
417     cycles for Instr_ (MasmBasic)
400     cycles for Instr_, case-insensitive, result=100
560     cycles for C strstr, result=100


include \masm32\include\masm32rt.inc

    Main  proto
...

invoke crt_strlen, string1
invoke crt_strlen, string2
invoke crt_strstr, string1, string2
invoke crt_printf, offset style1, offset StringC
invoke crt_printf, offset style2, result
call crt__getch      ; pause


@Steve: using the FPU for showing the result is unorthodox :bg
Title: Re: Checking whether a string contains another string
Post by: SteveAsm on April 19, 2012, 06:46:16 AM
Quote from: abnuque on April 19, 2012, 04:53:35 AM
Where can I get the following:
    include ..\Include\stdio.inc
    include ..\Include\stdlib.inc
    include ..\Include\conio.inc
These are not in the Masm package.

Oops, my bad...
in my haste, I forgot to mention that my example was using   jwasm  (http://japheth.de/JWasm.html), which includes those filenames.
But, JJ is correct, use his example: Masm32rt and preappend crt_ as he illustrated.

edit:
fixed typo.
Title: Re: Checking whether a string contains another string
Post by: jj2007 on April 19, 2012, 07:09:46 AM
Quote from: SteveAsm on April 19, 2012, 06:46:16 AM
JJ is correct, use his Masm32rt

The owner is Hutch - unless he declares it common forum property :green
Title: Re: Checking whether a string contains another string
Post by: SteveAsm on April 19, 2012, 07:48:39 AM
That was an editing typo.  :boohoo:
Title: Re: Checking whether a string contains another string
Post by: SteveAsm on April 19, 2012, 07:57:03 AM
Alternately,
the win API has a similar set of functions   StrStr (http://msdn.microsoft.com/en-us/library/windows/desktop/bb773436(v=vs.85).aspx) and  StrStrI  (http://msdn.microsoft.com/en-us/library/windows/desktop/bb773439(v=vs.85).aspx).
Title: Re: Checking whether a string contains another string
Post by: jj2007 on April 19, 2012, 08:27:29 AM
Quote from: SteveAsm on April 19, 2012, 07:57:03 AM
Alternately,
the win API has a similar set of functions   StrStr (http://msdn.microsoft.com/en-us/library/windows/desktop/bb773436(v=vs.85).aspx) and  StrStrI  (http://msdn.microsoft.com/en-us/library/windows/desktop/bb773439(v=vs.85).aspx).

They do exist, indeed:
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
754     cycles for MyTestCode
615     cycles for InString, result=100
410     cycles for Instr_ (MasmBasic)
400     cycles for Instr_, case-insensitive, result=100
555     cycles for C strstr, result=100
1586    cycles for StrStr, result=100
87280   cycles for StrStrI, result=100
Title: Re: Checking whether a string contains another string
Post by: uosunsetter on April 19, 2012, 01:15:39 PM
Quote from: jj2007 on April 19, 2012, 12:34:25 AM
First of all, this is a very nice program, written from scratch and well commented. Compliments :U

Can it be done faster? You have to set up a benchmark, and test it...

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
838     cycles for MyTestCode
438     cycles for InString
402     cycles for Instr_ (MasmBasic)


So, yes, it can be done a bit faster, but so much speed rarely matters. What I would suggest, though, is to reflect on how to set up a testing environment for this kind of exercise. For example, every time you launch your code you have to type in the strings - that can be avoided. See the attachment for a proposal that uses conditional assembly. It contains the timer.asm file taken from the Masm32 Laboratory (http://www.masm32.com/board/index.php?topic=770.0). Make sure you tick "use path names" when unzipping.

Keep up the good work, you are on a very good track :thumbu

Thx alot!  I'm imitating (ohhh my English... couldn't find a better fitting word but you got the idea :D) what I see in this forum :)
Looking to the attachment right now.
I will be checking like 100.000 strings on memory. Rather than that, for practical purposes I must learn and achieve the fastest.

Quote from: dedndave on April 19, 2012, 01:13:15 AM
you might start by searching for the first letter
then scan the rest of the string

Actually I am doing exactly what you are saying, continuously :) checking first letter if matched checking next or if it doesn't match I slide by 1 letter :)

Btw, I'm avoiding C functions or APIs because some of the comparisions will be done with strings that not actually string (not ending with '0') and I will not need the information where the other string is found. Also all of the strings will be lower case. These made me want to optimize it. The string lengths will not be greater than like 200. So I thought why would i let a C function use the whole register for holding string's length.

I will be checking the examples for sure in order to see a better algorithm or get a better idea.

Soo I still couldn't figure out why it always give me the answer "yes" which means str1 contains str2 even it doesnt. Intrestingly, if I hardcode the length of strings that I will input(setting bl and bh registers which holds length of each string) I get proper results.

Thx in advance :)
Title: Re: Checking whether a string contains another string
Post by: hutch-- on April 19, 2012, 01:57:23 PM
uosunsetter,

Don't give up on writing your own, they are not that hard to write. Usually you search for the first character by scanning one byte at a time, if you find that character you branch and compare the rest of the search string to the current location in the main string, if it matches you return the offset in the string. You can use the C runtime versions if you like and there are plenty of others to use as well but writing your own is good value and you understand it better when you write it yourself.
Title: Re: Checking whether a string contains another string
Post by: uosunsetter on April 19, 2012, 04:42:53 PM
Quote from: hutch-- on April 19, 2012, 01:57:23 PM
uosunsetter,

Don't give up on writing your own, they are not that hard to write. Usually you search for the first character by scanning one byte at a time, if you find that character you branch and compare the rest of the search string to the current location in the main string, if it matches you return the offset in the string. You can use the C runtime versions if you like and there are plenty of others to use as well but writing your own is good value and you understand it better when you write it yourself.

and

Quote from: dedndave on April 19, 2012, 01:13:15 AM
you might start by searching for the first letter
then scan the rest of the string

Now I get it! I'm writing the new one right now and will benchmark with jj2007's example. Thank you all!
Title: Re: Checking whether a string contains another string
Post by: SteveAsm on April 19, 2012, 05:36:22 PM
Quote from: jj2007 on April 19, 2012, 08:27:29 AM
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
555     cycles for C strstr, result=100
1586    cycles for StrStr, result=100
87280   cycles for StrStrI, result=100


JJ,
I'm not too disappointed by the performance of the C functions, but, the Win-API functions seem extremely wak.
I don't understand it.
Could you maybe run them several times and average them ?
Title: Re: Checking whether a string contains another string
Post by: MichaelW on April 19, 2012, 06:49:50 PM
One of the reasons for the slowness of the API functions may be built-in buffer-overrun protections that the CRT functions lack. See the Security Note here:

http://msdn.microsoft.com/en-us/library/z9da80kz(v=vs.71).aspx

But this does not explain why StrStrI is so much slower than StrStr. I was able to throw together a replacement for StrStrI:

        push esi
        push edi
        invoke crt__strdup, ADDR str1
        mov esi, eax
        invoke crt__strdup, ADDR str2
        mov edi, eax
        invoke crt__strlwr, esi
        invoke crt__strlwr, edi
        invoke StrStr, esi, edi
        invoke crt_free, esi
        invoke crt_free, edi
        pop esi
        pop edi


That running on my P3 is some 7 times faster than StrStrI.
Title: Re: Checking whether a string contains another string
Post by: SteveAsm on April 19, 2012, 08:04:17 PM
Quote from: MichaelW on April 19, 2012, 06:49:50 PM
...See the Security Note here:

Yeh, scary stuff.

Quote
But this does not explain why StrStrI is so much slower than StrStr. I was able to throw together a replacement for StrStrI: ...
That running on my P3 is some 7 times faster than StrStrI.

7x is a significant improvement.
Title: Re: Checking whether a string contains another string
Post by: jj2007 on April 19, 2012, 09:20:49 PM
It's probably bad design. Even the Unicode version of MasmBasic Instr, wInstr, runs in 1600 cycles instead of 35000:

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
852     cycles for MyTestCode (UoSunSetter)
430     cycles for InString, result=100
391     cycles for Instr_ (MasmBasic), result=100
379     cycles for Instr_, case-insensitive, result=100
1117    cycles for wInstr, case-sensitive, result=100
1564    cycles for wInstr, case-insensitive, result=100
383     cycles for C strstr, result=100
937     cycles for StrStr, result=100
35480   cycles for StrStrI, result=100

... and that one works fine with case-insensitive Russian Unicode strings, for example. Why this would need a check for buffer overrun is a mystery - we are talking zero-delimited strings here ::)
Title: Re: Checking whether a string contains another string
Post by: uosunsetter on April 19, 2012, 10:39:49 PM
OMG! How about this? :D I hope I didn't make any mistake on benchmark... I put an attachment which includes benchmark. I set MasmBasic flag to 0 'cause couldn't wait to see the results :D

Intel(R) Core(TM) i7 CPU       Q 720  @ 1.60GHz (SSE4)
56   cycles for MyTestCode (UoSunSetter)
560   cycles for InString, result=100
524   cycles for C strstr, result=100
1163   cycles for StrStr, result=100
44536   cycles for StrStrI, result=100

56   cycles for MyTestCode (UoSunSetter)
571   cycles for InString, result=100
468   cycles for C strstr, result=100
1176   cycles for StrStr, result=100
43617   cycles for StrStrI, result=100

.386
.model flat,stdcall
option casemap:none

include     \masm32\include\windows.inc
include     \masm32\include\kernel32.inc
include     \masm32\include\masm32.inc
include     \masm32\include\msvcrt.inc

includelib  \masm32\lib\kernel32.lib
includelib  \masm32\lib\msvcrt.lib
includelib  \masm32\lib\masm32.lib



.data
msgyes db 'yes ',0
msgno db 'no ',0

.data?
str1 db 100 dup(?)
str2 db 100 dup(?)

.code

start:

;Get 2 strings
invoke StdIn,ADDR str1,100
invoke StdIn,ADDR str2,100

;Get the length of str1 and save it to bl
xor bl, bl ;Zero bl
lea esi, str1

repeatstr1: ;repeat until the end of str1
mov al, [esi]
inc esi
inc bh
cmp al, 0
jne repeatstr1



;Get the length of str2 and save it to bh
xor bh, bh ;Zero bh
lea esi, str2

repeatstr2: ;repeat until the end of str2
mov al, [esi]
inc esi
inc bh
cmp al, 0
jne repeatstr2

;Decrease bl and bh because there will be no need to compare '0'
dec bl
dec bh

;Set cl to know how many times to check(in my case bl will be always greater then bh)
mov cl, bl
sub cl, bh
inc cl

;Set initials
lea esi, str1
lea edi, str2
lea edx, str1 ;will need a copy
mov ch, bh; another copy for len of str2
jmp MatchFirstLetters ;start by finding first match


PrepareForNextMatchingCheck:     ;Prepare registers for finding next first letter match also test if it is the last check
inc edx
mov esi, edx
dec cl
cmp cl, 0
je NotFound



MatchFirstLetters:
mov al, [esi]
mov ah, [edi]
cmp al, ah
jne PrepareForNextMatchingCheck ;no luck? try next

lea edi, str2 ;need to reset edi since it will manipulated by CompareTheRest
inc edi ;first letter already checked so start from second one
inc esi ;same as above
mov bh, ch ;reset the iterator used in CompareTheRest


CompareTheRest: ;Match the rest of the chars one by one
mov al, [esi]
mov ah, [edi]
cmp al, ah
jne PrepareForNextMatchingCheck
dec bh
cmp bh, 1
je Found ;if comparision is done str2's length-1 times and there is no mismatch then found
inc esi
inc edi
jmp CompareTheRest




NotFound:
mov eax, 0
jmp finish
Found:
mov eax, 1
jmp finish

;end
finish:
invoke ExitProcess,0

END start
Title: Re: Checking whether a string contains another string
Post by: uosunsetter on April 19, 2012, 10:55:09 PM
Forgot to delete driver prefix from include lines in asm file included in benchmark.zip :(

Here is the attachment with fix...
Title: Re: Checking whether a string contains another string
Post by: SteveAsm on April 19, 2012, 11:43:05 PM
In every test I ran this, the C strstr ran the fastest by about 8-9%.


Intel(R) Celeron(R) M processor         1.70GHz (SSE2)
856         cycles for MyTestCode (UoSunSetter)
430         cycles for InString, result=100
399         cycles for Instr_ (MasmBasic), result=100
404         cycles for Instr_, case-insensitive, result=100
1124        cycles for wInstr, case-sensitive, result=100
1606        cycles for wInstr, case-insensitive, result=100
369         cycles for C strstr, result=100
984         cycles for StrStr, result=100
35407       cycles for StrStrI, result=100

849         cycles for MyTestCode (UoSunSetter)
430         cycles for InString, result=100
403         cycles for Instr_ (MasmBasic), result=100
405         cycles for Instr_, case-insensitive, result=100
1262        cycles for wInstr, case-sensitive, result=100
1428        cycles for wInstr, case-insensitive, result=100
369         cycles for C strstr, result=100
930         cycles for StrStr, result=100
35429       cycles for StrStrI, result=100



MasmBasic was always a close second.
Title: Re: Checking whether a string contains another string
Post by: uosunsetter on April 19, 2012, 11:51:59 PM
Can it be the old codes I've written? Did you download the latest attachment I posted?
Title: Re: Checking whether a string contains another string
Post by: SteveAsm on April 20, 2012, 01:39:30 AM
Quote from: uosunsetter on April 19, 2012, 11:51:59 PM
Can it be the old codes I've written?

Yes, it is the old code.
Not a mistake, I was trying to figure why the API functions are so slow.

Quote
Did you download the latest attachment I posted?

Yes, I did download the new code.
I haven't had the chance to look at the source code yet.
But, the result is quite impressive, if it's right.
Title: Re: Checking whether a string contains another string
Post by: SteveAsm on April 20, 2012, 03:08:55 AM
Quote from: uosunsetter on April 19, 2012, 10:39:49 PM
OMG! How about this? :D

Okay, here is your latest benchmark:

Intel(R) Celeron(R) M processor         1.70GHz (SSE2)
51          cycles for MyTestCode (UoSunSetter)
427         cycles for InString, result=100
369         cycles for C strstr, result=100
938         cycles for StrStr, result=100
35567       cycles for StrStrI, result=100

49          cycles for MyTestCode (UoSunSetter)
431         cycles for InString, result=100
393         cycles for C strstr, result=100
944         cycles for StrStr, result=100
35595       cycles for StrStrI, result=100



My concern, is that since MyTestCode does not output a value, we don't really know if it's working correctly.
I think it must be short circuiting somewhere.
Title: Re: Checking whether a string contains another string
Post by: SteveAsm on April 20, 2012, 04:06:15 AM
Okay,
looking at your code for MyTestCode, I discovered a flaw in the execution.

When the benchmark executes, it does not wait for any input.
All the other programs use a static string to test against, MyTestCode does not.
As the code is written, it wants the user to enter str1 and str2. This never happens.

So, it proceeds to compare two empty strings and it fails.
For varification, I made two modifications to the code to test this.

Additionally, when I assemble and test the code, it works, but, fails the test as well.
Here are the two changes I made:

.data
    msgyes         db  'yes ',0
    msgno          db  'no ',0
    msgFailed      db  'procedure failed',0
    msgSuccess     db  'procedure suceeded',0
   


NotFound:
    mov eax, 0
    printstr msgFailed    ; my own print routine
    printstr lf

    jmp finish
Found:
    mov eax, 1
    printstr msgSuccess    ; my own print routine
    printstr lf

    jmp finish

;end
finish:
    call _getch          ; pause

    invoke ExitProcess,0
END start



The output message is: "procedure failed"

EDIT:
I just compared the code you posted above with the code in "benchmark.asm" and I see a difference.
I was testing the code above.
The "benchmark.asm" code does use static strings for comparison, but, I believe it is still failing.
Title: Re: Checking whether a string contains another string
Post by: MichaelW on April 20, 2012, 04:06:28 AM
I would expect the CRT strstr function to be fairly fast on short strings. But my previous tests against code that utilizes a Boyer-Moore search (the FreeBASIC Instr function), and the tests below, suggest that strstr is nothing fancy, just simple compiler-optimized code.

;==============================================================================
include \masm32\include\masm32rt.inc
.686
include \masm32\macros\timers.asm
;==============================================================================
.data
    str1 db "The three algorithms are variations of a Boyer Moore design.",0
    size1 equ $-str1
    str2 db "design.",0
    size2 equ $-str2
    str3 db "The three algorithms are variations of a Boyer Moore design."
         db "The original design BMBinSearch is the most flexible across "
         db "different search conditions. The two variations BMHBinsearch "
         db "and SBMBinSearch are respectively a Horspool variation and a "
         db "simplified variation which perform well depending on the "
         db "hardware and the patterns being searched."
         db "The minimum pattern length is 2 bytes but these algorithms are "
         db "slower on very short paterns than a classic byte scanner."
         db "The algorithms have about a 300 character penalty at startup "
         db "as they must construct a table so for either very short "
         db "patterns or short data it is more efficient to use a byte "
         db "scanner.",0
    size3 equ $-str3
    str4 db "it is more efficient to use a byte scanner.",0
    size4 equ $-str4
.code
;==============================================================================
start:

    invoke GetCurrentProcess
    invoke SetProcessAffinityMask, eax, 1

    invoke crt_strstr, ADDR str1, ADDR str2
    sub eax, OFFSET str1
    printf("%d\n",eax)
    invoke BMBinSearch, 0, ADDR str1, size1, ADDR str2, size2
    printf("%d\n\n",eax)

    invoke crt_strstr, ADDR str3, ADDR str4
    sub eax, OFFSET str3
    printf("%d\n",eax)
    invoke BMBinSearch, 0, ADDR str3, size3, ADDR str4, size4
    printf("%d\n\n",eax)

    invoke Sleep, 3000

    counter_begin 1000000, HIGH_PRIORITY_CLASS
        invoke crt_strstr, ADDR str1, ADDR str2
    counter_end
    printf("%d cycles, strstr\n", eax)

    counter_begin 1000000, HIGH_PRIORITY_CLASS
        invoke BMBinSearch, 0, ADDR str1, size1, ADDR str2, size2
    counter_end
    printf("%d cycles, BMBinSearch\n\n", eax)

    counter_begin 1000000, HIGH_PRIORITY_CLASS
        invoke crt_strstr, ADDR str3, ADDR str4
    counter_end
    printf("%d cycles, strstr\n", eax)

    counter_begin 1000000, HIGH_PRIORITY_CLASS
        invoke BMBinSearch, 0, ADDR str3, size3, ADDR str4, size4
    counter_end
    printf("%d cycles, BMBinSearch\n\n", eax)

    inkey
    exit
;==============================================================================
end start


Running on a P3:

53
53

600
600

211 cycles, strstr
812 cycles, BMBinSearch

2579 cycles, strstr
1396 cycles, BMBinSearch


Oops, I'm off by one on my sizes, but it's not worth fixing.
Title: Re: Checking whether a string contains another string
Post by: dedndave on April 20, 2012, 04:23:07 AM
QuoteThe output message is: "procedure failed"

good time on that, though   :P
i am actually a little surprised
Title: Re: Checking whether a string contains another string
Post by: jj2007 on April 20, 2012, 06:27:02 AM
Quote from: SteveAsm on April 19, 2012, 11:43:05 PM
In every test I ran this, the C strstr ran the fastest by about 8-9%.
...
MasmBasic was always a close second.

Timings may change by some % depending on the location of the code (see here (http://www.masm32.com/board/index.php?topic=11454.0)), and of course they differ by CPU:
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
754     cycles for MyTestCode (UoSunSetter)
615     cycles for InString, result=100
412     cycles for Instr_ (MasmBasic), result=100
399     cycles for Instr_, case-insensitive, result=100
554     cycles for C strstr, result=100
86779   cycles for StrStrI, result=100


The MasmBasic implementation is pretty fast especially for longer strings, but it has a bit of overhead since it handles also case-insensitive and Unicode strings. The C functions are generally very competitive, though. The only surprise here is the StrStrI result. By the way, Lingo wrote SSE4 code (http://www.masm32.com/board/index.php?topic=9370.msg132644#msg132644) that beats everything here, but beware of GPFs :wink
Title: Re: Checking whether a string contains another string
Post by: uosunsetter on April 20, 2012, 09:03:32 PM
From the beginning I didn't noticed one of the fixes that jj2007 (about getting strings' pointer) made to my codes. SteveAsm was also right :) I just paid too much attention to algorithm that I missed other points.

Also, after spending 2-3 more hours, I accepted some facts: Whoever wrote the InString function is far more experienced than me. Also the time he/she spent to write the codes is probably more than me. So understanding what it is written and manipulating it according to the cases that I will face seems quite beneficial.

But before I go, I got a question: Is there anyway to check if the CPU supports SSE4 instruction set so that I can take the advantage of the link below(thanks jj2007 for pointing and lingo for codes)???? Also if the CPU doesn't support I should be able to call standart(or edited) InString function. And of course it should be checked on execution.
http://www.masm32.com/board/index.php?topic=9370.msg132644#msg132644
Title: Re: Checking whether a string contains another string
Post by: dedndave on April 20, 2012, 09:11:58 PM
if you enable pentium instructions...
        .586
you can use CPUID
with all it's caveats, it is pretty easy to get the level of SSE support
most everyone has SSE2 or SSE3 - at least (not all, though)
more recent processors support SSE4 and better

http://www.intel.com/Assets/PDF/appnote/241618.pdf
http://support.amd.com/us/Embedded_TechDocs/25481.pdf
Title: Re: Checking whether a string contains another string
Post by: SteveAsm on April 21, 2012, 02:38:44 AM
@ uosunsetter
I was looking at your code, trying to debug it and I decided it was easier to just rewrite it, since I knew what you were trying to do.
Maybe somebody can insert this as a replacement for MyTestCode and time it.
Title: Re: Checking whether a string contains another string
Post by: dedndave on April 21, 2012, 03:22:46 AM
hi Steve

timing code is pretty easy
i know Jochen uses some rather "advanced" conditional assembly techniques   :red

most of us use MichaelW's timers.asm file
it may be found in the first post of the first thread in the Laboratory sub-forum
http://www.masm32.com/board/index.php?topic=770.msg5281#msg5281
i have included the timers.asm file in the attachment, though

we generally place the timers.asm file in the \masm32\macros folder

here is a very simple example of timing some code...
Title: Re: Checking whether a string contains another string
Post by: jj2007 on April 21, 2012, 07:21:45 AM
Quote from: SteveAsm on April 21, 2012, 02:38:44 AM
@ uosunsetter
I was looking at your code, trying to debug it and I decided it was easier to just rewrite it, since I knew what you were trying to do.
Maybe somebody can insert this as a replacement for MyTestCode and time it.


Doesn't even assemble without errors on my Masm32 installation. Could you rewrite it in the form used by the others? Thanks.

InstrSteve proc pString, pPattern
  .. code ..
  ret   ; return position in eax
InstrSteve endp
Title: Re: Checking whether a string contains another string
Post by: uosunsetter on April 21, 2012, 09:52:29 AM
Thanks dedndave!

Quote from: SteveAsm on April 21, 2012, 02:38:44 AM
@ uosunsetter
I was looking at your code, trying to debug it and I decided it was easier to just rewrite it, since I knew what you were trying to do.
Maybe somebody can insert this as a replacement for MyTestCode and time it.


Actually, as I said, on jj2007's benchmark attachments, he has the fixed version of my code. The errors simply caused by using lea instead of using mov and offset.

What I was doing:
lea esi, str1

What should have been:
mov esi, offset str1

After the replacement I got proper benchmark results which were quite disappointing(and normal :P).
Title: Re: Checking whether a string contains another string
Post by: jj2007 on April 21, 2012, 10:39:43 AM
Quote from: uosunsetter on April 21, 2012, 09:52:29 AM
After the replacement I got proper benchmark results which were quite disappointing(and normal :P).

Your code is already faster than the Windows API StrStr version. Comment out two lines, and it's 100 cycles faster:

PrepareForNextComp:
   dec cl
   ;cmp cl, 0
   je notfound      ; cl=0: NOT FOUND
   mov edi, pPattern
   inc edx
   mov esi, edx
   mov bh, ch

Compare:
   mov al, [esi]
   mov ah, [edi]
   cmp al, ah
   jne PrepareForNextComp
   dec bh
;   cmp bh, 0 ;if done pPattern's len times, than found
   je found
   inc esi
   inc edi
   jmp Compare
Title: Re: Checking whether a string contains another string
Post by: uosunsetter on April 21, 2012, 11:01:59 AM
Ewww wouldn't it crash or loops infinetely if I remove them?
Title: Re: Checking whether a string contains another string
Post by: dedndave on April 21, 2012, 12:15:38 PM
no - the DEC instruction sets the Zero Flag if the result is zero - no need to compare with 0

also - if the upper 3 bytes of EBX and ECX are 0, then DEC EBX and DEC ECX are probably slightly faster
that's because DWORD operations are generally prefered on 32-bit processors
and - DEC CL/BL are 2 byte instructions - DEC ECX/EBX are 1 byte instructions
Title: Re: Checking whether a string contains another string
Post by: uosunsetter on April 21, 2012, 12:30:13 PM
I just needed that kind of informations! Very good tips. Also intresting but makes sense. I learned a lot from this topic  :U  :8)
Title: Re: Checking whether a string contains another string
Post by: dedndave on April 21, 2012, 12:40:17 PM
ok - another tip   :P

it seems that DEC ECX is a little slow on some processors
actually, the instruction is fast if you do not have to wait for the flags
        dec     ecx
        jz      notfound

the processor has to wait for the flags to be altered before it can decide which way to branch

        sub     ecx,1
        jz      notfound

may be slightly faster, even though SUB ECX,1 is 3 bytes

another way around it is to put some other instruction in there that does not affect the flags
in this code, you don't have one to stick in there   :P
but, to illustrate...
        dec     ecx
        mov     esi,123456
        jnz     SomeLabel

is probably faster than
        mov     esi,123456
        dec     ecx
        jnz     SomeLabel

the processor can do something else while it waits for the flags to be set
Title: Re: Checking whether a string contains another string
Post by: FORTRANS on April 21, 2012, 01:00:44 PM
Hi,

   As Dave says, DEC and INC will set the Zero Flag properly.
But unlike ADD and SUB they do not affect the Carry Flag.
So you cannot use all of the conditioal jumps with INC and DEC,
while you can with ADD and SUB.  Just a heads up.

$0.02,

Steve N.
Title: Re: Checking whether a string contains another string
Post by: uosunsetter on April 21, 2012, 01:44:33 PM
Quotethe processor can do something else while it waits for the flags to be set

It makes sense. This is, however, a bit confusing for me. I always thought that only one thing can be done at a time on CPU. Again I thought that maybe there as some exceptions on multi-threading; but as far as I know, even multi-threading only organize and put things in queue without blocking each other.

And 1 other thing that I've been always wondering about is waiting. All these stuff including waiting(sleep functions, or waiting for a flag etc) do they rapidly check for the condition to be satisfied?
Title: Re: Checking whether a string contains another string
Post by: dedndave on April 21, 2012, 02:42:06 PM
modern CPU's execute many instructions "out-of-order" to achieve speed
they look ahead to see what instructions may be executed without being affected by those before them
and, they execute code in multiple pipe-lines
a major part of optimizing code is understanding how the CPU executes instructions
with modern CPU's - that is not a simple thing - i am mystified by what is faster quite often - lol

as for the Sleep function - that is unrelated to code execution time
when your process (or program, or thread) is asleep, the CPU and operating system allow other threads to execute
a good example is waiting for a key to be pressed by the user
if i call the operating system function to get a key in a loop, it executes very fast
and - it hogs all the processor power waiting for a key   :P
so - we insert a call to the Sleep function to allow other threads to execute
i typically use something like 40 or 50 milliseconds
40 mS will accommodate a typist going 250 WPM   :bg
even faster, actually...
i write the loop so that, if a keypress is received, it does not Sleep before the next test to see if a key is present
only when no keypress is in the buffer do i call Sleep