News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Checking whether a string contains another string

Started by uosunsetter, April 18, 2012, 11:30:01 PM

Previous topic - Next topic

uosunsetter

Hello!

I checked some examples and topics about it but none of them was showing what I exactly wanted. So I tried to write one by myself :)

Let's say there are 2 strings

1: "hi my name is deniz. did u like it?"

2: "deniz"

So what I am trying to achieve is:

hi my name is deniz. did u like it?
deniz
deniz
  deniz

sliding like that until the match.

When I run the code below it always says "yes" :(

Btw, I've used cpu registers as much as possible for better performance (not even sure if the usages were proper thougjh). However I still can't be sure of my code about performance so if there is a faster way of comparing strings, please let me know :)

.386
.model flat,stdcall
option casemap:none

include     \masm32\include\windows.inc
include     \masm32\include\kernel32.inc
include     \masm32\include\masm32.inc
include     \masm32\include\msvcrt.inc

includelib  \masm32\lib\kernel32.lib
includelib  \masm32\lib\msvcrt.lib
includelib  \masm32\lib\masm32.lib



.data
msgyes db 'yes ',0
msgno db 'no ',0

.data?
str1 db 100 dup(?)
str2 db 100 dup(?)

.code

start:

;Get 2 strings
invoke StdIn,ADDR str1,100
invoke StdIn,ADDR str2,100

;Get the length of str1 and save it to bl
xor bl, bl ;Zero bl
lea esi, str1

repeatstr1:
mov al, [esi]
inc esi
inc bh
cmp al, 0
jne repeatstr1



;Get the length of str2 and save it to bh
xor bh, bh ;Zero bh
lea esi, str2

repeatstr2:
mov al, [esi]
inc esi
inc bh
cmp al, 0
jne repeatstr2

;Decrease bl and bh because there will be no need to compare '0'
dec bl
dec bh

;Set cl to know how many times to check(in my case bl will be always greater then bh)
mov cl, bl
sub cl, bh

;Set initials

lea esi, str1
lea edi, str2
lea edx, str1 ;will need a copy
mov ch, bh; another copy for len of str2
jmp Compare ;start comparing

;Prepare for next comparision also check if it is the last comparision

PrepareForNextComp:
dec cl
cmp cl, 0
je notfound
lea edi, str2
inc edx
mov esi, edx
mov bh, ch

Compare:
mov al, [esi]
mov ah, [edi]
cmp al, ah
jne PrepareForNextComp
dec bh
cmp bh, 0 ;if done str2's len times, than found
je found
inc esi
inc edi
jmp Compare

notfound:
invoke StdOut,ADDR msgno
jmp finish
found:
invoke StdOut,ADDR msgyes
jmp finish
;wait for input and end
finish:
invoke StdIn,ADDR str2,100
invoke ExitProcess,0

END start




jj2007

First of all, this is a very nice program, written from scratch and well commented. Compliments :U

Can it be done faster? You have to set up a benchmark, and test it...

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
838     cycles for MyTestCode
438     cycles for InString
402     cycles for Instr_ (MasmBasic)


So, yes, it can be done a bit faster, but so much speed rarely matters. What I would suggest, though, is to reflect on how to set up a testing environment for this kind of exercise. For example, every time you launch your code you have to type in the strings - that can be avoided. See the attachment for a proposal that uses conditional assembly. It contains the timer.asm file taken from the Masm32 Laboratory. Make sure you tick "use path names" when unzipping.

Keep up the good work, you are on a very good track :thumbu

dedndave

you might start by searching for the first letter
then scan the rest of the string

SteveAsm

The C standard library contains a function for exactly this purpose, called: strstr()
Here's an example of its use:

  .486                       ; use 486 instructions
  .model flat, stdcall       ; Win32 memory model
  option casemap:none        ; case sensitive

    include ..\Include\windows.inc
    include ..\Include\stdio.inc
    include ..\Include\stdlib.inc
    include ..\Include\conio.inc

    includelib ..\Lib\crtdll.lib

    Main  proto

.data
    StringA     DB  'hi my name is deniz. did u like it?',0
    StringB     DB  'deniz',0
    StringC     DB  'result=',0
    style1      DB  '%s',0
    style2      DB  '%d',0

.data?
    iLVAR      dd    ?
    iRVAR      dd    ?
    string1    dd    ?
    string2    dd    ?
    result     dd    ?

.code
start:

  invoke Main
  invoke exit, 0

Main proc
  mov eax, offset StringA
  mov string1, eax
  mov eax, offset StringB
  mov string2, eax
 
  FINIT
  invoke strlen, string1
  mov iLVAR, eax
  invoke strlen, string2
  mov iRVAR, eax
;
  FLDZ
  mov eax, iLVAR
  mov ebx, iRVAR
  cmp eax, ebx
  jl Label1
;
  mov eax, [string1]
  mov iLVAR, eax
  invoke strstr, string1, string2
  mov iRVAR, eax
  .if iRVAR != NULL
      FILD iRVAR
      FILD iLVAR
      FSUBP ST(1), ST
      FLD1
      FADDP ST(1), ST
  .endif
Label1:
  FISTP result ; store integer, pop stack
;
  invoke printf, offset style1, offset StringC
  invoke printf, offset style2, result
;
  call _getch   ; pause
  ret                               ; return to windows
Main endp
;
end start



When run, the output is 15.

abnuque

Hi Steve,

Where can I get the following:
    include ..\Include\stdio.inc
    include ..\Include\stdlib.inc
    include ..\Include\conio.inc
These are not in the Masm package.
Thanks.

jj2007

Quote from: abnuque on April 19, 2012, 04:53:35 AM
Where can I get the following:
    include ..\Include\stdio.inc
    include ..\Include\stdlib.inc
    include ..\Include\conio.inc
These are not in the Masm package.

Use the standard masm32rt include and prepend crt_ to all C functions, as shown below. However, C strstr is a bit slow - and apparently, there is not even a case-insensitive version:
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
756     cycles for MyTestCode
624     cycles for InString
417     cycles for Instr_ (MasmBasic)
400     cycles for Instr_, case-insensitive, result=100
560     cycles for C strstr, result=100


include \masm32\include\masm32rt.inc

    Main  proto
...

invoke crt_strlen, string1
invoke crt_strlen, string2
invoke crt_strstr, string1, string2
invoke crt_printf, offset style1, offset StringC
invoke crt_printf, offset style2, result
call crt__getch      ; pause


@Steve: using the FPU for showing the result is unorthodox :bg

SteveAsm

#6
Quote from: abnuque on April 19, 2012, 04:53:35 AM
Where can I get the following:
    include ..\Include\stdio.inc
    include ..\Include\stdlib.inc
    include ..\Include\conio.inc
These are not in the Masm package.

Oops, my bad...
in my haste, I forgot to mention that my example was using jwasm , which includes those filenames.
But, JJ is correct, use his example: Masm32rt and preappend crt_ as he illustrated.

edit:
fixed typo.

jj2007

Quote from: SteveAsm on April 19, 2012, 06:46:16 AM
JJ is correct, use his Masm32rt

The owner is Hutch - unless he declares it common forum property :green

SteveAsm


SteveAsm

Alternately,
the win API has a similar set of functions StrStr and StrStrI .

jj2007

Quote from: SteveAsm on April 19, 2012, 07:57:03 AM
Alternately,
the win API has a similar set of functions StrStr and StrStrI .

They do exist, indeed:
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
754     cycles for MyTestCode
615     cycles for InString, result=100
410     cycles for Instr_ (MasmBasic)
400     cycles for Instr_, case-insensitive, result=100
555     cycles for C strstr, result=100
1586    cycles for StrStr, result=100
87280   cycles for StrStrI, result=100

uosunsetter

Quote from: jj2007 on April 19, 2012, 12:34:25 AM
First of all, this is a very nice program, written from scratch and well commented. Compliments :U

Can it be done faster? You have to set up a benchmark, and test it...

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
838     cycles for MyTestCode
438     cycles for InString
402     cycles for Instr_ (MasmBasic)


So, yes, it can be done a bit faster, but so much speed rarely matters. What I would suggest, though, is to reflect on how to set up a testing environment for this kind of exercise. For example, every time you launch your code you have to type in the strings - that can be avoided. See the attachment for a proposal that uses conditional assembly. It contains the timer.asm file taken from the Masm32 Laboratory. Make sure you tick "use path names" when unzipping.

Keep up the good work, you are on a very good track :thumbu

Thx alot!  I'm imitating (ohhh my English... couldn't find a better fitting word but you got the idea :D) what I see in this forum :)
Looking to the attachment right now.
I will be checking like 100.000 strings on memory. Rather than that, for practical purposes I must learn and achieve the fastest.

Quote from: dedndave on April 19, 2012, 01:13:15 AM
you might start by searching for the first letter
then scan the rest of the string

Actually I am doing exactly what you are saying, continuously :) checking first letter if matched checking next or if it doesn't match I slide by 1 letter :)

Btw, I'm avoiding C functions or APIs because some of the comparisions will be done with strings that not actually string (not ending with '0') and I will not need the information where the other string is found. Also all of the strings will be lower case. These made me want to optimize it. The string lengths will not be greater than like 200. So I thought why would i let a C function use the whole register for holding string's length.

I will be checking the examples for sure in order to see a better algorithm or get a better idea.

Soo I still couldn't figure out why it always give me the answer "yes" which means str1 contains str2 even it doesnt. Intrestingly, if I hardcode the length of strings that I will input(setting bl and bh registers which holds length of each string) I get proper results.

Thx in advance :)

hutch--

uosunsetter,

Don't give up on writing your own, they are not that hard to write. Usually you search for the first character by scanning one byte at a time, if you find that character you branch and compare the rest of the search string to the current location in the main string, if it matches you return the offset in the string. You can use the C runtime versions if you like and there are plenty of others to use as well but writing your own is good value and you understand it better when you write it yourself.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

uosunsetter

Quote from: hutch-- on April 19, 2012, 01:57:23 PM
uosunsetter,

Don't give up on writing your own, they are not that hard to write. Usually you search for the first character by scanning one byte at a time, if you find that character you branch and compare the rest of the search string to the current location in the main string, if it matches you return the offset in the string. You can use the C runtime versions if you like and there are plenty of others to use as well but writing your own is good value and you understand it better when you write it yourself.

and

Quote from: dedndave on April 19, 2012, 01:13:15 AM
you might start by searching for the first letter
then scan the rest of the string

Now I get it! I'm writing the new one right now and will benchmark with jj2007's example. Thank you all!

SteveAsm

Quote from: jj2007 on April 19, 2012, 08:27:29 AM
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
555     cycles for C strstr, result=100
1586    cycles for StrStr, result=100
87280   cycles for StrStrI, result=100


JJ,
I'm not too disappointed by the performance of the C functions, but, the Win-API functions seem extremely wak.
I don't understand it.
Could you maybe run them several times and average them ?