News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

question about searching strings

Started by starsiege, November 29, 2008, 12:21:30 AM

Previous topic - Next topic

starsiege

Hi guys im a newbie studying asm. and i had a question in which i had to get a string from a user; and compare it with another string i get from the user.

that is; the second string will be used to search the first string to see if there is any instance of the second string in the first one

for example

Enter text: This is just a random user defined text to make a simple search operation
Enter search word: define
Word found in text. Index is 27


i wrote the code for it but ran into some problems. ill be much obliged if anyone can help me figure why im getting the error i keep on getting

the code i wrote is this


QuoteTITLE MASM Template                  (main.asm)

; Description:
;
; Revision date:

INCLUDE Irvine32.inc

.data
textArry BYTE 100 DUP(0), 0
charArry BYTE 10 DUP(0), 0
promptText BYTE "ENTER THE TEXT",0
byteCount DWORD ?
promptChar BYTE "ENTER THE SEARCH WORD",0


.code
main PROC
cld
mov edx,OFFSET promptText
call WriteString                               ;prompts the user to enter the main strng


mov edx, OFFSET textArry
mov ecx, SIZEOF textArry
call ReadString

mov byteCount, eax
mov edx,OFFSET textArry
call WriteString



cld
mov ebx,OFFSET promptChar
call WriteString                               ;prompts the user to enter the test string


mov ebx, OFFSET charArry
mov ecx, SIZEOF charArry
call ReadString
mov byteCount, eax


mov ebx,OFFSET charArry
call WriteString


;scanning for a matchig char


mov edi, OFFSET textArry
mov al, charArry
mov ecx, LENGTHOF textArry
cld

repne scasd
jnz quit
dec edi

quit:

main ENDP

END main

the program compiles

and i get to enter the first string; and the program prints out the first string i enterd. but in stead of asking fr the second string; it automatically prints the first string again; and crashes.


if i entered "this is a test string" when im prompted

it will print "this is a test stringthis is a test string" and exit.

thanks in advance :)


MichaelW

The biggest problem I can see is that you are not properly terminating your program, so execution is continuing past the end of your main procedure. To correct this problem add a RET instruction at the end of the procedure, or use the exit equate defined in SmallWin.inc. Another problem is that your code to compare the strings and find the index of the match is not correct. For the string comparison, try the Str_compare procedure in irvine32.lib, or examine the source code for it in irvine32.asm.
eschew obfuscation

GregL

starsiege,

Also, where you are using ebx, you should be using edx for the calls to the WriteString and ReadString procedures. Look at the IrvineLibHelp.chm file.


Vortex

Hi starsiege,

Welcome to the forum.

Using the procedures supplied with the Masm32 package, here is my version :


include SearchString.inc

SIZEOF_BUFFER equ 128

.data

text1   db 'Enter text: ',0
text2   db 13,10,'Enter search word: ',0
format1 db 13,10,'Word found in text. Index is %d',13,10,0

.data?

buffer1 db SIZEOF_BUFFER dup(?)
buffer2 db SIZEOF_BUFFER dup(?)
buffer3 db 48 dup(?)

.code

start:

    mov     esi,OFFSET buffer2
    invoke  StdOut,ADDR text1
    invoke  StdIn,ADDR buffer1,SIZEOF_BUFFER
    invoke  StdOut,ADDR text2
    invoke  StdIn,esi,SIZEOF_BUFFER     ; StdIn returns the length of the input string
    mov     BYTE PTR [esi+eax-2],0      ; eliminate the CR+LF pair
    invoke  InString,1,ADDR buffer1,ADDR buffer2
    invoke  wsprintf,ADDR buffer3,ADDR format1,eax
    invoke  StdOut,ADDR buffer3
    invoke  ExitProcess,0

END start


InString returns 0 if there is no match. You should also handle the following return values of InString to report correctly the search result :


InString proc StartPos:DWORD, lpszString:DWORD, lpszSubStr:DWORD

Error Values

If the function fails, the following error values apply.
-1 = substring same length or longer than main string
-2 = "StartPos" parameter out of range (less than 1 or greater than main string length)




[attachment deleted by admin]

starsiege

MichaelW ,Greg ,Vortex  thank you guys for explaining patiently. i think i got it working but still have to iron out some small issues; mostly aesthetic. did not get to work on this for a day cos i was down with the cold :(



Vortex; i have a lil request

can you explain what the lines of ur program do? i mean not the basic ones such as

.data

text1   db 'Enter text: ',0
text2   db 13,10,'Enter search word: ',0
format1 db 13,10,'Word found in text. Index is %d',13,10,0

.data?

buffer1 db SIZEOF_BUFFER dup(?)
buffer2 db SIZEOF_BUFFER dup(?)
buffer3 db 48 dup(?)


but the following section


    mov     esi,OFFSET buffer2
    invoke  StdOut,ADDR text1
    invoke  StdIn,ADDR buffer1,SIZEOF_BUFFER                           <<<<
    invoke  StdOut,ADDR text2
    invoke  StdIn,esi,SIZEOF_BUFFER     ; StdIn returns the length of the input string
    mov     BYTE PTR [esi+eax-2],0      ; eliminate the CR+LF pair                 <<<<<
    invoke  InString,1,ADDR buffer1,ADDR buffer2
    invoke  wsprintf,ADDR buffer3,ADDR format1,eax                       <<<<<
    invoke  StdOut,ADDR buffer3
    invoke  ExitProcess,0


i think i can follow it but am not sure what exactly the lines i have marked with <<<<< do. i can get their general function but im trying to see if i can find what they do exactly.where is the search being done? .

what is masm32 library?. is is part of the masm 9.0 thats part of visual studio2008?

i apologize if im being too noobish.

and thank you for the welcome. this forum is really helpful for newbies to asm. i was browsing through the threads and found a treasure trove of information. many of those having solutions to similar problems ive encountered earlier. I think one can only get so much from a book.

i found this thread
http://www.masm32.com/board/index.php?topic=10243.0
very useful also; because even though i understood the concept of "align" i was never too sure of it. that thread explained it very well :)

im going to frequent this great forum often now. even when i dont have any dead-ends while programming cos its a great learning resource.
thanks again :)


Vortex

Hi starsiege,

invoke  StdIn,ADDR buffer1,SIZEOF_BUFFER

The StdIn function reads keyboard input from the console. The input text is stored in a buffer. The second parameter is the size of this buffer. The string read by StdIn is always terminated by a CR+LF pair ASCII 13,10

The InString function expects NULL terminated strings, so the string received by StdIn should be modified as it's terminated by CR+LF

Let's assume that the search word is "text"

The buffer pointed by the register esi ( esi -> address of buffer2 ) will contain 4+2 characters.

text <CR> <LF> 

0123  4    5  = 6 characters in total.

The NULL terminator should be inserted at the position ( SIZE OF STRING ) - 2

mov     BYTE PTR [esi+eax-2],0      ; eliminate the CR+LF pair

Notice that eax contains the return value of StdIn, the length of the substring to be searched in the main string stored in buffer1

Now, this is how it looks :

text <0>   The last character is ASCII 0

0123  4


The wsprintf function description from win32.hlp :

QuoteThe wsprintf function formats and stores a series of characters and values in a buffer. Any arguments are converted and copied to the output buffer according to the corresponding format specification in the format string. The function appends a terminating null character to the characters it writes, but the return value does not include the terminating null character in its character count.

wsprintf is a function adopting the C calling convention and it can accept variable number of parameters.

The first parameter points to a buffer receiving formatted output. The second parameter points a format-control specification :

format1 db 13,10,'Word found in text. Index is %d',13,10,0

If you are familiar with C, you will notice that %d is the symbol of a signed decimal integer argument. wsprintf replaces this %d with the content of eax, the 1 based index of the start of the substring to be searched.

invoke  wsprintf,ADDR buffer3,ADDR format1,eax

eax holds the return value of InString, the 1 based index of the start of the substring.

The masm32 static library ( masm32.lib ) is a collection of specialized routines to handle various programming tasks like calculating the size of a NULL terminated string ( StrLen ) , serching for substrings in larger strings ( InString ) , converting HEX string to DWORD ( htodw ) etc.

For more information about the library, have a look at the manual masmlib.chm coming with the Masm32 installation :

masm32\help\masmlib.chm


starsiege

Hi Vortex

thank you.the explanations were very helpful. and yes the library seems really great!. ive been working on a similar code i found to the one i was working on ealier' to see if i can get it working without using any other library than the ones in irvine32.

here is the code



INCLUDE Irvine32.inc

Str_find PROTO,
sourcePtr:PTR BYTE,
targetPtr:PTR BYTE

.data
sourceS BYTE 100 DUP (?),0
targetS BYTE 100 DUP (?),0
pos DWORD ?
sLength DWORD ?
tLength DWORD ?
resultLength DWORD ?
startpoint DWORD ?
counter DWORD 0
prompt1 BYTE "Please enter a string: ",0
prompt2 BYTE "Please enter a search string: ",0
response_found BYTE "The string was found.",0
response_index BYTE "The index is: ",0


.code
main PROC

mov edx, OFFSET prompt1
call writestring

mov ecx, (SIZEOF sourceS)
mov edx, OFFSET sourceS
call Readstring

call crlf
call crlf

mov edx, OFFSET prompt2
call writestring

mov ecx, (SIZEOF targetS)
mov edx, OFFSET targetS
call readstring

call crlf
call crlf


INVOKE Str_find,
ADDR sourceS,
ADDR targetS

jnz FOUND
jz NOT_FOUND

FOUND:


mov edx, offset response_found
call writestring

call crlf
call crlf


mov edx, OFFSET response_index
mov edx, OFFSET response_index
call writestring

mov edx, OFFSET sourceS
mov edx, esi
call writestring

call crlf
call crlf

jmp quit

NOT_FOUND:
jmp quit

quit:
main ENDP
exit


;;;;;;;;
;;;;;;;;
;;;;;;;;


Str_find PROC, sourcePtr:PTR BYTE, targetPtr:PTR BYTE

mov eax, sourcePtr
mov ebx, targetPtr
                           ; SET ecx = Length of [targetPtr] (excluding null terminator) [spans 8 lines]
mov esi, targetPtr         ; esi = beginning of the string
xor ecx, ecx ; ecx = 0

L0:
cmp BYTE PTR [esi], 0 ; [esi] == 0
je L1              ; TRUE: jump to L1
inc esi            ; esi++
inc ecx            ; ecx++
jmp L0             ; FALSE:Jump to L0

L1:
mov edx, ecx       ; edx holds ecx so we don't lose it when looping (REPE)

L2:
mov esi, eax       ; esi used in cmpsb
mov edi, ebx       ; edi used in cmpsb
repe cmpsb         ; Effect: while([esi++]==[edi++] && ecx-- > 0) {Zero flag stays set}
jz FOUND           ; If ZF is still set then the above line found a match
mov ecx, edx       ; ecx was changed 2 lines up. Put its original value back
cmp BYTE PTR [eax + ecx], 0     ; if we end up comparing NULLs we're at the end and we've failed
jz NOT_FOUND                    ; if the above is true, we've failed to find the string.
inc eax                         ; increment eax to be the next character in the sourcePtr string
jmp L2                          ; Do it all again

NOT_FOUND:
or eax, 1                ; un-set the zero flag
ret                      ; return nothing

FOUND:
                         ; mov eax, ebx ; mov the address of the found string into eax
mov tLength,edx
sub esi,tLength


ret                     
Str_find ENDP


END main




instead of printing out the index of the found string; this prints out the string and anything that comes after it

eg

QuotePlease enter a string: this is a testing string im trying out


Please enter a search string: string


The string was found.

The index is: string im trying out

Press any key to continue . . .



so i guess i have to subtract the target string length from the source string length to have the starting index of the string im searching for. but im having some lotsof while trying to implement it.

should i have a seperate function for that or is it possible to have it added in the procedure i have already?

thanks


starsiege

still stuck trying to get this to do what i want it to do    :red

KeepingRealBusy

If you want to do this yourself, do the following:

Read the target string into a buffer.
Read the test substring into a buffer.
Search the target string for the first occurrence of the first character of the test substring.
  If found, compare from the found point for a match with all of the characters of the substring.
  No need to test for length of the target string remaining, if you run out of target string before
  you run out of test string, the search should quit for not found.
If the search above fails, scan the target string for the next match with the first character of the
test string until no more matches with the first character of the test string is found.

Dave.

Mark Jones

This should also be do-able in under twenty lines of code. Consider the following:

http://www.masm32.com/board/index.php?topic=5338.0

In the first example, to find the position, consider that ECX = the offset (or address) to the source string (and this "pointer" ECX is incremented once for each byte tested) then it would only be a matter of saving the address of the first matching byte, and subtracting the starting address from it. i.e., If the source string address were 00402000h, and the first matching byte were found at 00402009h, then (402009 - 402000) = 9. Or, one could find the length of the second string (say it is 6 bytes), then match the second string completely to the first, and: (40200Fh - 402000h - 6h) = 9.

Sounds like a fun class. :U
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

starsiege

thanks guys! but i think i am almost there! :bg

the answer is right in front of me but im having trouble printing it out! :eek


i did dumpregs to see the state of the registers right before the function returned to main


String_search PROC, source_pointer:PTR BYTE, target_pointer:PTR BYTE

mov eax, source_pointer
mov ebx, target_pointer
                                ; SET ecx = Length of [target_pointer], null terminator not included
mov esi, target_pointer         ; esi = start of the string
xor ecx, ecx                    ; ecx = 0

L0:
cmp BYTE PTR [esi], 0           ; [esi] == 0
je L1                           ; TRUE: jump to L1
inc esi                         ; esi incrementeed by 1
inc ecx                         ; ecx++
jmp L0                          ; FALSE:Jump to L0

L1:
mov edx, ecx                    ; edx holds ecx so we don't lose it when looping (REPE)

;First use scasb to find first matching char
;Than compare rest of the chars by cmpsb
;for each repx, set ecx register appropriately

L2:
mov esi, eax                    ; esi used in cmpsb
mov edi, ebx                    ; edi used in cmpsb
repe cmpsb                     
jz FOUND                        ; If ZF is still set then the above line found a match
mov ecx, edx                    ; ecx was changed 2 lines up. Puting its original value back
cmp BYTE PTR [eax + ecx], 0     ; if null is being compared; we are at the end with no string
jz NOT_FOUND                    ; if the above is true, no string was found
inc eax                         ; incrementing eax to be the next character in the source_pointer string
jmp L2                          ; doing it again

NOT_FOUND:
or eax, 1                       ; desetting the zflag
ret                             ; nothing being returned

FOUND:
                                ; mov eax, ebx ; mov the address of the found string into eax
   

[color=Red]call dumpregs      [/color]                         
                               
[color=Yellow];the string i used to test was    :  this is a testing string
;the searched word was :            testing      [/color]

[color=Red]; seems that edx has the length of the source string
; eax : 10
; esi :11
; edx :7
[/color]


mov target_Length,edx
sub esi,target_Length

ret                     
String_search ENDP



so all i have to do now; to print out the index position by printing out ESI  (ESI has the starting index of the word im searching for)
but when i try to print it out in the function the thing crashes! :eek i dont know why

so can anyone tell me how to print the ESI value to the screen? thanks in advance!

starsiege

#11
Ah i got it working myself! :dance:

darn! i should have used the writeDec rather than writeString :))   :lol
thanks guys!!



the working code; in case anyone wants to comment on it or refer to it; is as follows



INCLUDE Irvine32.inc

String_search PROTO,
source_pointer:PTR BYTE,
target_pointer:PTR BYTE

;String_length PROTO,
;pString:PTR BYTE ; pointer to string

.data
sourceString BYTE 100 DUP (?),0
targetString BYTE 100 DUP (?),0
pos DWORD ?
string_Length DWORD ?
target_Length DWORD ?
firstPrompt BYTE "Please enter a source string: ",0
secondPrompt BYTE "Please enter a search string: ",0
response_found BYTE "The string was discovered while searching",0
response_index BYTE "The index is: ",0
eax_initial DWORD ?



.code
main PROC

mov edx, OFFSET firstPrompt
call writestring

mov ecx, (SIZEOF sourceString)
mov edx, OFFSET sourceString

call Readstring

call crlf

mov edx, OFFSET secondPrompt
call writestring

mov ecx, (SIZEOF targetString)
mov edx, OFFSET targetString
call readstring

call crlf

INVOKE String_search,
ADDR sourceString,
ADDR targetString

jnz FOUND
jz NOT_FOUND

FOUND:



jmp quit

NOT_FOUND:
jmp quit

quit:
main ENDP
exit



;;;;;;;;


String_search PROC, source_pointer:PTR BYTE, target_pointer:PTR BYTE


mov eax, source_pointer
mov eax_initial,eax
mov ebx, target_pointer
                                ; SET ecx = Length of [target_pointer], null terminator not included
mov esi, target_pointer         ; esi = start of the string
xor ecx, ecx                    ; ecx = 0

L0:
cmp BYTE PTR [esi], 0           ; [esi] == 0
je L1                           ; TRUE: jump to L1
inc esi                         ; esi incrementeed by 1
inc ecx                         ; ecx++
jmp L0                          ; FALSE:Jump to L0

L1:
mov edx, ecx                    ; edx holds ecx so we don't lose it when looping (REPE)


L2:
mov esi, eax                    ; esi used in cmpsb
mov edi, ebx                    ; edi used in cmpsb
repe cmpsb                     
jz FOUND                        ; If ZF is still set then the above line found a match

mov ecx, edx                    ; ecx was changed 2 lines up. Puting its original value back
cmp BYTE PTR [eax + ecx], 0     ; if null is being compared; we are at the end with no string
jz NOT_FOUND                    ; if the above is true, no string was found
inc eax                         ; incrementing eax to be the next character in the source_pointer string
jmp L2                          ; doing it again

NOT_FOUND:
or eax, 1                       ; desetting the zflag
ret                             ; nothing being returned

FOUND:
                                ; mov eax, ebx ; mov the address of the found string into eax

sub esi,target_Length

sub eax,eax_initial

mov edx, offset response_Index
call writestring

call writeDec
call crlf

ret                     
String_search ENDP


END main