Problem passing the location of a searched for string

Started by omdown, May 18, 2005, 10:15:05 PM

Previous topic - Next topic

omdown

Okay, here's the issue.  I was having some problems with this program a while back, but we figured out the problem and now there's a new one . . .

The program:  This program is supposed to get two strings from a user (at the moment the strings are predetermined for the sake of keeping it simple while I try to figure out the problem), the first string will be the "source string," which will be searched by the "target string," which is the second string the user will enter.  A Str_find PROC will then run, which will see if it can find the correct string match in the source string.  It's supposed to return with the location of where it found the target in the source, and then pass that along to Str_remove, which will remove the found string in the source.

The problem: I can't seem to get the Str_find to pass the location of the found string properly.  Below is what I have so far, I know something is wrong but I've tried a dozen different things and none of them worked.  I thought maybe putting some kind of a counter in Str_find to count how many spaces from the beginning it found the target string, but I can't figure out where I would put it as the rep movsb doesn't have any space to put in extra code or anything . . . very confused . . . anyone?


TITLE ch09_03                    (ch09_03.asm)

INCLUDE Irvine32.inc

Str_find PROTO,
sourcePtr:PTR BYTE,
targetPtr:PTR BYTE

Str_remove PROTO,
sourcePtr:PTR BYTE,
removeN:DWORD


.data
sourceS BYTE "123ABCDEFG",0
targetS BYTE "BCD",0
pos DWORD ?
sLength DWORD ?
tLength DWORD ?
startpoint DWORD ?
counter DWORD 0

.code
main PROC


INVOKE Str_find,
ADDR sourceS,
ADDR targetS

FOUND:
mov edx,eax
call writestring

call crlf
call crlf

mov eax, OFFSET sourceS
add eax,tLength

call dumpregs

call crlf
call crlf

mov edx,esi
call writestring

call crlf
call crlf

INVOKE Str_remove,
esi,
tLength

mov edx, OFFSET sourceS
call writestring

call crlf
call crlf

jmp quit

quit:
main ENDP
exit

;------------------------------------------------------------------------------------------------------------------
;------------------------------------------------------------------------------------------------------------------
;------------------------------------------------------------------------------------------------------------------
Str_find PROC, sourcePtr:PTR BYTE, targetPtr:PTR BYTE

mov eax, sourcePtr
mov ebx, targetPtr
; SET ecx = Length of [targetPtr] (excluding null terminator) [spans 8 lines]
mov esi, targetPtr ; esi = beginning of the string
xor ecx, ecx ; ecx = 0
L0:
cmp BYTE PTR [esi], 0 ; [esi] == 0
je L1 ; TRUE: jump to L1
inc esi ; esi++
inc ecx ; ecx++
jmp L0 ; FALSE:Jump to L0
L1:
mov edx, ecx ; edx holds ecx so we don't lose it when looping (REPE)
L2:
mov esi, eax ; esi used in cmpsb
mov edi, ebx ; edi used in cmpsb
repe cmpsb ; Effect: while([esi++]==[edi++] && ecx-- > 0) {Zero flag stays set}
jz FOUND ; If ZF is still set then the above line found a match
mov ecx, edx ; ecx was changed 2 lines up. Put its original value back
cmp BYTE PTR [eax + ecx], 0 ; if we end up comparing NULLs we're at the end and we've failed
jz NOT_FOUND ; if the above is true, we've failed to find the string.
inc eax ; increment eax to be the next character in the sourcePtr string
jmp L2 ; Do it all again
NOT_FOUND:
or eax, 1 ; un-set the zero flag
ret ; return nothing
FOUND:
; mov eax, ebx ; mov the address of the found string into eax
mov tLength,edx
ret ; How do I return eax?
Str_find ENDP
;------------------------------------------------------------------------------------------------------------------
;------------------------------------------------------------------------------------------------------------------
;------------------------------------------------------------------------------------------------------------------


;----------------------------------------------------------------------------
; Str_remove - Removes removeN characters after sourcePtr
; If programmer wants to remove more characters than there ARE before
; the null terminator then only the characters up to NULL are removed
Str_remove PROC, sourcePtr:PTR BYTE, removeN:DWORD
;----------------------------------------------------------------------------

mov esi, sourcePtr ; esi = beginning of the string
xor ecx, ecx


L0: ; Get the # of characters until the null terminator

cmp BYTE PTR [esi], 0 ; [esi] == 0
je L1 ; TRUE: jump to L1
inc esi ; esi++
inc ecx ; ecx++
add counter,1
jmp L0 ; FALSE:Jump to L0
L1:
inc ecx ; ecx = # chars EXCEPT the null terminator so add 1 more
cmp ecx, removeN ; user wants to remove more characters than there ARE left?
jb L3 ; TRUE: jump to L3

mov edi, sourcePtr ; edi points to the first byte we want to remove
mov esi, eax ; esi points to the 1st byte after what
sub esi, removeN
cld ; movsb auto-INCREMENTS after each operation

rep movsb ; Effect: while(ecx > 0) { mov [esi++], [edi++] }

ret ; return



L3:
mov esi, sourcePtr
mov BYTE PTR [esi], 0
ret
Str_remove ENDP


END main

omdown

Nevermind, I got it, though I would like to know if there is a more efficient way of doing this...

Here's what I did.  This is the finished product.

TITLE ch09_03                    (ch09_03.asm)

INCLUDE Irvine32.inc

Str_find PROTO,
sourcePtr:PTR BYTE,
targetPtr:PTR BYTE

Str_remove PROTO,
sourcePtr:PTR BYTE,
removeN:DWORD


.data
sourceS BYTE 100 DUP (?),0
targetS BYTE 100 DUP (?),0
pos DWORD ?
sLength DWORD ?
tLength DWORD ?
startpoint DWORD ?
counter DWORD 0
prompt1 BYTE "Please enter a string: ",0
prompt2 BYTE "Please enter a search: ",0
response_found BYTE "The string was found.  Removing . . .",0
response_removed BYTE "The string was removed: ",0




.code
main PROC

mov edx, OFFSET prompt1
call writestring

mov ecx, (SIZEOF sourceS)
mov edx, OFFSET sourceS
call Readstring

call crlf
call crlf

mov edx, OFFSET prompt2
call writestring

mov ecx, (SIZEOF targetS)
mov edx, OFFSET targetS
call readstring

call crlf
call crlf


INVOKE Str_find,
ADDR sourceS,
ADDR targetS

jnz FOUND
jz NOT_FOUND

FOUND:


mov edx, offset response_found
call writestring

call crlf
call crlf

INVOKE Str_remove,
esi,
tLength

mov edx, OFFSET response_removed
call writestring

mov edx, OFFSET sourceS
call writestring

call crlf
call crlf

jmp quit

NOT_FOUND:
jmp quit

quit:
main ENDP
exit

;------------------------------------------------------------------------------------------------------------------
;------------------------------------------------------------------------------------------------------------------
;------------------------------------------------------------------------------------------------------------------
Str_find PROC, sourcePtr:PTR BYTE, targetPtr:PTR BYTE

mov eax, sourcePtr
mov ebx, targetPtr
; SET ecx = Length of [targetPtr] (excluding null terminator) [spans 8 lines]
mov esi, targetPtr ; esi = beginning of the string
xor ecx, ecx ; ecx = 0
L0:
cmp BYTE PTR [esi], 0 ; [esi] == 0
je L1 ; TRUE: jump to L1
inc esi ; esi++
inc ecx ; ecx++
jmp L0 ; FALSE:Jump to L0
L1:
mov edx, ecx ; edx holds ecx so we don't lose it when looping (REPE)
L2:
mov esi, eax ; esi used in cmpsb
mov edi, ebx ; edi used in cmpsb
repe cmpsb ; Effect: while([esi++]==[edi++] && ecx-- > 0) {Zero flag stays set}
jz FOUND ; If ZF is still set then the above line found a match
mov ecx, edx ; ecx was changed 2 lines up. Put its original value back
cmp BYTE PTR [eax + ecx], 0 ; if we end up comparing NULLs we're at the end and we've failed
jz NOT_FOUND ; if the above is true, we've failed to find the string.
inc eax ; increment eax to be the next character in the sourcePtr string
jmp L2 ; Do it all again
NOT_FOUND:
or eax, 1 ; un-set the zero flag
ret ; return nothing
FOUND:
; mov eax, ebx ; mov the address of the found string into eax
mov tLength,edx
sub esi,tLength
ret ; How do I return eax?
Str_find ENDP
;------------------------------------------------------------------------------------------------------------------
;------------------------------------------------------------------------------------------------------------------
;------------------------------------------------------------------------------------------------------------------


;----------------------------------------------------------------------------
; Str_remove - Removes removeN characters after sourcePtr
; If programmer wants to remove more characters than there ARE before
; the null terminator then only the characters up to NULL are removed
Str_remove PROC, sourcePtr:PTR BYTE, removeN:DWORD
;----------------------------------------------------------------------------

mov esi, sourcePtr ; esi = beginning of the string
xor ecx, ecx


L0: ; Get the # of characters until the null terminator

cmp BYTE PTR [esi], 0 ; [esi] == 0
je L1 ; TRUE: jump to L1
inc esi ; esi++
inc ecx ; ecx++
add counter,1
jmp L0 ; FALSE:Jump to L0
L1:
inc ecx ; ecx = # chars EXCEPT the null terminator so add 1 more
cmp ecx, removeN ; user wants to remove more characters than there ARE left?
jb L3 ; TRUE: jump to L3

mov edi, sourcePtr ; edi points to the first byte we want to remove
mov esi, sourcePtr ; esi points to the 1st byte after what
add esi, removeN
cld ; movsb auto-INCREMENTS after each operation

rep movsb ; Effect: while(ecx > 0) { mov [esi++], [edi++] }

ret ; return



L3:
mov esi, sourcePtr
mov BYTE PTR [esi], 0
ret
Str_remove ENDP


END main

Jeff

hello,
it seems that we both are using assembly for intel by irvine, so i could see where you're coming from.  :)

my comments are all referring to the search procedure:
one thing i can say is to write your "string length" portion of code as a seperate procedure.  that could be used in many more places and can save time for you in the long run.

you might want to free up some registers instead of "reserving" them.  meaning, instead of moving the pointers of the strings to eax and ebx at the beginning, rather you may want to move them into the registers when they're needed.  this way, you wont have to be forced to work with less registers than available and eliminate.

it looks like you're using the brute force searching algorithm.  if you are concerned with the speed of your program, you might want to look into other algorithms or at least make a few modifications to your code.  what you have runs on the order of O(n^2).  rather than using the string comparing mnemonics, compare each character and break the search the moment you find a mismatch. 

its like searching for 100000000 in 000000000000000111110000000000.  it will be very inefficient to compare the characters [0,8], then [1,9], [2,10] and so on especially for this case.  it would be better to compare source[0] with target[0]... fail... source[0] with target[1]... fail... and so on.  sure it would require more coding and results in slightly larger program size, it would be a lot more efficient.

i had just finished an assignment i had that was based on string searching.  to give you a better idea on what im trying to say:
BruteForce PROC
    PUSHAD
    MOV ecx,0
    MOV edx,iTextSize
    SUB edx,iPatternSize
    .REPEAT
        INVOKE Find, ecx
        .IF eax == 1
            MOV Position,ecx
            JMP @F
        .ENDIF
        INC ecx
    .UNTIL ecx > edx
@@:
    POPAD
    RET
BruteForce ENDP

Find PROC Location:DWORD
LOCAL Found:DWORD
    MOV Found,1
    PUSHAD
    MOV esi,0
    MOV ebx,bPattern
    MOV edx,bText
    ADD edx,Location
    .REPEAT
        MOV al,BYTE PTR [ebx+esi]
        .IF al != BYTE PTR [edx+esi]
            MOV Found,0
            JMP @F
        .ENDIF
        INC esi
    .UNTIL esi >= iPatternSize
@@:
    POPAD
    MOV eax,Found
    RET
Find ENDP

bPattern and bText are pointers to the buffers containing the strings, iPatternSize is the size of bPattern and likewise for iTextSize.

my assignment was to read in files (pattern and text) and find if pattern was contained in text using different search algorithms.  i dynamically allocated the memory for the buffers which complicated a few things.  :)

i hope i've been some help to you.