News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

A little trouble with writing an Str_find PROC

Started by omdown, May 17, 2005, 10:59:54 PM

Previous topic - Next topic

omdown

Okay, I'm trying to find a way to make it so that a user can enter two strings, and have the program search for the second string in the first string, then remove it.  But I'm having a bit of trouble figuring out how to get the exact location of the string back.  The Str_find PROC in the code down there is borrowed, it's got plenty of comments but I can't quite figure it out.  I thought at first that it returned with EAX holding the location of the found code, but that doesn't seem to be the case as I tried passing eax to edx and call writestring and all I'm getting is an error.  Then I thought to try EBX as well, since before it returns, it passes EBX to EAX, still no luck.  So freakin' lost at this point.   :tdown

By the way, the reason the Str_remove PROTO and PROC are there but I haven't even started on trying to pass to them as I can't figure out for the life of me how to get the positioning.  So um . . . anyone here have any idea what I'm doing wrong?  Have I mentioned you guys are the best people ever?   :cheekygreen:  Sorry to suck up but seriously without you guys I have no idea where I'd be.

TITLE Str_find and Str_remove aps

INCLUDE Irvine32.inc

Str_find PROTO,
sourcePtr:PTR BYTE,
targetPtr:PTR BYTE

Str_remove PROTO,
sourcePtr:PTR BYTE,
removeN:DWORD


.data
sourceS BYTE "123ABCDEFG",0
targetS BYTE "ABC",0
pos DWORD ?
sLength DWORD ?
tLength DWORD ?
startpoint DWORD ?
counter DWORD 0

.code
main PROC


INVOKE Str_find,
ADDR sourceS,
ADDR targetS



FOUND:

mov edx,eax
call writestring

jmp quit

NOT_FOUND:
mov al,'x'
call writechar    ; just as something to show me an error
                exit


quit:
main ENDP
exit

;------------------------------------------------------------------------------------------------------------------
;------------------------------------------------------------------------------------------------------------------
;------------------------------------------------------------------------------------------------------------------
Str_find PROC USES eax ebx esi edi, sourcePtr:PTR BYTE, targetPtr:PTR BYTE

mov eax, sourcePtr
mov ebx, targetPtr
; SET ecx = Length of [targetPtr] (excluding null terminator) [spans 8 lines]
mov esi, targetPtr ; esi = beginning of the string
xor ecx, ecx ; ecx = 0
L0:
cmp BYTE PTR [esi], 0 ; [esi] == 0
je L1 ; TRUE: jump to L1
inc esi ; esi++
inc ecx ; ecx++
jmp L0 ; FALSE:Jump to L0
L1:
mov edx, ecx ; edx holds ecx so we don't lose it when looping (REPE)
L2:
mov esi, eax ; esi used in cmpsb
mov edi, ebx ; edi used in cmpsb
repe cmpsb ; Effect: while([esi++]==[edi++] && ecx-- > 0) {Zero flag stays set}
jz FOUND ; If ZF is still set then the above line found a match
mov ecx, edx ; ecx was changed 2 lines up. Put its original value back
cmp BYTE PTR [eax + ecx], 0 ; if we end up comparing NULLs we're at the end and we've failed
jz NOT_FOUND ; if the above is true, we've failed to find the string.
inc eax ; increment eax to be the next character in the sourcePtr string
jmp L2 ; Do it all again
NOT_FOUND:
or eax, 1 ; un-set the zero flag
ret ; return nothing
FOUND:
mov eax, ebx ; mov the address of the found string into eax
mov tLength,edx
ret ; How do I return eax?
Str_find ENDP
;------------------------------------------------------------------------------------------------------------------
;------------------------------------------------------------------------------------------------------------------
;------------------------------------------------------------------------------------------------------------------


;----------------------------------------------------------------------------
; Str_remove - Removes removeN characters after sourcePtr
; If programmer wants to remove more characters than there ARE before
; the null terminator then only the characters up to NULL are removed
Str_remove PROC USES eax ecx esi edi, sourcePtr:PTR BYTE, removeN:DWORD
;----------------------------------------------------------------------------
mov esi, sourcePtr ; esi = beginning of the string
xor ecx, ecx
L0: ; Get the # of characters until the null terminator
cmp BYTE PTR [esi], 0 ; [esi] == 0
je L1 ; TRUE: jump to L1
inc esi ; esi++
inc ecx ; ecx++
jmp L0 ; FALSE:Jump to L0
L1:
inc ecx ; ecx = # chars EXCEPT the N.T. so add 1 more
cmp ecx, removeN ; user wants to remove more characters than there ARE left?
jb L3 ; TRUE: jump to L3
mov edi, sourcePtr ; edi points to the first byte we want to remove
mov esi, sourcePtr ; esi points to the 1st byte after what
add esi, removeN
cld ; movsb auto-INCREMENTS after each operation
rep movsb ; Effect: while(ecx > 0) { mov [esi++], [edi++] }
ret ; return
L3:
mov esi, sourcePtr
mov BYTE PTR [esi], 0
ret
Str_remove ENDP


END main

hutch--

There are basic rules with registers for preservation, if you are using a normal stack frame, preserve EBX ESI & EDI if you use any of them, if you manually code your own proc entry and exit, you must also preserve ESP & EBP.

For a normal byte scanner search algo, you scan for the 1st character and when you find it, you do a branch compare to test the rest of the word. If it matches, you return with the address of the 1st character otherwise you continue scanning until you find another or the end of the source text.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

MichaelW

Hi omdown,

To return a value in EAX you just leave the value in EAX when you return. But to make this work you must avoid preserving EAX because otherwise the generated code at the end of the procedure will overwrite the return value. This is a disassembly of the code at the start and end of the Str_find procedure:

00401034 55                     push    ebp     ; start of generated code
00401035 8BEC                   mov     ebp,esp
00401037 50                     push    eax
00401038 53                     push    ebx
00401039 56                     push    esi
0040103A 57                     push    edi     ; end of generated code
0040103B 8B4508                 mov     eax,[ebp+8]
0040103E 8B5D0C                 mov     ebx,[ebp+0Ch]
00401041 8B750C                 mov     esi,[ebp+0Ch]
00401044 33C9                   xor     ecx,ecx
...
00401064 83C801                 or      eax,1
00401067 5F                     pop     edi     ; start of generated code
00401068 5E                     pop     esi
00401069 5B                     pop     ebx
0040106A 58                     pop     eax
0040106B C9                     leave
0040106C C20800                 ret     8       ; end of generated code

As you can see, the return value in EAX will be overwritten by the POP EAX instruction that will restore EAX to the value it had at procedure entry. To correct this problem you need to remove EAX from the register list following the USES keyword.
eschew obfuscation

omdown

Okay, that worked . . . I sort of get why, though I don't get why it didn't work when I tried pushing EAX to the stack, then returning, then poping it again.  Shouldn't that have worked to save EAX and keep from having this issue? 

MichaelW

Pushing a value within the procedure and not popping it before the end of the procedure would cause a stack imbalance. In the code above, the value pushed would be popped into EDI, the preserved EDI value would be popped into ESI, the preserved ESI value into EBX, and so on down until you ran out of pop instructions. At this point the preserved EBP would be left on the stack. The return instruction, expecting the return address at the top of the stack, would attempt to use the preserved value of EBP for the return address. The most likely result would be GPF or other fault, and Windows would terminate the program.
eschew obfuscation

thomasantony

Hi,
  Doesn't the restoration of the stackframe at the end of a procedure correct any imbalances? And omdown, here is a simplere way to get the size of the string in ecx

mov edi,targetPtr
mov ecx,targetPtr
@@:
    cmp byte ptr[ecx],0
    je @F
    add ecx,1
    jmp @B
@@:
    sub ecx,edi

And VOILA!! the length of target string is in ECX!! BTW if you didn't know @@ is an unnamed label. so JE @F means jump to the unnamed label in front, @B means the one before the jump instruction.

Thomas :U
There are 10 types of people in the world. Those who understand binary and those who don't.


Programmer's Directory. Submit for free

MichaelW

Quote from: thomasantony on May 18, 2005, 06:44:55 AM
Doesn't the restoration of the stackframe at the end of a procedure correct any imbalances?

Yes, the restoration of the stack pointer does. Thanks for the correction. But although nothing would crash and burn, the method would still not return a value in EAX.
eschew obfuscation

chep

And the "preserved" registers (EBX/ESI/EDI) would anyway be messed up, which the caller is likely to dislike. :P

Mirno

MichaelW was right first time, pushing eax will cause a stack imbalance if it's not popped within the same proc.

When the assembler comes to clean up, the USES registers will get the wrong values, (edi = pushed eax, esi = original edi, ecx = original esi, eax = original ecx), then the real fun begins, as ebp is poped for the stack frame restore but instead of getting ebp it gets the original eax value, esp is then set to ebp (original eax), then the function returns to what ever address is referenced by the original eax! This is more than likely to crash...

Mirno

chep

Mirno,

As thomasantony said, mov esp, ebp occurs before pop ebp.

Consider this:

foo PROC
LOCAL whatever:DWORD
    ...
    ret
foo ENDP


is generated as:

push ebp
mov esp, ebp
add esp, 4
...
leave        ; could also be:   mov esp, ebp   /   pop ebp
ret



Mark Jones

"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

omdown

Hmmmmm . . . very interesting . . . very interesting indeed . . . I love this forum.

omdown

Okay, new problem, when the find PROC ends, it returns pointing to the targetS, instead of where it found it in sourceS, and now I'm having trouble figuring out how to make sure it always points to the right place when I run str_Remove.  I thought at first to set  the PTR BYTE to the offset of sourceS and add tLength, but then I realized that that only worked because tLength was three and the ABC just happened to be three characters from the beginning.  If I change targetS to BCD, str_find will still find it and set str_remove to 3, and the result will still be 123DEFG in the end.  How could I set something to remember the position of the first character?  Or maybe just how many characters from the beginning to start at or something?  I've tried inputing counters but they don't seem to be working very well.   :'(