News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Simple string manipulations

Started by tanto, February 18, 2005, 03:33:46 PM

Previous topic - Next topic

tanto

Hi everyone,

I have some issues concerning the likes of c#  [string].substr, [string].indexof  or substr, instr, left, right etc in C++ or perhaps the best metafor the SubStr and Pos in Delphi/Pascal.

Say I want to do the following:

1) Pass a string to a function along with another string acting as keyword to find
2) find the first occurence of the keyword string in the main string (pos)
3) find the second occurence of the keyword (or cut the main string so that it starts after the first keyword)
4) use the above information to cut out the sought after string between the two occurences of the keyword. (substr)
5) return the found substring and maintaining the "Main String" in the status we left it in, in 3.

This would not be hard in a high level language.

But in MASM I think I even have problem understanding what I'm passing on to the procedure.

Off the top of my head I would like to give it a try with some "psuedo MASM code" below and please feel free to come with pointers or sample URL's on where I can find someone who's done this before me (there's bound o be hundreds of you out there).

With a string that looks as follows:   myUrl="http://someurl.com", myUrl2="http://someOtherUrl", etc etc


FindUrlInString proc szUrlList:DWORD, szKeyWord:DWORD
  Local  szResultString:DWORD
 
  ;Lets try to find the position of the keyword 'myUrl'
  mov edi, istring(0,szUrlList,szKeyWord)
 
  ;did we find any?
  .if edi != 0
     ;how to do a subtring? psuedo code so you know what I'm after here
     szUrlList SubStr, edi+2, szUrlList  ; i.e. copy the string from position given in edi into the same string (+2 for the =" to go away)
     
     ;now in this psuedo code szUrlList would look like: http://someurl.com", myUrl2="http://someOtherUrl", etc etc
     ;the beginning would be gone. Now we'd just like to cut it off at the next quotation mark (") and get the URL that sits there
     
     ;Lets try to find the position of the citation mark - psuedo code
     mov edi, istring(0,szUrlList,chr('"'))  ;how do I create a string or char on the spot that contains a " - mark?
   
        .if edi != 0
            szResultString SubStr, edi, szUrlList
        .endif
     .endif

     ;move the result to eax so the caller can use it
     mov eax, szResultString
     ret

FindUrlInString endp



So basically, as you can see, i badly need help with string manipulation. Any suggestions would be very welcome.

Also please enlighten me whether or not a string can be passed by the use of the DWORD type? Thanks much.

tanto

oh well, since the answers hasn't been pouring in on this one, let me rephrase it a bit to see if we get any takers:

If I have a DWORD pointing to a string, how would I go about invoking a successful substring operation semantically described like so:



Substr(startPositionValue, myDWORDstring)



As always, very greatful for any help.

GregL

You could get started by taking a look at the 'InString' function in the MASM32 library. The source code for it is in 'C:\Masm32\M32LIB'.

tanto

Thanks. Now I see where to look. I have to appologize not knowing of this source of information. It really solves all of the problems I had above. Sorry if I wasted someone's time.

Mark Jones

#4
 I'm curious, what are some methods to both A) concatinate multiple strings and B) reset strings? i.e. say you want a program to do this:

* Create a diagnostic window
* Read a line from an .ini file
* Parse version data, display in diagnostic window
* Read a line from the .ini file
* Parse date and time, display in diagnostic window
* ... etc
* close files, reset string pointers so this whole process can be done again

I've botched together enough calls to make this work, but it doesn't seem very efficient (or pretty):


Push Dword Ptr pProgressText ; save our original accumulator string offset!
szText t1, "Reading .ini file... ",13,10,0
Invoke lstrcat, Addr pProgressText, Addr t1 ; add text to blank string
Invoke SetWindowText, hEdit, Addr pProgressText ; display initial text
; do some stuff here...
szText t2, "Checking version... ",13,10,0
Invoke lstrcat, Addr pProgressText, Addr t2 ; add text
Invoke SetWindowText, hEdit, Addr pProgressText ; two lines of text now
; do some stuff here...
szText t3, "Loading parameters... ",13,10,0
Invoke lstrcat, Addr pProgressText, Addr t3 ; three lines of diagnostics
Invoke SetWindowText, hEdit, Addr pProgressText
; finish doing stuff here
Pop Dword Ptr pProgressText ; reset our string pointer offset!


I've had bad luck with SetDlgItemText, and SetWindowText says it will work fine for controls, so that's what I've been using. Still I seem to recall some API having to do with multiple string concatination, but can't find anything now. Any ideas?

What ends up happening when run the second time is the display window starts off with four high-ascii characters, as if the pProgressText pointer is being corrupted or overwritten or not pointed to properly. (Hmm, maybe simply writing four nulls to [pProgressText] at the beginning of the procedure will fix it - maybe it DOES contain data? Still, that doesn't explain why it would be high-ascii.)

Thanks for looking, I hope you have a nice weekend.
-Mark
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

pbrennick

Mark,
Using lstrcpy and lstrcat... make very sure that your strings are null terminated on the source side.  The function will take care of the destination side.  If you are getting high ascii values, you maybe walking on the wild side!  :naughty:

You will get there,
Paul

tenkey

Does the label pProgressText mark the start of your text buffer, or is it the label for a pointer variable?

If it's a pointer variable, remove the ADDR in front of pProgressText in your INVOKEs.
A programming language is low level when its programs require attention to the irrelevant.
Alan Perlis, Epigram #8

Mark Jones

 Thanks Paul, yes I've seen what happens when strings are not null-terminated. It always results in a c-c-c-craaaash!  :lol

Tenkey, the Dword Ptr to pProgressText is actually wrong, I think, since pProgressText is a 256-byte string. i.e.,


.data?
pProgressText  DB  255 DUP(?)


But neither Dword Ptr nor Byte Ptr seemed to make any difference. Hmm, maybe converting it to a DD will help. :)
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

tenkey

Then you need to keep the ADDR operator.
Using DD with the same DUP will simply quadruple the space set aside for your string.

The string buffer, pProgressText, will not be reset after use. It will still contain the last text that was created in it. lstrcat will just keep adding text to the end - add enough text and you have buffer overflow, which can affect other data. lstrcpy will reset the buffer.

The PUSH and POP of pProgressText makes no sense. You are only saving and restoring the first four bytes. If you are storing pointers in pProgressText because of other code, you need to ask yourself why you are using one label for two different purposes.
A programming language is low level when its programs require attention to the irrelevant.
Alan Perlis, Epigram #8

Mark Jones

Well, as I understand it, pProgressText (and any other var) is simply a pointer to flat data. I've triple-checked other code to make sure it is not overwriting the first four bytes at offset pProgressText. My thoughts on using Push/Pop was that perhaps something was making pProgressText decrement by four and hence show some other previous data; so I thought of saving and restoring the "good" pProgressText offset might fix the problem. (Push/Pop doesn't push the contents of vars, does it?)

Can you use lstrcpy with a null to reset pProgressText? Maybe that is all that needs to be done. :)
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

tenkey

#10
You misunderstand the nature of labels - they aren't variables. The variable is the memory bytes that contain the data. As a label, "pProgressText" is simply the name of a location - "X marks the spot". Its MASM "type" is the data type that is expected to be stored at that location. (You can lie.) Your code cannot change the address (= location) assigned to the label pProgressText. It can only change the data stored at that address.

PUSH will push 4 bytes of data. It will not push an entire string, if it's longer than four bytes.

You can either use lstrcpy without nulls, or null the destination buffer before lstrcat.

; First, and preferred, option - copy

invoke lstrcpy, ADDR pProgressText, ADDR t1

; Second option - catenation

mov    pProgressText,0    ; clear the buffer by storing final 0 byte
invoke lstrcat, ADDR pProgressText, ADDR t1


There is no Win32 API for multiple concatenation, only for appending one string at the end of another - and that is lstrcat, the "concatenation" API. lstrcat requires both strings to be properly terminated. Be sure you have enough space for the resulting string.
A programming language is low level when its programs require attention to the irrelevant.
Alan Perlis, Epigram #8

hutch--

Alex Yakubchik (The Svin) solved that problem with the szMultiCat procedure that is contained in the MASM32 library. Allocate a buffer big enough and add multiple strings one after another.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Mark Jones

#12
Aaaha... I thought that since you could mov a pointer value into EAX and increment EAX to write sequential bytes to a string, that the pointer offset itself was incrementable. Guess not!  ::)

Thanks hutch, that was the routine I was looking for.  :U
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08