News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Parsing string(s)

Started by Ksbunker, October 28, 2007, 04:22:00 AM

Previous topic - Next topic

Ksbunker

I seem to be having troubles parsing a string for the 2nd last word.
I've been trying this, bu it just does not work, anyhelp would be greatful, cheers

Grab PROC

.data

szFormat db "%s", 0
string db "anything THIS! blah", 0

.code

Invoke lstrlen, addr string
mov ecx, eax

lea eax, offset string
@@:
cmp byte ptr [eax+ecx], " "
je @F
loop @b

@@:
mov byte ptr [eax+ecx], 0

@@:
cmp byte ptr [eax+ecx], " "
je @F
loop @B

@@:
inc ecx
add eax, ecx

Invoke wsprintf, ADDR szBuffer, addr szFormat, eax

ret
Grab ENDP

MichaelW

One easy method is to use the CRT strtok function to parse the string, storing the address of each word into a pointer array, and then use a base-zero index on the array to access the individual words. Note that strtok will alter the string, replacing the delimiters with nulls.

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
      ptrs    dd 20 dup(0)
      count   dd 0
      string  db "anything THIS! blah", 0
    .code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    ; Tokenize the string using a space character as a
    ; delimiter, store the address of each token into
    ; the pointer array, and update the count.

    mov ebx, OFFSET ptrs
    invoke crt_strtok, ADDR string, chr$(" ")
    .WHILE eax
      mov [ebx], eax
      inc count
      add ebx, 4
      invoke crt_strtok, NULL, chr$(" ")
    .ENDW
    print ustr$(count),13,10

    ; Display the tokens from first to last.

    xor ebx, ebx
    .REPEAT
      mov eax, [ptrs+ebx*4]
      print eax,13,10
      inc ebx
    .UNTIL ebx == count

    inkey "Press any key to exit..."
    exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start

BTW, all three of the following instructions will load the address of string into eax, but you normally would not use the offset operator in the source operand for lea. I initially expected the first form to be rejected, but when I attempted to assemble it MASM produced the same output as it did for the second form.

lea eax, offset string
lea eax, string
mov eax, offset string

eschew obfuscation

ToutEnMasm

Hello,
I will not answer to the question but give you some help.

The loop instruction is very slow and can be replace by "dec ecx" and jcxnz.
If you use ecx as a pointer,you must start your search at the end of the chain.[eax+ecx] must Point the zero.
Don't forget that loop dec ecx and that the end of search don't point on what you search but to the caracter before.
wsprintf is deprecated (memory leak problems) and must be replace by http://www.masm32.com/board/index.php?topic=8022.msg58718#msg58718

If you want to see the chain starting at the position of the pointer use messagebox,it's enough.
The StringCbPrintfEx function is useful when you want to format numbers in ascii or when you want to format chains.

A more readable loop is

lea eax,string
@@:
.if byte ptr [eax] == 0             ;no need of lstrlen
    ;end of chain without find a space
    do something
.elseif byte ptr [eax] == " "
    jmp @F       ;
.else
    ;not found
   inc eax
  jmp @B
.endif
@@:












hutch--

Ksbunker,

Have a look at the tokeniser in the masm32 library, the wtok procedure is an in place tokeniser and is genuinely fast.

1. Grab the line contain9ing the word you want.
2. pass through tokeniser.
3. read return value for word count.
4 get word that is one less than count.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Mark Jones

Quote from: Ksbunker on October 28, 2007, 04:22:00 AM
Invoke lstrlen, addr string
mov ecx, eax

lea eax, offset string
@@:
cmp byte ptr [eax+ecx], " "
je @F
loop @b

@@:


Hello, here's some notes about this code:


    invoke lstrlen,addr string          ; get length of string
    mov ecx,eax                         ; copy full length into ecx
    lea eax,offset string               ; load address of string into eax
@@:                                     ; loop point here
    cmp byte ptr [eax+ecx]," "          ; is byte at [string address+length] == " "?
    je @F                               ; if it is, jump forward
    ; here needs to be what? a decrement of ecx to search backwards?
    ; here also needs to be what? what happens if you land on a null or if ecx overflows?
    loop @b                             ; else loop
@@:                                     ; match occured



What might be simpler is to perform the search forwards. Consider something like the following:

.data
    szSource    db  "Hello world! Masm Forum is great!",0
.data?
    szDest      db  256 dup(?)          ; big enough to hold our string
.code
    ; then set esi and edi to source and destination offsets
    lea esi,szSource
    lea edi,szDest
go: cmp byte ptr[esi],00                ; loop here, always check for end-of-string
    jz breakout                         ; once null is found, break out
    cmp byte ptr[esi]," "               ; ok not a null, is it a space?
    jz @F                               ; if a space, skip next instruction
    movsb   ; copies a byte from [EDI]<--[ESI] and increments both by 1
    jmp go                              ; and loop
@@: movsb                               ; else it was a space... copy it and
    mov byte ptr[edi-1],"+"             ; overwrite copied space with new + char
    jmp go                              ; and loop
breakout:                               ; null found in source string
    movsb                               ; must copy null to dest!
    ; now we're done...
    ; view both of the strings:
    invoke MessageBox,0,addr szDest,addr szSource,MB_OK
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08