News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Partial string matching algorithm.

Started by hutch--, August 06, 2006, 09:48:46 AM

Previous topic - Next topic

hutch--

This one seems to be working OK but I have not exhaustively tested it yet. It does beginnings, middle and ends with no problem so far. File is also attached below.


; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

comment * -----------------------------------------------------
                        Build this  template with
                       "CONSOLE ASSEMBLE AND LINK"
        ----------------------------------------------------- *

    pmatch PROTO :DWORD,:DWORD

    .data
      sample db "1234567890abcdefghijklmnopqrstuvwxyz",0

      item1 db "12**56**",0         ; test start
      item2 db "**3456**",0
      item3 db "**uv**yz",0         ; test end
      item4 db "*tu*wx**",0
      item5 db "*9*****e*",0        ; test middle section
      item6 db "*456789*****efgh*****nop***",0
      item7 db "mismatch",0         ; test mismatch
      item8 db "k*m*o*q*s*",0

      align 4

      array dd item1,item2,item3,item4,item5,item6,item7,item8

    .code

start:
   
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

    call main
    inkey
    exit

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

main proc

    cls

    push esi
    push edi

    mov esi, OFFSET array
    mov edi, 8

  @@:
    fn pmatch,ADDR sample,[esi]
    .if eax == -1
      print str$(eax)," No match",13,10
    .else
      push eax
      print "Matched at OFFSET "
      pop eax
      print str$(eax),13,10
    .endif
    add esi, 4
    sub edi, 1
    jnz @B

    pop edi
    pop esi

    ret

main endp

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

align 4

pmatch proc psrc:DWORD,ppatn:DWORD

comment @ ----------------------------------------
        Partial string matching algorithm.
        Wildcard character is fixed at "*".
        Leading, mixed and trailing * allowed.

        Return values
        =============
        -1 NO match
        0 and greater = zero based offset of match
        ---------------------------------------- @

    LOCAL lead  :DWORD          ; number of filler chars at start
    LOCAL lsrc  :DWORD          ; source length in bytes
    LOCAL lptn  :DWORD          ; pattern length in bytes
    LOCAL mptn  :DWORD          ; matching pattern length
    LOCAL quit  :DWORD          ; exit offset for main loop

    push ebx
    push esi
    push edi

    invoke StrLen,ppatn
    mov lptn, eax               ; raw patn length
    invoke StrLen,psrc
    mov lsrc, eax               ; source length

    mov esi, psrc
    mov edi, ppatn
    sub edi, 1

  ; ----------------------------------------
  ; scan beginning of string for lead length
  ; ----------------------------------------
    mov lead, 0
    jmp @F
  lbl0:
    add lead, 1                 ; count the lead filler chars
  @@:
    add edi, 1                  ; increment EDI up to 1st non blank char
    cmp BYTE PTR [edi], "*"
    je lbl0
  ; ----------------------------------------

    mov eax, lptn
    sub eax, lead               ; sub lead from patn length
    mov mptn, eax               ; calc the len of the pattern to match
    sub mptn, 1

    mov bl, [edi]               ; 1st non filler char in BL
    add esi, lead               ; start counting the correct number
    sub esi, 1                  ; of characters in from start

  ; -----------------------
  ; calculate "quit" offset
  ; -----------------------
    mov eax, lsrc
    sub eax, lptn
    add eax, lead
    add eax, esi
    mov quit, eax
  ; -----------------------

  ; ****************************************************************

  mainloop:
    add esi, 1
    cmp [esi], bl               ; scan source for 1st non filler char
    je @F
    cmp esi, quit               ; test against exit offset
    jle mainloop
    jmp nomatch

  @@:
    mov edx, -1                 ; set index to -1
  matchloop:
    add edx, 1                  ; increment the index
    cmp edx, mptn               ; check against pattern length
    jg matched
    mov cl, [edi+edx]           ; load current byte in pattern
    cmp cl, 42                  ; test if its a filler "*"
    je matchloop
    cmp cl, [esi+edx]           ; else test if it matches
    je matchloop                ; at next character
    jmp mainloop

  ; ****************************************************************

  matched:
    sub esi, psrc
    sub esi, lead
    mov eax, esi
    jmp outa_here

  nomatch:
    mov eax, -1

  outa_here:
    pop edi
    pop esi
    pop ebx

    ret

pmatch endp

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

end start

[attachment deleted by admin]
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Ghirai

MASM32 Project/RadASM mirror - http://ghirai.com/hutch/mmi.html

James Ladd

Hutch,
Interesting and nice work.
Can you explain the algo a little for those who are challenged at reading the code?
There is also some information in the books by Knuth that could help make this faster, if that is a goal.

hutch--

James,

I am not sure there is much to add, I made a point of properly commenting the code so it could be understood. Its a normal left to right byte scanner with a main loop for scanning for the first non filler character which branches to a matching loop that handles both filler characters and normal characters to test for a match.

You may find info in Knuth on an alternative partial matching algorithm but I doubt his work is much use for an assembler scanner of this type. I have not made any sp[ecial effort to speed optimise this algo yet but there is not all that much room with an algo of its type to make it all that much faster.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

M4D45M

i tried to rip a function which was coded in c. (the algo. is free)
but it somehow messes up the stack, i dunno why, i couldn't figure it out!

may someone plz also debug it, i think it shouldn't be hard to find out what's the problem, i failed.

my code was like:

pusha
;..
fn strcmpwildcard, "?est*", "test001"
;..
popa
ret

the stack value which the 'ret' instruction' took was zero, so the program crashed.

and i don't think that the stack pointer is wrongly shifted, otherwise the 'strcmpwildcard'-function would
not return properly.

would be great if we could get the function run coz it is very powerful!
(see the documentation)



[attachment deleted by admin]

hutch--

Why not do it the easy way and build the C code. Attachment below is the original as is built in VC. I have no idea if the algo works or not but it builds witjout error.

[attachment deleted by admin]
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

lingo

 :lol

It is my working code:


.data
FileName        db "AsmWe12m",0,0,0       ; file name without extension
FilterFileName db "?e*a?re",0           ; user defined file filter

; FilterFileName is in lower case !!!
; You can use only one "*" char and many "?" simultaneously in the same filter string

;mov eax, offset FilterFileName ; Filter  FileName filter
;mov edx, offset FileName         ; edx->offset FileName
;call CheckFilter

;On exit: eax = 0 -> No Match
;         eax # 0 -> Match


Align 16
CheckFilter:
mov ch, byte ptr [edx] ; edx->lp to FileName
add edx, 1
mov cl, byte ptr [eax] ; eax->lp to Filter
add eax, 1
cmp ch, 41h
jl L_1L
cmp ch, 5Ah
jnle L_1L                     
or ch, 20h
L_1L:
cmp cl, ch
je L_2L
cmp cl, "*"
je L_4L
cmp cl, "?"
je L_3L
xor eax, eax         ; No Match
ret
;......... cl = ch
L_2L:
test cl, cl
jne CheckFilter
ret                 ; Match
;......... cl != ch
L_3L:
test    ch, ch                  ; it is ? from left
jne CheckFilter
xor eax, eax         ; No Match
ret
L_4L:
;.......... cl = *
mov cl, byte ptr [eax] ; eax->lp to Filter
add eax, 1
test cl, cl
je Match
test ch, ch
je NotMatch
;..........cl = *         ; from right
cmp cl, ch
jne L_2R
Lopp:
test cl, cl
je Match
mov ch, byte ptr [edx] ; edx->lp to FileName
add edx, 1
mov cl, byte ptr [eax] ; eax->lp to Filter
add eax, 1
cmp ch, 41h
jl L_1R
cmp ch, 5Ah                ; "?o*?",0    3 chars only   
jnle L_1R                     
or ch, 20h               
L_1R:
cmp cl, ch
je Lopp
;......... cl != ch not equal
L_2R:
cmp cl, "?"
je L_3R
add cl, ch                 ; change with add cl, ch
jne NotMatch         ; eax -> 0 or !=0
Match:
ret
Align 16
L_3R:
test    ch, ch               
jne Lopp
NotMatch:
xor eax, eax
ret


Regards,
Lingo



M4D45M

@hutch,
how to get a .obj to a .lib ?

Polizei

Yo don't need to make the object file a library. You can simply link the object file together with the main object file of your program. For example:
\MASM32\BIN\LINK.EXE /OUT:Program.EXE Program.OBJ Algo.OBJ
You only need to add the name of the object file in the line of the batch file that is assemblying and linking your program.

M4D45M

tell me something i don't know.
it's not that i'm a noob just coz i haven't done as many posts on this board as some really cool guys.

i asked for how to get it to a lib coz including the .obj requires to change
my build.bat for everytime or the compiling options of my IDE for every project.
i just want a lib. btw i have had the .obj already, how else could i have ripped the code,
obvious, huh?

Polizei

Well, just start a project for a library and export the procedure in the library, then when linking the library just link the .obj file in it and everything will be done IMHO...

.486
.model flat, stdcall
option casemap: none
end

compile the empty source as library, then link both object files with the specific command line, and just write a include file for the library, e.g.

MatchSomething PROTO : DWORD, : DWORD

I think that this should work :)

Regards,
Polizei.Co

P.S. Sorry if I'd expressed myself making you look like noob ;(

Mark Jones

Quote from: madasm on August 11, 2006, 11:42:45 AM
i asked for how to get it to a lib coz including the .obj requires to change my build.bat for everytime or the compiling options of my IDE for every project.i just want a lib. btw i have had the .obj already, how else could i have ripped the code, obvious, huh?

Try opening a command prompt and type:


    lib /out:MyLib.lib File1.obj File2.obj File3.obj
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

M4D45M

thanx very much, i'll try this as soon as possible.