The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: M4D45M on August 05, 2006, 01:00:32 PM

Title: wildcard string pattern matching ?
Post by: M4D45M on August 05, 2006, 01:00:32 PM

are there functions to compare a string to a wildcard pattern?
can anyone plz point me to somewhere.

example: does string "www.masm32.com/index.htm" match pattern "*.masm*.com/index.???"

wildcard | character count
------------------------------------
     '*'    =  [ 0 ; +infinite ]
     '?'    =  1

Title: Re: wildcard string pattern matching ?
Post by: M4D45M on August 05, 2006, 01:03:10 PM
sorry didn't expect the string to turn into an active URI.
::)
Title: Re: wildcard string pattern matching ?
Post by: hutch-- on August 05, 2006, 02:03:58 PM
I don't know of an algo in assembler that does that but it can be written if you know what you are doing. You need to design an algo that handles the filler character as always matched and it will need to be done for the first character as well. Its logic will be something like this.

**my*word***    '; calculate the lead filler characters if present
my*word          ; write algo to handle any gaps of filler characters
and get the length of the string from the first non filler character to the last non filler character.
Title: Re: wildcard string pattern matching ?
Post by: M4D45M on August 05, 2006, 02:14:09 PM
thx hutch.
as i need this algo, i'll code it.
when it is finished i'll post it for discussion/improvement.
Title: Re: wildcard string pattern matching ?
Post by: Polizei on August 08, 2006, 09:07:36 PM
I have coded a wildcard match algo some years ago, but I can't find it in my files ;(
Sorry.
By the way, the idea of creating a library that supports regular expressions search is kinda good :)
Title: Re: wildcard string pattern matching ?
Post by: Tedd on August 09, 2006, 10:13:54 AM
If this is to be used for filtering files, you can simply use FindFirstFile/FindNextFile - which accepts wildcards :wink
Though, yes, this does require the files to exist, and couldn't be used for anything else, so it would still be a useful function :U
Title: Re: wildcard string pattern matching ?
Post by: gwapo on August 09, 2006, 10:22:27 AM
There is a C implementation in this site:
http://user.cs.tu-berlin.de/~schintke/references/wildcards/

You can use it as a reference, either convert it to MASM or see how it performs wildcard pattern matching.

Regards,

-chris
Title: Re: wildcard string pattern matching ?
Post by: gabor on August 09, 2006, 11:12:20 AM
Hello!

I too came up with the idea of creating a regular expression library. So far all I created is the specification based on Unix regular expressions. I have some ideas of implementation too, but I haven't started to code them yet...

If anyone interested PM me!

Greets, Gábor
Title: Re: wildcard string pattern matching ?
Post by: M4D45M on August 11, 2006, 11:53:53 AM
as gwapo already realized, there's a good c-code at the URI he specified.

that's actually the code i attached (and tried to rip) in this topic.
http://www.masm32.com/board/index.php?topic=5417.0

but the rip somehow messed up the stack, haven't found out til now.
maybe someone could have a look at it. (by means of debugging)

this algo is also capable of performing ranges like [a-z] and things like that.
have a look at the test report. the algo is beautiful.
http://user.cs.tu-berlin.de/~schintke/references/wildcards/testwildcards.main

so i think there's no need to recode it in asm. this would be to no purpose,
as this c-code worx pretty fine. it has just to be compiled to make it available.
i already did but i don't like the idea to use an .obj, i think it would be more comfortable
to have a lib. but i dunno how to get there.
Quotecoz including the .obj requires to change
my build.bat for everytime or the compiling options of my IDE for every project.

collaboration is appreciated. thx
Title: Re: wildcard string pattern matching ?
Post by: M4D45M on August 11, 2006, 12:06:33 PM
tedd i think most of us already did know about FindFirstFile/FindNextFile
and that they are able to perform wildcard matching.
but though they are very limited:
1. characters are disallowed for files  \ / : * ? " < > |
   ( i know, * and ? can't be used anyway)
2. these functions are case-insensitive but the algo is capable of both.
3. you can't perform chexx of list-files (.sfv/.crc/.md5) against (in-memory-)patterns without the files actually being present.
4. in the above case you had to always create a file just to test it. this is comparatively sloooow.
5. the algo is capable of performing range chexx like [a-z]
Title: Re: wildcard string pattern matching ?
Post by: M4D45M on August 11, 2006, 12:37:32 PM
i just got what the problem with the rip is:

the stack cleanup isn't done properly.
there has to be RETN 08h (C2 08 00)
whereas there's a normal RET instruction executed.
RETN 08h will do proper stack cleanup. (that's just for the two passed arguments [the pointers])

as soon as i get time i'll fix my rip and supply you with a working code which is in fact very powerful !
at this point, thanx to the author Florian Schintke
and to all the cool ppl at masm32 for their quality help, comprehension, and time.
soon we'll have a nice algo working!
thx
Title: Re: wildcard string pattern matching ?
Post by: gabor on August 11, 2006, 02:47:39 PM
Hi!


It would be nice to know how fast this library functions are. When I think of regular expression matching I often think of thousands or even million rows of text to check against... At such size the speed of the code is very important.

Greets, Gábor
Title: Re: wildcard string pattern matching ?
Post by: M4D45M on August 12, 2006, 01:25:50 AM
@gabor: it's just a procedure not a lib.
             it isn't capable of regular expressions (let's say it's a lite-version :D) (see original documentation)
             and i dunno how fast it is. in fact i think it isn't very fast but just give it a try.

so here it is, and it worx. (that's enough to fit my needs at the moment)
thx


[attachment deleted by admin]
Title: Re: wildcard string pattern matching ?
Post by: ecube on September 16, 2006, 11:59:48 PM
I didn't write this, and I forget where I got it , but it works nicely, so if you're the author thanks a lot :)


WildMatch proc uses ebx esi edi wild :DWORD, string :DWORD
    mov        ecx, wild
    mov        edx, string
    .while BYTE PTR [edx] != 0 && BYTE PTR [ecx] != "*"
        mov        bl, [ecx]
        mov        bh, [edx]
        .if bl != bh && bl != "?"
            xor        eax, eax
            ret
        .endif
        inc        ecx
        inc        edx
    .endw

    .while BYTE PTR [edx] != 0
        mov        bl, [ecx]
        mov        bh, [edx]
        .if bl == "*"
            inc        ecx
            mov        bl, [ecx]
            .if bl == 0
                xor        eax, eax
                inc        eax
                ret
            .endif
            mov        esi, ecx
            mov        eax, edx
            inc        eax
        .elseif bl == bh || bl == "?"
            inc        ecx
            inc        edx
        .else
            mov        ecx, esi
            mov        edx, eax
            inc        eax
        .endif
    .endw
   
    .while    BYTE PTR [ecx] == "*"
        inc        ecx
    .endw
   
    xor        eax, eax
    inc        eax

    xor eax,eax
    mov al,[ecx]
    or al,al
    sete al
    ret
WildMatch endp
Title: Re: wildcard string pattern matching ?
Post by: TmX on September 22, 2006, 09:43:53 AM
Quote from: Polizei on August 08, 2006, 09:07:36 PM
I have coded a wildcard match algo some years ago, but I can't find it in my files ;(
Sorry.
By the way, the idea of creating a library that supports regular expressions search is kinda good :)

Building an efficient regex engine from scratch could be quite tedious
What about PCRE (Perl Compatible Regular Expression) ?
:bg
Title: Re: wildcard string pattern matching ?
Post by: zooba on September 24, 2006, 09:52:17 AM
I have coded this for the next version of ASM Runtime (http://web.aanet.com.au/zooba/projects.htm) (version 0.300 - coming soon  :wink ) and it has worked fine for everything I've tested it on (mostly files, though it should work for any other strings).

The Lower() and StringFree macros are part of ASM Runtime, however, replacing them with your preferred lowercasing function should result in a working function.

(By the way, this does some naughty stack stuff - kids, close your eyes  :toothy )

Cheers,

Zooba :U

FileMatch PROC USES esi edi spFilename:DWORD, spFilter:DWORD
    LOCAL   dwESP:DWORD
    LOCAL   dwMatchAllCount:DWORD
   
    mov     dwMatchAllCount, 0
    mov     esi, LCase(spFilename)
    mov     edi, LCase(spFilter)
    push    esi
    push    edi
   
    mov     dwESP, esp
   
    .while(BYTE PTR [edi] && BYTE PTR [esi])
@LoopStart:
        mov al, [edi]
        .break .if(!al)
        .if(al == '*')
            push    edi
            .repeat
                inc     edi
                mov     al, [edi]
            .until(al != '*')
           
            .while(BYTE PTR [esi] && BYTE PTR [esi] != al)
                inc esi
            .endw
            .break .if(BYTE PTR [esi] == 0)
            inc     dwMatchAllCount
        .elseif(al == '?')
            inc     esi
            inc     edi
            .break .if(BYTE PTR [esi] == 0)
        .else
            .break .if(al != [esi])
            inc     edi
            inc     esi
        .endif
    .endw
    .if(BYTE PTR [esi] && dwMatchAllCount)
        dec     dwMatchAllCount
        pop     edi
        jmp     @LoopStart
    .endif
   
    mov     esp, dwESP
   
    .if(BYTE PTR [esi] || BYTE PTR [edi])
        jmp @NoMatch
    .endif
       
@Match:
    or      eax, -1
    jmp     @Exit
   
@NoMatch:
    xor     eax, eax
    jmp     @Exit

@Exit:
    pop     edx
    pop     ecx
    StringFree edx, ecx
    ret
FileMatch ENDP
Title: Re: wildcard string pattern matching ?
Post by: mrpink on November 02, 2006, 08:46:22 AM
Small regular expression search engine.
Easy to port to MASM(I hope)

http://board.flatassembler.net/topic.php?t=6127
Title: Re: wildcard string pattern matching ?
Post by: TNick on January 29, 2007, 12:30:09 PM
Didn't had a closer look to see why, but WildMatch doesn't work.

[later]
I have tested strmatchpattern, too, but this one doesn't work, either...
Am I doing something wrong here?

Attached file updated...
[/later]

[attachment deleted by admin]
Title: Re: wildcard string pattern matching ?
Post by: TNick on January 29, 2007, 12:45:07 PM
Need opinions: Does such a function need to return an offset to where the pattern has been found or it's enough to tell us if it has found the pattern or not?

Nick
Title: Re: wildcard string pattern matching ?
Post by: zooba on January 30, 2007, 07:44:32 AM
Quote from: TNick on January 29, 2007, 12:45:07 PM
Need opinions: Does such a function need to return an offset to where the pattern has been found or it's enough to tell us if it has found the pattern or not?

Nick

Up to the creator. It's probably no harder to say where the pattern is, and if you return 0 for no match then you can treat it as a boolean (zero/non-zero) for whether it was found or not.

My one above doesn't, since the point is that the pattern will match from the start of the string. I have written others (and subsequently lost them in a recent hard drive failure) which will return the position of a match and one which would return a linked list of text which matches parameters specified in the pattern string.
Title: Re: wildcard string pattern matching ?
Post by: TNick on January 31, 2007, 07:24:28 AM
Thanks for your reply, zooba!
I have written my own (http://www.masm32.com/board/index.php?topic=6619.0), but I have to disagree with you. Returning the offset means some more code => function will be slower. Maybe two functions - one boolean and other with offset ... yap, that seem to be the solution. :)

Regards,
Nick
Title: Re: wildcard string pattern matching ?
Post by: zooba on January 31, 2007, 11:58:14 AM
How can you compare a string if you don't know where in memory the string is? If you know where the string is in memory, you can send that back.  :toothy

Okay, it's not quite that simple :P . When you start matching the pattern, store the address of the character you start at. If it matches, return that value, otherwise increment it and start again. Pattern searching simplifies down quite well to precise matching (ie. entire string matches and/or match must begin at the start of the string) in a separate function (which returns a boolean) and a search function (ie. a loop) which calls it.

If we're talking about pattern matching, as opposed to pattern searching, returning a pointer is (excuse the pun) pointless. If you are searching, a pointer is much more useful than a boolean.

Cheers,

Zooba :U
Title: Re: wildcard string pattern matching ?
Post by: TNick on January 31, 2007, 05:32:24 PM
Quote from: zooba on January 31, 2007, 11:58:14 AM
How can you compare a string if you don't know where in memory the string is? If you know where the string is in memory, you can send that back.  :toothy
:lol :lol :lol

I will try to add this to WldMatch. However, when I first tried to do that, things were becoming messy. But, if you say it's easy, I should give it a second chance.

Thanks for your reply!
Nick
Title: Re: wildcard string pattern matching ?
Post by: zooba on January 31, 2007, 08:31:15 PM
I've had a look at WldMatchA and I believe you when you said it got messy  :green  My take on pattern searching is that when you start matching, you need to know where you started from so you can go back there if the match fails. The upside of that approach is that the pointer to the start of the match is readily available.

(I have continued this post in your other (http://www.masm32.com/board/index.php?topic=6619.0) thread where it is more relevant)