News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

wildcard string pattern matching ?

Started by M4D45M, August 05, 2006, 01:00:32 PM

Previous topic - Next topic

M4D45M


are there functions to compare a string to a wildcard pattern?
can anyone plz point me to somewhere.

example: does string "www.masm32.com/index.htm" match pattern "*.masm*.com/index.???"

wildcard | character count
------------------------------------
     '*'    =  [ 0 ; +infinite ]
     '?'    =  1


M4D45M

sorry didn't expect the string to turn into an active URI.
::)

hutch--

I don't know of an algo in assembler that does that but it can be written if you know what you are doing. You need to design an algo that handles the filler character as always matched and it will need to be done for the first character as well. Its logic will be something like this.

**my*word***    '; calculate the lead filler characters if present
my*word          ; write algo to handle any gaps of filler characters
and get the length of the string from the first non filler character to the last non filler character.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

M4D45M

thx hutch.
as i need this algo, i'll code it.
when it is finished i'll post it for discussion/improvement.

Polizei

I have coded a wildcard match algo some years ago, but I can't find it in my files ;(
Sorry.
By the way, the idea of creating a library that supports regular expressions search is kinda good :)

Tedd

If this is to be used for filtering files, you can simply use FindFirstFile/FindNextFile - which accepts wildcards :wink
Though, yes, this does require the files to exist, and couldn't be used for anything else, so it would still be a useful function :U
No snowflake in an avalanche feels responsible.

gwapo

There is a C implementation in this site:
http://user.cs.tu-berlin.de/~schintke/references/wildcards/

You can use it as a reference, either convert it to MASM or see how it performs wildcard pattern matching.

Regards,

-chris

gabor

Hello!

I too came up with the idea of creating a regular expression library. So far all I created is the specification based on Unix regular expressions. I have some ideas of implementation too, but I haven't started to code them yet...

If anyone interested PM me!

Greets, Gábor

M4D45M

as gwapo already realized, there's a good c-code at the URI he specified.

that's actually the code i attached (and tried to rip) in this topic.
http://www.masm32.com/board/index.php?topic=5417.0

but the rip somehow messed up the stack, haven't found out til now.
maybe someone could have a look at it. (by means of debugging)

this algo is also capable of performing ranges like [a-z] and things like that.
have a look at the test report. the algo is beautiful.
http://user.cs.tu-berlin.de/~schintke/references/wildcards/testwildcards.main

so i think there's no need to recode it in asm. this would be to no purpose,
as this c-code worx pretty fine. it has just to be compiled to make it available.
i already did but i don't like the idea to use an .obj, i think it would be more comfortable
to have a lib. but i dunno how to get there.
Quotecoz including the .obj requires to change
my build.bat for everytime or the compiling options of my IDE for every project.

collaboration is appreciated. thx

M4D45M

tedd i think most of us already did know about FindFirstFile/FindNextFile
and that they are able to perform wildcard matching.
but though they are very limited:
1. characters are disallowed for files  \ / : * ? " < > |
   ( i know, * and ? can't be used anyway)
2. these functions are case-insensitive but the algo is capable of both.
3. you can't perform chexx of list-files (.sfv/.crc/.md5) against (in-memory-)patterns without the files actually being present.
4. in the above case you had to always create a file just to test it. this is comparatively sloooow.
5. the algo is capable of performing range chexx like [a-z]

M4D45M

i just got what the problem with the rip is:

the stack cleanup isn't done properly.
there has to be RETN 08h (C2 08 00)
whereas there's a normal RET instruction executed.
RETN 08h will do proper stack cleanup. (that's just for the two passed arguments [the pointers])

as soon as i get time i'll fix my rip and supply you with a working code which is in fact very powerful !
at this point, thanx to the author Florian Schintke
and to all the cool ppl at masm32 for their quality help, comprehension, and time.
soon we'll have a nice algo working!
thx

gabor

Hi!


It would be nice to know how fast this library functions are. When I think of regular expression matching I often think of thousands or even million rows of text to check against... At such size the speed of the code is very important.

Greets, Gábor

M4D45M

@gabor: it's just a procedure not a lib.
             it isn't capable of regular expressions (let's say it's a lite-version :D) (see original documentation)
             and i dunno how fast it is. in fact i think it isn't very fast but just give it a try.

so here it is, and it worx. (that's enough to fit my needs at the moment)
thx


[attachment deleted by admin]

ecube

I didn't write this, and I forget where I got it , but it works nicely, so if you're the author thanks a lot :)


WildMatch proc uses ebx esi edi wild :DWORD, string :DWORD
    mov        ecx, wild
    mov        edx, string
    .while BYTE PTR [edx] != 0 && BYTE PTR [ecx] != "*"
        mov        bl, [ecx]
        mov        bh, [edx]
        .if bl != bh && bl != "?"
            xor        eax, eax
            ret
        .endif
        inc        ecx
        inc        edx
    .endw

    .while BYTE PTR [edx] != 0
        mov        bl, [ecx]
        mov        bh, [edx]
        .if bl == "*"
            inc        ecx
            mov        bl, [ecx]
            .if bl == 0
                xor        eax, eax
                inc        eax
                ret
            .endif
            mov        esi, ecx
            mov        eax, edx
            inc        eax
        .elseif bl == bh || bl == "?"
            inc        ecx
            inc        edx
        .else
            mov        ecx, esi
            mov        edx, eax
            inc        eax
        .endif
    .endw
   
    .while    BYTE PTR [ecx] == "*"
        inc        ecx
    .endw
   
    xor        eax, eax
    inc        eax

    xor eax,eax
    mov al,[ecx]
    or al,al
    sete al
    ret
WildMatch endp

TmX

Quote from: Polizei on August 08, 2006, 09:07:36 PM
I have coded a wildcard match algo some years ago, but I can't find it in my files ;(
Sorry.
By the way, the idea of creating a library that supports regular expressions search is kinda good :)

Building an efficient regex engine from scratch could be quite tedious
What about PCRE (Perl Compatible Regular Expression) ?
:bg