are there functions to compare a string to a wildcard pattern?
can anyone plz point me to somewhere.
example: does string "www.masm32.com/index.htm" match pattern "*.masm*.com/index.???"
wildcard | character count
------------------------------------
'*' = [ 0 ; +infinite ]
'?' = 1
sorry didn't expect the string to turn into an active URI.
::)
I don't know of an algo in assembler that does that but it can be written if you know what you are doing. You need to design an algo that handles the filler character as always matched and it will need to be done for the first character as well. Its logic will be something like this.
**my*word*** '; calculate the lead filler characters if present
my*word ; write algo to handle any gaps of filler characters
and get the length of the string from the first non filler character to the last non filler character.
thx hutch.
as i need this algo, i'll code it.
when it is finished i'll post it for discussion/improvement.
I have coded a wildcard match algo some years ago, but I can't find it in my files ;(
Sorry.
By the way, the idea of creating a library that supports regular expressions search is kinda good :)
If this is to be used for filtering files, you can simply use FindFirstFile/FindNextFile - which accepts wildcards :wink
Though, yes, this does require the files to exist, and couldn't be used for anything else, so it would still be a useful function :U
There is a C implementation in this site:
http://user.cs.tu-berlin.de/~schintke/references/wildcards/
You can use it as a reference, either convert it to MASM or see how it performs wildcard pattern matching.
Regards,
-chris
Hello!
I too came up with the idea of creating a regular expression library. So far all I created is the specification based on Unix regular expressions. I have some ideas of implementation too, but I haven't started to code them yet...
If anyone interested PM me!
Greets, Gábor
as gwapo already realized, there's a good c-code at the URI he specified.
that's actually the code i attached (and tried to rip) in this topic.
http://www.masm32.com/board/index.php?topic=5417.0
but the rip somehow messed up the stack, haven't found out til now.
maybe someone could have a look at it. (by means of debugging)
this algo is also capable of performing ranges like [a-z] and things like that.
have a look at the test report. the algo is beautiful.
http://user.cs.tu-berlin.de/~schintke/references/wildcards/testwildcards.main
so i think there's no need to recode it in asm. this would be to no purpose,
as this c-code worx pretty fine. it has just to be compiled to make it available.
i already did but i don't like the idea to use an .obj, i think it would be more comfortable
to have a lib. but i dunno how to get there.
Quotecoz including the .obj requires to change
my build.bat for everytime or the compiling options of my IDE for every project.
collaboration is appreciated. thx
tedd i think most of us already did know about FindFirstFile/FindNextFile
and that they are able to perform wildcard matching.
but though they are very limited:
1. characters are disallowed for files \ / : * ? " < > |
( i know, * and ? can't be used anyway)
2. these functions are case-insensitive but the algo is capable of both.
3. you can't perform chexx of list-files (.sfv/.crc/.md5) against (in-memory-)patterns without the files actually being present.
4. in the above case you had to always create a file just to test it. this is comparatively sloooow.
5. the algo is capable of performing range chexx like [a-z]
i just got what the problem with the rip is:
the stack cleanup isn't done properly.
there has to be RETN 08h (C2 08 00)
whereas there's a normal RET instruction executed.
RETN 08h will do proper stack cleanup. (that's just for the two passed arguments [the pointers])
as soon as i get time i'll fix my rip and supply you with a working code which is in fact very powerful !
at this point, thanx to the author Florian Schintke
and to all the cool ppl at masm32 for their quality help, comprehension, and time.
soon we'll have a nice algo working!
thx
Hi!
It would be nice to know how fast this library functions are. When I think of regular expression matching I often think of thousands or even million rows of text to check against... At such size the speed of the code is very important.
Greets, Gábor
@gabor: it's just a procedure not a lib.
it isn't capable of regular expressions (let's say it's a lite-version :D) (see original documentation)
and i dunno how fast it is. in fact i think it isn't very fast but just give it a try.
so here it is, and it worx. (that's enough to fit my needs at the moment)
thx
[attachment deleted by admin]
I didn't write this, and I forget where I got it , but it works nicely, so if you're the author thanks a lot :)
WildMatch proc uses ebx esi edi wild :DWORD, string :DWORD
mov ecx, wild
mov edx, string
.while BYTE PTR [edx] != 0 && BYTE PTR [ecx] != "*"
mov bl, [ecx]
mov bh, [edx]
.if bl != bh && bl != "?"
xor eax, eax
ret
.endif
inc ecx
inc edx
.endw
.while BYTE PTR [edx] != 0
mov bl, [ecx]
mov bh, [edx]
.if bl == "*"
inc ecx
mov bl, [ecx]
.if bl == 0
xor eax, eax
inc eax
ret
.endif
mov esi, ecx
mov eax, edx
inc eax
.elseif bl == bh || bl == "?"
inc ecx
inc edx
.else
mov ecx, esi
mov edx, eax
inc eax
.endif
.endw
.while BYTE PTR [ecx] == "*"
inc ecx
.endw
xor eax, eax
inc eax
xor eax,eax
mov al,[ecx]
or al,al
sete al
ret
WildMatch endp
Quote from: Polizei on August 08, 2006, 09:07:36 PM
I have coded a wildcard match algo some years ago, but I can't find it in my files ;(
Sorry.
By the way, the idea of creating a library that supports regular expressions search is kinda good :)
Building an efficient regex engine from scratch could be quite tedious
What about PCRE (Perl Compatible Regular Expression) ?
:bg
I have coded this for the next version of ASM Runtime (http://web.aanet.com.au/zooba/projects.htm) (version 0.300 - coming soon :wink ) and it has worked fine for everything I've tested it on (mostly files, though it should work for any other strings).
The Lower() and StringFree macros are part of ASM Runtime, however, replacing them with your preferred lowercasing function should result in a working function.
(By the way, this does some naughty stack stuff - kids, close your eyes :toothy )
Cheers,
Zooba :U
FileMatch PROC USES esi edi spFilename:DWORD, spFilter:DWORD
LOCAL dwESP:DWORD
LOCAL dwMatchAllCount:DWORD
mov dwMatchAllCount, 0
mov esi, LCase(spFilename)
mov edi, LCase(spFilter)
push esi
push edi
mov dwESP, esp
.while(BYTE PTR [edi] && BYTE PTR [esi])
@LoopStart:
mov al, [edi]
.break .if(!al)
.if(al == '*')
push edi
.repeat
inc edi
mov al, [edi]
.until(al != '*')
.while(BYTE PTR [esi] && BYTE PTR [esi] != al)
inc esi
.endw
.break .if(BYTE PTR [esi] == 0)
inc dwMatchAllCount
.elseif(al == '?')
inc esi
inc edi
.break .if(BYTE PTR [esi] == 0)
.else
.break .if(al != [esi])
inc edi
inc esi
.endif
.endw
.if(BYTE PTR [esi] && dwMatchAllCount)
dec dwMatchAllCount
pop edi
jmp @LoopStart
.endif
mov esp, dwESP
.if(BYTE PTR [esi] || BYTE PTR [edi])
jmp @NoMatch
.endif
@Match:
or eax, -1
jmp @Exit
@NoMatch:
xor eax, eax
jmp @Exit
@Exit:
pop edx
pop ecx
StringFree edx, ecx
ret
FileMatch ENDP
Small regular expression search engine.
Easy to port to MASM(I hope)
http://board.flatassembler.net/topic.php?t=6127
Didn't had a closer look to see why, but WildMatch doesn't work.
[later]
I have tested strmatchpattern, too, but this one doesn't work, either...
Am I doing something wrong here?
Attached file updated...
[/later]
[attachment deleted by admin]
Need opinions: Does such a function need to return an offset to where the pattern has been found or it's enough to tell us if it has found the pattern or not?
Nick
Quote from: TNick on January 29, 2007, 12:45:07 PM
Need opinions: Does such a function need to return an offset to where the pattern has been found or it's enough to tell us if it has found the pattern or not?
Nick
Up to the creator. It's probably no harder to say where the pattern is, and if you return 0 for no match then you can treat it as a boolean (zero/non-zero) for whether it was found or not.
My one above doesn't, since the point is that the pattern will match from the start of the string. I have written others (and subsequently lost them in a recent hard drive failure) which will return the position of a match and one which would return a linked list of text which matches parameters specified in the pattern string.
Thanks for your reply, zooba!
I have written my own (http://www.masm32.com/board/index.php?topic=6619.0), but I have to disagree with you. Returning the offset means some more code => function will be slower. Maybe two functions - one boolean and other with offset ... yap, that seem to be the solution. :)
Regards,
Nick
How can you compare a string if you don't know where in memory the string is? If you know where the string is in memory, you can send that back. :toothy
Okay, it's not quite that simple :P . When you start matching the pattern, store the address of the character you start at. If it matches, return that value, otherwise increment it and start again. Pattern searching simplifies down quite well to precise matching (ie. entire string matches and/or match must begin at the start of the string) in a separate function (which returns a boolean) and a search function (ie. a loop) which calls it.
If we're talking about pattern matching, as opposed to pattern searching, returning a pointer is (excuse the pun) pointless. If you are searching, a pointer is much more useful than a boolean.
Cheers,
Zooba :U
Quote from: zooba on January 31, 2007, 11:58:14 AM
How can you compare a string if you don't know where in memory the string is? If you know where the string is in memory, you can send that back. :toothy
:lol :lol :lol
I will try to add this to WldMatch. However, when I first tried to do that, things were becoming messy. But, if you say it's easy, I should give it a second chance.
Thanks for your reply!
Nick
I've had a look at WldMatchA and I believe you when you said it got messy :green My take on pattern searching is that when you start matching, you need to know where you started from so you can go back there if the match fails. The upside of that approach is that the pointer to the start of the match is readily available.
(I have continued this post in your other (http://www.masm32.com/board/index.php?topic=6619.0) thread where it is more relevant)