Hello friends!
Some time ago PBrennick asked me in a PM referring to an earlier PM whether I am still feeling like creating a reg exp library for supporting regular expression based matching. Well, I am still interested, but at first sight this is not an easy task, so I'd like to have some help. I think of comments, discussions and ideas in the first place.
As an opening I'd like to share the small specification I created while looking for Unix-like regular expression on the net.
Unix like regular expression processing1. Basics
- Regular expressions are between '/' characters.
2. Modifiers
- i - case-insensitive match
- g - global match
- x - ignore whitespaces
3. Metacharacters must be used with \ or they have special interpretation. (/\\/ matches a '\' character)
4. Special characters
- \d - a digit
- \D - a non-digit
- \w - a word character (alphanumeric)
- \W - a non-word character
- \t - tabulator
- \n - line feed
- \r - carriage return
- \s - a whitespace character (\t,\n,\r,' ')
- \S - a non-whitespace character
- \b - word boundary
- \B - non-word boundary
5. Matching
- ^ - matches beginning of a string
- $ - matches end of a string
- . - matches any character
6. Quantifying
- * - 0 or more times (Ex. /a*/ matches '','a','aa'...)
- + - 1 or more times (Ex. /a+/ matches 'a','aa'...)
- ? - 0 or 1 time (Ex. /a?/ matches '' or 'a' only)
- {n} - exactly n times (Ex. /a{5}/: 'aaaaa')
7. Misc
- [...] - characters in the list (Ex. /[abc0-9]/ matches 'a','b','c' and digits)
- [^...] - character that are not in the list (Ex. /[^a-bA-b]/ does not match letters)
- | - alternation/or (Ex. /a|b/ matches 'a' or 'b')
- () - grouping, creates an atom that can be also referenced later
(Ex. /(ab){2}/ matches 'abab', but ab{2} matches 'abb', /(ab)cd\1/ matches 'abcdab')[/li]
I hope this specification cover as much functionality as possible.
My problem is that this is were my knowledge neary ends :toothy. Though I have an idea but I am afraid it is not that good...
How would you implement reg.exp? Do you know about a resource about this topic on the net? Of course a fast matching method is a demand here...
Greets, Gábor
Hmm, I'll take a look at it tomorrow. At the uni, we learned quite a bit about regular expressions last year, so maybe that could help.
Gábor,
That list looks pretty comprehensive. Of special interest to Me are MetaCharacters,especially ^ $ which I use almost everday while manipulating text files. They represent The beginning of a line and the end of a line.
Paul