Print Page - Regular expressions library

Title: Regular expressions library
Post by: gabor on November 28, 2006, 08:38:27 AM

Hello friends!

Some time ago PBrennick asked me in a PM referring to an earlier PM whether I am still feeling like creating a reg exp library for supporting regular expression based matching. Well, I am still interested, but at first sight this is not an easy task, so I'd like to have some help. I think of comments, discussions and ideas in the first place.
As an opening I'd like to share the small specification I created while looking for Unix-like regular expression on the net.

Unix like regular expression processing

1. Basics

Regular expressions are between '/' characters.

2. Modifiers

i - case-insensitive match
g - global match
x - ignore whitespaces

3. Metacharacters must be used with \ or they have special interpretation. (/\\/ matches a '\' character)

\ | () [ { ^ $ * + ? .

4. Special characters

\d - a digit
\D - a non-digit
\w - a word character (alphanumeric)
\W - a non-word character
\t - tabulator
\n - line feed
\r - carriage return
\s - a whitespace character (\t,\n,\r,' ')
\S - a non-whitespace character
\b - word boundary
\B - non-word boundary

5. Matching

^ - matches beginning of a string
$ - matches end of a string
. - matches any character

6. Quantifying

* - 0 or more times (Ex. /a*/ matches '','a','aa'...)
+ - 1 or more times (Ex. /a+/ matches 'a','aa'...)
? - 0 or 1 time (Ex. /a?/ matches '' or 'a' only)
{n} - exactly n times (Ex. /a{5}/: 'aaaaa')

7. Misc

[...] - characters in the list (Ex. /[abc0-9]/ matches 'a','b','c' and digits)
[^...] - character that are not in the list (Ex. /[^a-bA-b]/ does not match letters)
| - alternation/or (Ex. /a|b/ matches 'a' or 'b')
() - grouping, creates an atom that can be also referenced later

I hope this specification cover as much functionality as possible.
My problem is that this is were my knowledge neary ends :toothy. Though I have an idea but I am afraid it is not that good...
How would you implement reg.exp? Do you know about a resource about this topic on the net? Of course a fast matching method is a demand here...

Greets, Gábor

Title: Re: Regular expressions library
Post by: stanhebben on November 28, 2006, 10:04:58 PM

Hmm, I'll take a look at it tomorrow. At the uni, we learned quite a bit about regular expressions last year, so maybe that could help.

Title: Re: Regular expressions library
Post by: PBrennick on November 28, 2006, 11:10:12 PM

Gábor,
That list looks pretty comprehensive. Of special interest to Me are MetaCharacters,especially ^ $ which I use almost everday while manipulating text files. They represent The beginning of a line and the end of a line.

Paul

The MASM Forum Archive 2004 to 2012

General Forums => The Workshop => Topic started by: gabor on November 28, 2006, 08:38:27 AM