News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Perl Regular Expressions.

Started by Eóin, June 12, 2005, 07:00:03 PM

Previous topic - Next topic

Eóin

Hello all.

I'm been playing around with regular expressions recently and I decided to put together a little example program to share with the rest of ye. The included pcre.inc and pcre.lib are from Philip Hazel's PCRE free (BSD license) C library.

I compiled it against msvcrt.lib using the free VCToolKit 2003, see the instructions I wrote up if you want to make and changes like compiling with more optimising options, etc.

Anyway my appologies that the example program itself isn't that great, its the first time I've used the string macros with masm32 and I don't think I was using them correctly. But still it should give some ideas on how to use the library.

Hope ye enjoy, Eoin.

[EDIT] Forgot to mentionthat I had to include oldnames.lib. Interestingly the linker only started complaining about it being missing after I used crt_free from msvcrt.lib. It seems to be a fairly core library as the MSDN page on linking mentions it. Perhaps someone else here will know more about it.

[EDIT] See below for attachment.

hutch--

Eóin,  :thumbu

Looks like you have done some good work here. I confess to being an illiterate in PERL so is there any info on how to test this app ?

The include file looks like a good clean conversion which is a pleasure to see after some of the others I have seen.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

QvasiModo


Eóin

hutch-- and QvasiModo, thanks for your comments.

I'll just give a small example of using a RegEx.

Say you have a piece of text like <tag attr="val"> and you want to pick out the important bits then this RegEx could do just that: <(\w+) (\w+)="(\w+)">. The \w matches any alpha-numeric character, the + means match 1 or more. Enclosing them in brackets means we want to be able to access what is matched. If we run this through the example program it prints

0:<tag attr="val">
1:tag
2:attr
3:val


And you can see its picked out the important bit. Unfortunatly that example only works for very specific bits of text, for a start the only white space character it allows is a single space, by add a couple of \s* (match 0 or more whitespace) we can make the RegEx more robust, eg RegEx: <\s*(\w+)\s+(\w+)\s*=\s*"\s*(\w+)\s*"\s*> and string:< tag  attr =" val"  > gives

0:< tag  attr =" val"  >
1:tag
2:attr
3:val


Finally a quick demo of back referencing, you can use \# where # is a number to back reference a previously (i.e. bracketed) substring. This allows you to, say, only match a tag if it has a proper end tag. Eg RegEx:<(\w+) (\w+)="(\w+)">(\w+)<\1 /> and string:<tag attr="val">text<tag /> gives

0:<tag attr="val">text<tag />
1:tag
2:attr
3:val
4:text


But string:<tag attr="val">text<end /> gives no match cause tag != end.

Also I've uploaded the example program again, this one uses a small function eprintf for doing the outputting the the edit control which personally I'm just more at home with. Also it doesn't allow for buffer overflows. Also fixed a stupid mistake in the original which hardcoded the string length :eek silly me.

[attachment deleted by admin]

zhasm

E:

thanks for your great work. that's what i'm searching for.

it seems that, this engine only handles searching,and doesn't support replacement, which can only be supported in C++ mode.

could anybody make it better to add the function of replacement?