News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Writing a parser

Started by DarkWolf, April 24, 2005, 01:20:56 AM

Previous topic - Next topic

DarkWolf

I have been trying to convert the notations in the XML specs into HLA
The first few I did are listed below. I know I made some mistakes, I'll change document to a string.
But overall is there problems that anyone can see ?

// [1] document      ::= prolog element Misc*
// [2] Char               ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
            /* any unicode character excluding surrogate blocks, FFFE, and FFFF */
// [3] S         ::= (#x20 | #x9 | #xD | #xA)+
            /* whitespace is space, tab, cr, lf */
// [4] NameChar      ::= Letter | Digit | '.' | '_' | ':' | CombiningChar | Extender
// [5] Name          ::= (Letter | '_' | ':') (NameChar)*

static
document:   cset := prolog + element + Misc
Char:               cset := {u#$9,u#$A,u#$D,[u#$20..u#$D7FF],[u#$E000..u#$FFFD],[u#$10000..u#$10FFFF]}
S:                cset := {u#$20,u#$9,u#$D,u#$A}
NameChar:   cset := Letter + Digit + CombiningChar + Extender + {'.','_',':'}
Name:             string;

procedure checkName;   
begin checkName;
  pat.match( Name );
   pat.zeroOrMoreCset( Letter + {'_',':'} );
   pat.oneOrMoreCset( NameChar );
   pat.EOS

  pat.if_failure
   stdout.put( "Not a valid name" );
  pat.endmatch;
end Name;
--
Where's there's smoke, There are mirrors.
Give me Free as in Freedom not Speech or Beer.
Thank You and Welcome to the Internet.

tenkey

The definition for Name doesn't look right.

According to the grammar you posted, an XML name must start with a letter, '_' or ':'. "Zero or more" does not sound like it matches exactly one character.
A programming language is low level when its programs require attention to the irrelevant.
Alan Perlis, Epigram #8

DarkWolf

I reversed them ( oops! )

the first one should be pat.oneOrMoreCset and the second is pat.zeroOrMoreCset

But I need to exclude 'xml' , no matter what case the string 'xml' is reserved. I am thinking using pat.alternate may be a good solution.
--
Where's there's smoke, There are mirrors.
Give me Free as in Freedom not Speech or Beer.
Thank You and Welcome to the Internet.

DarkWolf

Just remembered

I know I can do a case insensitive pattern match for characters but what about a case insensitive match for strings ?
--
Where's there's smoke, There are mirrors.
Give me Free as in Freedom not Speech or Beer.
Thank You and Welcome to the Internet.

Randall Hyde

Quote from: DarkWolf on June 19, 2005, 08:38:23 PM
Just remembered

I know I can do a case insensitive pattern match for characters but what about a case insensitive match for strings ?
Try matchistr. IIRC, it should be present.
Cheers,
Randy Hyde

Randall Hyde

Quote from: DarkWolf on June 19, 2005, 08:35:14 PM
I reversed them ( oops! )

the first one should be pat.oneOrMoreCset and the second is pat.zeroOrMoreCset

But I need to exclude 'xml' , no matter what case the string 'xml' is reserved. I am thinking using pat.alternate may be a good solution.

Yes. First, try to match "xml" and then, as an alternate, match your identifiers.  It is important, however, to attempt the match on xml first.
Cheers,
Randy Hyde

DarkWolf

Thanks,

Figures I would miss that ' i ' in there on my first read through.

Other questions though:

// [1] document      ::= (prolog element Misc*) - (Char* RestrictedChar Char*)
// [10] AttValue   ::=  '"' ([^<&"] | Reference )* '"' | "'" ([^<&'] | Reference )* "'"

Anyone famaliar with Backus Naur ? There are two questions above I am not sure about, otherwise I understand them.
Does ^ in 10 mean that the characters that follow are some sort of excluded set ?
In 1 RestrictedChar is excluded from document but why is it preceded and followed by Char which is included ?
--
Where's there's smoke, There are mirrors.
Give me Free as in Freedom not Speech or Beer.
Thank You and Welcome to the Internet.

DarkWolf

Below should be attached what I have worked on so far.
( Not as muched as I would have liked )

Maybe that will help in case my questions have not been descriptive enough.
I organized the project to also show what I intend to do with it.

[attachment deleted by admin]
--
Where's there's smoke, There are mirrors.
Give me Free as in Freedom not Speech or Beer.
Thank You and Welcome to the Internet.

Sevag.K

Haven't looked too in depth yet, but I do have one comment to make on your use of HIDE.

Apparantly, you used "add existing..." which copies the entire path of the file as-is, which means that
it's practically unusable on somebody elses' computer (where they may not have the same
folder setup as you).

I've remedied this problem for the next version of HIDE by adding an 'import' option which will
make a copy of an existing file.  I'll release it as soon as I finish the update (which should be in
the next couple of days).


Randall Hyde

Quote from: DarkWolf on June 27, 2005, 09:14:38 PM
Thanks,

Figures I would miss that ' i ' in there on my first read through.

Other questions though:

// [1] document      ::= (prolog element Misc*) - (Char* RestrictedChar Char*)
// [10] AttValue   ::=  '"' ([^<&"] | Reference )* '"' | "'" ([^<&'] | Reference )* "'"

Anyone famaliar with Backus Naur ? There are two questions above I am not sure about, otherwise I understand them.
Does ^ in 10 mean that the characters that follow are some sort of excluded set ?
Yes, though this is UNIX regular expression syntax rather than BNF.

Quote
In 1 RestrictedChar is excluded from document but why is it preceded and followed by Char which is included ?
Probably because they're trying to say that *any word* containing the restricted character is to be ignored.
Cheers,
Randy Hyde

DarkWolf

Thanks Kain, didn't know using "add existing" would have been a problem.
Actullay I was thinking of throwing a generic makefile in there for those not using an IDE or HIDE.
Question, can there be additional subfolders in the "src" directory like src/someotherdir ?

To Randy:

Ah figures, I am not famaliar with Unix expressions, no wonder it didn't really make any sense.
W3C has got some screwed up ways of doing things, they shouldn't have mixed two notation styles.
--
Where's there's smoke, There are mirrors.
Give me Free as in Freedom not Speech or Beer.
Thank You and Welcome to the Internet.

Sevag.K

Quote from: DarkWolf on June 30, 2005, 04:20:40 PM
Thanks Kain, didn't know using "add existing" would have been a problem.
Actullay I was thinking of throwing a generic makefile in there for those not using an IDE or HIDE.
Question, can there be additional subfolders in the "src" directory like src/someotherdir ?

Unfortunately no, the project files are setup as a 'relative' system and the 'src' folder is hardcoded in
the source.  It's a design implementation I made early on that won't be easy to change.  It's something
to consider for HIDE 2.0

You did throw in a good idea for me though, I'll see if I can whip up a tool that will convert a HIDE
project to a makefile with Borland make compatibility.



DarkWolf

Edited the first post, will try to keep the most recently worked on files up there.

The project is by no means anywhere near something useful, so keep that in mind.
--
Where's there's smoke, There are mirrors.
Give me Free as in Freedom not Speech or Beer.
Thank You and Welcome to the Internet.

DarkWolf

I think I am making progress. ( maybe backwards : )  )
There are things I have been stumbling on and those are noted somewhere in the source or exta docs.

Still want to get some unicode support but I don't know how to code that. Anyone know ?

I have been thinking about the parser's API, should be notes on that too in there.
Right now I still want to finish the productions from the XML specs.
--
Where's there's smoke, There are mirrors.
Give me Free as in Freedom not Speech or Beer.
Thank You and Welcome to the Internet.