News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

HLA v2.0 Progress

Started by Randall Hyde, January 02, 2005, 03:03:42 AM

Previous topic - Next topic

Randall Hyde

Hi All,
I've just posted some more code for the ADK (assembler developer's kit) to Webster. This is the code foundation for HLA v2.0. The latest version provides support for HLA-like macros.

Read about it (and download the code) at
http://webster.cs.ucr.edu/AsmTools/RollYourOwn/index.html
or
http://webster.cs.ucr.edu/AsmTools/HLA/hla2/0_hla2.html
Cheers,
Randy Hyde

cman

I here the new version of HLA will be written in assembly code ( is this true ? ). Will the new version still use parser/lexer generator software ? If so how will that work with lex/bison? Thanks.......

Sevag.K

The lexer is built in.  Check out the source, a good chunk of the lexer is already complete.


Randall Hyde

Quote from: cman on January 06, 2005, 05:09:08 PM
I here the new version of HLA will be written in assembly code ( is this true ? ). Will the new version still use parser/lexer generator software ? If so how will that work with lex/bison? Thanks.......

HLA v2.0 is written completely in HLA (v1.x, for now, but it will port very easily to HLA v2.0 once HLA v2.0 becomes real).

The lexer is highly optimized assembly language. It is relatively complete at this point (no doubt I'll add a few more reserved words as time passes, but this doesn't involve changing any of the human-written code). In a sense, you could say that *part* of the lexer uses "lexer generator" software. I wrote a program (in HLA) that reads lists of HLA reserved words and emits *highly optimized* machine code sequences to recognize those reserved words, but this produces fair better code (and easier to maintain code) than you'd get by hand.  Download the HLA v2.0 source code and check out the "rw" (reserved words) directory. I think you'll agree this is the right approach to recognizing reserved words.

There is no flex or bison code present in HLA v2.0. The intent is to make the whole project "self-compiling" once the HLA v2.0 feature set reaches the HLA v1.x level. My plan is to design HLA v2.0 so you won't require any other tools to work on HLA source code other than the tools you'd use to create normal HLA applications.
cheers,
Randy Hyde

AsmPlayer


Ghirai

Great work, can't wait to see v2  :U
MASM32 Project/RadASM mirror - http://ghirai.com/hutch/mmi.html

cman

Quote
I wrote a program (in HLA) that reads lists of HLA reserved words and emits *highly optimized* machine code sequences to recognize those reserved words, but this produces fair better code (and easier to maintain code) than you'd get by hand.


Could you describe this algorithm? Also , what type of parser will the new assembler/compiler use? Will this be code generated? This is a really beatiful project!

Randall Hyde

Quote from: cman on January 14, 2005, 01:42:06 PM
Quote
I wrote a program (in HLA) that reads lists of HLA reserved words and emits *highly optimized* machine code sequences to recognize those reserved words, but this produces fair better code (and easier to maintain code) than you'd get by hand.


Could you describe this algorithm? Also , what type of parser will the new assembler/compiler use? Will this be code generated? This is a really beatiful project!

Well, you can check out the source code yourself at
http://webster.cs.ucr.edu/AsmTools/HLA/hla2/0_hla2.html

But the basic idea is not too different than what Hutch has done here on the MASMForum. I've hand-written a lexer that recognizes things that look like identifiers and then I call a machine-generated function that compares a scanned (indentifier) string against a list of reserved words (returning a match if the "identifier" is actually a reserved word). The basic algorithm is

1) Hash off the length of the string (up to the maximum reserved word length) and use a separate code path to handle identifiers of a given length. This dramatically simplifies string comparisons as I always know the length of the input string (and only have to consider those reserved words of the same length).

2) Hash off a value computed based on the identifier.  Standard hashing stuff here.

3) Load the entire input string into registers and do string comparisons using dword comparisons against immediate values (very fast).

4) Use a binary search (encoded in the code stream) to differentiate reserved words that happen to collide in the hash table.

With something like 800 reserved words, the number of collisions is actually small, so the reserved word comparison is actually quite fast.  Combined with the fact that the HLA v2.0 lexer uses memory-mapped files (which simplifies algorithms that operate on stream data from a file), the whole thing runs very fast. Indeed, based on some benchmarks I've run, the HLA v2.0 code written thus far tends to blow away other assemblers in terms of speed (when processing identical constructs, that HLA v2.0 can currently handle).

I'm not going to make any claims like "HLA v2.0 will be the fastest of all actual assemblers..." because my plans for the code generator and optimizer are going to sacrifice some speed for generality (building data structures that allow one to easily add on different code generators means that the assembler is going to have to do more work). But I can promise that HLA v2.0 will be "nearly instantaneous" for most practical files you'll want to assemble with it. Though it's a bit early to make speed claims, my belief is that loading in the files you process during an assembly will take more time than the assembly process itself.
Cheers,
Randy Hyde