New ADK/HLA v2.0 release

Randall Hyde · February 27, 2005, 03:11:57 PM

Hi All, I've just posted a new version of the "Assembler Developer's Kit" (ADK) to Webster. You can find it at

http://webster.cs.ucr.edu/AsmTools/RollYourOwn/index.html

The ADK is the basis for HLA v2.0, so those interested in watching the development of HLA v2.0 will want to take a look at this.

==========================================
Assembler Developer's Kit (ADK) / HLA v2.0
==========================================

Many beginning assembly language programmers think that
programming tools such as compilers and assemblers are "magic"
systems. Somehow, these amazing programs convert ASCII-based
source files into executable code, using techniques
sufficiently advanced that they truly are "magic" to the
beginning programming.

Of course, as time passes and the beginning programmer gains
experience and knowledge, they learn the basic concepts of
opcode construction in memory and many such programmers decide
to create their own "assembler" to help them better understand
the process of code generation. Such assemblers generally
start off as "experiments" and rapidly become development
tools that the author (and, possibly, other programmers) use
for software development.

A big problem with such "hobby" assemblers is that they rarely
contain professional-quality features. Once the author gets
past the "interesting" parts of the project, specifically:
converting mnemonics to opcodes, they quickly discover that
the rest of the language system consists of a lot of "grunt"
work and their attention quickly turns elsewhere. As a
result, most of these "hobby" projects wind up containing
minimalist features above and beyond the processing of basic
machine instructions. A classic indication of this phenomenon
is the lack of a powerful macro processing system (or even a
lack of macro facilities altogether). Another indication of
this problem is the lack of decent data structure support in
the assembly language created by the author (e.g., no
"structs" or "records" in their language). The problem is,
implementing these features requires considerable technical
knowledge and is, quite frankly, a lot of work. Authors of
such products respond with comments like "such features are
not 'true' assembly language and don't belong in an
assembler." Nonsense. Commercial quality assemblers have had
these features for at least two decades. They are an absolute
requirement in a modern, commercial-quality assembly language
product.

Another problem with many "hobby-level" assemblers is low
performance. Though some hobby assemblers are every bit as
fast as (or faster than) their commercial counterparts, it's
also the case that many such hobby endeavors are written by
programmers who don't have much experience writing
high-performance compilers and assemblers. As such, the
performance of such products tends to degenerate as you
attempt to process larger and larger projects with them.

The "Assembler Developer's Kit" (ADK) is a language design
toolkit intended to help alleviate the problem of
low-performance, low-feature, assemblers (and other language
systems). The ADK provides a huge amount of public-domain
(totally free) source code that you can use to create
assemblers, compilers, interpreters, and scripting languages.
It provides a high-performance solution with pre-written
source code that handles all the "hard" stuff found in a
commercial-quality, high-featured assembler:

1.   The ADK provides a very powerful macro and
   "compile-time language" system. Far more
   powerful than any existing assembler (even
   more powerful than MASM, TASM, and HLA).

2.   The ADK provides a high-performance lexer/parser
   that operates at between 100,000 and 1,000,000
   lines per SECOND on a 3.00 GHz Pentium (actual
   speed varies based on the size of the source file
   and the complexity of the source code the ADK code
   is processing). For all intents and purposes,
   this is instantaneous for all but the very largest
   source files.

3.   The ADK supports a wide variety of data types
   and operations. Integers from 8 bits to 128 bits.
   32, 64, and 80 bit IEEE floating point formats.
   Character sets supported include 7-bit ASCII,
   8-bit extended ASCII, and 16-bit Unicode.
   Character sets, arrays, records, unions, and
   classes/objects. Plus many more advanced data
   structures.

4.   The ADK supports a wide range of compile-time
   operators and functions. Included with the
   ADK are hundreds of compile-time functions.
   Plus you have the ability to create your
   own compile-time functions using the ADK's
   macro facilities.

5.   The ADK fully supports namespaces, nested
   declarations (block structured languages),
   and other advanced symbol structures that
   you don't find in hobby, or even in many
   "professional quality" assemblers.

6.   The ADK supports the most poweful macro system
   found in any x86 assembler. It is so powerful,
   you can easily create just about any statement
   (e.g., HLL-like statements) using nothing more
   than the ADK macro facilities. Combined with
   the ADK's "compile-time language" facilities,
   no other assembler (or other language processor)
   can touch the ADK's capabilities.

7.   The ADK's compile-time language fully supports
   facilities such as include files, conditional
   compilation/assembly, compile-time looping,
   and compile-time complex data types for constants
   (e.g., compile-time array, records/structs,
   unions, and character sets).

8.   The ADK provides an advanced, high-performance
   symbols management system (symbol table) using
   fast algorithms while providing considerable
   flexibility for nearly any symbol organization
   you can imagine.

9.   The ADK is well-tested. The package includes
   a regression test suite that tests every
   executable line of code in the program
   (i.e., a "code coverage" test). Furthermore,
   the test suite is automated, so an ADK
   developer can easily run the tests after
   making changes or extensions to the ADK.

10.   The ADK is robust. It includes lots of
   "defensive" code to check for unexpected
   conditions. It is also full of ASSERT
   statements that check for illegal values
   and unexpected results (the ASSERTs can
   be disarmed and eliminated from the code
   by changing *one* statement in the source
   files). It handles large and small projects
   with relative ease (e.g., one of the programs
   in the test suite is 4MB long, containing
   over 115,000 lines of code (packed, almost
   no blank lines and no comments, with roughly
   one symbol definition and two additional
   symbol references per line; on a 3GHz PIV,
   by the way, the ADK compiles this file in
   454 ms at a rate of 253,682 lines/second).

11.   The ADK will be portable across OSes (currently,
   only Win32 is supported, but there is only one
   module that contains OS-specific code and this
   will be modified to support Linux in the near
   future).

12.   The ADK source code is well-documented and
   well-commented. The lexical analyzer is easy
   to maintain. New reserved words can be added
   to the language by adding them to a text file
   and running a utility to generate the high-
   performance scanner for the lexical analyzer.
   Extending the lexer is a relatively straight-
   forward process. The parser uses an easy-to-
   understand and maintain recursive-descent,
   predictive, parser. Though the ADK provides
   a large amount of source code, programmers
   with a minimum of language design/implementation
   theory behind them can easily extend the
   system according to their own desires. Though
   the current syntactical model of the ADK
   is that for HLA v2.0 (as the ADK is being
   used to develop HLA v2.0), it's easy enough
   to modify the syntax that the ADK accepts
   to support any syntax you choose.

13.   The ADK is written in HLA. As the ADK is
   being used to create HLA v2.0, this means
   that the ADK is written in the language that
   the ADK (natively) supports. Therefore, you
   don't have to learn a different language
   in order to work on the ADK source code
   (assuming you keep the existing ADK syntax).
   The syntax for the ADK source code is very
   similar to the language that the ADK is
   written in.

14.   The ADK provides a solution for most of the
   "grunt" work needed to create a high-performance
   fully-featured assembler. As this is being
   written, the ADK contains just under 90,000
   lines of source code.

15.   As noted, the ADK is being used to develop
   the High Level Assembler (HLA) v2.0. Therefore,
   you can be assured of continuing development
   of the ADK system.

-------------
Using the ADK
-------------

The ADK is written in assembly language, using HLA v1.x.
Therefore, to use the ADK you will need to install HLA (if you
haven't done so already). You can obtain a copy of HLA
(the High Level Assembler) from http://webster.cs.ucr.edu.

The ADK uses a standard "make" file to control building the
final executable. You will need a "make.exe" program in
order to build the ADK.   You can use Microsoft's "nmake.exe"
program for this purpose (note, however, that if you want
to use the automated test suite that accompanies the ADK,
you'll need a make program that is actually named "make.exe";
if you're using Microsoft's nmake.exe program, just make a
copy and name the copy "make.exe").

Once you've installed the HLA v1.x system, you can build the
ADK system by CD'ing into the HLA2 subdirectory (which
contains the ADK) and type "make". In a minute or so, you
should be greated by the completion of the ADK compilation.

NOTE TO "FHLA" USERS: if you want to compile the ADK using
HLA and FASM (rather than HLA and MASM), then use the
command line:

   make -f makefile.fhla

This will create the executable file (HLA2.EXE) using HLA and
FASM rather than HLA and MASM. Note that the test suite only
uses the HLA/MASM combination. If you want to use FASM in the
test suite, you'll have to go in an modify all the make files
in the test suite yourself.

Once the compilation of the ADK is complete, you'll wind up
with the executable file (assuming no compilation errors)
"HLA2.EXE". As this is being written, the ADK executable
(HLA2.EXE) compiles declaration sections only. Support for
machine instructions (or other language statements) is left to
the assembler developer who is using the ADK as a development
aid.

--------------------------
Extending the ADK Language
--------------------------

The ADK, as it exists while this is being written, supports
declaration sections for an HLA-like language. If you don't
particularly like this syntax, that's okay. You can easily
change the parser to use any syntax you like. You can remove
features you don't want to support (much easier than adding
features).

The ADK is relatively "processor-neutral" at this time. Though
the lexer does recognize certain x86 reserved words (such as
mnemonics and register names), it's a simple matter to replace
those x86-specific names with others (for example, if you
wanted to develop a PowerPC assembler). The x86 reserved words
are only referenced in a few limited places in the ADK source
code, so you can easily change them, if that's what you want
to do. Other than reserved words, the only place x86-isms
creep in are in the code that looks for memory expressions.
This code is easy to remove (and replace by something else) if
you decide to create an assembler for a different processor or
if you decide to create a high-level language rather than an
assembler.

Currently, there are a couple of documentation files in the
"Doc" subdirectory. However, these are starting to get out of
date and the information is eventually going to wind up in
comments in the source code. Nevertheless, you might want to
scan through these documents -- they'll give you a good idea
of how the ADK system operates (internally).

----------------------------------------

===============================
Assembler Developer's Kit (ADK)
===============================

---------------------
Regression Test Suite
---------------------

This directory contains several subdirectories
containing a wide variety of test programs and
data files for the ADK. To utilize this test
suite requires the following:

1. This subdirectory *must* be installed in the HLA2
(source code) directory.

2. You must build the HLA2.EXE file (by making all
the files in the HLA2 subdirectory) and the
executable must be sitting in the HLA2 subdirectory.

3. You will need some version of the "MAKE" program.
This test suite was built using Borland's MAKE.EXE
program (it comes with their free command-line
C++ compiler kit). Any standard version of MAKE
should work just fine, but it *must* be named
MAKE.EXE (i.e., Microsoft's nmake.exe won't work
unless you make a copy of it to the file "make.exe").

4. You must have succesfully installed HLA v1.x

To run this test, open a command window and CD into
the "...HLA2\tests" subdirectory. Type "make" and
and the rest is automatic. The test suite will stop
on one of two conditions:

1. Some error is detected while compiling the programs
in the test suite.

2. The test concludes without error. In this case, the
test file prints the following message:

********************************************************************
* *
* T E S T S C O M P L E T E D S U C C E S S F U L L Y !!! *
* *
********************************************************************

------------------------------------------
How the Automated Test Files are Organized
------------------------------------------

Within the ...\hla2\tests subdirectory you will
find several subdirectories. The files in this
test suite are organized according to the HLA v2.0
source files that they test. For example, the
files in the "coerce" subdirectory test the code
found in the HLA v2.0 (ADK) "coerce.hla" source
file module; the files in the "parseConsts"
subdirectory test the code in the "parseConsts.hla"
source file; and so on...

Within each test module subdirectory, you will find:

1. A "text" subdirectory
2. A "tmp" subdirectory
3. A "makefile" source file for the make program
4. A "txt.mak" source file for the make program
5. A set of HLA source files (".hla" and, possibly, ".hhf")

You might also find some ".TXT" files in the module subdirectory,
but these are transient files that go away when you issue a "make
clean" command.

The "makefile" make file is responsible for running the tests on
the particular files in this module's subdirectory. The main
"makefile" (in the ...\hla2\tests subdirectory) invokes each
module's "makefile" when running the whole test. You can run the
tests for an individual module (without running the entire test
suite) by CD'ing into the particular test module's subdirectory and
typing "make" at the command line.

The "txt.mak" file is a special makefile used to build the test
data files in the "text" subdirectory. We'll discuss the purpose of
this file later in this document.

----------------------------
How the Automated Test Works
----------------------------

This regression test suite consists of three basic
file types: makefiles, HLA source files (".hla" and
".hhf" files), and data (text) files.

Each test is run as follows:

1. HLA2 is used to compile a test file. A typical
command line looks like this:

   ..\..\hla2 -test -sym -v xctl5.hla2 >xctl5.txt

The "-test" command-line parameter tells HLA2
not to print certain information to the output
file that can change on each assembly (e.g.,
date and time information, line number information
within the HLA2 compiler's source files, and
machine addresses used during compilation).

The "-sym" command line parameter tells HLA2
to print a symbol table dump after compilation
is complete. This output is written to a text
file (xctl5.txt in this case) via I/O redirection.

The -v option ("verbose" mode) tells HLA to print
some extra information during compilation. This
information is also written to the output text file.

2. The output produced by I/O redirection is then
compared against a known, valid, output file to
see if there are any differences. If the newly
produced file exactly matches the known file, then
the test succeeds. If they do not match, then
the comparison program ("hlacmp") will display
the offending lines in the two files (this will
also stop the regression test suite). The
output file comparison is done by the following
command (in the makefile):

   ..\hlacmp xctl1.txt   text\xctl1.txt

   "hlacmp" displays "Files compare OK" if the
   two files specified on its command line are
   identical. It prints the offending lines if
   there are any differences in the files.
   "hlacmp" also returns a program status code of
   zero if the files match, it returns a non-zero
   value if they do not match. This return value is
   what the make.exe program uses to determine whether
   it should stop the test suite. Note that the
   "hlacmp.exe" source code (HLA v1.x sources) is
   included in the test suite; you'll find this
   source code in the "...\hla2\tests" subdirectory.

The "known, valid, output files" have to be created before running
the test suite and validated by hand. That is, you must manually
run the command

   ..\..\hla2 -test -sym -v xctl5.hla2 >xctl5.txt

and then carefully study the output that HLA2 produces and verify
that it is correct. Once you've verified that the output is correct
(perhaps, after correcting defects in the HLA2 source module), you
then copy this text file to the "text" subdirectory in the current
module.

It cannot be overemphasized how important this manual verification
step is. Once you copy the output text file to the "text"
subdirectory, the tests will compare all future output from the
test operation against this "valid" file. If the "valid" file
indicates some sort of error, then the tests will pass as long as
that error exists in the system. Therefore, it is important to make
sure that there are no errors in the code.

Note that "errors" does not imply a correct compilation by HLA2.
You may supply HLA2 with syntactically incorrect source code in
order to test HLA2's error handling capabilities. Although there is
an error during compilation, the output text file is still valid if
HLA2 produced the correct error diagnostics. Indeed, the *vast*
majority of the test files in this regression test suite test error
conditions.

------------------------------------
How to Use the Regression Test Suite
------------------------------------

A big problem with software development, especially on large
projects, is the fact that changes to one part of the project may
have unintended effects on other parts of the project. The purpose
of a regression test suite is to provide a facility to test for
such problems that occur during software development. For example,
if you're working on the parseConsts module in the HLA2 source
code, changes you make in this module could affect how the
"expr.hla" source module operates. By running the full regression
test suite after making changes to some local portion of the HLA2
source system, you can test the global effects of those changes.

The ADK/HLA2 regression test suite contains two types of test data
(HLA2 source files):

1. White-box generated test data intended
to achieve code coverage.

2. Black-box generated functional test
data.

White-box test data is created by studying the HLA2/ADK source
code. Test data (that is, HLA2 sample source code) is chosen in
order to exercise explicit code sequences within the program. The
regression test suite attempts to achieve "code coverage". This
means that once the test suite runs successfully, you've executed
every statement in HLA2 at least once. In reality, total code
coverage is rarely possible because most programs (and HLA2 is
certainly in this category) contain "defensive code" sequences that
should never execute. However, ignoring such code, the regression
test suite does a pretty good job of achieving code coverage on the
HLA2 source files.

Black-box test data is generated using functional specifications
rather than by looking at the program's source code. Functional
tests are great for finding problems that occur when combinations
of code produce incorrect result (such tests may not be done if
you're only attempting to execute each statement in the program at
least once).

Functional tests using black-box test data generation are perpetual
(as long as the specifications for the program don't invalidate the
test). Code coverage tests and tests involving white-box generated
test data can be invalidated by changes in the source file. For
example, if you delete a range of code, then the tests that checked
out that range of code no longer apply. However, assuming the test
code remains syntactically valid, the test itself is still useful:
the test now becomes a functional test rather than a white-box test
data test.

Maintaining code coverage is a bit of work. Whenever you add code
to the system, you have to extend the white-box generate test data
to handle the new source code. "Code coverage" is the *minimum*
level of testing demanded by the IEEE for professional software
engineering. Therefore, if you add any code to the HLA2/ADK system,
it's important to add additional tests to ensure that you test all
the code you've written (or, at least, verify that existing tests
provide code coverage of your changes).

The HLA2/ADK regression test suite is one of the most important
tools for ensuring that HLA2 achieves "commercial quality".
Therefore, it is important to maintain this test suite so that it
accurately reflects the underlying source code.

Fortunately, because the test suite is automated, there is very
little pain associated with testing changes you make to the
HLA2/ADK system. With one command and a minute or so of your time
you can run the complete test suite. Maintaining the test suite
(which doesn't require *that* much work on an incremental basis)
guarantees that testing changes to the system is painless and easy
-- which means it will get done often.

News:

New ADK/HLA v2.0 release

Randall Hyde