Load and go assembler to bootstrap? And one pass assembler.

Started by ThoughtCriminal, May 03, 2005, 08:57:17 AM

Previous topic - Next topic

ThoughtCriminal

Got married in november, it seems to be next to impossible to program when you are married.  Now that I'm the project manager at a game company and my job is not to produce content, but to make other people are making content, I have alot of dead time.  If I'm not making the weekly schedules or in a meeting about something or other, I am probly surfing the internet.  Sine\ce that rots my brain, I will use assembly programming to keep it sharp.  I know work come first, but I'm mangment  :bg

I will now get back to the assembler I wanted to make.  At first I'm pretty sure I was desinging it as a load and go assembler.  The assembler builds the executable in its own address space and allows the executable to once finished assembling.  I want COFF support, with debugging data support to, eventually.  My plan is first to make a load and go assembler with enough support for a few instructions and imports, then add COFF obj support.

For some reason the idea of a 1 pass assembler is easier for me to comprehend.  I ran into one problem with what I tried already, what to do with the the two endline bytes.  They have no syntactic singnificance, unless someone want to make an endline operrator or something.  :eek The other is how to handle included files.  If I have one pass and I have an include mid source I now have to copy the the source below the include forward to the size of the include.  Not hard but maybe a cleaner way would be to use a formating pass.  Remove the endlines, remove extra whitespace, and include the files inplace.  Load the main file, when I hit and include copy the souce up to that part to another buffer and load the include below the source in the new buffer.  Now if the loaded include has its own include copy agin to its own buffer and load the include.  Might have to have a maximum include depth.  Anyway the idea is to have one pass to make the souce easier to parse.  I'm not married to it only being a one pass assembler, but it will start as one.

Am I understanding this stuff right?
Any of the above a stupid/difficult way to do things.
I do not want to reinvent every wheel.  I will use the C libs and a hash table made by some one else.

Thanks for any input.


Randall Hyde

Quote from: ThoughtCriminal on May 03, 2005, 08:57:17 AM

For some reason the idea of a 1 pass assembler is easier for me to comprehend.  I ran into one problem with what I tried already, what to do with the the two endline bytes.  They have no syntactic singnificance, unless someone want to make an endline operrator or something. 
Having an end of statement token makes error recovery a whole lot easier.

Quote
:eek The other is how to handle included files.  If I have one pass and I have an include mid source I now have to copy the the source below the include forward to the size of the include.  Not hard but maybe a cleaner way would be to use a formating pass.  Remove the endlines, remove extra whitespace, and include the files inplace. 
This is, for example, what CPP (the C preprocessor) often does for the C compiler. Indeed, you could make a preprocessor pass and expand macros while you are at it.

Quote
Load the main file, when I hit and include copy the souce up to that part to another buffer and load the include below the source in the new buffer. 
I have three words for you: memory-mapped files.
They make handling include files very easy. If you want some code you can start with, take a look at the Assembler Developer's Kit on Webster at
http://webster.cs.ucr.edu/AsmTools/RollYourOwn/index.html

The lexical analyzer (which is a pretty good one, btw) handles (nested)  includes, macro expansion, conditional assembly (and other compile-time language facilities). It also uses memory-mapped files for all source code input. Even if you're not interested in all the other stuff that the ADK provides (e.g., HLA-style declarations), the lexer might give you some ideas and a big head-start on your coding.


Quote
Now if the loaded include has its own include copy agin to its own buffer and load the include.  Might have to have a maximum include depth.  Anyway the idea is to have one pass to make the souce easier to parse.  I'm not married to it only being a one pass assembler, but it will start as one.
Handling include files isn't the difficult thing to do with a single-pass assembler. For example, the ADK code currently makes only a single pass over the source file. The place where you start getting into trouble is during code generation. If you want to do things like branch displacement optimization, you'll need to make multiple passes over the code you generate (we generally call those "phases" to differentiate them from making passes over the source code). Of course, if you use memory-mapped files, the source code is, effectively, sitting around in memory anyway, so making multiple passes over the source code isn't that big of a deal. Indeed, jumping around randomly in the source file is pretty easy, too.


Quote
Am I understanding this stuff right?
Any of the above a stupid/difficult way to do things.
I do not want to reinvent every wheel.  I will use the C libs and a hash table made by some one else.
Definitely take a look at the ADK. It's written in HLA, not C, but it can give you a big head start on things. And the current lexer and parser is *very* fast (typicallly processing about 250,000 to 750,000 lines per second on a modern PIV processor).
Cheers,
Randy Hyde


ThoughtCriminal

Thank you for your comments Randall.  I took a look a Bogdan's assembler.  That helped put more things in perspective.  I'll take a look at the ADK too.  I'll be very interested when you get to the object module generation part.  Have yet to find one tutorial or helper library for object module generation.

Thanks.

Rifleman

OBJ Specification [Microsoft Corporation] can be downloaded at http://www.wotsit.org - Page 29.

Paul