For your amusement - a Bootstrap compiler

Started by tenkey, October 11, 2005, 05:27:50 AM

Previous topic - Next topic

tenkey

A few years ago, I wrote this compiler to get a self-compiling system going with a minimum of effort. To get it going, I used VC to "kick start" the compiler, and left all the hard work of encoding instructions and managing a symbol table to the assembler, MASM. Then I built a compiler for this very primitive language, in the new language. I used this as a starting point for evolving a series of compilers.

Recently, I resurrected the first self-compiling version (actually Version 0), and extended it a bit. It now generates a binary file containing valid executable instructions. It can't be linked with any linker, but it can be loaded and run without any "relocation". I haven't tested the compiler on an old slow machine, but it's definitely slow on a 1+ GHz machine.

Control structure-wise, this language itself programs like "raw" assembly language. There are no high level IFs and loops. Such control structures must be translated into jumps, conditional and unconditional. There are also no clear boundaries to where subroutines begin and end. Like "raw" assembly language, the compiler has no idea whether a name is a data label or a code label.

You can do what you like with it. The language is ugly, but it's simple. The target (executable) code is also ugly, but it works.

----- And you get a binary executable (as well as a usable MASM equivalent) -----

One of the next steps is to build a proper lexer, so that cryptic commands such as ^ @ . ! and ~ can be replaced by more meaningful symbols such as declare call goto return ->

Another followup is to implement subroutines with local (stack) variables, so that a recursive descent parser can be written with some clarity. Having temporary variables local to a subroutine is also a big help in avoiding reuse (sharing) bugs.

And I suppose it wouldn't hurt to wrap the raw binary with some COFF overhead.


[attachment deleted by admin]
A programming language is low level when its programs require attention to the irrelevant.
Alan Perlis, Epigram #8

ThoughtCriminal

Quote from: tenkey on October 11, 2005, 05:27:50 AM
And I suppose it wouldn't hurt to wrap the raw binary with some COFF overhead.
That would be nice to see. Lots of information for making compilers and assemblers, almost none for guidence on making an .OBJ file.


hutch--

ThoughtCriminal,

Do a web search for a file from Microsoft called PECOFF.DOC. I don't class it as good but it has the basic information you need.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Vortex

Quote from: hutch-- on October 13, 2005, 02:54:21 PM
ThoughtCriminal,

Do a web search for a file from Microsoft called PECOFF.DOC. I don't class it as good but it has the basic information you need.

This document is available from :

http://www.microsoft.com/whdc/system/platform/firmware/PECOFF.mspx

ThoughtCriminal

Thanks.  I have that doc.  What I mean is there is no real information/example source about implemeting the COFF format.

tenkey

As it turned out, putting a COFF wrapper around position-independent code was not too hard.

I've put together a prettified version of Bootstrap that has a lexer and creates a simple COFF module. There's nothing you can use to link into the Win32 API, but it's a start. It's enough for making a subroutine module provided you don't have a data section, and you don't use the equivalent of OFFSET some_label or DWORD some_label or EXTERN some_label or some_label DWORD some_data.

The prettified Bootstrap...

[attachment deleted by admin]
A programming language is low level when its programs require attention to the irrelevant.
Alan Perlis, Epigram #8

ThoughtCriminal