The MASM Forum Archive 2004 to 2012

General Forums => The Workshop => Topic started by: travism on June 27, 2009, 06:45:04 PM

Title: Writing an assembler.
Post by: travism on June 27, 2009, 06:45:04 PM
Ive written some basic interperters in the past and just finished my fun project of a brainf*ck interperter, Now I want to step it up a little bit and actually write an assembler. Ive got some documents on the COFF file format and the intel manuals. Ive read instruction encoding, but Im still not understanding how your actually suppose to encode each instruction or how your actually suppose to write the coff file? I also have 2 pdf's on compiler technology but nothing explains anything about actually writing object files or encoding instructions its just all about parsing, which is the easiest part. Does anyone have any information or anything? Ive already been looking at the sources of jwasm and fasm to see how others have done it. Any help would be appreciated thanks! :)
Title: Re: Writing an assembler.
Post by: dedndave on June 27, 2009, 06:58:44 PM
as for the coff format, Vortex is the man - his website has all the proper docs
and, he can assist you if you have a hard time understanding parts

as for the encoding, it is not too difficult, there are just a lot of instructions to encode
and, you will want to provide support for different extended instruction sets, as well - that part will be confusing
you will also want to look at the amd material - and know what instructions are unique to the different processors
you will make a lot of friends if you make a good 64-bit assembler - lol
an assembler (or compiler, for that matter) is mostly a big text parser
providing macro support will be a big part of the task

which intel docs do you have ? - i may have a couple that can help
i think the main one will be the "Intel 64 and IA-32 Architectures Software Developer's Manual" (5 pdf's i think)
that manual fully describes instruction encoding
Title: Re: Writing an assembler.
Post by: dedndave on June 27, 2009, 07:20:08 PM
intel manuals (these will get you started)
http://www.intel.com/products/processor/manuals/

amd
http://www.amd.com/us-en/Processors/DevelopWithAMD/0,,30_2252_875_7044,00.html

cyrix
http://www.sandpile.org/docs/cyrix.htm

(sandpile has lots of documentation links)

Vortex's site
http://www.vortex.masmcode.com/
Title: Re: Writing an assembler.
Post by: travism on June 27, 2009, 08:05:39 PM
Wow thanks a bunch for those links, right now im still in the planning phase for it, deciding on syntax to support and writing a outline of the assembler. I really hope this works out, I just wanted to test encoding instructions and outputting to a object file with my brainf*ck interperter to understand how it works before I use it in an assembler, I will have a look at all those manuals. Thank you again!
Title: Re: Writing an assembler.
Post by: dedndave on June 27, 2009, 08:10:01 PM
ya know, Travis, i admire you for taking on such a large project
i would like to write a debugger/disassembler/resource viewer/pe editor/etc/etc
but, i am not ready for that, yet - maybe someday before i am too old - lol
Title: Re: Writing an assembler.
Post by: travism on June 27, 2009, 08:57:38 PM
Hey thanks, Vortex and also Jeremy with his GoAsm assembler have really inspired me to create my own. It will be a long process, but I will learn alot. :)
Title: Re: Writing an assembler.
Post by: Farabi on June 28, 2009, 03:21:41 AM
If Im not mistaken on 2005 there is an information from the svin to write an assembler. He wrote about bitfield.
Title: Re: Writing an assembler.
Post by: dedndave on June 28, 2009, 03:48:15 AM
you have a link Farabi ?

i found these - maybe they are what you are talking about

http://www.asmcommunity.net/board/index.php?topic=10554.0

http://www.asmcommunity.net/board/index.php?topic=22393.0

http://www.wasm.ru/baixado.php?mode=tool&id=210
Title: Re: Writing an assembler.
Post by: Farabi on June 28, 2009, 05:03:37 AM
Quote from: dedndave on June 28, 2009, 03:48:15 AM
you have a link Farabi ?

i found these - maybe they are what you are talking about

http://www.asmcommunity.net/board/index.php?topic=10554.0

http://www.asmcommunity.net/board/index.php?topic=22393.0

http://www.wasm.ru/baixado.php?mode=tool&id=210


:clap: whoa you are so fast. Yep thats him, I wonder where he went now?
Title: Re: Writing an assembler.
Post by: hutch-- on June 28, 2009, 07:23:01 AM
Travis,

I understand the task you have in mind is a large and complex one and it will take a lot of coding skill and work to get it to work properly. Much of the task will be in the PRE-CODING architecture and this will dictate how much success you will get in doing something this complicated. What I would suggest you do before you take on the doing part of the project is get a good idea of what you need to do to make something like this work.

Japheth's rewrite of JWASM is a good start as his code is well written and very clear. Tomasz Gristar's FASM is written directly in FASM so you will need to be able to read the notation but it is an excellent assembler. Some of our members know about techniques like recursive descent parsers, various temporary data storage methods and the like and general parsing knowledge will in fact be very useful to you.
Title: Re: Writing an assembler.
Post by: travism on June 28, 2009, 07:43:13 PM
Thank you all for all the information I have saved every bit of it. Hutch thank you for responding. I know this task will be very complicated and it will take a lot of time. I am not even thinking about coding it yet just some design principles and such. I am most confused right after the parsing so I am trying to read a lot on that. I hope soon ill begin to understand :/

Thank you all again
Title: Re: Writing an assembler.
Post by: travism on July 02, 2009, 03:45:37 PM
Wow so I have been reading everything under the sun about compiler technology. And so I've begun to start at the beginning the parse tree. A couple things I'm having trouble understanding is 1 line at a time read in and tokenized then that line sent to parser or whatever u choose next then to code generation? Or is the whole file first parsed then sent to the next stages?

I've tried looking at fasm source and jwasm and I am trying to see how they store there parse and ast trees. But I guess there is a zillion ways to do it?

Title: Re: Writing an assembler.
Post by: dedndave on July 02, 2009, 03:48:11 PM
the entire file
simplify everything for the next pass
i don't know if they expand includes and macros before (i.e. during) or after that pass
it seems like it would make sense to expand them as well - you could ask the RadAsm and GoAsm guys for a little direction, there
of course, a single-pass assembler would be nice, but i dunno how practical it would be to implement
i think MASM is a 3-pass assembler
they don't count the beautifier pass so it is called a 2-pass assembler (i think that's right)
all this seems rather simple until you take into account very large projects
then, you are generating temporary files and balancing memory against file-size and symbol space
also, everyone likes an assembler that is fast, but i like one that works - lol
maybe, better than trying to find documentation,
you might see if you can get the source for some assembler and browse it for ideas and concepts
i think the Gnu Assembler (GAS) is open-source
Title: Re: Writing an assembler.
Post by: travism on July 02, 2009, 04:21:14 PM
Hey dedndave thanks for the quick reply, yeah I was kind of figuring it would be more practical to do it that way... I've been studying the jwasm source but its a bit much I'm having a easier time understanding the fasm source code more which is so clean and fast. Its just hard trying to break it down at that low of level lol. It kind of sucks cuz I'm only getting bits and pieces of what it takes to make one. I'm not putting myself on any time constraint though, so I have all the time to make one :) thanks for your help and information
Title: Re: Writing an assembler.
Post by: hutch-- on July 02, 2009, 04:41:08 PM
travis,

Its a bit to do with what target you have in mind in terms of assembler complexity, at its simplest a bare mnemonic grinder of masm 5 technology or lower is a far simpler task than one that adds any pseudo high level capacity to it. A macro engine is another layer of complexity that collectively makes the result larger and a lot harder to code.
Title: Re: Writing an assembler.
Post by: travism on July 02, 2009, 05:20:29 PM
Quote from: hutch-- on July 02, 2009, 04:41:08 PM
travis,

Its a bit to do with what target you have in mind in terms of assembler complexity, at its simplest a bare mnemonic grinder of masm 5 technology or lower is a far simpler task than one that adds any pseudo high level capacity to it. A macro engine is another layer of complexity that collectively makes the result larger and a lot harder to code.

Yeah exactly that's why I'm trying to break down these sources into there simplest forms.. And I want to start this project right so later when I decide to add a macro type engine I can add it without rewriting the whole thing, I understand the whole parsing its just me trying to figure how the best way to store the tokens so the parser can read it well and then turn it into a ast and so forth without getting half way through and seeing I should have done it a different way. This will be a lot of reading :) thanks for the input hutch
Title: Re: Writing an assembler.
Post by: d0d0 on July 02, 2009, 07:16:39 PM
Travis, Big up mon!

I admire you mate. Here are some more links that might be useful:

checkout Randy Hyde's docs on Lexers/Parsers. I know it's on HLA but you can learn something out of it. There is also a link to a free book on compiler consturction - the first few chapters covers the creation of a basic assembler. it then multiple pass assemblers and macro capabilities.
http://webster.cs.ucr.edu/AsmTools/RollYourOwn/index.html

Check out YASM as well especially the libyasm library. maybe you could start with a MASM frontend for YASM.

Good luck mate!

Respect
Title: Re: Writing an assembler.
Post by: travism on July 02, 2009, 07:24:25 PM
Hey thanks for that link d0d0, ill check it out! Thank you again for everyones help and support I really wanna make this assembler right :)
Title: Re: Writing an assembler.
Post by: travism on July 03, 2009, 09:42:39 PM
Well I've got a fairly decent sized document written on syntax and design of the assembler. Also the note on multi threaded I was thinking in a way where after the source is broken up into tokens its checked for invalid syntax and errors and then starts a thread while its parsing the syntax tree to then pass it off to the thread to generate code so its working at the same time? Just an idea. Also I'm still trying to figure out the best way to store the parse tree and ast.. In c I can think of linked list so a structure breakdown for each line of source, but wondering some different ideas for assembly. Just some of my thoughts :) any input?
Title: Re: Writing an assembler.
Post by: Neo on July 04, 2009, 07:33:33 AM
In making the built-in assembling for Inventor IDE, I've put together a huge data file containing all of the instruction encodings expanded out.  It took months of data entry to make a file detailed enough to use for assembling all 16-bit, 32-bit, and 64-bit versions of instructions, but it lets you easily generate huge test cases to exhaustively check against other assemblers (and find bugs in them :wink).  If you'd like to use it, just let me know.  It's in the download of Inventor IDE (http://www.codecortex.com/ide/) as ASM.xml.

You can find the code I use for compiling here (http://code.google.com/p/pwnide/source/browse/#svn/trunk/Base/lang) in Encoder.java, asm/Encoder.java, and bits and pieces around there.  It kind of depends on the code already being fully parsed, though (in ASMLine.java and ASMParse.java).  However, the way I've set it up has the advantage that it's very easy to parallelize (most of it is already), and other things like to crawl through the code to include only things that are referenced from a specified entry point (for the automated performance test/analysis feature I'm working on for a demo on Tuesday (http://www.facebook.com/event.php?eid=97430891427) :toothy).

For the record, I don't use a parse tree, since assembly without the high-level macros isn't very tree-like; I keep track of lines, since that's how they're edited anyway, but you may want to do things differently since you're not going to do editing.  When I add support for C, I'll keep track of a parse tree (but just inside of functions, 'cause elsewhere in Inventor IDE, it's not necessary to parse much after loading).  The only thing tree-like I do is parsing of global variables that are structure types, but that's just a simple recursive descent parsing whenever it needs to be evaluated instead of keeping it parsed.

Cheers!  :U
Title: Re: Writing an assembler.
Post by: travism on July 08, 2009, 03:38:16 AM
hey thanks for the information! Im not very good with java at all lol, I tried but failed. Im having some trouble finding information on encoding the instructions really not sure the most efficient way. I really don't think it would be efficient entering all the hex opcodes and cmp and jmps lol. Anyone know of any docs on it? Ive read the encoding part in the intel manuals. :\
Title: Re: Writing an assembler.
Post by: bruce1948 on July 08, 2009, 11:08:04 PM
This might help

[attachment deleted by admin]
Title: Re: Writing an assembler.
Post by: Neo on July 09, 2009, 09:49:02 AM
Quote from: travism on July 08, 2009, 03:38:16 AM
hey thanks for the information! Im not very good with java at all lol, I tried but failed. Im having some trouble finding information on encoding the instructions really not sure the most efficient way. I really don't think it would be efficient entering all the hex opcodes and cmp and jmps lol. Anyone know of any docs on it? Ive read the encoding part in the intel manuals. :\
I didn't mean to imply that you should have to enter all that data too.  :wink  Here's the ASM.xml file that's included with Intentor IDE.  It should have most everything you need that doesn't involve directives or macros, in a format that's not too hard to handle.  Everything, that is, unless you want to implement a really fancy error/warning system (like tracking dependency chains to find uninitialized values or ignored values).  I'd post the main huge test case too, but even compressed, it's 1MB, and you probably won't need it for a while.  We can compare our results when the time comes.

Edit: whoops, forgot to attach it, hehe.

[attachment deleted by admin]
Title: Re: Writing an assembler.
Post by: travism on July 09, 2009, 07:46:57 PM
Neo thanks againf or all the help, bruce I have that downloaded and is a vital source thank you!, What im trying to work on is i read in the file, break it into tokens and save it to a syntax tree which is a structure of the grammar, but then when you move to the next instruction it will overwrite the current instruction in the syntax tree, so thats why i thought it went through each phase which each instruction first then moved on to the next... What is the proper way of using a syntax tree?
Title: Re: Writing an assembler.
Post by: travism on July 13, 2009, 09:27:25 PM
Does anyone have any more sources? I have read pdf after pdf about lexical analysis and parsing, but nothing actually explains it used in programming or even in the sense of writing a compiler... I might just have to put this project on hold since no one has really written to much about it.. :\
Title: Re: Writing an assembler.
Post by: dedndave on July 13, 2009, 09:38:33 PM
in the days of DOS, i have written probably hundreds of command-line parsers
i realize that is nothing like what you are up against, but i have a feel for parsing
i think the first "beautifier" pass is what you are talking about
it needs to convert tabs to spaces, eliminate extra spaces, strip out comments, tokenize instructions and directives, and terminate lines
there is no magical formula for all that
in real-mode DOS, using lodsb, stosb, and loop worked fairly well
with the extended set of pentium instructions, there are probably better ways to do it - i have not learned all these instructions, yet
if i were in your shoes, i would take a good look at source code for other assemblers to get ideas and concepts
Title: Re: Writing an assembler.
Post by: travism on July 13, 2009, 09:44:25 PM
Yeah thats what i have been trying to do, but so many of them are highly optimized and very advanced and am just trying to find the basics. Basically the only part im having trouble with understanding is like this if you have mov eax,ebx or what not, that ofcourse you would move into a structure for your syntax tree but as soon as you hit the next statement it would overwrite that structure witht he new information thats why I thought you actually go through the whole process, tokenize, parse and encode each line at a time..
Title: Re: Writing an assembler.
Post by: dedndave on July 13, 2009, 09:51:59 PM
try to dynamically allocate memory for the parsed output
(one of those functions lets you "grow" an allocation - don't recall which one)
managing memory throughout the assembly process is going to be tricky
this first pass is an example - as your output data grows, your requirement for input space diminishes
i dunno if i would try to validate any code on that pass or not (that is one thing i would look to others for example)
of course, if it isn't a valid instruction or directive, it must be a label - lol - i dunno
Title: Re: Writing an assembler.
Post by: travism on July 13, 2009, 09:56:23 PM
Wow, I feel retarded, I completely forgot I wrote how i was going to tokenize the file, it saves each structure of grammar to memory and then parses it checking it for error etc... :| Thanks again for your help lol
Title: Re: Writing an assembler.
Post by: dedndave on July 14, 2009, 05:18:42 AM
hiya Travis
i know you feel as though you have seen all the docs and pdf's you ever want - lol
here are a couple short ones that i found and thought you might appreciate
http://gec.di.uminho.pt/Discip/Lesi/AC10203/docs/P4ISAformat.pdf
http://webster.cs.ucr.edu/AoA/Windows/HTML/ISA.html
Title: Re: Writing an assembler.
Post by: travism on July 16, 2009, 03:06:43 AM
Hey thanks those are very helpful also, I will be sure to give those a read. :) Thank you again!