News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Designing A Language

Started by cman, May 11, 2005, 02:22:04 PM

Previous topic - Next topic

cman

How does one start to design a new language? I hadn't thought of this before until just now. I want to design a small input language to help me understand the concepts involved more completely and also so that I can make the input to my translator very simple and easy to translate ( close to assembly language , say ). I'd like to design something that has maybe high - level arrays and loops , conditionals , mathematical expressions and thats about it. The only types would be machine storage types ( BYTE , WORD , ect. ). Anything I can read to motivate this task ? Thanks...

AeroASM

If I were you, I would just copy the syntax from C. C is easy to parse because of the line termination and the curly braces. Howvere, some things are too weird, like the *p and &p notation; replace that with [p] and addr p or something else more intuitive.

Also it may be easier if instead of writing direct to a .obj file, write a {M|N|F}ASM compatible listing which the user must assemble himself.

(BTW I have absolutely no experience in this field, so please forgive me if I am totally off the mark)

cman

I wanted to go a little simpler than the C definition and get the experience of doing this myself . I sort of want to write a "real" compiler on a very small scale so its manageable for me to write. Thanks for your input thought  :U :bg.

James Ladd

A new language would be hard to do and probably wouldnt have many takers, if you actually wanted people to use it.
Maybe focus on an existing language and add some syntactic sugar that users of the language want.
Like Java did with C++. This way you have a new language and the potential for users of the older language to
migrate.
If you want an entirely new language then the sky is the limit. Just think of what would be fun and easy.
Extensibility should be in there so existing binary code can be used to help get libraries going etc etc.
ie: make a language construct that lets you load and call routines in a DLL.

rgs, striker

cman

Thanks for the input  :U! I just want to make a little toy language so that I can study / understand the processes of a compiler more completely. I just want a little experience in all the stages of compiler development without have to write something huge. I just want to develop  a tiny ( probably useless ) language that compiles directly to a subset of 80x86 assembly instructions. Just so I can understand concepts / experiment. Just need to get a start on how to do this. The language will be feed to Bison ( LR ) , but I'm not sure on how to start its design. Thanks for your input!  :bg :U

James Ladd

Then maybe write a simple calculator language where you can go:

c:\mycalc\mycalc.exe 1 + 2

and have it output 3

Its a simple enough language that you already understand and not too complex that you wont
understand how to do it or never get it completed.

cman

I was thinking along the lines of a striped down programming language that would translate to assembly language. One in which a subset of programming language features could be implemented ( repeation and selection structures , variables could be declared and functions can be called. Something very mild and sparse , like:



DWORD functionName ( WORD param1 )
{

     DWORD temp = 1;

    if ( param1 = temp )
      return 1;

}


only basic types maybe a "while" statement for loops or something, just so I can code the basic operations into the compiler. Just want to get the feel for building source code trees , generating intermediate results and generate some modest assembler from the source. Just as a learning experience. Thanks for the ideas!  :U

cman

Maybe I'll just take the C grammer and remove a bunch of features. This seems like an idea , since I have no others. I haven't seen any material that covers this sort of thing in any of the books I'm reading ( these usually introduce the common theory and ideas involved in languages and parseing; the Chomsky Hierarchy ( CFG and non - CFG grammers - type 0 , type 1 , ect. ) , top down and bottom up parsing , SLR , LR , LALR , LL , ect. parsers , error recovery , conflict resolution ... But not ideas on language design. I guess thats a different set of ideas all together. If I had to guess how to start to develop a LR grammer I would say ; "start by denoting a top level construct" :


   program : sourceFile
   ;



and maybe work down from there:


  sourceFile : ExternalDeclaration
  ;



and keep going:



  externalDeclaration : Declaration
                             | functionDeclaration
                             ;



I'll start this way unless I learn of a better way ( I'm sure that theres an algorithm for this sort of thing - I'll look on the internet).

Randall Hyde

Quote from: cman on May 11, 2005, 02:22:04 PM
How does one start to design a new language? I hadn't thought of this before until just now.
There are *lots* of books on programming language design (that discuss the design of the language rather than the implementation of a design). Of course, most of them are geared towards creating new imperative/procedural or functional languages, but there are lots of good ideas there. I'd recommend searching for "Programming Language Design" on Amazon and reading book reviews.

Quote
I want to design a small input language to help me understand the concepts involved more completely and also so that I can make the input to my translator very simple and easy to translate ( close to assembly language , say ). I'd like to design something that has maybe high - level arrays and loops , conditionals , mathematical expressions and thats about it. The only types would be machine storage types ( BYTE , WORD , ect. ). Anything I can read to motivate this task ? Thanks...
Search for the key phrase "Domain Specific Languages" (or DSLs). Note that you can create some relatively complex DSLs using the compile-time language found in HLA v1.x.  See the chapter on DSLs in the Art of Assembly Language.
Cheers,
Randy Hyde

rea

I supose that you will enter in the domain of mathematics and concepts like decidability (or how is writed?) and other things like that :), for example, I know a guy that is (was? I lost coneection :S) studing phylosopy and he was doing work in the regard of languages ;), then you can get the idea that also you will need a little of that.

Design is very atractive I think.

cman

Thanks for all the input / ideas. I guess I'll try to design something that will be easy to translate ( ie. make it easy on myself ) . I like the ideas in Pascal ; restricted arrays , ranges for variables , loops that "step through values ( for 1 to 100 do ) , these ideas make for easier translation / optimization. I'm thinking of things like this now.  :bg

cman

QuoteI supose that you will enter in the domain of mathematics and concepts like decidability (or how is writed?) and other things like that :),

Yeah , I noticed a lot of the ideas are similar to those in mathematic logic , growing strings of symbols deriving strings of symbols from others according to a rule system ( decidability is one of the characteristics of a formal logic system , along with completeness and consistancy ). I'm just looking for something that will translate easy and make instruction selection / memory ussage easier to implement ( like restricted values for variables , ect. ) . THanks for the ideas!  :toothy

tenkey

Procedural languages containing only primitive types and one dimensional arrays are relatively easy to translate.

How elaborate a syntax do you want? With Yacc/Bison, you can make the syntax as fancy as you want. But if you want simple syntax, there's Polish notation - Cambridge Polish as used in Lisp, RPN as in Forth.

I once created a "bootstrap" language that was basically a high level assembly language. It looked something like this:


dim id
dim id [ integer ]

; var
+ var
- var
* var
/ var
= var
< var
> var
-> var

? var >> jump-label
>> jump-label
% call-label
: jump-or-call-label

{ subroutine-code }


"var" had only two forms:


id
id [ id ]


As the language was "free format", you could write the following increment code:


; count + one -> count
A programming language is low level when its programs require attention to the irrelevant.
Alan Perlis, Epigram #8

James Ladd

regardless of the "goal" language I would still recommend that you start with something super simple (like a calculator) to
hone your ideas and code. Then make something more involved.
Honestly, this IS the best approach.

cman

OK , thanks for the information! I'll try to write something down in the next few day ( I have the next couple days off of work ). I'll see if I can come up with anything. I haven't gotten my copy of Lex & Yacc in the mail yet so I'll hold off the implementation of the parser/lexer until I know what I'm doing. Thanks for the tips....