How does one start to design a new language? I hadn't thought of this before until just now. I want to design a small input language to help me understand the concepts involved more completely and also so that I can make the input to my translator very simple and easy to translate ( close to assembly language , say ). I'd like to design something that has maybe high - level arrays and loops , conditionals , mathematical expressions and thats about it. The only types would be machine storage types ( BYTE , WORD , ect. ). Anything I can read to motivate this task ? Thanks...
If I were you, I would just copy the syntax from C. C is easy to parse because of the line termination and the curly braces. Howvere, some things are too weird, like the *p and &p notation; replace that with [p] and addr p or something else more intuitive.
Also it may be easier if instead of writing direct to a .obj file, write a {M|N|F}ASM compatible listing which the user must assemble himself.
(BTW I have absolutely no experience in this field, so please forgive me if I am totally off the mark)
I wanted to go a little simpler than the C definition and get the experience of doing this myself . I sort of want to write a "real" compiler on a very small scale so its manageable for me to write. Thanks for your input thought :U :bg.
A new language would be hard to do and probably wouldnt have many takers, if you actually wanted people to use it.
Maybe focus on an existing language and add some syntactic sugar that users of the language want.
Like Java did with C++. This way you have a new language and the potential for users of the older language to
migrate.
If you want an entirely new language then the sky is the limit. Just think of what would be fun and easy.
Extensibility should be in there so existing binary code can be used to help get libraries going etc etc.
ie: make a language construct that lets you load and call routines in a DLL.
rgs, striker
Thanks for the input :U! I just want to make a little toy language so that I can study / understand the processes of a compiler more completely. I just want a little experience in all the stages of compiler development without have to write something huge. I just want to develop a tiny ( probably useless ) language that compiles directly to a subset of 80x86 assembly instructions. Just so I can understand concepts / experiment. Just need to get a start on how to do this. The language will be feed to Bison ( LR ) , but I'm not sure on how to start its design. Thanks for your input! :bg :U
Then maybe write a simple calculator language where you can go:
c:\mycalc\mycalc.exe 1 + 2
and have it output 3
Its a simple enough language that you already understand and not too complex that you wont
understand how to do it or never get it completed.
I was thinking along the lines of a striped down programming language that would translate to assembly language. One in which a subset of programming language features could be implemented ( repeation and selection structures , variables could be declared and functions can be called. Something very mild and sparse , like:
DWORD functionName ( WORD param1 )
{
DWORD temp = 1;
if ( param1 = temp )
return 1;
}
only basic types maybe a "while" statement for loops or something, just so I can code the basic operations into the compiler. Just want to get the feel for building source code trees , generating intermediate results and generate some modest assembler from the source. Just as a learning experience. Thanks for the ideas! :U
Maybe I'll just take the C grammer and remove a bunch of features. This seems like an idea , since I have no others. I haven't seen any material that covers this sort of thing in any of the books I'm reading ( these usually introduce the common theory and ideas involved in languages and parseing; the Chomsky Hierarchy ( CFG and non - CFG grammers - type 0 , type 1 , ect. ) , top down and bottom up parsing , SLR , LR , LALR , LL , ect. parsers , error recovery , conflict resolution ... But not ideas on language design. I guess thats a different set of ideas all together. If I had to guess how to start to develop a LR grammer I would say ; "start by denoting a top level construct" :
program : sourceFile
;
and maybe work down from there:
sourceFile : ExternalDeclaration
;
and keep going:
externalDeclaration : Declaration
| functionDeclaration
;
I'll start this way unless I learn of a better way ( I'm sure that theres an algorithm for this sort of thing - I'll look on the internet).
Quote from: cman on May 11, 2005, 02:22:04 PM
How does one start to design a new language? I hadn't thought of this before until just now.
There are *lots* of books on programming language design (that discuss the design of the language rather than the implementation of a design). Of course, most of them are geared towards creating new imperative/procedural or functional languages, but there are lots of good ideas there. I'd recommend searching for "Programming Language Design" on Amazon and reading book reviews.
Quote
I want to design a small input language to help me understand the concepts involved more completely and also so that I can make the input to my translator very simple and easy to translate ( close to assembly language , say ). I'd like to design something that has maybe high - level arrays and loops , conditionals , mathematical expressions and thats about it. The only types would be machine storage types ( BYTE , WORD , ect. ). Anything I can read to motivate this task ? Thanks...
Search for the key phrase "Domain Specific Languages" (or DSLs). Note that you can create some relatively complex DSLs using the compile-time language found in HLA v1.x. See the chapter on DSLs in the Art of Assembly Language.
Cheers,
Randy Hyde
I supose that you will enter in the domain of mathematics and concepts like decidability (or how is writed?) and other things like that :), for example, I know a guy that is (was? I lost coneection :S) studing phylosopy and he was doing work in the regard of languages ;), then you can get the idea that also you will need a little of that.
Design is very atractive I think.
Thanks for all the input / ideas. I guess I'll try to design something that will be easy to translate ( ie. make it easy on myself ) . I like the ideas in Pascal ; restricted arrays , ranges for variables , loops that "step through values ( for 1 to 100 do ) , these ideas make for easier translation / optimization. I'm thinking of things like this now. :bg
QuoteI supose that you will enter in the domain of mathematics and concepts like decidability (or how is writed?) and other things like that :),
Yeah , I noticed a lot of the ideas are similar to those in mathematic logic , growing strings of symbols deriving strings of symbols from others according to a rule system ( decidability is one of the characteristics of a formal logic system , along with completeness and consistancy ). I'm just looking for something that will translate easy and make instruction selection / memory ussage easier to implement ( like restricted values for variables , ect. ) . THanks for the ideas! :toothy
Procedural languages containing only primitive types and one dimensional arrays are relatively easy to translate.
How elaborate a syntax do you want? With Yacc/Bison, you can make the syntax as fancy as you want. But if you want simple syntax, there's Polish notation - Cambridge Polish as used in Lisp, RPN as in Forth.
I once created a "bootstrap" language that was basically a high level assembly language. It looked something like this:
dim id
dim id [ integer ]
; var
+ var
- var
* var
/ var
= var
< var
> var
-> var
? var >> jump-label
>> jump-label
% call-label
: jump-or-call-label
{ subroutine-code }
"var" had only two forms:
id
id [ id ]
As the language was "free format", you could write the following increment code:
; count + one -> count
regardless of the "goal" language I would still recommend that you start with something super simple (like a calculator) to
hone your ideas and code. Then make something more involved.
Honestly, this IS the best approach.
OK , thanks for the information! I'll try to write something down in the next few day ( I have the next couple days off of work ). I'll see if I can come up with anything. I haven't gotten my copy of Lex & Yacc in the mail yet so I'll hold off the implementation of the parser/lexer until I know what I'm doing. Thanks for the tips....
You could do something like BF.
http://www.iwriteiam.nl/Ha_BF.html
Sample BF app that displays "Hello World!" and terminates (just in case you can't tell by looking at the source):
>+++++++++[<++++++++>-]<.>+++++++[<++++>-]<+.+++++++..+++.[-]>++++++++[<++++>-]
<.#>+++++++++++[<+++++>-]<.>++++++++[<+++>-]<.+++.------.--------.[-]>++++++++[
<++++>-]<+.[-]++++++++++.
Quote from: MichaelW on May 16, 2005, 08:08:48 AM
You could do something like BF.
http://www.iwriteiam.nl/Ha_BF.html
Sample BF app that displays "Hello World!" and terminates (just in case you can't tell by looking at the source):
>+++++++++[<++++++++>-]<.>+++++++[<++++>-]<+.+++++++..+++.[-]>++++++++[<++++>-]
<.#>+++++++++++[<+++++>-]<.>++++++++[<+++>-]<.+++.------.--------.[-]>++++++++[
<++++>-]<+.[-]++++++++++.
Wow , that is
really cool! :U I've never seen anything like this! Thanks for the post! This is pretty sparce indeed. Converts one set of unreadable symbols to another. Pretty neat. Can you actually program in this language ( have you written anything big) ? Just curious. :bg
I have written a few small test pieces. I once ported a BF interpreter from C to QuickBASIC and I needed to have a reasonable understanding of the language to make it work. But I never really got into it -- I already have enough problems with my brain :green
Reminds me of the idea of a Turing Machine; one read head and an infinite tape with 1's and 0's. You can reproduce and function of a more conventional computer with a Turing Machine , but I doubt it would be pretty ( its difficult enough seeing a TM add two numbers ). I guess there are many different ways to look at directiong the actions of a piece of translation software. As I look into language design I notice that the easier the language is to translate the harder it becomes to use ( I guess this should be obvious , thought ). I look at ideas in Pascal:
type.name = [b]set of [/b] base_type [ value ,... , value ]
restricted value ranges ( that are a pain to specify but make instruction selection easier for the compiler ).
I also noticed step type loops that make it easy to identify Induction Variables:
for j := 1 to 14 do
begin
end
I noticed how binding times speed the execution time of array processing of languages like Fortran and C ( but reduce the flexablity of data structures at run time ). And I noticed the scripting languages ( Perl , ect. ) that can produce complicated algorithms in just a few lines. If one could just put all the neat features of every language into a blender....... :bg
Quote from: cman on May 17, 2005, 04:40:04 PM
I noticed how binding times speed the execution time of array processing of languages like Fortran and C ( but reduce the flexablity of data structures at run time ). And I noticed the scripting languages ( Perl , ect. ) that can produce complicated algorithms in just a few lines. If one could just put all the neat features of every language into a blender....... :bg
This is because languages like Perl, etc., are "very high-level" or "domain-specific" languages. They work fantastic for the problems they were created to solve. But once you try to solve a problem outside their domain, they aren't quite so advantageous.
"Kitchen sink languages" (as in "everything, including the kitchen sink") have been tried (PL/I, anyone?) and have failed miserably. They are too complex to write, too complex to generate efficient code for, and too complex for mere mortals to learn (how many people have completely mastered C++, for example?)
Cheers,
Randy Hyde
Quote from: Randall Hyde on May 18, 2005, 12:18:49 AM(how many people have completely mastered C++, for example?)
Indeed! After 12 years I'm still a complete newbie! (well, almost)
But... ONE DAY I'll be a C++ guru... and then I'll take my revenge... muahahahaha :bdg
:toothy
I guess focus is the key word when designing a language. I suppose I would be best off tring to design a language that solves a very specific problem of some sort. But what would that be.....
This has more to do with implementing a language than designing one, but how about creating an expression evaluator for use with MASM. The evaluator could effectively be a "string calculator" that would accept an expression in the form of a string, and return the result. For MASM applications that needed to perform complex non-time critical calculations, such an evaluator could save a lot of time and code. I think BASIC-style expressions would be the best choice because the notation is easy to use and widely understood. Here are some examples pulled at random from working apps:
Y = .299 * R + .587 * G + .114 * B
U = .492 * (B - Y)
V = .877 * (R - Y)
C7 = ((1 - v ^ 2) * (a / b - b / a)) / 2
C9 = (b / a) * (((1 + v) / 2) * LOG(a / b) + ((1 - v) / 4) * (1 - ((b / a) ^ 2)))
Mt = ((-w * a ^ 3) / b ^ 2) * (1 - v ^ 2) * C9 / C7
id = SQR(od ^ 2 * (s - p) / (s + p))
I think the passed expression probably should not include the assignment. Some simple method of passing in variables and named constants would be required. Using a fixed type for all numbers, say REAL8, would make the task easier.
@MichaelW
it would be even better to have Math Expression compiler, a little utility that accepts a math expression and then translates to
optimized assembler, because trying to do that oneself is a pain :P
Interesting! It would be nice if there were a utility that could produce MASM code for high level expressions so as to save programmers the tedious work of codeing long calculations in assembly language. I've read of some algorithms for register and memory management that can produce fairly decent code ( not as good as a human programmers , but OK ). A project like that would be interesting , and maybe even manageable. Hmmm..... :bg
I am thinking along the lines of something like assembler with operators , like this:
DWORD array [ MAX_ARRAY_LENGTH ];
eax = ecx + array [ 1 ]; // high level array is sumed with the contents of ecx and store in eax
eax ++; // increment eax C style
edx = ( eax | ecx ); //take logical or of two registers and move result to edx
I don't know how practical something like this would be. It certainly would make assembly more fun in certain tedious situations. I think quality object code ( MASM ) could be obtained if used by an experienced assembly language programmers. However , statements like this:
eax = ebx + ecx + edx + esi + edi .....
would cause some internal register and memory management by the compiler ( changing register names and using temporary memory locations ) , using data flow anaylsis ( the knight's armor on the cover of the Dragon Book :bg ) , ect... Is this a stupid / bad idea? I've been studing the ANSI C grammar ( a surprizingly elegant and concise grammar , by the way ) to get some ideas on how to develop the language. I like the idea of substituting operators for assembly operations or groups of assembly operations and chaining operations to speed development. The set of C operators seems somewhat sufficient to cover a good subset of assembly operations ( + , - , * , ++ , -- , & , | , || , && , < , > , << , >> ,% ,^ , ! , ext.. ). Maybe add a few more. Hmmmm....
Go ahead...design yet another high level assembler.
You've got good company. Give Randy a run for his money (on HLA).
Hey, even Randy acknowledges the idea predates him...he even mentions PL/360 (created by Niklaus Wirth of Pascal fame).
I've seen proposals for 8-bit machines like PL/B, and working ones like ML-80. There was even a commercial product (I think it was called BL-80 or similar).
I'm not sure about PL/360, but the others did not allow expressions that caused spillage (storing temporary results in memory). That would spoil the idea of an assembly language, where you have complete control, or at least an easily predictable code sequence.
Thanks for the input tenkey! While I'm not even close to being in Randy's league :'( , I thought this might be a neat "toy compiler" project ( I get to go through all the motions of building a compiler , but won't have to write anything real difficult - like a real compiler ). Just look for ideas for a hobby - style project and since this style of language would not be too far above the object code , I think this is as easy as it gets . The whole idea of writing something that would half-way work is pretty exciting for me. Thanks for the input! :U
I'd just like to point out that masm's MACRO support is powerful !!
You can write yourself a neato script-based language as a macro set, which allows the user to build the applications with masm (no need for a preprocessor).
If you don't believe me, take a look inside the macros folder of the ObjAsm32 oop model by Biterider.. you're in for a shock !!
The simple truth is that masm's macro support is:
-underrated
-misunderstood
-not well known
by most masm coders, including long-term users :cheekygreen:
I agree,
Apart from piggish documentation, obscure syntax and the odd irritations here and there, the preprocessor in MASM is very powerful and in conjunction with normal library modules, you can do many things with it.
I think it might also be good to consider creating an interpreter for the language you design before moving on to build an actual compiler for it. It's a much simpler task that would allow you to get your choice of language implimented in much less time. It also gives you the advantage of allowing you to enter the statements directly and it makes it so much easier to debug when you see the results immediately as you do with an interpreter. Once you have the framework for your language in place and functional it will be fairly easy to build a compiler for it.
Thanks for all the input guys! I've been working on the grammar for the language for the last few weeks , looking at the C grammar and figuring out how it works and what I like about it and don't like about it. I like the way expressions are formed ( matematical expressions , anyway - an assignment expression with the regular precedence rules ) but I don't like the way logical expressions are mixed with mathematical rules and logical operators are given arbitrary precedence rules. For instance , this formula compiles to true:
int i = 1 || 0 && 1; // i is 1 , since logical or is of higher precedence than and
I had the idea of separating out the operators for math and shift and cast from those of logic and c comparison , so that logic expressions would be "well formed formulas" ; they would use braces to for precedence levels and the creators intended , like :
p & ( q | s ) is not equal to p & ( q | s ) since
p q s | p & ( q | s ) | ( p & q ) | s
------------------------------------------------------------
F F F | F F | F F
F F T F T F T
F T F F T F F
F T T F T F T
T F F F F F F
T F T T T F T
T T F T T T T
T T T T T T T
I'll see how this all works out , let you know. :bg
Quote from: cman on June 14, 2005, 04:14:07 PM
int i = 1 || 0 && 1; // i is 1 , since logical or is of higher precedence than and
With that expression it does not matter what the order of precedence is:
(1 or 0) and (1) = (1) and (1) = 1
(1) or (0 and 1) = (1) or (0) = 1
Opps! :red Well you get the idea! :bg
Quote from: Homer on June 11, 2005, 09:16:10 PM
I'd just like to point out that masm's MACRO support is powerful !!
You can write yourself a neato script-based language as a macro set, which allows the user to build the applications with masm (no need for a preprocessor).
If you don't believe me, take a look inside the macros folder of the ObjAsm32 oop model by Biterider.. you're in for a shock !!
The simple truth is that masm's macro support is:
-underrated
-misunderstood
-not well known
by most masm coders, including long-term users :cheekygreen:
The only real problems with MASM's macro support is that it's missing a lot of useful string manipulation operations. It's amazing what you can do with the operators that are available, but the resulting code is so *hard* to read and understand that it makes for "less than robust" results. Keep in mind, long before I started the HLA project I tried to do what HLA does with MASM's macro facilities. Ultimately, I gave up because the thing blew up big time (with crazy error messages) whenever you made a mistake using the macros. It worked okay as long as the input was perfect, but the diagnostics were terrible when things were not syntactically correct. Hardly a good thing for a teaching tool. The other problem I had with MASM macro facilities was that it was difficult to do things. Yes, the string manipulation was powerful to do a lot of things, but you had to jump through a lot of hoops to do those things. Those two problems, plus the fact that there are simply some things that you *cannot* do with MASM macros, are what led me to create HLA in the first place.
You are quite right about most MASM users not having a clue about the power of MASM's macros, though. It's really sad how many MASM users don't want to take the time to learn how to use the features of the assembly language they've chosen. Indeed, most of the people I've seen who've defected from MASM to other assemblers like FASM often fall into this category. Advanced MASM users (who really understand how to use MASM's macro facilities) have a hard time giving up that power.
Cheers,
Randy Hyde
Thank you so much for all your input! It really means quite a bit to me that you all are willing to comment on my project! I'll let you know how this goes ( it might be a few years! ) . Thanks! :bg
I've been working on the language design the last few weeks and will try to debug the grammar pretty soon. I decided my initial idea was stupid ( make a compiler - assembler ). What I'm thinking now is to make the back end of the compiler work like this:
eax = ebx + ecx + edx;
would be translated something like this:
add ecx , ebx
add edx , ecx
mov eax , edx
so as to preserve the actual meaning of the assembly source code. Each operator that corrupts the contents of an operand would have a "rule" as to how register contents will be handled ( in the case of the "+" operator , sums would travel left to right so that the right hand operand would hold the contents of the sum and the left hand operand would be preserved ( this information could be used to produce algorithms by the coder ). This way I could produce a sort of "assembler short hand" . I think this idea is much better than the prior idea. In addition to this format I would add certain high level features to the language to help speed development of assembly programs. Just a thought... :toothy
I would avoid any reordering, and anything requiring hidden pushes and pops to preserve register contents.
One type of reordering I would tolerate is "instruction scheduling". That is simply interleaving chains of instructions to take advantage of internal parallelism (pipes or micro-ops). For some processors this will be hard because it requires intimate knowledge of chip internals.
The goal of a high level assembler is readability. This includes the ease of mapping high level constructs to low level instructions.
Take your example...
eax = ebx + ecx + edx;
How would an assembly language programmer code this, when manually translating it?
First, the destination register EAX does not appear on the right hand side.
Second, the sequence can be strictly left-to-right in this case.
It won't be an assembler, if it doesn't produce what an assembly language programmer would write, namely...
mov eax,ebx
add eax,ecx
add eax,edx
Note that all three registers on the right hand side are untouched.
Opps! Guess your right. :red It will be awhile until I get to the back of the compiler , so I guess I have time to think these things over! I also have the idea of "glueing" instuctions with 0 or 1 operand to data or primary types , like this:
eax = ebx + ( _AAA ecx );
or something like:
eax = ebx * ( PUSH ecx ) ++ ;
so that an instruction can be executed in a certain order according to its relation to a primary type ( in the segment above ecx would be pushed before the operation is executed ( and then ecx is incremented ). I have many stupid/crazy ideas for this grammer , and I can't think of a way to make enough symbols to represent all of the instructions I want to enable , so I though I would give symbols to binary operators , for the most part. What do you think of this? :bg
Or maybe the traditional function call notation would be more appropriate:
ecx = ebx + ( push ( ecx ) ) ++ ; // push ecx's value prior to evaluating for expression
and
ecx = ebx + ( ( ecx ) push ) ++; //evaluate ecx for expression and then push its value
this seems like a clearer way to express the meaning of the expression. I haven't decided on how 0 operand instuctions will be added yet. Hmmmm...... :(
I will try to confuse more :P.... :bdg
eax = ebx * ( PUSH ecx ) ++ ;
Like I also read that code is..... with the suposition that ++ is first "executed" than the "*", I will see like..
push ecx
add [esp], 1
Then the multiplication with the value at [esp]....
Tough, perhaps what you whant is some like
eax = ebx * ( PUSH ecx++ );
This if Im not wrong will be some like..
push ecx
add ecx, 1
mul with ecx....
and you will say... what a movement of '++' operator do that???, yes, because ++ take like argument what is "near", in the example that you gice first, what is near, is not the register, but the result of the anterior operation, and that refer to esp (in some way), that is for what I have make the decision than in your first example the "add [esp], 1" thing instead of "add ecx, 1".
I can suguest a way, I guess in some cases introduce () will be handy, for example when you whant to save a intermediate result... (perhaps) in a large (nor very handy to break) operation.... if you use only push whatToPush, you will be restricted to save only the following result rather than a posible result for example of something like.... ebx+ecx*3
How you will doit without parenthesis???
eax = 5 + push ebx+ecx*3
Not introduce () for a list of argument, perhaps will be best introduce it for 'grouping' group of instructions (rather than a list of arguments)....
then
eax = 5 + push (ebx+ecx*3)
My other suguestions...
Use/handle mnemonis for opcodes like complete expresions....
For example
mov eax, ebx
is a complete expresion....
The thing about opcodes is that they are operations over operands...., but because they are hardware defined, the operands are "single" and not "composite" for example....
add eax, 5
mov eax, ebx
add eax, ebx
They take not composite operands....
But you can extend to composite operands...
mov eax, (ebx+5)
or
mov eax, ebx+5
You see...
You only need see when a instructions is taking a single or composite operand... the diference is that a single operand can be recognogized by the mnemonic to be assembled, a composite operand need be "breaked"...
Now you can extend.... to
eax = 2*ebx+(push ++(mov ebx, [addr]))
That will be
shl ebx, 1 ;this can be deleted in a further optimization... ¿?¿?
mov ebx, [addr]
inc ebx ; or add ebx, 1
push ebx
mov eax, ebx
tought all that I guess can happen only with eax... :roll: but is for the shake of confuse you more... ops.. give examples/suguestions :P.
Also eax = 2*ebx+(push ++(mov ebx, [addr])) you think that is equivalent to eax = 2*ebx+(push (mov ebx, [addr])++)???... changind the position in this case of "++"
Because push only recognogize single operands, not composite... then if it take the first single operand when evaluated... it will try get the first single operand available for the instruction... that is, something like the following will be generated....
shl ebx, 1
mov ebx, [addr]
push ebx
add [esp], 1
mov eax, [esp]
Hope this help you a little more :)....
Thanks for the suggestions! :U What I sort of want to do with this is capture the thought process of writting assembler in the syntax of the language. Like for instance , one might want to save the contents of a register , add the contents of the register to another and then take the bit-wise or of the addition and store this in another register:
ecx = ( eax + ( push ( ebx ) ) ) | edx;
I just wanted to catch the that thought process and use this to design the grammer. I think this would be more self documenting and faster to generate than the traditional opcode , operand system ( and maybe make assembler accessable to a wider range of programmers ). I'll continue my work on this.... :bg
But if you follow the inner to outside (that I guess is what make sense) You will need some like the following (I guess)...
ecx = ( eax + ( push ( ebx ) ) ) | edx;
If Im not wrong will be something like...
=
/ \
/ \
ecx |
/ \
/ \
+ edx
/ \
/ \
eax push
\
\
ebx
Then you have registers, mnemonics or "pure"/CPU operations (that impact in something, and the result is somehwhere), operators. At less that is what can be watched easely.
following the anterior, first you will see that you have a register... that is good, then you will see that there is a mnemonic (push) you see that you need a single operator, because how is defined push like I see the operator isn't modified and the result is in [esp], then you go up, and see that you have a +, that need two operators..., you see that the left operator is a single operator (not composite.. no further evaluation), but the right operator, the result is not ebx the result (for me) is in [esp], furthermore "+" take two operands and dosent save the result in the operands but in a temp variable (with diference of add mnemonic) the temp result is a operator for "|" and ebx is the other single operator, but "|" dosent save the result in any of the operands, but in a temp var, then that second temp var will be assigned to ecx. And that is.
That is a first aproach, but you will see, that there are paths for optimization, for example for delete the "temp vars" or temporal operators, for generate more eficient code, another optimization, you know, like is push, the result is in [esp], but instead of access the memory at esp, you can use the operator (ebx), see the diference, when you do some like:
(push eax) ++ in that case (like I see), eax will not be touched and the assumption for optimization (take the operator instead of the result in memory) will not work, you need add 1 to the result in [esp].
;).
fuerthermore, the optimization, will be to generate the code that a human will generate, for example, we know that we not will use temp vars, but perhaps the register that will be trashed :)...
My assembler is a little rusty ( I haven't been programming for awhile - been studing compiler theory ) but I think that this statement:
ecx = ( eax + ( push ( ebx ) ) ) | edx;
would translate to ( without optimization ) :
push ebx
mov ecx , eax
add ecx , ebx
or ecx , edx
Not sure yet. Just trying to implement my vision of what the grammar should look like and be able to say. I'd like the grammar to be able to express a series of assembler operators in a way that is natural to human understanding as well as being terse and concise. I don't know if this can be achived or not ......
Then, the question that I have, is how you will handle operators?... like a shorcut to mnemonics? or like "they are"
for example
b+c
follow a temp var a
a = b + c
is diferent than
b+c
follow a rewrite of the first operator
b = b + c
The first example, is like I say, the operator "+" uses a temp var that can be "deleted" with optimization (considering that the result of a push is at [esp], but is the same than the operator... tough you will normally push for save and resuse the register...), the second one, is more a shorcut to asm.
Like I say handling the operators in the first way, without optimization... If Im not wrong, will be expanded to the following, and the thing will be that this can be "optimized" to what you say it should be, but after generate the "direct" tree.
push ebx
mov centinel1, [esp]
add centinel1, eax
mov centinel2, edx
or centinel2, centinel1 ;being strict this operation isnt valid, but this are temp that should be deleted.
mov ecx, centinel2
A first aproach that aparently can handle this following the tree
ecx = ( eax + ( push ( ebx ) ) ) | edx;
=
/ \
/ \
ecx |
/ \
/ \
+ edx
/ \
/ \
eax push
\
\
ebx
=
/ \
/ \
ecx(r32) |(s2)
/ \
/ \
+(s1) edx(r32)
/ \
/ \
eax(r32) push (mem32 [esp])
\
\
ebx (r32)
There is you can propagate from top to down the overwrited ecx to the sentinels in this case for easy handling of the "graph" they are called s(entinel)N.
=
/ \
/ \
ecx(r32) |(s2) = ecx
/ \
/ \
ecx = +(s1) edx(r32)
/ \
/ \-----> mov ecx, [esp]/ebx
eax(r32) push (mem32 [esp])
\
\
ebx (r32)
In this case, I guess can be generated well.
push ebx
mov ecx, [esp]/ebx (taking the shortcut about push result equal operator)
add ecx, eax
or ecx, edx
The "sentinels" that cant be deleted, then they will be extra temp vars....
Thank you for your input rea! :U Sorry I haven't responded for awhile, I've been busy. In this language I wanted to "bridge the gap" between how assembler is input and how assembler is considered by an assembly language programmer. When I program assembler I think things like; "load this register with this value and the other with this value , sum the two and move this value to a memory location for storage and future use". I think this is how every programmer considers an algorithm in the primary stages of development. Thats essentially what I whated to do with this language. I think for the most part that the expressions in the language would be short cuts to mnemonics , with additional code generation that is not implied for statements and symbols where such things that can be generated in an "algorithmic way" ( where no intuition whatsoever is required to generate the proper instructions for the given situation ). Thats my entire premise for the "language". Thank you for your time , tree diagrams , ideas , ect. :U
When I program in assembly language, my thoughts start with a task, such as "I need to add two numbers", followed by "this processor forces me to use a register" or "this processor forces me to do everything through the one register called the accumulator". So I generate a load instruction (in x86, it's one form of "MOV"). Then I think "I can add the second value directly from memory" or "I must load the second value into a register, then add", and generate the appropriate instructions.
I do this for all processors, and all languages. First, the task, followed by possible ways of implementing it, followed by choice (if there is more than one viable way.)
Quote from: rea on July 03, 2005, 04:06:47 PM
Then, the question that I have, is how you will handle operators?... like a shorcut to mnemonics? or like "they are"
for example
b+c
follow a temp var a
a = b + c
is diferent than
b+c
follow a rewrite of the first operator
b = b + c
The first example, is like I say, the operator "+" uses a temp var that can be "deleted" with optimization (considering that the result of a push is at [esp], but is the same than the operator... tough you will normally push for save and resuse the register...), the second one, is more a shorcut to asm.
I was thinking in the case of statements like :
eax = eax + ebx;
that the compiler would set up temporary memory storage for the value of eax . I think that there are certain instances where it would be feasible for the compiler to do this , and this would be part of the "high level" features avaliable to the programmer. I don't think that the compiler should alter the contents of registers in any way , as this could corrupt an assembly language programmer's algorithms.
Quote
Code:
=
/ \
/ \
ecx(r32) |(s2) = ecx
/ \
/ \
ecx = +(s1) edx(r32)
/ \
/ \-----> mov ecx, [esp]/ebx
eax(r32) push (mem32 [esp])
\
\
ebx (r32)
In this case, I guess can be generated well.
Code:
push ebx
mov ecx, [esp]/ebx (taking the shortcut about push result equal operator)
add ecx, eax
or ecx, edx
I think that the statement:
ecx = ( eax + ( push ( ebx ) ) ) | edx;
would generate the simpler :
push ebx
mov ecx , eax
add ecx , ebx
or ecx , edx
as when the instruction push is executed the next operator will operate on the data or register contained within the braces, push ( ebx ) , if there is no alteration of the data or register as a side effect of the operation. I'm still working finishing the grammar. I'm amazed at the level of work involved in this project! :eek As soon as I have something completed and debuged and make sure that the language can express what I mean it to express , I can get on to the business of generating intermediate results and figure out code generation issues. Thanks again for your input! This is always helpfull to me! :U
Quote from: tenkey on July 06, 2005, 06:37:05 AM
When I program in assembly language, my thoughts start with a task, such as "I need to add two numbers", followed by "this processor forces me to use a register" or "this processor forces me to do everything through the one register called the accumulator". So I generate a load instruction (in x86, it's one form of "MOV"). Then I think "I can add the second value directly from memory" or "I must load the second value into a register, then add", and generate the appropriate instructions.
I do this for all processors, and all languages. First, the task, followed by possible ways of implementing it, followed by choice (if there is more than one viable way.)
Yeah , I want to "formalize" this process in terms of a mathematic / symbolic language that could express the process in more "human" terms than the tiny machine steps that are involved in an assembly language program , but still retain the efficiency and power of assembly language programming. I'd like the translator to be able to do the little annoying things for the programmer , like set up memory for strings , temporary variables and handle mathematical expressions. I always liked the way that C used operators perform many algorithmic tasks , so I thought it might be neat if this sort of thing could be used to create an assembly language - like translator. I'll have to keep thinking about this! :bg