News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Designing A Language

Started by cman, May 11, 2005, 02:22:04 PM

Previous topic - Next topic

Phil

I think it might also be good to consider creating an interpreter for the language you design before moving on to build an actual compiler for it. It's a much simpler task that would allow you to get your choice of language implimented in much less time. It also gives you the advantage of allowing you to enter the statements directly and it makes it so much easier to debug when you see the results immediately as you do with an interpreter. Once you have the framework for your language in place and functional it will be fairly easy to build a compiler for it.

cman

Thanks for all the input guys! I've been working on the grammar for the language for the last few weeks , looking at the C grammar and figuring out how it works and what I like about it and don't like about it. I like the way expressions are formed ( matematical expressions , anyway - an assignment expression with the regular precedence rules ) but I don't like the way logical expressions are mixed  with mathematical rules and logical operators are given arbitrary precedence rules. For instance , this formula compiles to true:


   int i = 1 || 0 && 1; // i is 1 , since logical or is of higher precedence than and


I had the idea of separating out the operators for math and shift and cast from those of logic and c comparison , so that logic expressions would be "well formed formulas" ; they would use braces to for precedence levels and the creators intended , like :



  p & ( q | s )   is not equal to p & ( q | s ) since


p  q  s  |  p & ( q | s )  | ( p & q ) |  s
------------------------------------------------------------
F  F   F |  F          F      |      F          F
F  F  T     F         T              F         T
F  T  F     F         T              F         F
F  T  T    F          T             F         T
T  F   F    F          F             F          F
T  F  T    T          T            F          T
T  T  F    T          T            T          T
T  T  T    T          T           T            T



I'll see how this all works out , let you know. :bg

AeroASM

Quote from: cman on June 14, 2005, 04:14:07 PM

   int i = 1 || 0 && 1; // i is 1 , since logical or is of higher precedence than and


With that expression it does not matter what the order of precedence is:

(1 or 0) and (1) = (1) and (1) = 1
(1) or (0 and 1) = (1) or   (0) = 1

cman

Opps!  :red Well you get the idea! :bg

Randall Hyde

Quote from: Homer on June 11, 2005, 09:16:10 PM
I'd just like to point out that masm's MACRO support is powerful !!
You can write yourself a neato script-based language as a macro set, which allows the user to build the applications with masm (no need for a preprocessor).
If you don't believe me, take a look inside the macros folder of the ObjAsm32 oop model by Biterider.. you're in for a shock !!
The simple truth is that masm's macro support is:
-underrated
-misunderstood
-not well known
by most masm coders, including long-term users :cheekygreen:

The only real problems with MASM's macro support is that it's missing a lot of useful string manipulation operations. It's amazing what you can do with the operators that are available, but the resulting code is so *hard* to read and understand that it makes for "less than robust" results. Keep in mind, long before I started the HLA project I tried to do what HLA does with MASM's macro facilities. Ultimately, I gave up because the thing blew up big time (with crazy error messages) whenever you made a mistake using the macros. It worked okay as long as the input was perfect, but the diagnostics were terrible when things were not syntactically correct. Hardly a good thing for a teaching tool. The other problem I had with MASM macro facilities was that it was difficult to do things. Yes, the string manipulation was powerful to do a lot of things, but you had to jump through a lot of hoops to do those things.  Those two problems, plus the fact that there are simply some things that you *cannot* do with MASM macros, are what led me to create HLA in the first place.

You are quite right about most MASM users not having a clue about the power of MASM's macros, though. It's really sad how many MASM users don't want to take the time to learn how to use the features of the assembly language they've chosen. Indeed, most of the people I've seen who've defected from MASM to other assemblers like FASM often fall into this category. Advanced MASM users (who really understand how to use MASM's macro facilities) have a hard time giving up that power.
Cheers,
Randy Hyde

cman

#35
Thank you so much for all your input! It really means quite a bit to me that you all are willing to comment on my project! I'll let you know how this goes ( it might be a few years! ) . Thanks! :bg

cman

#36
I've been working on the language design the last few weeks and will try to debug the grammar pretty soon. I decided my initial idea was stupid ( make a compiler - assembler ). What I'm thinking now is to make the back end of the compiler work like this:


   
    eax = ebx + ecx + edx;


would be translated something like this:



   add ecx , ebx
   add edx , ecx
   mov eax , edx



so as to preserve the actual meaning of the assembly source code. Each operator that corrupts the contents of an operand would have a "rule" as to how register contents will be handled ( in the case of the "+" operator , sums would travel left to right so that the right hand operand would hold the contents of the sum and the left hand operand would be preserved ( this information could be used to produce algorithms by the coder ). This way I could produce a sort of "assembler short hand" . I think this idea is much better than the prior idea. In addition to this format I would add certain high level features to the language to help speed development of assembly programs. Just a thought...  :toothy

tenkey

I would avoid any reordering, and anything requiring hidden pushes and pops to preserve register contents.

One type of reordering I would tolerate is "instruction scheduling". That is simply interleaving chains of instructions to take advantage of internal parallelism (pipes or micro-ops). For some processors this will be hard because it requires intimate knowledge of chip internals.

The goal of a high level assembler is readability. This includes the ease of mapping high level constructs to low level instructions.

Take your example...

eax = ebx + ecx + edx;

How would an assembly language programmer code this, when manually translating it?

First, the destination register EAX does not appear on the right hand side.
Second, the sequence can be strictly left-to-right in this case.

It won't be an assembler, if it doesn't produce what an assembly language programmer would write, namely...

mov eax,ebx
add eax,ecx
add eax,edx


Note that all three registers on the right hand side are untouched.
A programming language is low level when its programs require attention to the irrelevant.
Alan Perlis, Epigram #8

cman

Opps! Guess your right. :red It will be awhile until I get to the back of the compiler , so I guess I have time to think these things over! I also have the idea of "glueing" instuctions with 0 or 1 operand to data or primary types , like this:




   eax = ebx + ( _AAA ecx );



or something like:



  eax = ebx * ( PUSH ecx ) ++ ;



so that an instruction can be executed in a certain  order according to its relation to a primary type ( in the segment above ecx would be pushed before the operation is executed ( and then ecx is incremented ). I have many stupid/crazy ideas for this grammer , and I can't think of a way to make enough symbols to represent all of the instructions I want to enable , so I though I would give symbols to binary operators , for the most part. What do you think of this?  :bg


cman

Or maybe the traditional function call notation would be more appropriate:



  ecx = ebx + ( push ( ecx ) ) ++ ; // push ecx's value prior to evaluating for expression

  and

  ecx = ebx + ( ( ecx ) push ) ++;  //evaluate ecx for expression and then push its value



this seems like a clearer way to express the meaning of the expression. I haven't decided on how 0 operand instuctions will be added yet. Hmmmm...... :(

rea

I will try to confuse more :P....  :bdg


eax = ebx * ( PUSH ecx ) ++ ;


Like I also read that code is..... with the suposition that ++ is first "executed" than the "*", I will see like..

push ecx
add [esp], 1

Then the multiplication with the value at [esp]....

Tough, perhaps what you whant is some like

eax = ebx * ( PUSH ecx++ );


This if Im not wrong will be some like..

push ecx
add ecx, 1

mul with ecx....


and you will say... what a movement of '++' operator do that???, yes, because ++ take like argument what is "near", in the example that you gice first, what is near, is not the register, but the result of the anterior operation, and that refer to esp (in some way), that is for what I have make the decision than in your first example the "add [esp], 1" thing instead of "add ecx, 1".

I can suguest a way, I guess in some cases introduce () will be handy, for example when you whant to save a intermediate result... (perhaps) in a large (nor very handy to break) operation.... if you use only push whatToPush, you will be restricted to save only the following result rather than a posible result for example of something like.... ebx+ecx*3


How you will doit without parenthesis???

eax = 5 + push ebx+ecx*3

Not introduce () for a list of argument, perhaps will be best introduce it for 'grouping' group of instructions (rather than a list of arguments)....

then

eax = 5 + push (ebx+ecx*3)


My other suguestions...

Use/handle mnemonis for opcodes like complete expresions....

For example

mov eax, ebx

is a complete expresion....


The thing about opcodes is that they are operations over operands...., but because they are hardware defined, the operands are "single" and not "composite" for example....

add eax, 5
mov eax, ebx
add eax, ebx

They take not composite operands....

But you can extend to composite operands...

mov eax, (ebx+5)
or
mov eax, ebx+5

You see...


You only need see when a instructions is taking a single or composite operand... the diference is that a single operand can be recognogized by the mnemonic to be assembled, a composite operand need be "breaked"...

Now you can extend.... to

eax = 2*ebx+(push ++(mov ebx, [addr]))

That will be

shl ebx, 1 ;this can be deleted in a further optimization... ¿?¿?
mov ebx, [addr]
inc ebx ; or add ebx, 1
push ebx
mov eax, ebx

tought all that I guess can happen only with eax... :roll: but is for the shake of confuse you more... ops.. give examples/suguestions :P.



Also eax = 2*ebx+(push ++(mov ebx, [addr])) you think that is equivalent to eax = 2*ebx+(push (mov ebx, [addr])++)???... changind the position in this case of "++"



Because push only recognogize single operands, not composite... then if it take the first single operand when evaluated... it will try get the first single operand available for the instruction... that is, something like the following will be generated....

shl ebx, 1
mov ebx, [addr]
push ebx
add [esp], 1
mov eax, [esp]


Hope this help you a little more :)....

cman

Thanks for the suggestions!  :U What I sort of want to do with this is capture the thought process of writting assembler in the syntax of the language. Like for instance , one might want to save the contents of a register , add the contents of the register to another and then take the bit-wise or of the addition and store this in another register:



  ecx = ( eax + ( push ( ebx ) ) ) | edx;




I just wanted to catch the that thought process and use this to design the grammer. I think this would be more self documenting and faster to generate than the traditional opcode , operand system ( and maybe make assembler accessable to a wider range of programmers ). I'll continue my work on this.... :bg

rea

#42
But if you follow the inner to outside (that I guess is what make sense) You will need some like the following (I guess)...


ecx = ( eax + ( push ( ebx ) ) ) | edx;

If Im not wrong will be something like...


      =
     / \
    /   \
   ecx   |
       /   \
      /     \
     +       edx
    /  \
   /    \
eax    push
          \
           \
            ebx


Then you have registers, mnemonics or "pure"/CPU operations (that impact in something, and the result is somehwhere), operators. At less that is what can be watched easely.

following the anterior, first you will see that you have a register... that is good, then you will see that there is a mnemonic (push) you see that you need a single operator, because how is defined push like I see the operator isn't modified and the result is in [esp], then you go up, and see that you have a +, that need two operators..., you see that the left operator is a single operator (not composite.. no further evaluation), but the right operator, the result is not ebx the result (for me) is in [esp], furthermore "+" take two operands and dosent save the result in the operands but in a temp variable (with diference of add mnemonic) the temp result is a operator for "|" and ebx is the other single operator, but "|" dosent save the result in any of the operands, but in a temp var, then that second temp var will be assigned to ecx. And that is.

That is a first aproach, but you will see, that there are paths for optimization, for example for delete the "temp vars" or temporal operators, for generate more eficient code, another optimization, you know, like is push, the result is in [esp], but instead of access the memory at esp, you can use the operator (ebx), see the diference, when you do some like:
(push eax) ++ in that case (like I see), eax will not be touched and the assumption for optimization (take the operator instead of the result in memory) will not work, you need add 1 to the result in [esp].


;).




fuerthermore, the optimization, will be to generate the code that a human will generate, for example, we know that we not will use temp vars, but perhaps the register that will be trashed :)...

cman

My assembler is a little rusty ( I haven't been programming for awhile - been studing compiler theory ) but I think that this statement:




   ecx = ( eax + ( push ( ebx ) ) ) | edx;



would translate to ( without optimization ) :



   push ebx
   mov ecx , eax
   add ecx , ebx
   or ecx , edx



Not sure yet. Just trying to implement my vision of what the grammar should look like and be able to say. I'd like the grammar to be able to express a series of assembler operators in a way that is natural to human understanding as well as being terse and concise. I don't know if this can be achived or not ......

rea

Then, the question that I have, is how you will handle operators?... like a shorcut to mnemonics? or like "they are"

for example



b+c
follow a temp var a
a = b + c

is diferent than

b+c
follow a rewrite of the first operator
b = b + c


The first example, is like I say, the operator "+" uses a temp var that can be "deleted" with optimization (considering that the result of a push is at [esp], but is the same than the operator... tough you will normally push for save and resuse the register...), the second one, is more a shorcut to asm.



Like I say handling the operators in the first way, without optimization... If Im not wrong, will be expanded to the following, and the thing will be that this can be "optimized" to what you say it should be, but after generate the "direct" tree.


push ebx
mov centinel1, [esp]
add centinel1, eax
mov centinel2, edx
or centinel2, centinel1 ;being strict this operation isnt valid, but this are temp that should be deleted.
mov ecx, centinel2






A first aproach that aparently can handle this following the tree

ecx = ( eax + ( push ( ebx ) ) ) | edx;

      =
     / \
    /   \
   ecx   |
       /   \
      /     \
     +       edx
    /  \
   /    \
eax    push
          \
           \
            ebx




            =
           / \
          /   \
    ecx(r32)   |(s2)
              /   \
             /     \
           +(s1)   edx(r32)
          /  \
         /    \
    eax(r32)   push (mem32 [esp])
                \
                 \
                ebx (r32)



There is you can propagate from top to down the overwrited ecx to the sentinels in this case for easy handling of the "graph" they are called s(entinel)N.



            =
           / \
          /   \
    ecx(r32)   |(s2) = ecx
              /   \
             /     \
     ecx = +(s1)   edx(r32)
          /  \
         /    \-----> mov ecx, [esp]/ebx
    eax(r32)   push (mem32 [esp])
                \
                 \
                ebx (r32)


In this case, I guess can be generated well.



push ebx
mov ecx, [esp]/ebx (taking the shortcut about push result equal operator)
add ecx, eax
or ecx, edx


The "sentinels" that cant be deleted, then they will be extra temp vars....