News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Pattern Matching for my 6502 Monitor

Started by indiocolifa, October 04, 2005, 05:27:02 AM

Previous topic - Next topic

indiocolifa

Hi. For my 6502 monitor I'm going to implement a pseudoassembler, that is a command that inputs for mnemonics and assembles it into 6502 RAM.
The accepted format of course is:

<mnemonic> <operands>

e.g: LDA #$09
      ADC $FF01
etc.

I was thinking of using the pattern matching routines of the HLA Standard Library, but maybe it's better to use string tokenization functions to do this (token 1: OPCODE, token 2: OPERANDS using delimiters,etc)

Which do you think I should use?




Randall Hyde

Quote from: indiocolifa on October 04, 2005, 05:27:02 AM
Hi. For my 6502 monitor I'm going to implement a pseudoassembler, that is a command that inputs for mnemonics and assembles it into 6502 RAM.
The accepted format of course is:

<mnemonic> <operands>

e.g: LDA #$09
      ADC $FF01
etc.

I was thinking of using the pattern matching routines of the HLA Standard Library, but maybe it's better to use string tokenization functions to do this (token 1: OPCODE, token 2: OPERANDS using delimiters,etc)

Which do you think I should use?





I've actually written a "Y86" assembler using the HLA pattern matching code for the sample exercises in the on-line version of AoA. You might look up the source code for "simy86" (or whatever I've called it) on Webster.
Cheers,
Randy Hyde

Evenbit

Quote from: indiocolifa on October 04, 2005, 05:27:02 AM
I was thinking of using the pattern matching routines of the HLA Standard Library, but maybe it's better to use string tokenization functions to do this (token 1: OPCODE, token 2: OPERANDS using delimiters,etc)

Which do you think I should use?

Seems like str.tokenize( stringArray, string ) would be quite helpful.  Whatever you do, you should think about the order in which you process the information gained.  For instance, right after tokenizing it, I'd determine the Addressing Mode first before I pass it on to code that matches-up the mneumonic.  Here's some rough pseudocode to show what I mean:



asmline - input from user
tok[x] - string array of tokens

str.tokenize( tok, asmline );
if (eax = 2)
    if (firstchar(tok[1*4]) = "#")
        flag = immediate
    else
        if (length(tok[1*4]) = 2)
            flag = zeropage
        else
            flag = absolute
        endif
    endif
else
    if (tok[2*4] = "X"
        if (length(tok[1*4]) = 2)
            flag = zeropageX
        else
            flag = absoluteX
        endif
    elseif (tok[2*4] = "X)")
        flag = indirectX
    else
        if (firstchar(tok[1*4] = "(")
            flag = indirectY
        else
            flag = absoluteY
        endif
    endif
endif



Then you pass this on to code to match tok[0*4] with one of about 56 instructions.  If it happens to be "ADC", for instance, then the 'flag' will tell you which opcode (69, 65, 75, 6D, 7D, 79, 61, 71) to use.  It also tells you how many bytes are expected after the opcode.

Nathan.


indiocolifa

How do I search in my opcode table using pattern-matching. I'm using the following code that does not work:

// do pattern match
    pat.match (asmentry);
   
        // check for valid mnemonic
        push (ecx);
        push (edi);
       
        pat.onePat;
       
        FOR (xor(eax,eax); eax<=256; inc (eax)) DO
            stdout.put ("EaX=",eax);
            intmul (@size(opcode),eax,edi); // calc offset at string table
            mov (OP_STR[edi], esi);
            mov (esi, mnemo);       // get string at table
            pat.alternate;
            pat.matchStr(mnemo);
        ENDFOR;
        pop(edi);           
        pop(ecx);
       
        pat.endOnePat;

   
    pat.if_failure
   
        stdout.put ("Unknown instruction -",asmentry,nl);
   
    pat.endmatch;


What I'm trying to do is:

pat.alternate
<test for opcode 1>
pat.alternate
<test for opcode 2>
.
.
.
pat.alternate
<test for opcode n>

Thank you very much.

Evenbit

Well I've had an itch for a while now to do a full-fledged 6502/10 assembler (with some support for macros, a few control constructs, an expression evaluator, and some directives and such) so I've occasionally poked around into places like http://webster.cs.ucr.edu/AsmTools/RollYourOwn/index.html and Sevag's Arayna project (take a look at "CmpAtFuncs.hla") for some ideas.  Now a binary tree might be overkill for a simple assembler that only deals with 56 instructions, so I came up with this approach:

1) put the 56 instructions into one long string and follow each with a space.
2) take the user-supplied instruction, make it the same case, add a space, and put it into EAX.
3) use SCASD to find it in step 1's string.
4) use the resulting position as an index into an array of opcodes.

Here's some test code for the matching part:

program asm;
#include("stdlib.hhf")

static
    mnem: byte[4*4] := ['A','D','C',' ','S','U','B',' ','S','T','A',' ','L','D','A',' '];
    find: byte[4] := ['S','T','A',' '];

begin asm;

mov( 4*4, ecx );
lea( edi, mnem );
mov( (type dword find), eax );
cld();
back:
scasd();
loopne back;
mov( 4*4, eax );
sub( ecx, eax );
stdout.puti32(eax);
stdout.newln();

end asm;



EAX will contain an index into our array.  Just use a 2-dimensional byte array with the first dimension being the mnemonic and the second dimension being the addressing mode.  Here's code that I wrote for the aoaprogramming forum (Yahoo Groups) which shows a straight-forward way of dealling with 2D arrays:


program afp;
// Array Fill & Print
// low-level example by Nathan Baker
//---
// esi - 1st dimension counter
// edi - 2nd dimension counter
// ebx - base register
// eax - multi-purpose scratchpad

#include("stdlib.hhf")

var
    MyArray: int32[3, 5];
endvar;

begin afp;

xor( esi, esi );  //clear our first dimension counter
stdout.puts( nl "Enter 15 integers, 5 per column:" );

lp1:
    stdout.puts( nl "Column " );
    mov( esi, eax );  //get current 1st dim count
    inc( eax );  //zero-based, so we adjust for the display
    stdout.puti32( eax );  //display it
    stdout.newln();
    mov( esi, eax );  //lets calc offset from basepointer
    shl( 2, eax );  //multiply by 4 because 'int32' is 4 bytes
    intmul( 5, eax );  //multiply by 5 because second dimension is 5
    lea( ebx, MyArray );  //get array pointer into base register
    add( eax, ebx );  //add the offset
    xor( edi, edi );  //clear our second dimension counter

    lp2:
        mov( edi, eax );  //get current 2nd dim count
        inc( eax );  //zero-based, so we adjust for the display
        stdout.puti32( eax );  //display it
        stdout.puts( ":" );
        stdin.geti32();
        mov( eax, [ebx+edi*4] );  //[base + (1st dim * size_of 2nd)] + 2nd dim * 4
        inc( edi );  //increase 2nd dim counter
        cmp( edi, 5 );  //have we reached upper limit?
        jne lp2;  //no, then jump back -- yes, then continue

    inc( esi );  //increase 1st dim counter
    cmp( esi, 3 );  //have we reached upper limit?
    jne lp1;  //no, then jump back -- yes, then continue
   
xor( esi, esi );
stdout.puts( nl "You entered:" );

lp3:
    stdout.puts( nl "Column " );
    mov( esi, eax );
    inc( eax );
    stdout.puti32( eax );
    stdout.newln();
    mov( esi, eax );
    shl( 2, eax );
    intmul( 5, eax );
    lea( ebx, MyArray );
    add( eax, ebx );
    xor( edi, edi );

    lp4:
        mov( edi, eax );
        inc( eax );
        stdout.puti32( eax );
        stdout.puts( ":" );
        mov( [ebx+edi*4], eax );
        stdout.puti32( eax );
        stdout.newln();
        inc( edi );
        cmp( edi, 5 );
        jne lp4;

    inc( esi );
    cmp( esi, 3 );
    jne lp3;

end afp;



You'll want to remove the "*4" from the end of "[ebx+edi*4]" and the "shl( 2, eax );" needs deleted (and maybe a few other tweaks) since you only need a byte array.  Hope this helps.  Have fun!

Nathan.