Question on a recursive ".IF" parser

Started by hutch--, November 11, 2005, 12:18:03 AM

Previous topic - Next topic

hutch--

I have been playing with a recursive .IF parser and everything works fine except I keep messing up the label locations. It accurately get the correct code at each recursion level and I have the code working fine to convert each IF/ENDIF/ELSE/ENDIF statement to the appropriate mnemonics but it has been a nightmare to put all of the labels for forward jumps in the right place.

I have been using this approach to try and perform te complete preprocessor expansion in one pass but I am starting to get the idea that it cannot be done efficiently.

If the task is done as a multiple pass technique there are many things that become a lot easier where you can remove redundant labels and jumps and perform short circuiting of exit jumps and other similar forms of optimisation.

Has anyone ever played with a recursive IF block parser ? I was interested if anyone has a different approach that has been successful.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

tenkey

Attached is one example that I've hand converted to C, so it may have some syntax errors. It produces at most one pair of redundancies - when an ELSE section is missing, there is a redundant jump and the last false jump label and the exit label mark the same location. It can be altered to eliminate the redundant jump. I moved the generation of the false jump label ids for ease in implementing shortcut ANDs and ORs. If you want shortcut expressions, the ids for both true and false jump labels are generated together, and the true jump label is placed just after the false jump following the expression. For efficient code, pass both the true and false jump labels to the expression parser.

Here is an alternate technique I used to handle "break" and "continue" instructions of C.

Keep the current break and continue labels in global variables. Initialize the global variables with invalid values if you want error checking. If a loop (do or while) or switch instruction is begun, stack the labels by copying them to local variables. Generate new labels. When the loop or switch ends, unstack the labels from the local variables before exiting the parsing routine.

[attachment deleted by admin]
A programming language is low level when its programs require attention to the irrelevant.
Alan Perlis, Epigram #8

BogdanOntanu

Actually it is fairly easy:

Use the stack ;)

Push the JMP locations that need fixing on the stack creating an "Current IF CONTEXT"
and later on when you encounter a block finish (either ELSE or ENDIF)
you shall pop the locations from the stack and fix them with the appropiate addresses ;)

Simple ;)
Ambition is a lame excuse for the ones not brave enough to be lazy.
http://www.oby.ro

hutch--

Guys,

Thanks for your help here. I have currently put aside the recursive version for a multipass version that converts one IF ENDIF block at a time and continues iterating until there ae no more IF blocks left but this will be slower than a one pass recursive version. Even if I can get a recursive version going I am tending to favour a multipass aproach to do tasks like redundant jump removal, jump short circuiting and selective opcode replacement when you can use TEST to replace CMP for example.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Randall Hyde

Quote from: hutch-- on November 11, 2005, 12:18:03 AM
I have been playing with a recursive .IF parser and everything works fine except I keep messing up the label locations. It accurately get the correct code at each recursion level and I have the code working fine to convert each IF/ENDIF/ELSE/ENDIF statement to the appropriate mnemonics but it has been a nightmare to put all of the labels for forward jumps in the right place.

I have been using this approach to try and perform te complete preprocessor expansion in one pass but I am starting to get the idea that it cannot be done efficiently.
Everything was easy up to the point you said "efficiently". If by efficient, you mean removing jmps to jmps and the like, I'm afraid that a recursive algorithm is going to be very complex. Best to make it multi-pass if you want simplicity.

Quote
If the task is done as a multiple pass technique there are many things that become a lot easier where you can remove redundant labels and jumps and perform short circuiting of exit jumps and other similar forms of optimisation.
Yes, all this is true. The disadvantage of multi-pass, of course, is that it runs a little slower. But given the speed of today's machines, I wouldn't worry too much about it. Otherwise, you find yourself in the same unenviable position as Rene arguing that his branch displacement optimization algorithm is better because it is faster, even though it doesn't do a very good job of optimization. Get the code right first and worry about speed later.

Quote
Has anyone ever played with a recursive IF block parser ? I was interested if anyone has a different approach that has been successful.
Yes. Several, in fact. The trick to doing a recursive parser/code generator is to pass "thunks" down the recursive calls that execute when doing the equivalent of backpatching, and those thunks can check to see if a jmp has become redundant, and put in the ultimate address.

A thunk is, effectively, a piece of code passed as data to a procedure. Of course, you pass a pointer (and an execution context, which is what makes a thunk different from a procedure pointer) not the actual code itself; when it comes time to patch in the ultimate target address, you call the thunk and it figures out what to actually do at that point. Because the thunk's context (think: stack frame) is down in the call stack, it has access to values that are masked by the current recursive call to your parser. For a brief discussion of thunks, check out:

http://webster.cs.ucr.edu/AoA/Windows/PDFs/Thunks.pdf

or

http://webster.cs.ucr.edu/AoA/DOS/ch12/CH12-1.html

Cheers,
Randy Hyde

hutch--

I have got the multipass version running and its output is correct but I have run into the weirdest problem with allocated memory. I started using two GlobalAlloc() buffers which I rotate each pass so that one is written to the other for each pass. While I was doing the early development I had a messagebox testing a return value before the parser and when I had the output working properly, I removed it and the app instantly GP faults.

Tracking the fault it ended up on the first write attempt to the second buffer in the first pass. If I put the message box back, it worked correctly run from the editor.

I substituted a pushad SleepEx and popad and this will also allow it to run from the editor but any attempt to run it from the command line gives the same fault on first attempt at writing to te second buffer.

I have tried HeapAlloc() and OLE string memory as well but they all crash in the same place, even though I have tested the memory allocation return values which are OK and GetLastError().

This is the basic calling code so the logic is very simple.

  ; ****************************************

  process_if:

    pushad
    invoke SleepEx,1,0                      ; yield time slice, crashes without it
    popad

    invoke scan_lines,gbuf1,gbuf2,lnum      ; process 1st level .IF blocks
    mov lnum, ecx                           ; save the last label number

    push gbuf1                              ; swap memory buffers
    push gbuf2
    pop gbuf1
    pop gbuf2

    test eax, eax                           ; test if no more .IF blocks left
    jnz process_if

    print gbuf1,13,10                       

  ; ****************************************


Run from the ediitor with the SleepEx API, it performs multiple passes on the data by swapping the buffers ad outputs the result as a string to the screen. Anything else and it instantly faults on the first write.

Has anyone ever seen a problem like this one ?

This is the working output. The only optimisation done so far is the removal of dead jumps after and existing JMP, RET or RETN.



Turns this,

  ; ********************
  ; ********************

    .if eax == 1
      xor eax, eax
    .elseif eax != 2
      ret
    .elseif eax >= 3
      sub eax, 1
    .elseif eax <= 3
        .if edx == 100
          sub edx, 1
        .elseif edx == 2
          add edx, 1
        .else
          nop
        .endif
    .elseif eax > 3
      jmp elsewhere
    .elseif eax < 3
      sub eax, 1
    .else
      retn
    .endif

  ; ********************
  ; ********************

to this,

  ; ********************
  ; ********************

    cmp eax, 1
    jne lbl1
      xor eax, eax
    jmp lbl0

  lbl1:
    cmp eax, 2
    je  lbl2
      ret

  lbl2:
    cmp eax, 3
    jl  lbl3
      sub eax, 1
    jmp lbl0

  lbl3:
    cmp eax, 3
    jg  lbl4
    cmp edx, 100
    jne lbl9
          sub edx, 1

  lbl9:
    cmp edx, 2
    jne lbl10
          add edx, 1
    jmp lbl8

  lbl10:
          nop

  lbl8:
    jmp lbl0

  lbl4:
    cmp eax, 3
    jle lbl5
      jmp elsewhere

  lbl5:
    cmp eax, 3
    jge lbl6
      sub eax, 1
    jmp lbl0

  lbl6:
      retn

  lbl0:

  ; ********************
  ; ********************
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

tenkey

Smells like an initialization error to me.

The extra code at the beginning is initializing stack data that will be referenced later, perhaps?
A programming language is low level when its programs require attention to the irrelevant.
Alan Perlis, Epigram #8

hutch--

I gave up and rewrote it as it needed better internal architecture and I have it running properly at the moment. Only optimisation done so far is replacing cmp reg, 0 with test reg, reg.

Next is to remove redundant jumps after JMP RET and RETN. This is a reasonably easy flagged operation from the 1st word of the last normal instruction.

The design is iterative depending on a set flag for an occurence of .IF in the last pass. For 3 levels of nesting it makes 4 passes, the last determines there are no more .IF blocks left.

I will post it when I get it tidied up a bit.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

asmrixstar

just out of curiousity what's this for hutch your own assemble or macro's? :eek