News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Few jumps vs many jumps in a parser loop

Started by jj2007, December 08, 2009, 04:24:05 PM

Previous topic - Next topic

dedndave

#60
Quote$12345

ok - i know the assembler doesn't recognize currency - lol
so - how is this evaluated ?

on the other hand - a nice feature
we could call this routine until the bank account is full   :bg

Net_Worth_Loop:
        call    GetSome
        add     MyBankAccount,eax
        jmp     Net_Worth_Loop

GetSome:
        mov     eax,$999999999
        ret

EDIT - added a little "bonus" code   :P
guess i better optimize that a little bit...
(get rich quick scheme)

        mov     eax,$999999999

Net_Worth_Loop:
        add     MyBankAccount,eax
        jmp     Net_Worth_Loop

dedndave

another question for you, Jochen...
is this parser required to evaluate reals, as well ?
if not - there is a practical limit to the length of strings
the worst case would be binary
does it need to parse 64-bit binary values ? - or is 32-bit sufficient ?
i am just curious as to what the max string length is

and one more question - as far as i know, spaces are not allowed in any of the formats - so they could be interpreted as EOL ?

        mov     eax,10h         ;valid comment
        mov     eax,10h          invalid comment

or do you need to evaluate the rest of the line to test validity ?

dedndave

i see a little bug in some of the logic, Jochen
not timing related, but you might want to mod the code a little

this code has the potential for aliasing errors

            test al, al
            je @F
            or al, 20h
            cmp al, "e"
            je HasE
            cmp al, "."
            je HasDot
            cmp al, "h"
            je HasH
            cmp al, "b"
            je HasB
            cmp al, "y"
            je HasY
            cmp al, "("
            je HasB1
            cmp al, "("
            je HasB2

changing the test order so non-alpha chars are tested prior to the "or al,20h" will eliminate the problem

            test al, al
            je @F
            cmp al, "."
            je HasDot
            cmp al, "("
            je HasB1
            cmp al, ")"          ;i think that's what you meant
            je HasB2

            or al, 20h
            cmp al, "e"
            je HasE
            cmp al, "h"
            je HasH
            cmp al, "b"
            je HasB
            cmp al, "y"
            je HasY

on a related note - and this may speed things up a little
because most of the characters in the strings are ascii numerics, test for that first
the null char only occurs once - even though "or al,al" may be faster than other compares, the occurance frequency is last on the list
don't test for it until you have to

test for the most likely-to-occur first
that way, fewer compares are made
i would start off with:

        xor     al,30h
        cmp     al,9
        jbe     Skip_It

no need to xor it back out - simply re-map the rest of the chars

also, a-f are valid hexadecimal "numbers"
you need a little bit of a state-machine to control some of the branching

also - i am still trying to find something on the "$" - anyone ?

jj2007

Thanks, Dave. The logic still needs to be elaborated. Right now, I just wanted to test if a pre-scan, either sequentially or per character table, is worth the effort, speed and sizewize. The old version of Val() has almost a kByte, and that's no good...

dedndave

i am trying to understand the byte-count portion of the results macro....

   ifidn <algo>, <parseTable>
      add eax, ptEnd-pt+1
   elseifidn <algo>, <parseTableS>
      add eax, ptEndS-ptS+1
   elseifidn <algo>, <parseDave>
      add eax, ptEndS-ptS+1         ; we treat it as if the standard jumps had been used
      ;add eax, Build_T-Build+1
   endif


huh ?    :dazzled:
you lost me, my friend - lol
i will just leave that part out

jj2007

> we treat it as if the standard jumps had been used
Dave,
The macro adds the hand-counted bytes in the .data section. For better comparison, I assumed that the same number of "specific" jumps was implemented in your algo.

dedndave

well, Jochen, here is my first attempt
very impressive   :P

prescott:

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
14608   cycles for PAlgn4, 150 bytes    #h=50,#dots=50

2827    cycles for PAlgn4, 150 bytes    #h=10,#dots=10

131     cycles for PAlgn4, 150 bytes    #h=1,#dots=1

jj2007

Quote from: dedndave on December 12, 2009, 10:42:28 PM
well, Jochen, here is my first attempt

Thanks, Dave. Interesting approach, I hope you can tune it a little bit...
K2Z, J

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
7189    cycles for PAlgn4, 150 bytes    #h=50,#dots=50
1394    cycles for PAlgn4, 150 bytes    #h=10,#dots=10
84      cycles for PAlgn4, 150 bytes    #h=1,#dots=1

dedndave

lol - i don't think "tuning it" is going to be adequate
it needs a major overhaul   :bg
but, i may see something
i made a couple small changes in ecx and got a few strokes out of it
but, i have to keep an open mind to a different concept

rags

Quote from: dedndave on December 12, 2009, 03:15:02 PM
this code has the potential for aliasing errors
Dave, what is an  aliasing error?
God made Man, but the monkey applied the glue -DEVO

dedndave

well - that is, perhaps, a mis-use of the terminology - lol
but, in this case, Jochen uses the OR AL,20h instruction to force upper case alpha characters to lower case
that way, he does not have to test for "A" and "a" - they both become "a"
but, half of the rest of the characters are affected, as well
for example, carriage return (0Dh) becomes hypen or minus sign (2Dh)
the OR instruction destoyed the unique identification of the 0Dh
you can no longer tell which was loaded
when you parse characters using OR 20h (or AND 0DFh to force upper case on lower case alpha characters),
it is best to test all the non-alpha characters first

notice that if you use XOR AL,30h (or SUB AL,30h) to test for numberic characters,
there are still 256 different values - they are just different - they do not lose their unique identity

the term "aliasing error" is normally used in information theory or cryptography when there is a loss of information