Few jumps vs many jumps in a parser loop

dedndave · December 12, 2009, 11:41:56 AM

Quote$12345

ok - i know the assembler doesn't recognize currency - lol
so - how is this evaluated ?

on the other hand - a nice feature
we could call this routine until the bank account is full :bg

Net_Worth_Loop:
call GetSome
add MyBankAccount,eax
jmp Net_Worth_Loop

GetSome:
mov eax,$999999999
ret

EDIT - added a little "bonus" code :P
guess i better optimize that a little bit...
(get rich quick scheme)

mov eax,$999999999

Net_Worth_Loop:
add MyBankAccount,eax
jmp Net_Worth_Loop

dedndave · December 12, 2009, 12:01:50 PM

another question for you, Jochen...
is this parser required to evaluate reals, as well ?
if not - there is a practical limit to the length of strings
the worst case would be binary
does it need to parse 64-bit binary values ? - or is 32-bit sufficient ?
i am just curious as to what the max string length is

and one more question - as far as i know, spaces are not allowed in any of the formats - so they could be interpreted as EOL ?

mov eax,10h ;valid comment
mov eax,10h invalid comment

or do you need to evaluate the rest of the line to test validity ?

dedndave · December 12, 2009, 03:15:02 PM

i see a little bug in some of the logic, Jochen
not timing related, but you might want to mod the code a little

this code has the potential for aliasing errors

            test al, al
            je @F
            or al, 20h
            cmp al, "e"
            je HasE
            cmp al, "."
            je HasDot
            cmp al, "h"
            je HasH
            cmp al, "b"
            je HasB
            cmp al, "y"
            je HasY
            cmp al, "("
            je HasB1
            cmp al, "("
            je HasB2

changing the test order so non-alpha chars are tested prior to the "or al,20h" will eliminate the problem

            test al, al
            je @F
            cmp al, "."
            je HasDot
            cmp al, "("
            je HasB1
            cmp al, ")" ;i think that's what you meant
            je HasB2

            or al, 20h
            cmp al, "e"
            je HasE
            cmp al, "h"
            je HasH
            cmp al, "b"
            je HasB
            cmp al, "y"
            je HasY

on a related note - and this may speed things up a little
because most of the characters in the strings are ascii numerics, test for that first
the null char only occurs once - even though "or al,al" may be faster than other compares, the occurance frequency is last on the list
don't test for it until you have to

test for the most likely-to-occur first
that way, fewer compares are made
i would start off with:

xor al,30h
cmp al,9
jbe Skip_It

no need to xor it back out - simply re-map the rest of the chars

also, a-f are valid hexadecimal "numbers"
you need a little bit of a state-machine to control some of the branching

also - i am still trying to find something on the "$" - anyone ?

jj2007 · December 12, 2009, 05:53:49 PM

Thanks, Dave. The logic still needs to be elaborated. Right now, I just wanted to test if a pre-scan, either sequentially or per character table, is worth the effort, speed and sizewize. The old version of Val() has almost a kByte, and that's no good...

dedndave · December 12, 2009, 07:03:51 PM

i am trying to understand the byte-count portion of the results macro....

   ifidn <algo>, <parseTable>
      add eax, ptEnd-pt+1
   elseifidn <algo>, <parseTableS>
      add eax, ptEndS-ptS+1
   elseifidn <algo>, <parseDave>
      add eax, ptEndS-ptS+1         ; we treat it as if the standard jumps had been used
      ;add eax, Build_T-Build+1
   endif

huh ? :dazzled:
you lost me, my friend - lol
i will just leave that part out

jj2007 · December 12, 2009, 08:22:37 PM

> we treat it as if the standard jumps had been used
Dave,
The macro adds the hand-counted bytes in the .data section. For better comparison, I assumed that the same number of "specific" jumps was implemented in your algo.

dedndave · December 12, 2009, 10:42:28 PM

well, Jochen, here is my first attempt
very impressive :P

prescott:

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
14608 cycles for PAlgn4, 150 bytes #h=50,#dots=50

2827 cycles for PAlgn4, 150 bytes #h=10,#dots=10

131 cycles for PAlgn4, 150 bytes #h=1,#dots=1

jj2007 · December 12, 2009, 11:06:02 PM

Quote from: dedndave on December 12, 2009, 10:42:28 PM
well, Jochen, here is my first attempt

Thanks, Dave. Interesting approach, I hope you can tune it a little bit...
K2Z, J

Code Select

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
7189    cycles for PAlgn4, 150 bytes    #h=50,#dots=50
1394    cycles for PAlgn4, 150 bytes    #h=10,#dots=10
84      cycles for PAlgn4, 150 bytes    #h=1,#dots=1

dedndave · December 12, 2009, 11:41:43 PM

lol - i don't think "tuning it" is going to be adequate
it needs a major overhaul :bg
but, i may see something
i made a couple small changes in ecx and got a few strokes out of it
but, i have to keep an open mind to a different concept

rags · December 13, 2009, 02:00:45 AM

Quote from: dedndave on December 12, 2009, 03:15:02 PM
this code has the potential for aliasing errors

Dave, what is an aliasing error?

dedndave · December 13, 2009, 02:08:39 AM

well - that is, perhaps, a mis-use of the terminology - lol
but, in this case, Jochen uses the OR AL,20h instruction to force upper case alpha characters to lower case
that way, he does not have to test for "A" and "a" - they both become "a"
but, half of the rest of the characters are affected, as well
for example, carriage return (0Dh) becomes hypen or minus sign (2Dh)
the OR instruction destoyed the unique identification of the 0Dh
you can no longer tell which was loaded
when you parse characters using OR 20h (or AND 0DFh to force upper case on lower case alpha characters),
it is best to test all the non-alpha characters first

notice that if you use XOR AL,30h (or SUB AL,30h) to test for numberic characters,
there are still 256 different values - they are just different - they do not lose their unique identity

the term "aliasing error" is normally used in information theory or cryptography when there is a loss of information

News:

Few jumps vs many jumps in a parser loop