The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: jj2007 on December 08, 2009, 04:24:05 PM

Title: Few jumps vs many jumps in a parser loop
Post by: jj2007 on December 08, 2009, 04:24:05 PM
I was testing the hypothesis that jumping is slower than not jumping - see snippet below. Code size is identical, timings are surprisingly similar, but what struck my eye was that the longer string took almost 4 times as long as expected. Any explanation...?
Celeron M:
587     cycles  few jumps, 66 bytes, len=40
616     cycles many jumps, 66 bytes, len=40
45      cycles  few jumps, 66 bytes, len=10
43      cycles many jumps, 66 bytes, len=10
Quotealign 16
Src1   db "123456789.1234789e-123 and a few more..", 0
align 16
Src2   db "1234.567h", 0

.code
start:
   counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      S1:
      mov esi, offset Src1
      jmp @F
      HasE:
      jmp @F
      HasDot:
      jmp @F
      HasH:
      jmp @F
      HasB:
      jmp @F
      HasY:
      jmp @F
      HasB1:
      jmp @F
      HasB2:
      @@:
         mov al, [esi]
         inc esi
         .if al<"0" || al>"9"   ; few jumps
            test al, al
            je @F
            or al, 20h
            cmp al, "e"
            je HasE
            cmp al, "."
            je HasDot
            cmp al, "h"
            je HasH
            cmp al, "b"
            je HasB
            cmp al, "y"
            je HasY
            cmp al, "("
            je HasB1
            cmp al, "("
            je HasB2
         .endif
      jmp @B
      @@:
      E1:
      sub esi, offset Src1
   counter_end
   print str$(eax), 9, "cycles  few jumps, "
   mov eax, E1
   sub eax, S1
   print str$(eax), " bytes, len="
   print str$(esi), 13, 10

   counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      S2:
      mov esi, offset Src1
      @@:
         mov al, [esi]
         inc esi
         .if al<"0" || al>"9"   ; many jumps
            test al, al
            je @F
            or al, 20h
            .if al=="e"
               jmp @B
            .endif
            .if al=="."
               jmp @B
            .endif
            .if al=="h"
               jmp @B
            .endif
            .if al=="b"
               jmp @B
            .endif
            .if al=="y"
               jmp @B
            .endif
            .if al=="("
               jmp @B
            .endif
            .if al=="("
               jmp @B
            .endif
         .endif
      jmp @B
      @@:
      E2:
      sub esi, offset Src1
   counter_end
   print str$(eax), 9, "cycles many jumps, "
   mov eax, E2
   sub eax, S2
   print str$(eax), " bytes, len="
   print str$(esi), 13, 10
[/b]
Title: Re: Few jumps vs many jumps in a parser loop
Post by: dedndave on December 08, 2009, 04:56:38 PM
could it be because it is trying to parse through...
Quote" and a few more.."
DOH !
lol

as for jumps - i find that short jumps are pretty quick
it is still good to reduce the number of jumps executed in a loop, if practical
Title: Re: Few jumps vs many jumps in a parser loop
Post by: jj2007 on December 08, 2009, 05:21:48 PM
Quote from: dedndave on December 08, 2009, 04:56:38 PM
could it be because it is trying to parse through...
Quote" and a few more.."
DOH !
lol
Yep, you are right. These are non-numbers that slow the algo down... how stupid of me.
504     cycles  few jumps, 66 bytes, len=100
502     cycles many jumps, 66 bytes, len=100
45      cycles  few jumps, 66 bytes, len=10
43      cycles many jumps, 66 bytes, len=10


>as for jumps - i find that short jumps are pretty quick
>it is still good to reduce the number of jumps executed in a loop, if practical

Probably yes, although this test reveals no difference between jumping and not jumping.
Title: Re: Few jumps vs many jumps in a parser loop
Post by: hutch-- on December 09, 2009, 12:40:37 AM
JJ,

In isolation the difference is reasonably hard to measure but thge general drift is if an algorithm is anything like well advanced in speed terms extra jumps start to become one of the barriers to making it go faster. With very high speed code you will see the difference but with code full of stalls, dependencies and the like it just does not matter much.
Title: Re: Few jumps vs many jumps in a parser loop
Post by: jj2007 on December 09, 2009, 12:54:42 AM
Hutch,
Maybe I have not been clear. What I wanted to see was the difference between two parser loops, assuming that the text rarely has a dot:
A) 
cmp al, "."
je HasDot   ; ... do stuff there, otherwise fall through

B)
.if al=="."
  ... do stuff
.endif

Now A) is obviously rarely taken; in 99% of cases, it falls through.
In B), the .if translates to:
cmp al, "."
jne @F
... do stuff
@@:

So here it's the opposite: That jump is taken 99% of the time.
According to Opcodes.chm, a jump taken is 3 cycles, not taken is one cycle. An unknown author writes there:
Quote- It's a good programming practice to organize code so the
  expected case is executed without a jump since the actual
  jump takes longer to execute than falling through the test.
:wink
Title: Re: Few jumps vs many jumps in a parser loop
Post by: hutch-- on December 09, 2009, 01:01:02 AM
JJ,

That is conventional wisdom and if it is the only factor its usually correct, a fall through is faster than a jump over but almost exclusively there are other factors that effect instruction scheduling. For what its worth a character table usually eats a sequential scanner aliver in terms of character evaluation.
Title: Re: Few jumps vs many jumps in a parser loop
Post by: dedndave on December 09, 2009, 03:59:44 AM
what isn't readily obvious or easily quantifiable is "how much work is the jump doing for you"
sometimes, a jump is a sign of clumsy code
other times, it performs a great service in terms of easing the workload for the rest of the routine
sometimes, you can replace a jump inside a loop with one outside the loop
well - you get the idea
if the jump is "doing a lot of work" it is worth the effort
testing conditionals in the right order can speed up the simplest code
Title: Re: Few jumps vs many jumps in a parser loop
Post by: sinsi on December 09, 2009, 04:10:49 AM
I was wondering if a branch hint makes any difference?
Title: Re: Few jumps vs many jumps in a parser loop
Post by: hutch-- on December 09, 2009, 06:26:29 AM
JJ,

Make a 256 member table of label offsets which you can divide into as many characters classes as you like and for the price of one table access you can seperate punctuation from either lower or upper class words, characters you don't want < 32 except for 9 10 13, numbers, arithmetic operators etc ..... any division you like.

Unwanted characters go back to the loop start, punctuation to another label, arithmetic operators to another label etc .... This type of algo will rip the titz off a sequential evaluation block and this is one place where you need the pace if the text you are parsing is large.
Title: Re: Few jumps vs many jumps in a parser loop
Post by: jj2007 on December 09, 2009, 07:31:52 AM
Quote from: sinsi on December 09, 2009, 04:10:49 AM
I was wondering if a branch hint makes any difference?

Sinsi, DednDave, Hutch, thanks. The branch hint is not helpful, newer processors do it automatically, the character table is feasible but kind of an overkill if you have only a few branches. I might give it a try, though.
The point is rather that the code with branches runs at 5 cycles per character, independently of whether most branches are taken or not taken. The Opcodes.chm rule that a taken branch needs 3 cycles, a not taken one needs one cycle is thus obsolete.
It might be different for older processors, though - test yourself, code attached.
Title: Re: Few jumps vs many jumps in a parser loop
Post by: sinsi on December 09, 2009, 07:43:58 AM
There are 2 instructions to help with conditional jumps
2EH—Branch not taken (used only with Jcc instructions)
3EH—Branch taken (used only with Jcc instructions)

I think they are the instructions for segment overrides (CS: and DS:) so older processors ignore it.
Title: Re: Few jumps vs many jumps in a parser loop
Post by: jj2007 on December 09, 2009, 07:54:19 AM
So do newer processors, but I can't find the Intel or Agner Fog page that specifies why ::)
We had that instruction in this thread (http://www.masm32.com/board/index.php?topic=9370.msg84835#msg84835) already.
Title: Re: Few jumps vs many jumps in a parser loop
Post by: sinsi on December 09, 2009, 08:14:14 AM
>We had that instruction in this thread already.
Can you recall the reply? 8 pages is too much...
Title: Re: Few jumps vs many jumps in a parser loop
Post by: jj2007 on December 09, 2009, 09:20:37 AM
Quote from: sinsi on December 09, 2009, 08:14:14 AM
>We had that instruction in this thread already.
Can you recall the reply? 8 pages is too much...

There was no reply, but I found the reference: Agner Fog (http://www.agner.org/optimize/microarchitecture.pdf). Search the pdf for branch hint to see several quotes of the "Branch hint prefixes have no useful effect" type.
Title: Re: Few jumps vs many jumps in a parser loop
Post by: dedndave on December 09, 2009, 11:39:55 AM
i think the 3 cycles for "branch taken" is from the 486 days - which went by rather fast - lol
actually, i remember thinking how nice that processor was   :bg
well - compared to the 286, it was a super-star
Title: Re: Few jumps vs many jumps in a parser loop
Post by: hutch-- on December 09, 2009, 12:25:39 PM
 :bg

I did too but the 33 meg i486 chip cost me $1500 AU, the board cost me about the same and the 300 meg ESDI full height cost me $2200 AU and then I had to buy a controller card for it. I still own the lot but the controller card dies shortly after I retired the box and the next time I tried to get the disk contents it would not work.

The full height NEC disk would make a good doorstopper though.  :bdg
Title: Re: Few jumps vs many jumps in a parser loop
Post by: dedndave on December 10, 2009, 01:00:18 AM
lol
i have an old (XT old) seagate 40 mb full height drive
it is a good drive, really - lol
but the heads have a self-parking solenoid (one of the first)
when that thing kicks in or out, you want to open it to see if your clothes are dry - lol
Title: Re: Few jumps vs many jumps in a parser loop
Post by: jj2007 on December 10, 2009, 08:44:22 AM
Quote from: hutch-- on December 09, 2009, 06:26:29 AM
JJ,

Make a 256 member table of label offsets ...This type of algo will rip the titz off a sequential evaluation block and this is one place where you need the pace if the text you are parsing is large.

Hutch,
Yes it rips the titz off, twice as fast as the sequential ones, but only on my Celeron M... on a P4, it stinks. Unless you have a better design...? Code is attached.

3804    cycles  few jumps, 66 bytes, len=500
10640   cycles char table, 135 bytes, len=500
3233    cycles many jumps, 66 bytes, len=500
2921    cycles many + xchg al, ah, 78 bytes, len=500

Quote      .data?

PARSETABLE STRUCT
   TheChar   BYTE ?
   TheLabel   DWORD ?
PARSETABLE ENDS
...
   ChTable   dd 256 dup(?)   ; CHARTABLE parser
   .data
   pt PARSETABLE <0, plCTE1>,   ; if nullbyte found, jump out of the loop
   <"e", ChkE>, <"E", ChkE>,   ; exponent, lowercase or uppercase
   <".", ChkDot>,
   <"h", ChkH>, <"H", ChkH>,   ; hex
   <"b", ChkB>, <"B", ChkB>,   ; binary
   <"y", ChkY>, <"Y", ChkY>,   ; binary
   <"(", ChkB1>, <")", ChkB2>   ; brackets
   ptEnd db -128      ; flag end of table
      .code
      push edi
      push esi
      mov esi, offset Src1
      mov edi, offset ChTable
      cmp dword ptr [edi], 0
      jne plCT0   ; chartable already created, so dive directly into the loop
      mov ecx, 256   ; this part will be run only once, so it can be slow
      push edi
      mov eax, plCT   ; set defaults
      rep stosd
      pop edi
      push esi
      mov esi, offset pt
      .Repeat
         lodsb
         .Break .if al==-128
         movzx edx, al
         lodsd
         mov [edi+4*edx], eax
      .Until 0
      pop esi
      jmp plCT0

      ChkE:
      jmp plCT
      ChkDot:
      jmp plCT
      ChkH:
      jmp plCT
      ChkB:
      jmp plCT
      ChkY:
      jmp plCT
      ChkB1:
      jmp plCT
      ChkB2:
   plCT0:
      xor eax, eax
   plCT:
      movzx eax, byte ptr [esi]
      inc esi
      jmp dword ptr [edi+4*eax]   ; jump to the address obtained from the table

      plCTE1:
Title: Re: Few jumps vs many jumps in a parser loop
Post by: hutch-- on December 10, 2009, 09:20:48 AM
JJ,

A table like this. 0 is one class, the numbers are class 1, upper case is class 2 and lower case is class 3.


  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0
  db 0,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2
  db 2,2,2,2,2,2,2,2,2,2,2,0,0,0,0,0
  db 0,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3
  db 3,3,3,3,3,3,3,3,3,3,3,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Then make another data table which has the label offsets as members.


  lbltbl dd lbl0,lbl1,lbl2,lbl2


In the code,


  lbl0:
    ; code
  lbl1:
    ; code
  lbl2:
    ; code
  lbl3:
    ; code


2 table accesses to get to the right label after evaluating each character.


Title: Re: Few jumps vs many jumps in a parser loop
Post by: jj2007 on December 10, 2009, 10:26:23 AM
Hutch,
That's exactly what my code does. It's very fast on a Celeron but not on a P4, unfortunately. Here is the innermost loop.

movzx eax, byte ptr [esi]
inc esi
jmp dword ptr [edi+4*eax] ; jump to the address obtained from the table


Can I have some timings for other processors, please?
Thanx, jj
Title: Re: Few jumps vs many jumps in a parser loop
Post by: dedndave on December 10, 2009, 11:06:02 AM
why not "pre-multiply" the byte values in the table by 4 ? (unless, of course, they are over 63 - lol)
alternatively...

        movzx   eax,byte ptr [esi]
        shl     eax,2
        inc     esi
        jmp dword ptr [edi+eax]

the [edi+4*eax] kills you on the P4
Title: Re: Few jumps vs many jumps in a parser loop
Post by: jj2007 on December 10, 2009, 11:33:12 AM
Quote from: dedndave on December 10, 2009, 11:06:02 AM
why not "pre-multiply" the byte values in the table by 4 ? (unless, of course, they are over 63 - lol)
alternatively...

Can you post a working example? The shl eax, 2 solution is slower, unfortunately...
Title: Re: Few jumps vs many jumps in a parser loop
Post by: dedndave on December 10, 2009, 11:37:47 AM

  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 4,4,4,4,4,4,4,4,4,4,0,0,0,0,0,0
  db 0,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8
  db 8,8,8,8,8,8,8,8,8,8,8,0,0,0,0,0
  db 0,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12
  db 12,12,12,12,12,12,12,12,12,12,12,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
.
.
.
        movzx   eax,byte ptr [esi]
        inc     esi
        jmp dword ptr [edi+eax]
Title: Re: Few jumps vs many jumps in a parser loop
Post by: dedndave on December 10, 2009, 11:45:56 AM
another alternative is....

        jmp     lbltbl[eax]

no need to load the vector table address into edi
Title: Re: Few jumps vs many jumps in a parser loop
Post by: z941998 on December 10, 2009, 12:18:59 PM
I was looking at the source code and noticed that there are two back-to-back cmps of the al register with ")" as the value everywhere, but the switch routine has a "(" and a ")".

I believe the all the other rountines should match the switch routine logic, because it doesn't make sense to have back-to-back compares with the same values, waste of space and never gets executed?
Title: Re: Few jumps vs many jumps in a parser loop
Post by: hutch-- on December 10, 2009, 12:20:43 PM
JJ,

I just downloaded your test piece but I don't know if I will live long enough to decypher it. A passable description of wqhat you are parsing would make it a lot easier to work on.

RE: Complex addressing mode on a PIV, I have never had a problem using it and while the suggestion Dave made can save you a base address register [table+index_reg*4] it tends to be slightly slower than using a register as the based address. A shift is far slower.

Here are the timings on my quad.

2503    cycles  few jumps, 66 bytes, len=500
1637    cycles char table, 135 bytes, len=500
2421    cycles many jumps, 66 bytes, len=500
1796    cycles many + xchg al, ah, 78 bytes, len=500
2355    cycles elseif jumps, 78 bytes, len=500
2890    cycles Switch, 79 bytes, len=500
80      cycles  few jumps, 66 bytes, len=10
42      cycles many jumps, 66 bytes, len=10

--- ok ---
Title: Re: Few jumps vs many jumps in a parser loop
Post by: jj2007 on December 10, 2009, 01:51:59 PM
Quote from: dedndave on December 10, 2009, 11:37:47 AM

        movzx   eax,byte ptr [esi]
        inc     esi
        jmp dword ptr [edi+eax]

My belly says this will crash.

Quote from: dedndave on December 10, 2009, 11:45:56 AM
another alternative is....

        jmp     lbltbl[eax]

no need to load the vector table address into edi
Yes that works but it yields exactly the same cycles on Celeron (and is 4 bytes longer).
Title: Re: Few jumps vs many jumps in a parser loop
Post by: jj2007 on December 10, 2009, 01:56:37 PM
Quote from: hutch-- on December 10, 2009, 12:20:43 PM
JJ,

I just downloaded your test piece but I don't know if I will live long enough to decypher it. A passable description of wqhat you are parsing would make it a lot easier to work on.

Will work on it. Sorry that I challenged you - the string to be parsed is in the .data section, and tests possible combinations of text that could be interpreted as decimal, hex or binary values.

Quote
Here are the timings on my quad.
What happened to your trusty P4...?
Title: Re: Few jumps vs many jumps in a parser loop
Post by: hutch-- on December 10, 2009, 02:29:26 PM
Your wish is my command.

Timing on my 3.8 gig PIV.


3323    cycles  few jumps, 66 bytes, len=500
10468   cycles char table, 135 bytes, len=500
3050    cycles many jumps, 66 bytes, len=500
2830    cycles many + xchg al, ah, 78 bytes, len=500
3156    cycles elseif jumps, 78 bytes, len=500
2801    cycles Switch, 79 bytes, len=500
159     cycles  few jumps, 66 bytes, len=10
102     cycles many jumps, 66 bytes, len=10

--- ok ---
Title: Re: Few jumps vs many jumps in a parser loop
Post by: hutch-- on December 10, 2009, 02:35:44 PM
JJ,

Depending on your chosen notation, there are a number of ways to test what a number text type is, C uses 0xBADC0DE while basic uses &h or &B while MASM uses a leading number and a trailing "h" for hex and only a trailing "b" for binary and has to track the size by the string length. Is there a particular format you have in mind ?
Title: Re: Few jumps vs many jumps in a parser loop
Post by: dedndave on December 10, 2009, 03:02:52 PM
QuoteMy belly says this will crash.

Jochen - look at the table values
i think your belly is tender - lol - don't make me come over there and rub it
i ain't rubbin' nuthin
Title: Re: Few jumps vs many jumps in a parser loop
Post by: jj2007 on December 10, 2009, 05:17:29 PM
Quote from: hutch-- on December 10, 2009, 02:35:44 PMIs there a particular format you have in mind ?
Njet. GfaBasic eats almost everything (and correctly), and so does MasmBasic. The current Val() is just a bit slow and a bit oversized, that's why I am testing the parser option.

@Dave:
        movzx   eax,byte ptr [esi]
        inc     esi
        jmp dword ptr [edi+eax]

Addresses are 4 bytes long. If eax contains e.g. "1" aka 49, you are jumping to edi+49. What was your favourite exception? c0000005?
:bg
Title: Re: Few jumps vs many jumps in a parser loop
Post by: dedndave on December 10, 2009, 06:06:19 PM
Jochen - look at this post, then the one of yours right after it.....

http://www.masm32.com/board/index.php?topic=12871.msg99585#msg99585

Hutch showed you an array of byte values in steps of 1
you seemed perfectly happy with that - lol - in fact, you said that is what your code does

then - look at mine again - they are byte values in steps of 4
that simply saves you from having to use [edi+4*eax]

you should try my stuff out before you dismiss it   :bg
Title: Re: Few jumps vs many jumps in a parser loop
Post by: jj2007 on December 10, 2009, 06:44:16 PM
Quote from: z941998 on December 10, 2009, 12:18:59 PM
I was looking at the source code and noticed that there are two back-to-back cmps of the al register with ")" as the value everywhere, but the switch routine has a "(" and a ")".
Yes, but it's for testing the speed only. The final version would have different tests and actions. Thanks anyway for pointing this out.
Title: Re: Few jumps vs many jumps in a parser loop
Post by: jj2007 on December 10, 2009, 07:03:35 PM
Quote from: dedndave on December 10, 2009, 06:06:19 PM
then - look at mine again - they are byte values in steps of 4
that simply saves you from having to use [edi+4*eax]

you should try my stuff out before you dismiss it   :bg

Apologies, my friend. There seems to be a misunderstanding, though:
.data
align 16
Src1 db "1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.5678h"
db "1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.5678h"
db "1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.5678h"
db "1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.5678h"
db "1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h", 0

That is our source string (not a very valid format, but for timings it's ok).
Now the innermost loop:

mov esi, offset Src1
plCT:
movzx eax, byte ptr [esi]
inc esi
jmp dword ptr [ChTable+4*eax] ; jump to the address obtained from the table

movzx eax, byte ptr [esi] loads a value between 0 and 255. That might be "y", ASCII 121, the terminator of a Bin$.
Now to make it clearer what happens here, I'll use a modified version of the innermost loop:
movzx eax, byte ptr [esi]
inc esi
mov edx, [ChTable+4*eax]
jmp edx

If eax was "y", then edx contains the address found in [ChTable+4*121], set before with the <"y", ChkY> structure.
What my table generator does is to preload the whole table with plCT, via rep stosd, then set a handful of jmp addresses individually. Both you and Hutch have used another table construct, not with addresses but with byte size (1,2,3) values. I have not yet grasped how the innermost loop would look like, but hope you can clarify...
Title: Re: Few jumps vs many jumps in a parser loop
Post by: jj2007 on December 10, 2009, 08:15:25 PM
New code attached, somewhat clearer than before.

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
1682    cycles for parseTable, 97 bytes, #h=55
2397    cycles for parseManyAhAl, 98 bytes, #h=55
2638    cycles for parseFewJumps, 86 bytes, #h=55
2494    cycles for parseManyJumps, 86 bytes, #h=55
3031    cycles for parseElseif, 98 bytes, #h=55
3577    cycles for parseSwitch, 99 bytes, #h=55
Title: Re: Few jumps vs many jumps in a parser loop
Post by: dedndave on December 10, 2009, 10:47:16 PM
prescott...

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
10943   cycles for parseTable, 97 bytes, #h=55
3124    cycles for parseManyAhAl, 98 bytes, #h=55
3106    cycles for parseFewJumps, 86 bytes, #h=55
3405    cycles for parseManyJumps, 86 bytes, #h=55
3065    cycles for parseElseif, 98 bytes, #h=55
3162    cycles for parseSwitch, 99 bytes, #h=55

2231    cycles for parseTable, 97 bytes, #h=11
586     cycles for parseManyAhAl, 98 bytes, #h=11
796     cycles for parseFewJumps, 86 bytes, #h=11
618     cycles for parseManyJumps, 86 bytes, #h=11
670     cycles for parseElseif, 98 bytes, #h=11
692     cycles for parseSwitch, 99 bytes, #h=11

261     cycles for parseTable, 97 bytes, #h=1
71      cycles for parseManyAhAl, 98 bytes, #h=1
69      cycles for parseFewJumps, 86 bytes, #h=1
106     cycles for parseManyJumps, 86 bytes, #h=1
73      cycles for parseElseif, 98 bytes, #h=1
227     cycles for parseSwitch, 99 bytes, #h=1
Title: Re: Few jumps vs many jumps in a parser loop
Post by: hutch-- on December 10, 2009, 11:49:11 PM
JJ,

I asked you the question because the original topic was to do with parsing, not just number conversion and what I was after was the data IO format(s) that you were working on. Currently reading your source is not a lot of use because I don't know what it is supposed to be doing.

Your code has a bad bottleneck on Prescott core PIVs that shows up on both Dave's machine and the 3.8 gig PIV I have here and there are a massive number of PIVs that will still run for years so its worth tracking down why you have this problem.
Title: Re: Few jumps vs many jumps in a parser loop
Post by: jj2007 on December 11, 2009, 12:15:51 AM
Hutch,

The case here is pretty generic: Scan a string, and in rare cases take some action. I tried two variants, one that takes jumps if no action is needed, the other that does not take jumps if no action is needed. Then an experienced programmer suggested a character table:

Quote from: hutch-- on December 09, 2009, 01:01:02 AMFor what its worth a character table usually eats a sequential scanner aliver in terms of character evaluation.

Brilliant idea, so I implemented it and tested it, and voilà! on the Celeron M (a Core CPU) it eats the sequential scanner alive. Thanks a lot for pointing me to this solution - and condolencies to P4 owners :bg

Quote from: hutch-- on December 10, 2009, 11:49:11 PMthere are a massive number of PIVs that will still run for years so its worth tracking down why you have this problem.

I agree. It is indeed the innermost loop, and that one is very simple. I tested various options, including alignment, but no success.

Quote      movzx eax, byte ptr [esi]
      inc esi
      ; now jump to the address obtained from the table
      if 0
         jmp dword ptr [ChTable+4*eax]      ; slowest
      elseif 1
         mov eax, [edi+4*eax]               ; fastest
         jmp eax
      else
         jmp dword ptr [edi+4*eax]         ; fast
      endif

(it may not look like a loop, but Olly says it is a loop :toothy)
Title: Re: Few jumps vs many jumps in a parser loop
Post by: jj2007 on December 11, 2009, 12:47:23 AM
I added a "short" variant using a 2-byte per jump table. It's even faster than the 5-byte version. Note that previous timings showed a wrong code size - the space used in the .data section was not reported.

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
1635    cycles for parseTable, 158 bytes, #h=55
1527    cycles for parseTableS, 128 bytes, #h=55   <--- NEW
2335    cycles for parseManyAhAl, 98 bytes, #h=55
2627    cycles for parseFewJumps, 86 bytes, #h=55
3111    cycles for parseManyJumps, 86 bytes, #h=55
2488    cycles for parseElseif, 98 bytes, #h=55
3388    cycles for parseSwitch, 99 bytes, #h=55
Title: Re: Few jumps vs many jumps in a parser loop
Post by: dedndave on December 11, 2009, 03:39:52 AM
prescott...

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
11181   cycles for parseTable, 158 bytes, #h=55
10707   cycles for parseTableS, 128 bytes, #h=55
3002    cycles for parseManyAhAl, 98 bytes, #h=55
3228    cycles for parseFewJumps, 86 bytes, #h=55
3326    cycles for parseManyJumps, 86 bytes, #h=55
3341    cycles for parseElseif, 98 bytes, #h=55
3164    cycles for parseSwitch, 99 bytes, #h=55
Title: Re: Few jumps vs many jumps in a parser loop
Post by: sinsi on December 11, 2009, 03:54:17 AM

Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz (SSE4)
1718    cycles for parseTable, 158 bytes, #h=55
1723    cycles for parseTableS, 128 bytes, #h=55
1806    cycles for parseManyAhAl, 98 bytes, #h=55
2416    cycles for parseFewJumps, 86 bytes, #h=55
2466    cycles for parseManyJumps, 86 bytes, #h=55
2456    cycles for parseElseif, 98 bytes, #h=55
2875    cycles for parseSwitch, 99 bytes, #h=55

Title: Re: Few jumps vs many jumps in a parser loop
Post by: FORTRANS on December 11, 2009, 01:00:38 PM
Hi,

PIII

G:\WORK\TEMP>parserlo
☺☺☻♥ (SSE1)
5885    cycles for parseTable, 158 bytes, #h=55
5933    cycles for parseTableS, 128 bytes, #h=55
3778    cycles for parseManyAhAl, 98 bytes, #h=55
3740    cycles for parseFewJumps, 86 bytes, #h=55
4227    cycles for parseManyJumps, 86 bytes, #h=55
3841    cycles for parseElseif, 98 bytes, #h=55
5515    cycles for parseSwitch, 99 bytes, #h=55

1211    cycles for parseTable, 158 bytes, #h=11
1210    cycles for parseTableS, 128 bytes, #h=11
721     cycles for parseManyAhAl, 98 bytes, #h=11
778     cycles for parseFewJumps, 86 bytes, #h=11
877     cycles for parseManyJumps, 86 bytes, #h=11
801     cycles for parseElseif, 98 bytes, #h=11
1134    cycles for parseSwitch, 99 bytes, #h=11

140     cycles for parseTable, 158 bytes, #h=1
140     cycles for parseTableS, 128 bytes, #h=1
57      cycles for parseManyAhAl, 98 bytes, #h=1
77      cycles for parseFewJumps, 86 bytes, #h=1
84      cycles for parseManyJumps, 86 bytes, #h=1
70      cycles for parseElseif, 98 bytes, #h=1
119     cycles for parseSwitch, 99 bytes, #h=1

--- ok ---


Regards,

Steve
Title: Re: Few jumps vs many jumps in a parser loop
Post by: hutch-- on December 11, 2009, 01:14:20 PM
I give up, the problem is simply not described properly but the obvious is, a PIV has no problems with complex addressing mode so there is something wrong with the code.
Title: Re: Few jumps vs many jumps in a parser loop
Post by: jj2007 on December 11, 2009, 01:55:53 PM
Quote from: hutch-- on December 11, 2009, 01:14:20 PM
I give up, the problem is simply not described properly but the obvious is, a PIV has no problems with complex addressing mode so there is something wrong with the code.

It is simple enough, and the innermost loop is a three-liner. Besides, it works perfectly, and is a lot faster on more recent CPUs.

QuoteparseTableS proc uses esi edi Src:DWORD
      .data?
   ChTableS   dd 512/4 dup(?)            ; character table for parser
      .data
   ptS   db 0, plOut-plBASE            ; if nullbyte found, jump out of the loop
      db "e", plChkE-plBASE, "E", plChkE-plBASE   ; exponent, lowercase or uppercase
      db ".", plChkDot-plBASE
      db "h", plChkH-plBASE, "H", plChkH-plBASE   ; hex
      db "b", plChkB-plBASE, "B", plChkB-plBASE   ; binary
      db "y", plChkY-plBASE, "Y", plChkY-plBASE   ; binary
      db "(", plChkB1-plBASE, ")", plChkB2-plBASE   ; brackets
   ptEndS   db -128               ; flag end of table
      .code
      and NumH, 0               ; first instructions for all algos:
      and NumDots, 0            ; reset the test counters
      mov edi, offset ChTableS
      cmp dword ptr [edi], 0
      jne
plSct0
      mov ecx, 256               ; this part creates the chartable and will be run only once, so it can be slow
      push edi
      mov eax,
plSct            ; set defaults to entry of innermost loop
      rep stosd
      pop edi                     ; edi=offset ChTableS
      mov esi, offset ptS         ; parse table specific chars
      .While 1
         lodsb
         .Break .if al==-128      ; flag we are done
         movzx edx, al            ; the character requiring specific action, e.g. "h"
         lodsb
         movzx eax, al            ; the byte offset against plBASE
         add eax, plBASE         ; translate into a dword address
         mov [edi+4*edx], eax   ; store into the table in the .data? section
      .Endw
      jmp
plSct0                  ; skip action zone and point esi to Src

plBASE:   ; ----------------------- here is the action zone for some specific chars -----------------------
      plChkE:            ; e: maybe an exponent, maybe a hex char, who knows?
      jmp plSct
      plChkDot:
      inc NumDots      ; a dot: it's a float
      jmp plSct
      plChkH:
      inc NumH         ; h: end of a Hex$
      jmp plSct
      plChkB:            ; b: maybe the end of a Bin$, maybe a hex char, who knows?
      jmp plSct
      plChkY:            ; y: surely the end of a Bin$
      jmp plSct
      plChkB1:         ; (: we have brackets, e.g. (1234h)
      jmp plSct
      plChkB2:         ; ): end of a bracketed string
      nop

plSct0:   mov esi, Src      ; let's go
[/b]      ; ----------- innermost loop ----------------------------------------
plSct:   movzx eax, byte ptr [esi]         ; load the char from the source string
      inc esi
      jmp dword ptr [edi+4*eax]      ; jump to the address obtained from the table
      ; ----------- end of innermost loop ------------------------------
   plOut:
   ret
parseTableS endp
Title: Re: Few jumps vs many jumps in a parser loop
Post by: dedndave on December 11, 2009, 02:08:32 PM
i think you are missing what Hutch and I are trying to convey, Jochen
and, neither of us has posted the example code because we can't get a handle on the desired end-result
that is because you are trying to make the comparisons on a generic level
the methods we propose require knowledge of the specific application
we understand that it is a parser, of course - lol
it appears to be for evaluating ASCII decimal reals, but we can't be certain
i.e. we don't want to play the game, if we don't know what the rules are   :P
Title: Re: Few jumps vs many jumps in a parser loop
Post by: jj2007 on December 11, 2009, 02:36:41 PM
Dave (and Hutch),
The rules are pretty simple:

- Take a source string and scan it.
- Do nothing for 90% of the characters found, in this example: 01234567890abcdfABCDF, i.e. the characters that have no specific meaning.
- Jump to a specific address if you hit an interesting character, e.g. a dot, an "h", an "e", a nullbyte, tab or space etc.

That is what this code simulates. I am sure there would be other applications, too, although it is clear that I want to use it to evaluate numbers of the WM_XXX EQU 1234h or WM_ZZZ EQU 101010b etc type.

HTH,
Jochen
Title: Re: Few jumps vs many jumps in a parser loop
Post by: dedndave on December 11, 2009, 04:01:02 PM
here you go, my friend
a generic example of what Hutch and i were thinking about

EDIT - there are ways to speed this up - it is merely intended to demonstrate the concept

(http://kidshow.dcmemories.com/rrdobeeplate_226.gif)
Title: Re: Few jumps vs many jumps in a parser loop
Post by: hutch-- on December 11, 2009, 05:25:19 PM
JJ,

I just had a quick "struggle" through the technocolor code and noticed one thing, the use of the mnemonic "lodsb" which is known to be very slow on a PIV, especially without the REP prefix as you have no special case circuitry to make it faster. Shoot it and use a register which you increment and you should see a speed improvement.
Title: Re: Few jumps vs many jumps in a parser loop
Post by: jj2007 on December 11, 2009, 05:58:07 PM
@Dave: I hope this is correctly implemented. It seems to work but is not very fast. The reason might be that you have three instead of two memory accesses:

Class4:
movzx eax, byte ptr [esi]
inc esi
mov al, XlatTbl[eax] ; get the "class"
jmp VectTbl[eax]


Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
1394    cycles for parseTable, 171 bytes,       #h=50, #dots=50
1336    cycles for parseTableS, 137 bytes,      #h=50, #dots=50
2542    cycles for parseManyAhAl, 111 bytes,    #h=50, #dots=50
2783    cycles for parseFewJumps, 99 bytes,     #h=50, #dots=50
2743    cycles for parseManyJumps, 99 bytes,    #h=50, #dots=50
2742    cycles for parseElseif, 111 bytes,      #h=50, #dots=50
3248    cycles for parseSwitch, 112 bytes,      #h=50, #dots=50
3175    cycles for parseDave, 109 bytes,        #h=50, #dots=400


@Hutch:
The lodsb is in the part that builds the table. It is run once and therefore has no influence on the timings.
Title: Re: Few jumps vs many jumps in a parser loop
Post by: dedndave on December 11, 2009, 06:45:12 PM
well - i don't think it is, Jochen
first, it appears as though you rebuild the one-time table in each call to parsedave
second, the program is a demonstration of a concept
it does not do what you do to parse
in order to compare times, you have to design a class list and build the table as required
so that it performs the same task as the other routines
i am sure there are ways to make it faster
the 3 accesses are not as bad as it may seem
the one thing that might be problematic is, as Hutch has mentioned in the past:
accessing a byte just before accessing a dword (in a loop situation) on a p4 is sluggish
that may well be part of the problem with some of the other algos, also
Title: Re: Few jumps vs many jumps in a parser loop
Post by: jj2007 on December 11, 2009, 07:13:57 PM
Quote from: dedndave on December 11, 2009, 06:45:12 PM
well - i don't think it is, Jochen
first, it appears as though you rebuild the one-time table in each call to parsedave

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
1805    cycles for parseTableS, 136 bytes,      #h=50, #dots=50
3868    cycles for parseDave, 129 bytes,        #h=50, #dots=400
...
43      cycles for parseTableS, 136 bytes,      #h=1, #dots=1
769     cycles for parseDave, 129 bytes,        #h=1, #dots=8

These are timings for disabling these two lines:
; cmp dword ptr [edi], 0 ; table already built?
; jne ParseTheString

As you see, the Build is called only once, just as in the other routines.

Quotesecond, the program is a demonstration of a concept
it does not do what you do to parse
in order to compare times, you have to design a class list and build the table as required
so that it performs the same task as the other routines
It does not do exactly the same, but for the timings the difference is irrelevant since I scan the same Src

Quote
i am sure there are ways to make it faster
the 3 accesses are not as bad as it may seem
the one thing that might be problematic is, as Hutch has mentioned in the past:
accessing a byte just before accessing a dword (in a loop situation) on a p4 is sluggish
that may well be part of the problem with some of the other algos, also

Timings above are for Celeron, not P4. Anyway, if you see a way to speed up the innermost loops, please let me know.

EDIT: I managed to shorten parseTableS by 2 bytes, and speed it up by 25%. Don't ask me why ::)
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
1349    cycles for parseTableS, 134 bytes       #h=50, #dots=50
1840    cycles for parseTable, 171 bytes        #h=50, #dots=50
3166    cycles for parseDave, 134 bytes         #h=50, #dots=400
Title: Re: Few jumps vs many jumps in a parser loop
Post by: dedndave on December 11, 2009, 10:44:19 PM
well - if you really want to speed the thing up, here's what you need to do - and this will make the P4's happy
the Src string may or may not be 4-aligned
build the parser to parse bytes until the address IS 4-aligned
then, switch to a different parser that reads 4-aligned dwords from Src
inside the dword parser, use one register to hold the unparsed bytes (EDX, maybe)
shift the bytes into DL (SHR EDX,8) and parse them from there
one of the biggest improvements i made to my ling long kai fang bignum conversion
routines (LLKF9) was to access 4-aligned dwords for both input and output

EDIT - i notice you have Src 16-aligned
that isn't a real-world condition
in fact, you should test it with different alignments (mod 4 should be good enough)
if you look at my ling long kai fang code (LLKF9), in the init section, i scan down non-signifigant bytes from the top of the input value
that code is somewhat similar to what you want, only mine has 3 loops - yours only needs the first 2 loops
i tested that code with "mod 4" alignment cases
no matter how many bytes or how they are aligned, that code is ~ 3 times faster than STD, REP SCASB on a P4 prescott
Title: Re: Few jumps vs many jumps in a parser loop
Post by: jj2007 on December 12, 2009, 03:24:07 AM
Quote from: dedndave on December 11, 2009, 10:44:19 PM
EDIT - i notice you have Src 16-aligned
that isn't a real-world condition
There is absolutely no difference in timings between aligned and misaligned strings. Which is not surprising for modern CPUs in general, and a byte scanner in particular...
As to the shift alternative, I doubt that it can speed things up. Can you show me how you would modify the inner loop?

plSct0: mov esi, Src ; let's go
; ----------- innermost loop ----------------------------------------
plSct: movzx eax, byte ptr [esi] ; load the char from the source string
inc esi
jmp dword ptr [edi+4*eax] ; jump to the address obtained from the table
; ----------- end of innermost loop ------------------------------
Title: Re: Few jumps vs many jumps in a parser loop
Post by: hutch-- on December 12, 2009, 03:33:45 AM
JJ,

On a PIV,


;  change this

mov al, XlatTbl[eax]   ; get the "class"
jmp VectTbl[eax]

;    to this

movzx eax, BYTE PTR XlatTbl[eax]
; stick something in here if the algo allows it.
jmp VectTbl[eax]


This should prevent a read after write stall.

Also on a PIV ala Intel, use ADD REG/VAR, 1, not INC REG/VAR.
Title: Re: Few jumps vs many jumps in a parser loop
Post by: jj2007 on December 12, 2009, 03:54:17 AM
Hutch,
movzx eax instead of mov al rocks indeed, in Dave's algo:
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
1346    cycles for parseTableS, 134 bytes       #h=50
2470    cycles for parseDave, 135 bytes         #h=50 movzx
3156    cycles for parseDave, 134 bytes         #h=50 mov al

But it is still almost a factor 2 slower than my parseTableS.
Can you give me timings for the P4?

I had tested inc esi vs. add esi, 1 in my own algo on my PIV, no significant difference at >12000 cycles.
Title: Re: Few jumps vs many jumps in a parser loop
Post by: dedndave on December 12, 2009, 05:13:13 AM
prescott...

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
9735    cycles for parseTableS, 134 bytes       #h=50, #dots=50
10125   cycles for parseTable, 171 bytes        #h=50, #dots=50
3049    cycles for parseFewJumps, 99 bytes      #h=50, #dots=50
3806    cycles for parseManyJumps, 99 bytes     #h=50, #dots=50
12009   cycles for parseDave, 135 bytes         #h=50, #dots=400

2003    cycles for parseTableS, 134 bytes       #h=10, #dots=10
2073    cycles for parseTable, 171 bytes        #h=10, #dots=10
724     cycles for parseFewJumps, 99 bytes      #h=10, #dots=10
887     cycles for parseManyJumps, 99 bytes     #h=10, #dots=10
2543    cycles for parseDave, 135 bytes         #h=10, #dots=80

254     cycles for parseTableS, 134 bytes       #h=1, #dots=1
264     cycles for parseTable, 171 bytes        #h=1, #dots=1
98      cycles for parseFewJumps, 99 bytes      #h=1, #dots=1
73      cycles for parseManyJumps, 99 bytes     #h=1, #dots=1
300     cycles for parseDave, 135 bytes         #h=1, #dots=8

if i get some time tomorrow, i may play with an aligned loop JJ   :8)
Title: Re: Few jumps vs many jumps in a parser loop
Post by: sinsi on December 12, 2009, 07:22:52 AM
Are you wanting to scan a string and get true/false? Use a byte scanner and return when one is found.
Are you counting certain characters? Use a table (inc chars[eax]) and bsf.
Have you tried call/ret instead of jmp/jmp?
Or are you processing the source to a destination on the fly?

Playing around with aligning threw other procs out of whack, but I did get the 'S' one 100 cycles faster.
Maybe you could have one exe per proc and discard the slow ones?
Title: Re: Few jumps vs many jumps in a parser loop
Post by: jj2007 on December 12, 2009, 08:10:57 AM
Quote from: sinsi on December 12, 2009, 07:22:52 AM
Are you wanting to scan a string and get true/false?
Sinsi,
The scanner is supposed to act, i.e. jump, for specific characters.
For example, look at your post: You have used lots of lowercase characters - ordinary text, no need to act, so get the next char. But do something specific if you find ?/[].' etc, or if you encounter an uppercase character.
My specific application is to get the value of e.g.
1010101b
( 101010 y)
12345
12345h
$12345
0x12345
123.456
(123.456)
-123e99
123eh
I guess you are able to recognise, using your knowledge and logic and a set of rules, that these are all valid number formats; some of them are being used in Windows.inc, and ml.exe has no problem distinguishing them. That is what the scanner should do: Get the info for applying the rules, then, later, branch to the decimal or hex or bin$ algo.
Title: Re: Few jumps vs many jumps in a parser loop
Post by: hutch-- on December 12, 2009, 10:23:04 AM
For most of you number formats, just having a leading number makes the parser a lot easier, branch on anything between ascii "0" to "9" and the minus sign "-" and keep scanning until you find the end of the number, usually by a punctuation character. Depending on what is at the front or end you can determine the numeric format and convert it accordingly.  The ones in brackets "(" and ")" are usually parsed in the preprocessor.

The leading "$" notation is very easy to parse because you only have to test the 1st character to determine the data type but if you are using a general purpose word parser its just as easy to rad the last character as MASM does.
Title: Re: Few jumps vs many jumps in a parser loop
Post by: dedndave on December 12, 2009, 11:41:56 AM
Quote$12345

ok - i know the assembler doesn't recognize currency - lol
so - how is this evaluated ?

on the other hand - a nice feature
we could call this routine until the bank account is full   :bg

Net_Worth_Loop:
        call    GetSome
        add     MyBankAccount,eax
        jmp     Net_Worth_Loop

GetSome:
        mov     eax,$999999999
        ret

EDIT - added a little "bonus" code   :P
guess i better optimize that a little bit...
(get rich quick scheme)

        mov     eax,$999999999

Net_Worth_Loop:
        add     MyBankAccount,eax
        jmp     Net_Worth_Loop
Title: Re: Few jumps vs many jumps in a parser loop
Post by: dedndave on December 12, 2009, 12:01:50 PM
another question for you, Jochen...
is this parser required to evaluate reals, as well ?
if not - there is a practical limit to the length of strings
the worst case would be binary
does it need to parse 64-bit binary values ? - or is 32-bit sufficient ?
i am just curious as to what the max string length is

and one more question - as far as i know, spaces are not allowed in any of the formats - so they could be interpreted as EOL ?

        mov     eax,10h         ;valid comment
        mov     eax,10h          invalid comment

or do you need to evaluate the rest of the line to test validity ?
Title: Re: Few jumps vs many jumps in a parser loop
Post by: dedndave on December 12, 2009, 03:15:02 PM
i see a little bug in some of the logic, Jochen
not timing related, but you might want to mod the code a little

this code has the potential for aliasing errors

            test al, al
            je @F
            or al, 20h
            cmp al, "e"
            je HasE
            cmp al, "."
            je HasDot
            cmp al, "h"
            je HasH
            cmp al, "b"
            je HasB
            cmp al, "y"
            je HasY
            cmp al, "("
            je HasB1
            cmp al, "("
            je HasB2

changing the test order so non-alpha chars are tested prior to the "or al,20h" will eliminate the problem

            test al, al
            je @F
            cmp al, "."
            je HasDot
            cmp al, "("
            je HasB1
            cmp al, ")"          ;i think that's what you meant
            je HasB2

            or al, 20h
            cmp al, "e"
            je HasE
            cmp al, "h"
            je HasH
            cmp al, "b"
            je HasB
            cmp al, "y"
            je HasY

on a related note - and this may speed things up a little
because most of the characters in the strings are ascii numerics, test for that first
the null char only occurs once - even though "or al,al" may be faster than other compares, the occurance frequency is last on the list
don't test for it until you have to

test for the most likely-to-occur first
that way, fewer compares are made
i would start off with:

        xor     al,30h
        cmp     al,9
        jbe     Skip_It

no need to xor it back out - simply re-map the rest of the chars

also, a-f are valid hexadecimal "numbers"
you need a little bit of a state-machine to control some of the branching

also - i am still trying to find something on the "$" - anyone ?
Title: Re: Few jumps vs many jumps in a parser loop
Post by: jj2007 on December 12, 2009, 05:53:49 PM
Thanks, Dave. The logic still needs to be elaborated. Right now, I just wanted to test if a pre-scan, either sequentially or per character table, is worth the effort, speed and sizewize. The old version of Val() has almost a kByte, and that's no good...
Title: Re: Few jumps vs many jumps in a parser loop
Post by: dedndave on December 12, 2009, 07:03:51 PM
i am trying to understand the byte-count portion of the results macro....

   ifidn <algo>, <parseTable>
      add eax, ptEnd-pt+1
   elseifidn <algo>, <parseTableS>
      add eax, ptEndS-ptS+1
   elseifidn <algo>, <parseDave>
      add eax, ptEndS-ptS+1         ; we treat it as if the standard jumps had been used
      ;add eax, Build_T-Build+1
   endif


huh ?    :dazzled:
you lost me, my friend - lol
i will just leave that part out
Title: Re: Few jumps vs many jumps in a parser loop
Post by: jj2007 on December 12, 2009, 08:22:37 PM
> we treat it as if the standard jumps had been used
Dave,
The macro adds the hand-counted bytes in the .data section. For better comparison, I assumed that the same number of "specific" jumps was implemented in your algo.
Title: Re: Few jumps vs many jumps in a parser loop
Post by: dedndave on December 12, 2009, 10:42:28 PM
well, Jochen, here is my first attempt
very impressive   :P

prescott:

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
14608   cycles for PAlgn4, 150 bytes    #h=50,#dots=50

2827    cycles for PAlgn4, 150 bytes    #h=10,#dots=10

131     cycles for PAlgn4, 150 bytes    #h=1,#dots=1
Title: Re: Few jumps vs many jumps in a parser loop
Post by: jj2007 on December 12, 2009, 11:06:02 PM
Quote from: dedndave on December 12, 2009, 10:42:28 PM
well, Jochen, here is my first attempt

Thanks, Dave. Interesting approach, I hope you can tune it a little bit...
K2Z, J

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
7189    cycles for PAlgn4, 150 bytes    #h=50,#dots=50
1394    cycles for PAlgn4, 150 bytes    #h=10,#dots=10
84      cycles for PAlgn4, 150 bytes    #h=1,#dots=1
Title: Re: Few jumps vs many jumps in a parser loop
Post by: dedndave on December 12, 2009, 11:41:43 PM
lol - i don't think "tuning it" is going to be adequate
it needs a major overhaul   :bg
but, i may see something
i made a couple small changes in ecx and got a few strokes out of it
but, i have to keep an open mind to a different concept
Title: Re: Few jumps vs many jumps in a parser loop
Post by: rags on December 13, 2009, 02:00:45 AM
Quote from: dedndave on December 12, 2009, 03:15:02 PM
this code has the potential for aliasing errors
Dave, what is an  aliasing error?
Title: Re: Few jumps vs many jumps in a parser loop
Post by: dedndave on December 13, 2009, 02:08:39 AM
well - that is, perhaps, a mis-use of the terminology - lol
but, in this case, Jochen uses the OR AL,20h instruction to force upper case alpha characters to lower case
that way, he does not have to test for "A" and "a" - they both become "a"
but, half of the rest of the characters are affected, as well
for example, carriage return (0Dh) becomes hypen or minus sign (2Dh)
the OR instruction destoyed the unique identification of the 0Dh
you can no longer tell which was loaded
when you parse characters using OR 20h (or AND 0DFh to force upper case on lower case alpha characters),
it is best to test all the non-alpha characters first

notice that if you use XOR AL,30h (or SUB AL,30h) to test for numberic characters,
there are still 256 different values - they are just different - they do not lose their unique identity

the term "aliasing error" is normally used in information theory or cryptography when there is a loss of information