News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Few jumps vs many jumps in a parser loop

Started by jj2007, December 08, 2009, 04:24:05 PM

Previous topic - Next topic

hutch--

 :bg

I did too but the 33 meg i486 chip cost me $1500 AU, the board cost me about the same and the 300 meg ESDI full height cost me $2200 AU and then I had to buy a controller card for it. I still own the lot but the controller card dies shortly after I retired the box and the next time I tried to get the disk contents it would not work.

The full height NEC disk would make a good doorstopper though.  :bdg
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave

lol
i have an old (XT old) seagate 40 mb full height drive
it is a good drive, really - lol
but the heads have a self-parking solenoid (one of the first)
when that thing kicks in or out, you want to open it to see if your clothes are dry - lol

jj2007

Quote from: hutch-- on December 09, 2009, 06:26:29 AM
JJ,

Make a 256 member table of label offsets ...This type of algo will rip the titz off a sequential evaluation block and this is one place where you need the pace if the text you are parsing is large.

Hutch,
Yes it rips the titz off, twice as fast as the sequential ones, but only on my Celeron M... on a P4, it stinks. Unless you have a better design...? Code is attached.

3804    cycles  few jumps, 66 bytes, len=500
10640   cycles char table, 135 bytes, len=500
3233    cycles many jumps, 66 bytes, len=500
2921    cycles many + xchg al, ah, 78 bytes, len=500

Quote      .data?

PARSETABLE STRUCT
   TheChar   BYTE ?
   TheLabel   DWORD ?
PARSETABLE ENDS
...
   ChTable   dd 256 dup(?)   ; CHARTABLE parser
   .data
   pt PARSETABLE <0, plCTE1>,   ; if nullbyte found, jump out of the loop
   <"e", ChkE>, <"E", ChkE>,   ; exponent, lowercase or uppercase
   <".", ChkDot>,
   <"h", ChkH>, <"H", ChkH>,   ; hex
   <"b", ChkB>, <"B", ChkB>,   ; binary
   <"y", ChkY>, <"Y", ChkY>,   ; binary
   <"(", ChkB1>, <")", ChkB2>   ; brackets
   ptEnd db -128      ; flag end of table
      .code
      push edi
      push esi
      mov esi, offset Src1
      mov edi, offset ChTable
      cmp dword ptr [edi], 0
      jne plCT0   ; chartable already created, so dive directly into the loop
      mov ecx, 256   ; this part will be run only once, so it can be slow
      push edi
      mov eax, plCT   ; set defaults
      rep stosd
      pop edi
      push esi
      mov esi, offset pt
      .Repeat
         lodsb
         .Break .if al==-128
         movzx edx, al
         lodsd
         mov [edi+4*edx], eax
      .Until 0
      pop esi
      jmp plCT0

      ChkE:
      jmp plCT
      ChkDot:
      jmp plCT
      ChkH:
      jmp plCT
      ChkB:
      jmp plCT
      ChkY:
      jmp plCT
      ChkB1:
      jmp plCT
      ChkB2:
   plCT0:
      xor eax, eax
   plCT:
      movzx eax, byte ptr [esi]
      inc esi
      jmp dword ptr [edi+4*eax]   ; jump to the address obtained from the table

      plCTE1:

hutch--

JJ,

A table like this. 0 is one class, the numbers are class 1, upper case is class 2 and lower case is class 3.


  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0
  db 0,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2
  db 2,2,2,2,2,2,2,2,2,2,2,0,0,0,0,0
  db 0,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3
  db 3,3,3,3,3,3,3,3,3,3,3,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Then make another data table which has the label offsets as members.


  lbltbl dd lbl0,lbl1,lbl2,lbl2


In the code,


  lbl0:
    ; code
  lbl1:
    ; code
  lbl2:
    ; code
  lbl3:
    ; code


2 table accesses to get to the right label after evaluating each character.


Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

Hutch,
That's exactly what my code does. It's very fast on a Celeron but not on a P4, unfortunately. Here is the innermost loop.

movzx eax, byte ptr [esi]
inc esi
jmp dword ptr [edi+4*eax] ; jump to the address obtained from the table


Can I have some timings for other processors, please?
Thanx, jj

dedndave

why not "pre-multiply" the byte values in the table by 4 ? (unless, of course, they are over 63 - lol)
alternatively...

        movzx   eax,byte ptr [esi]
        shl     eax,2
        inc     esi
        jmp dword ptr [edi+eax]

the [edi+4*eax] kills you on the P4

jj2007

Quote from: dedndave on December 10, 2009, 11:06:02 AM
why not "pre-multiply" the byte values in the table by 4 ? (unless, of course, they are over 63 - lol)
alternatively...

Can you post a working example? The shl eax, 2 solution is slower, unfortunately...

dedndave


  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 4,4,4,4,4,4,4,4,4,4,0,0,0,0,0,0
  db 0,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8
  db 8,8,8,8,8,8,8,8,8,8,8,0,0,0,0,0
  db 0,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12
  db 12,12,12,12,12,12,12,12,12,12,12,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
.
.
.
        movzx   eax,byte ptr [esi]
        inc     esi
        jmp dword ptr [edi+eax]

dedndave

another alternative is....

        jmp     lbltbl[eax]

no need to load the vector table address into edi

z941998

I was looking at the source code and noticed that there are two back-to-back cmps of the al register with ")" as the value everywhere, but the switch routine has a "(" and a ")".

I believe the all the other rountines should match the switch routine logic, because it doesn't make sense to have back-to-back compares with the same values, waste of space and never gets executed?

hutch--

JJ,

I just downloaded your test piece but I don't know if I will live long enough to decypher it. A passable description of wqhat you are parsing would make it a lot easier to work on.

RE: Complex addressing mode on a PIV, I have never had a problem using it and while the suggestion Dave made can save you a base address register [table+index_reg*4] it tends to be slightly slower than using a register as the based address. A shift is far slower.

Here are the timings on my quad.

2503    cycles  few jumps, 66 bytes, len=500
1637    cycles char table, 135 bytes, len=500
2421    cycles many jumps, 66 bytes, len=500
1796    cycles many + xchg al, ah, 78 bytes, len=500
2355    cycles elseif jumps, 78 bytes, len=500
2890    cycles Switch, 79 bytes, len=500
80      cycles  few jumps, 66 bytes, len=10
42      cycles many jumps, 66 bytes, len=10

--- ok ---
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

Quote from: dedndave on December 10, 2009, 11:37:47 AM

        movzx   eax,byte ptr [esi]
        inc     esi
        jmp dword ptr [edi+eax]

My belly says this will crash.

Quote from: dedndave on December 10, 2009, 11:45:56 AM
another alternative is....

        jmp     lbltbl[eax]

no need to load the vector table address into edi
Yes that works but it yields exactly the same cycles on Celeron (and is 4 bytes longer).

jj2007

Quote from: hutch-- on December 10, 2009, 12:20:43 PM
JJ,

I just downloaded your test piece but I don't know if I will live long enough to decypher it. A passable description of wqhat you are parsing would make it a lot easier to work on.

Will work on it. Sorry that I challenged you - the string to be parsed is in the .data section, and tests possible combinations of text that could be interpreted as decimal, hex or binary values.

Quote
Here are the timings on my quad.
What happened to your trusty P4...?

hutch--

Your wish is my command.

Timing on my 3.8 gig PIV.


3323    cycles  few jumps, 66 bytes, len=500
10468   cycles char table, 135 bytes, len=500
3050    cycles many jumps, 66 bytes, len=500
2830    cycles many + xchg al, ah, 78 bytes, len=500
3156    cycles elseif jumps, 78 bytes, len=500
2801    cycles Switch, 79 bytes, len=500
159     cycles  few jumps, 66 bytes, len=10
102     cycles many jumps, 66 bytes, len=10

--- ok ---
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

hutch--

JJ,

Depending on your chosen notation, there are a number of ways to test what a number text type is, C uses 0xBADC0DE while basic uses &h or &B while MASM uses a leading number and a trailing "h" for hex and only a trailing "b" for binary and has to track the size by the string length. Is there a particular format you have in mind ?
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php