News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Few jumps vs many jumps in a parser loop

Started by jj2007, December 08, 2009, 04:24:05 PM

Previous topic - Next topic

dedndave

QuoteMy belly says this will crash.

Jochen - look at the table values
i think your belly is tender - lol - don't make me come over there and rub it
i ain't rubbin' nuthin

jj2007

Quote from: hutch-- on December 10, 2009, 02:35:44 PMIs there a particular format you have in mind ?
Njet. GfaBasic eats almost everything (and correctly), and so does MasmBasic. The current Val() is just a bit slow and a bit oversized, that's why I am testing the parser option.

@Dave:
        movzx   eax,byte ptr [esi]
        inc     esi
        jmp dword ptr [edi+eax]

Addresses are 4 bytes long. If eax contains e.g. "1" aka 49, you are jumping to edi+49. What was your favourite exception? c0000005?
:bg

dedndave

Jochen - look at this post, then the one of yours right after it.....

http://www.masm32.com/board/index.php?topic=12871.msg99585#msg99585

Hutch showed you an array of byte values in steps of 1
you seemed perfectly happy with that - lol - in fact, you said that is what your code does

then - look at mine again - they are byte values in steps of 4
that simply saves you from having to use [edi+4*eax]

you should try my stuff out before you dismiss it   :bg

jj2007

Quote from: z941998 on December 10, 2009, 12:18:59 PM
I was looking at the source code and noticed that there are two back-to-back cmps of the al register with ")" as the value everywhere, but the switch routine has a "(" and a ")".
Yes, but it's for testing the speed only. The final version would have different tests and actions. Thanks anyway for pointing this out.

jj2007

Quote from: dedndave on December 10, 2009, 06:06:19 PM
then - look at mine again - they are byte values in steps of 4
that simply saves you from having to use [edi+4*eax]

you should try my stuff out before you dismiss it   :bg

Apologies, my friend. There seems to be a misunderstanding, though:
.data
align 16
Src1 db "1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.5678h"
db "1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.5678h"
db "1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.5678h"
db "1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.5678h"
db "1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h1234.567h", 0

That is our source string (not a very valid format, but for timings it's ok).
Now the innermost loop:

mov esi, offset Src1
plCT:
movzx eax, byte ptr [esi]
inc esi
jmp dword ptr [ChTable+4*eax] ; jump to the address obtained from the table

movzx eax, byte ptr [esi] loads a value between 0 and 255. That might be "y", ASCII 121, the terminator of a Bin$.
Now to make it clearer what happens here, I'll use a modified version of the innermost loop:
movzx eax, byte ptr [esi]
inc esi
mov edx, [ChTable+4*eax]
jmp edx

If eax was "y", then edx contains the address found in [ChTable+4*121], set before with the <"y", ChkY> structure.
What my table generator does is to preload the whole table with plCT, via rep stosd, then set a handful of jmp addresses individually. Both you and Hutch have used another table construct, not with addresses but with byte size (1,2,3) values. I have not yet grasped how the innermost loop would look like, but hope you can clarify...

jj2007

New code attached, somewhat clearer than before.

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
1682    cycles for parseTable, 97 bytes, #h=55
2397    cycles for parseManyAhAl, 98 bytes, #h=55
2638    cycles for parseFewJumps, 86 bytes, #h=55
2494    cycles for parseManyJumps, 86 bytes, #h=55
3031    cycles for parseElseif, 98 bytes, #h=55
3577    cycles for parseSwitch, 99 bytes, #h=55

dedndave

prescott...

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
10943   cycles for parseTable, 97 bytes, #h=55
3124    cycles for parseManyAhAl, 98 bytes, #h=55
3106    cycles for parseFewJumps, 86 bytes, #h=55
3405    cycles for parseManyJumps, 86 bytes, #h=55
3065    cycles for parseElseif, 98 bytes, #h=55
3162    cycles for parseSwitch, 99 bytes, #h=55

2231    cycles for parseTable, 97 bytes, #h=11
586     cycles for parseManyAhAl, 98 bytes, #h=11
796     cycles for parseFewJumps, 86 bytes, #h=11
618     cycles for parseManyJumps, 86 bytes, #h=11
670     cycles for parseElseif, 98 bytes, #h=11
692     cycles for parseSwitch, 99 bytes, #h=11

261     cycles for parseTable, 97 bytes, #h=1
71      cycles for parseManyAhAl, 98 bytes, #h=1
69      cycles for parseFewJumps, 86 bytes, #h=1
106     cycles for parseManyJumps, 86 bytes, #h=1
73      cycles for parseElseif, 98 bytes, #h=1
227     cycles for parseSwitch, 99 bytes, #h=1

hutch--

JJ,

I asked you the question because the original topic was to do with parsing, not just number conversion and what I was after was the data IO format(s) that you were working on. Currently reading your source is not a lot of use because I don't know what it is supposed to be doing.

Your code has a bad bottleneck on Prescott core PIVs that shows up on both Dave's machine and the 3.8 gig PIV I have here and there are a massive number of PIVs that will still run for years so its worth tracking down why you have this problem.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

Hutch,

The case here is pretty generic: Scan a string, and in rare cases take some action. I tried two variants, one that takes jumps if no action is needed, the other that does not take jumps if no action is needed. Then an experienced programmer suggested a character table:

Quote from: hutch-- on December 09, 2009, 01:01:02 AMFor what its worth a character table usually eats a sequential scanner aliver in terms of character evaluation.

Brilliant idea, so I implemented it and tested it, and voilà! on the Celeron M (a Core CPU) it eats the sequential scanner alive. Thanks a lot for pointing me to this solution - and condolencies to P4 owners :bg

Quote from: hutch-- on December 10, 2009, 11:49:11 PMthere are a massive number of PIVs that will still run for years so its worth tracking down why you have this problem.

I agree. It is indeed the innermost loop, and that one is very simple. I tested various options, including alignment, but no success.

Quote      movzx eax, byte ptr [esi]
      inc esi
      ; now jump to the address obtained from the table
      if 0
         jmp dword ptr [ChTable+4*eax]      ; slowest
      elseif 1
         mov eax, [edi+4*eax]               ; fastest
         jmp eax
      else
         jmp dword ptr [edi+4*eax]         ; fast
      endif

(it may not look like a loop, but Olly says it is a loop :toothy)

jj2007

I added a "short" variant using a 2-byte per jump table. It's even faster than the 5-byte version. Note that previous timings showed a wrong code size - the space used in the .data section was not reported.

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
1635    cycles for parseTable, 158 bytes, #h=55
1527    cycles for parseTableS, 128 bytes, #h=55   <--- NEW
2335    cycles for parseManyAhAl, 98 bytes, #h=55
2627    cycles for parseFewJumps, 86 bytes, #h=55
3111    cycles for parseManyJumps, 86 bytes, #h=55
2488    cycles for parseElseif, 98 bytes, #h=55
3388    cycles for parseSwitch, 99 bytes, #h=55

dedndave

prescott...

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
11181   cycles for parseTable, 158 bytes, #h=55
10707   cycles for parseTableS, 128 bytes, #h=55
3002    cycles for parseManyAhAl, 98 bytes, #h=55
3228    cycles for parseFewJumps, 86 bytes, #h=55
3326    cycles for parseManyJumps, 86 bytes, #h=55
3341    cycles for parseElseif, 98 bytes, #h=55
3164    cycles for parseSwitch, 99 bytes, #h=55

sinsi


Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz (SSE4)
1718    cycles for parseTable, 158 bytes, #h=55
1723    cycles for parseTableS, 128 bytes, #h=55
1806    cycles for parseManyAhAl, 98 bytes, #h=55
2416    cycles for parseFewJumps, 86 bytes, #h=55
2466    cycles for parseManyJumps, 86 bytes, #h=55
2456    cycles for parseElseif, 98 bytes, #h=55
2875    cycles for parseSwitch, 99 bytes, #h=55

Light travels faster than sound, that's why some people seem bright until you hear them.

FORTRANS

Hi,

PIII

G:\WORK\TEMP>parserlo
☺☺☻♥ (SSE1)
5885    cycles for parseTable, 158 bytes, #h=55
5933    cycles for parseTableS, 128 bytes, #h=55
3778    cycles for parseManyAhAl, 98 bytes, #h=55
3740    cycles for parseFewJumps, 86 bytes, #h=55
4227    cycles for parseManyJumps, 86 bytes, #h=55
3841    cycles for parseElseif, 98 bytes, #h=55
5515    cycles for parseSwitch, 99 bytes, #h=55

1211    cycles for parseTable, 158 bytes, #h=11
1210    cycles for parseTableS, 128 bytes, #h=11
721     cycles for parseManyAhAl, 98 bytes, #h=11
778     cycles for parseFewJumps, 86 bytes, #h=11
877     cycles for parseManyJumps, 86 bytes, #h=11
801     cycles for parseElseif, 98 bytes, #h=11
1134    cycles for parseSwitch, 99 bytes, #h=11

140     cycles for parseTable, 158 bytes, #h=1
140     cycles for parseTableS, 128 bytes, #h=1
57      cycles for parseManyAhAl, 98 bytes, #h=1
77      cycles for parseFewJumps, 86 bytes, #h=1
84      cycles for parseManyJumps, 86 bytes, #h=1
70      cycles for parseElseif, 98 bytes, #h=1
119     cycles for parseSwitch, 99 bytes, #h=1

--- ok ---


Regards,

Steve

hutch--

I give up, the problem is simply not described properly but the obvious is, a PIV has no problems with complex addressing mode so there is something wrong with the code.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

Quote from: hutch-- on December 11, 2009, 01:14:20 PM
I give up, the problem is simply not described properly but the obvious is, a PIV has no problems with complex addressing mode so there is something wrong with the code.

It is simple enough, and the innermost loop is a three-liner. Besides, it works perfectly, and is a lot faster on more recent CPUs.

QuoteparseTableS proc uses esi edi Src:DWORD
      .data?
   ChTableS   dd 512/4 dup(?)            ; character table for parser
      .data
   ptS   db 0, plOut-plBASE            ; if nullbyte found, jump out of the loop
      db "e", plChkE-plBASE, "E", plChkE-plBASE   ; exponent, lowercase or uppercase
      db ".", plChkDot-plBASE
      db "h", plChkH-plBASE, "H", plChkH-plBASE   ; hex
      db "b", plChkB-plBASE, "B", plChkB-plBASE   ; binary
      db "y", plChkY-plBASE, "Y", plChkY-plBASE   ; binary
      db "(", plChkB1-plBASE, ")", plChkB2-plBASE   ; brackets
   ptEndS   db -128               ; flag end of table
      .code
      and NumH, 0               ; first instructions for all algos:
      and NumDots, 0            ; reset the test counters
      mov edi, offset ChTableS
      cmp dword ptr [edi], 0
      jne
plSct0
      mov ecx, 256               ; this part creates the chartable and will be run only once, so it can be slow
      push edi
      mov eax,
plSct            ; set defaults to entry of innermost loop
      rep stosd
      pop edi                     ; edi=offset ChTableS
      mov esi, offset ptS         ; parse table specific chars
      .While 1
         lodsb
         .Break .if al==-128      ; flag we are done
         movzx edx, al            ; the character requiring specific action, e.g. "h"
         lodsb
         movzx eax, al            ; the byte offset against plBASE
         add eax, plBASE         ; translate into a dword address
         mov [edi+4*edx], eax   ; store into the table in the .data? section
      .Endw
      jmp
plSct0                  ; skip action zone and point esi to Src

plBASE:   ; ----------------------- here is the action zone for some specific chars -----------------------
      plChkE:            ; e: maybe an exponent, maybe a hex char, who knows?
      jmp plSct
      plChkDot:
      inc NumDots      ; a dot: it's a float
      jmp plSct
      plChkH:
      inc NumH         ; h: end of a Hex$
      jmp plSct
      plChkB:            ; b: maybe the end of a Bin$, maybe a hex char, who knows?
      jmp plSct
      plChkY:            ; y: surely the end of a Bin$
      jmp plSct
      plChkB1:         ; (: we have brackets, e.g. (1234h)
      jmp plSct
      plChkB2:         ; ): end of a bracketed string
      nop

plSct0:   mov esi, Src      ; let's go
[/b]      ; ----------- innermost loop ----------------------------------------
plSct:   movzx eax, byte ptr [esi]         ; load the char from the source string
      inc esi
      jmp dword ptr [edi+4*eax]      ; jump to the address obtained from the table
      ; ----------- end of innermost loop ------------------------------
   plOut:
   ret
parseTableS endp