News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Few jumps vs many jumps in a parser loop

Started by jj2007, December 08, 2009, 04:24:05 PM

Previous topic - Next topic

jj2007

I was testing the hypothesis that jumping is slower than not jumping - see snippet below. Code size is identical, timings are surprisingly similar, but what struck my eye was that the longer string took almost 4 times as long as expected. Any explanation...?
Celeron M:
587     cycles  few jumps, 66 bytes, len=40
616     cycles many jumps, 66 bytes, len=40
45      cycles  few jumps, 66 bytes, len=10
43      cycles many jumps, 66 bytes, len=10
Quotealign 16
Src1   db "123456789.1234789e-123 and a few more..", 0
align 16
Src2   db "1234.567h", 0

.code
start:
   counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      S1:
      mov esi, offset Src1
      jmp @F
      HasE:
      jmp @F
      HasDot:
      jmp @F
      HasH:
      jmp @F
      HasB:
      jmp @F
      HasY:
      jmp @F
      HasB1:
      jmp @F
      HasB2:
      @@:
         mov al, [esi]
         inc esi
         .if al<"0" || al>"9"   ; few jumps
            test al, al
            je @F
            or al, 20h
            cmp al, "e"
            je HasE
            cmp al, "."
            je HasDot
            cmp al, "h"
            je HasH
            cmp al, "b"
            je HasB
            cmp al, "y"
            je HasY
            cmp al, "("
            je HasB1
            cmp al, "("
            je HasB2
         .endif
      jmp @B
      @@:
      E1:
      sub esi, offset Src1

   counter_end
   print str$(eax), 9, "cycles  few jumps, "
   mov eax, E1
   sub eax, S1
   print str$(eax), " bytes, len="
   print str$(esi), 13, 10

   counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      S2:
      mov esi, offset Src1
      @@:
         mov al, [esi]
         inc esi
         .if al<"0" || al>"9"   ; many jumps
            test al, al
            je @F
            or al, 20h
            .if al=="e"
               jmp @B
            .endif
            .if al=="."
               jmp @B
            .endif
            .if al=="h"
               jmp @B
            .endif
            .if al=="b"
               jmp @B
            .endif
            .if al=="y"
               jmp @B
            .endif
            .if al=="("
               jmp @B
            .endif
            .if al=="("
               jmp @B
            .endif
         .endif
      jmp @B
      @@:
      E2:
      sub esi, offset Src1

   counter_end
   print str$(eax), 9, "cycles many jumps, "
   mov eax, E2
   sub eax, S2
   print str$(eax), " bytes, len="
   print str$(esi), 13, 10
[/b]

dedndave

could it be because it is trying to parse through...
Quote" and a few more.."
DOH !
lol

as for jumps - i find that short jumps are pretty quick
it is still good to reduce the number of jumps executed in a loop, if practical

jj2007

Quote from: dedndave on December 08, 2009, 04:56:38 PM
could it be because it is trying to parse through...
Quote" and a few more.."
DOH !
lol
Yep, you are right. These are non-numbers that slow the algo down... how stupid of me.
504     cycles  few jumps, 66 bytes, len=100
502     cycles many jumps, 66 bytes, len=100
45      cycles  few jumps, 66 bytes, len=10
43      cycles many jumps, 66 bytes, len=10


>as for jumps - i find that short jumps are pretty quick
>it is still good to reduce the number of jumps executed in a loop, if practical

Probably yes, although this test reveals no difference between jumping and not jumping.

hutch--

JJ,

In isolation the difference is reasonably hard to measure but thge general drift is if an algorithm is anything like well advanced in speed terms extra jumps start to become one of the barriers to making it go faster. With very high speed code you will see the difference but with code full of stalls, dependencies and the like it just does not matter much.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

Hutch,
Maybe I have not been clear. What I wanted to see was the difference between two parser loops, assuming that the text rarely has a dot:
A) 
cmp al, "."
je HasDot   ; ... do stuff there, otherwise fall through

B)
.if al=="."
  ... do stuff
.endif

Now A) is obviously rarely taken; in 99% of cases, it falls through.
In B), the .if translates to:
cmp al, "."
jne @F
... do stuff
@@:

So here it's the opposite: That jump is taken 99% of the time.
According to Opcodes.chm, a jump taken is 3 cycles, not taken is one cycle. An unknown author writes there:
Quote- It's a good programming practice to organize code so the
  expected case is executed without a jump since the actual
  jump takes longer to execute than falling through the test.
:wink

hutch--

JJ,

That is conventional wisdom and if it is the only factor its usually correct, a fall through is faster than a jump over but almost exclusively there are other factors that effect instruction scheduling. For what its worth a character table usually eats a sequential scanner aliver in terms of character evaluation.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave

what isn't readily obvious or easily quantifiable is "how much work is the jump doing for you"
sometimes, a jump is a sign of clumsy code
other times, it performs a great service in terms of easing the workload for the rest of the routine
sometimes, you can replace a jump inside a loop with one outside the loop
well - you get the idea
if the jump is "doing a lot of work" it is worth the effort
testing conditionals in the right order can speed up the simplest code

sinsi

I was wondering if a branch hint makes any difference?
Light travels faster than sound, that's why some people seem bright until you hear them.

hutch--

JJ,

Make a 256 member table of label offsets which you can divide into as many characters classes as you like and for the price of one table access you can seperate punctuation from either lower or upper class words, characters you don't want < 32 except for 9 10 13, numbers, arithmetic operators etc ..... any division you like.

Unwanted characters go back to the loop start, punctuation to another label, arithmetic operators to another label etc .... This type of algo will rip the titz off a sequential evaluation block and this is one place where you need the pace if the text you are parsing is large.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

Quote from: sinsi on December 09, 2009, 04:10:49 AM
I was wondering if a branch hint makes any difference?

Sinsi, DednDave, Hutch, thanks. The branch hint is not helpful, newer processors do it automatically, the character table is feasible but kind of an overkill if you have only a few branches. I might give it a try, though.
The point is rather that the code with branches runs at 5 cycles per character, independently of whether most branches are taken or not taken. The Opcodes.chm rule that a taken branch needs 3 cycles, a not taken one needs one cycle is thus obsolete.
It might be different for older processors, though - test yourself, code attached.

sinsi

There are 2 instructions to help with conditional jumps
2EH—Branch not taken (used only with Jcc instructions)
3EH—Branch taken (used only with Jcc instructions)

I think they are the instructions for segment overrides (CS: and DS:) so older processors ignore it.
Light travels faster than sound, that's why some people seem bright until you hear them.

jj2007

So do newer processors, but I can't find the Intel or Agner Fog page that specifies why ::)
We had that instruction in this thread already.

sinsi

>We had that instruction in this thread already.
Can you recall the reply? 8 pages is too much...
Light travels faster than sound, that's why some people seem bright until you hear them.

jj2007

Quote from: sinsi on December 09, 2009, 08:14:14 AM
>We had that instruction in this thread already.
Can you recall the reply? 8 pages is too much...

There was no reply, but I found the reference: Agner Fog. Search the pdf for branch hint to see several quotes of the "Branch hint prefixes have no useful effect" type.

dedndave

i think the 3 cycles for "branch taken" is from the 486 days - which went by rather fast - lol
actually, i remember thinking how nice that processor was   :bg
well - compared to the 286, it was a super-star