News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

BM Test revisited

Started by hutch--, April 23, 2006, 01:12:04 PM

Previous topic - Next topic

hutch--

I just had a play with a timing test that Lingo had written that tested a number of BM algos, the Horspool version from the masm32 library, one by Cresta and an XMM version by Lingo. I removed the Horspool version and placed the 4 heuristic BM version from the masm32 library and got surprising results on my PIV. These are the results below.

If anyone has the time, would they test this test piece on different hardware, my box is a PIV Prescott 2.8 gig.


Timing routines by MichaelW  - www.masmforum.com
Please terminate any high-priority tasks and press ENTER to begin.


Boyer - Moore Tests:

BMBinS(BM Hutch -> Begining of the Buffer): 1180 clocks; Return value: 111
BMFJT (BM Cresta-> Begining of the Buffer): 1339 clocks; Return value: 111
BMLin (BM Lingo -> Begining of the Buffer): 740 clocks; Return value: 111
BMBinS(BM Hutch -> Middle of the Buffer): 23512 clocks; Return value: 19384
BMFJT (BM Cresta-> Middle of the Buffer): 38243 clocks; Return value: 19384
BMLin (BM Lingo -> Middle of the Buffer): 21795 clocks; Return value: 19384
BMBinS(BM Hutch -> End of the Buffer): 25720 clocks; Return value: 29621
BMFJT (BM Cresta-> End of the Buffer): 45616 clocks; Return value: 29621
BMLin (BM Lingo -> End of the Buffer): 28803 clocks; Return value: 29621
BMBinS(BM Hutch -> Not found): 32202 clocks; Return value: 4294967295
BMFJT (BM Cresta-> Not found): 63415 clocks; Return value: 4294967295
BMLin (BM Lingo -> Not found): 34826 clocks; Return value: 4294967295

Press ENTER to exit...


Note that to build this example, you must use ML 6.15 or later.

[attachment deleted by admin]
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Mark Jones

Hutch, I get:

Quote
Timing routines by MichaelW  - www.masmforum.com
Please terminate any high-priority tasks and press ENTER to begin.


Boyer - Moore Tests:

BMBinS(BM Hutch -> Begining of the Buffer): 842 clocks; Return value: 111
BMFJT (BM Cresta-> Begining of the Buffer): 971 clocks; Return value: 111
BMLin (BM Lingo -> Begining of the Buffer):

Then it crashes with error code 0xC000001D at address 0x000000000040208A. AMD XP 2500+
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

arafel

It crashes for me too, but that's probably my fault since I tested it on PIII (933MHz) which doesn't support punpcklqdq instruction.

BMBinS(BM Hutch -> Begining of the Buffer): 932 clocks; Return value: 111
BMFJT (BM Cresta-> Begining of the Buffer): 1012 clocks; Return value: 111
BMLin (BM Lingo -> Begining of the Buffer):

dsouza123

The results of three runs that each crash on BMLin

BMBinS(BM Hutch -> Begining of the Buffer): 995 clocks; Return value: 111
BMFJT (BM Cresta-> Begining of the Buffer): 951 clocks; Return value: 111
BMLin (BM Lingo -> Begining of the Buffer):

BMBinS(BM Hutch -> Begining of the Buffer): 975 clocks; Return value: 111
BMFJT (BM Cresta-> Begining of the Buffer): 950 clocks; Return value: 111
BMLin (BM Lingo -> Begining of the Buffer):

BMBinS(BM Hutch -> Begining of the Buffer): 813 clocks; Return value: 111
BMFJT (BM Cresta-> Begining of the Buffer): 950 clocks; Return value: 111
BMLin (BM Lingo -> Begining of the Buffer):

Athlon Thunderbird 1.2 Ghz  Windows XP Pro

I guess the CPU / instruction sets needed to test this without crashing
are Pentium 4 and Athlon 64/Opteron maybe others.
Guessing SSE2 support is required.

I commented out all the invoke BMLin12 lines,
reassembled (as a console app), these are the results of two runs.
Hopefully there are no alignment/timing issues introduced
by commenting out the invokes.

BMBinS(BM Hutch -> Begining of the Buffer): 981 clocks; Return value: 111
BMFJT (BM Cresta-> Begining of the Buffer): 955 clocks; Return value: 111
BMLin (BM Lingo -> Begining of the Buffer): 1 clocks; Return value: 1
BMBinS(BM Hutch -> Middle of the Buffer): 18663 clocks; Return value: 19384
BMFJT (BM Cresta-> Middle of the Buffer): 15636 clocks; Return value: 19384
BMLin (BM Lingo -> Middle of the Buffer): 2 clocks; Return value: 1
BMBinS(BM Hutch -> End of the Buffer): 18676 clocks; Return value: 29621
BMFJT (BM Cresta-> End of the Buffer): 18081 clocks; Return value: 29621
BMLin (BM Lingo -> End of the Buffer): 1 clocks; Return value: 1
BMBinS(BM Hutch -> Not found): 26920 clocks; Return value: -1
BMFJT (BM Cresta-> Not found): 26468 clocks; Return value: -1
BMLin (BM Lingo -> Not found): 2 clocks; Return value: 1

BMBinS(BM Hutch -> Begining of the Buffer): 819 clocks; Return value: 111
BMFJT (BM Cresta-> Begining of the Buffer): 953 clocks; Return value: 111
BMLin (BM Lingo -> Begining of the Buffer): 1 clocks; Return value: 1
BMBinS(BM Hutch -> Middle of the Buffer): 18528 clocks; Return value: 19384
BMFJT (BM Cresta-> Middle of the Buffer): 15954 clocks; Return value: 19384
BMLin (BM Lingo -> Middle of the Buffer): 1 clocks; Return value: 1
BMBinS(BM Hutch -> End of the Buffer): 19134 clocks; Return value: 29621
BMFJT (BM Cresta-> End of the Buffer): 18131 clocks; Return value: 29621
BMLin (BM Lingo -> End of the Buffer): 1 clocks; Return value: 1
BMBinS(BM Hutch -> Not found): 26919 clocks; Return value: -1
BMFJT (BM Cresta-> Not found): 26470 clocks; Return value: -1
BMLin (BM Lingo -> Not found): 2 clocks; Return value: 1

[attachment deleted by admin]

EduardoS

If ML6.15 is required to compile probably SSE2 is required to run...

But i'm a luck guy and Athlon 64 suport SSE2:

BMBinS(BM Hutch -> Begining of the Buffer): 693 clocks; Return value: 111
BMFJT (BM Cresta-> Begining of the Buffer): 930 clocks; Return value: 111
BMLin (BM Lingo -> Begining of the Buffer): 442 clocks; Return value: 111
BMBinS(BM Hutch -> Middle of the Buffer): 19722 clocks; Return value: 19384
BMFJT (BM Cresta-> Middle of the Buffer): 15563 clocks; Return value: 19384
BMLin (BM Lingo -> Middle of the Buffer): 10776 clocks; Return value: 19384
BMBinS(BM Hutch -> End of the Buffer): 20012 clocks; Return value: 29621
BMFJT (BM Cresta-> End of the Buffer): 17967 clocks; Return value: 29621
BMLin (BM Lingo -> End of the Buffer): 12438 clocks; Return value: 29621
BMBinS(BM Hutch -> Not found): 27943 clocks; Return value: 4294967295
BMFJT (BM Cresta-> Not found): 26340 clocks; Return value: 4294967295
BMLin (BM Lingo -> Not found): 18429 clocks; Return value: 4294967295

Phoenix

Athlon64 / WinXP Pro SP2:

BMBinS(BM Hutch -> Begining of the Buffer): 692 clocks; Return value: 111
BMFJT (BM Cresta-> Begining of the Buffer): 930 clocks; Return value: 111
BMLin (BM Lingo -> Begining of the Buffer): 440 clocks; Return value: 111
BMBinS(BM Hutch -> Middle of the Buffer): 19735 clocks; Return value: 19384
BMFJT (BM Cresta-> Middle of the Buffer): 15569 clocks; Return value: 19384
BMLin (BM Lingo -> Middle of the Buffer): 10774 clocks; Return value: 19384
BMBinS(BM Hutch -> End of the Buffer): 20036 clocks; Return value: 29621
BMFJT (BM Cresta-> End of the Buffer): 17999 clocks; Return value: 29621
BMLin (BM Lingo -> End of the Buffer): 12451 clocks; Return value: 29621
BMBinS(BM Hutch -> Not found): 27959 clocks; Return value: 4294967295
BMFJT (BM Cresta-> Not found): 26349 clocks; Return value: 4294967295
BMLin (BM Lingo -> Not found): 18425 clocks; Return value: 4294967295

mnemonic

AMD Turion64 / Win XP Home SP2 (32bit)


BMBinS(BM Hutch -> Begining of the Buffer): 691 clocks; Return value: 111
BMFJT (BM Cresta-> Begining of the Buffer): 930 clocks; Return value: 111
BMLin (BM Lingo -> Begining of the Buffer): 440 clocks; Return value: 111
BMBinS(BM Hutch -> Middle of the Buffer): 19740 clocks; Return value: 19384
BMFJT (BM Cresta-> Middle of the Buffer): 15555 clocks; Return value: 19384
BMLin (BM Lingo -> Middle of the Buffer): 10767 clocks; Return value: 19384
BMBinS(BM Hutch -> End of the Buffer): 20014 clocks; Return value: 29621
BMFJT (BM Cresta-> End of the Buffer): 17962 clocks; Return value: 29621
BMLin (BM Lingo -> End of the Buffer): 12433 clocks; Return value: 29621
BMBinS(BM Hutch -> Not found): 27932 clocks; Return value: 4294967295
BMFJT (BM Cresta-> Not found): 26333 clocks; Return value: 4294967295
BMLin (BM Lingo -> Not found): 18408 clocks; Return value: 4294967295
Be kind. Everyone you meet is fighting a hard battle.--Plato
-------
How To Ask Questions The Smart Way

hutch--

I guess I should have mentioned that the algo by lingo required SSE2. I was mainly interested in the difference on hardware late enough to run all of the instructions. I changed the Horspool BM version to the full version that uses both shifts as it was generally faster on a wider range of data and it ran a lot faster on Lingo's test data than the Horspool version did.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

hutch--

I have had a play with the 4 heuristic BM and got it marginally faster on my PIV. These are the current timings with it. As before, the example contains SSE2 code so it will not run on earlier machines without it.


Timing routines by MichaelW  - www.masmforum.com
Please terminate any high-priority tasks and press ENTER to begin.


Boyer - Moore Tests:

BMBinS(BM Hutch -> Begining of the Buffer): 950 clocks; Return value: 111
BMFJT (BM Cresta-> Begining of the Buffer): 1339 clocks; Return value: 111
BMLin (BM Lingo -> Begining of the Buffer): 753 clocks; Return value: 111
BMBinS(BM Hutch -> Middle of the Buffer): 22544 clocks; Return value: 19384
BMFJT (BM Cresta-> Middle of the Buffer): 37920 clocks; Return value: 19384
BMLin (BM Lingo -> Middle of the Buffer): 21632 clocks; Return value: 19384
BMBinS(BM Hutch -> End of the Buffer): 24624 clocks; Return value: 29621
BMFJT (BM Cresta-> End of the Buffer): 45327 clocks; Return value: 29621
BMLin (BM Lingo -> End of the Buffer): 28714 clocks; Return value: 29621
BMBinS(BM Hutch -> Not found): 30926 clocks; Return value: 4294967295
BMFJT (BM Cresta-> Not found): 63282 clocks; Return value: 4294967295
BMLin (BM Lingo -> Not found): 34683 clocks; Return value: 4294967295

Press ENTER to exit...

[attachment deleted by admin]
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Ghirai

The gode above gives these results, on an Athlon 64 3000+:

BMBinS(BM Hutch -> Begining of the Buffer): 647 clocks; Return value: 111
BMFJT (BM Cresta-> Begining of the Buffer): 940 clocks; Return value: 111
BMLin (BM Lingo -> Begining of the Buffer): 444 clocks; Return value: 111
BMBinS(BM Hutch -> Middle of the Buffer): 19965 clocks; Return value: 19384
BMFJT (BM Cresta-> Middle of the Buffer): 15666 clocks; Return value: 19384
BMLin (BM Lingo -> Middle of the Buffer): 10836 clocks; Return value: 19384
BMBinS(BM Hutch -> End of the Buffer): 20652 clocks; Return value: 29621
BMFJT (BM Cresta-> End of the Buffer): 18086 clocks; Return value: 29621
BMLin (BM Lingo -> End of the Buffer): 12524 clocks; Return value: 29621
BMBinS(BM Hutch -> Not found): 30261 clocks; Return value: 4294967295
BMFJT (BM Cresta-> Not found): 26523 clocks; Return value: 4294967295
BMLin (BM Lingo -> Not found): 18537 clocks; Return value: 4294967295


Why are there so big differences in the last 2 lines of the result, between my system and yours?
Your PIV is 2.8 GHz, and my AMD is 2.0 GHz...
MASM32 Project/RadASM mirror - http://ghirai.com/hutch/mmi.html

EduardoS

Quote from: Ghirai on April 24, 2006, 08:53:46 AM
Why are there so big differences in the last 2 lines of the result, between my system and yours?
Your PIV is 2.8 GHz, and my AMD is 2.0 GHz...
1) The time here is counted in clocks, a P4 1.6GHz or 3.2GHz will have the same value;

2) Athlon execute more instructions per cicle than P4.

A) To have the time in nanoseconds divide the result by the clock of your processor.

y-code

Isn't there a much faster rewrite of that Cresta code in anothr thread here? ANd wasn't that Horspool code shown to be buggy anyway?

EduardoS

I forgot, the second one:
Quote
BMBinS(BM Hutch -> Begining of the Buffer): 642 clocks; Return value: 111
BMFJT (BM Cresta-> Begining of the Buffer): 933 clocks; Return value: 111
BMLin (BM Lingo -> Begining of the Buffer): 440 clocks; Return value: 111
BMBinS(BM Hutch -> Middle of the Buffer): 19803 clocks; Return value: 19384
BMFJT (BM Cresta-> Middle of the Buffer): 15561 clocks; Return value: 19384
BMLin (BM Lingo -> Middle of the Buffer): 10773 clocks; Return value: 19384
BMBinS(BM Hutch -> End of the Buffer): 20516 clocks; Return value: 29621
BMFJT (BM Cresta-> End of the Buffer): 17974 clocks; Return value: 29621
BMLin (BM Lingo -> End of the Buffer): 12443 clocks; Return value: 29621
BMBinS(BM Hutch -> Not found): 30072 clocks; Return value: 4294967295
BMFJT (BM Cresta-> Not found): 26353 clocks; Return value: 4294967295
BMLin (BM Lingo -> Not found): 18422 clocks; Return value: 4294967295

hutch--

Thanks guys, perhaps I should have rewritten the example so it does the timing in real time but it does give a reasonable idea of the difference on different hardware. I wondered if anyone had a later Intel box with an EM64T or similar.

y-code,

I would be more than interested in a testing method that verifies BM algorithms but I have yet to see one. Years ago when I was working on these three on a PIII I wrote testing software that ran many millions of variable length patterns and I could not get them to fail but this is not proof that they are without failure. The reason for replacing the Horspool with a complete BM algo was because the Horspool only uses one of the shift types and is slower in many circumstances than the original design by Bob Boyer.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Ian_B

Some more optimisation, mostly removing redundant code. I believe these times (P4 Northwood) speak for themselves. On P4 at least this stripped-down version of hutch's routine performs 5-20% faster and outclasses the MMX code that's far less flexible anyway (no provision for starting offsets) and doesn't even work on all processors. Should be a good compromise if you don't want to write specific targetted code.

BMBinS (BM Hutch -> Beginning of the Buffer): 890 clocks; Return value: 111
BMB2   (BM H/IB  -> Beginning of the Buffer): 847 clocks; Return value: 111
BMFJT  (BM Cresta-> Beginning of the Buffer): 1238 clocks; Return value: 111
BMLin  (BM Lingo -> Beginning of the Buffer): 669 clocks; Return value: 111

BMBinS (BM Hutch -> Middle of the Buffer): 21407 clocks; Return value: 19384
BMB2   (BM H/IB  -> Middle of the Buffer): 16690 clocks; Return value: 19384
BMFJT  (BM Cresta-> Middle of the Buffer): 36571 clocks; Return value: 19384
BMLin  (BM Lingo -> Middle of the Buffer): 20976 clocks; Return value: 19384

BMBinS (BM Hutch -> End of the Buffer): 22873 clocks; Return value: 29621
BMB2   (BM H/IB  -> End of the Buffer): 21122 clocks; Return value: 29621
BMFJT  (BM Cresta-> End of the Buffer): 43807 clocks; Return value: 29621
BMLin  (BM Lingo -> End of the Buffer): 27268 clocks; Return value: 29621

BMBinS (BM Hutch -> Not found): 29011 clocks; Return value: -1
BMB2   (BM H/IB  -> Not found): 26064 clocks; Return value: -1
BMFJT  (BM Cresta-> Not found): 62204 clocks; Return value: -1
BMLin  (BM Lingo -> Not found): 33544 clocks; Return value: -1


Wasn't this account supposed to have been deleted, BTW...?  :(

Ian_B


[attachment deleted by admin]