The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: hutch-- on April 23, 2006, 01:12:04 PM

Title: BM Test revisited
Post by: hutch-- on April 23, 2006, 01:12:04 PM
I just had a play with a timing test that Lingo had written that tested a number of BM algos, the Horspool version from the masm32 library, one by Cresta and an XMM version by Lingo. I removed the Horspool version and placed the 4 heuristic BM version from the masm32 library and got surprising results on my PIV. These are the results below.

If anyone has the time, would they test this test piece on different hardware, my box is a PIV Prescott 2.8 gig.


Timing routines by MichaelW  - www.masmforum.com
Please terminate any high-priority tasks and press ENTER to begin.


Boyer - Moore Tests:

BMBinS(BM Hutch -> Begining of the Buffer): 1180 clocks; Return value: 111
BMFJT (BM Cresta-> Begining of the Buffer): 1339 clocks; Return value: 111
BMLin (BM Lingo -> Begining of the Buffer): 740 clocks; Return value: 111
BMBinS(BM Hutch -> Middle of the Buffer): 23512 clocks; Return value: 19384
BMFJT (BM Cresta-> Middle of the Buffer): 38243 clocks; Return value: 19384
BMLin (BM Lingo -> Middle of the Buffer): 21795 clocks; Return value: 19384
BMBinS(BM Hutch -> End of the Buffer): 25720 clocks; Return value: 29621
BMFJT (BM Cresta-> End of the Buffer): 45616 clocks; Return value: 29621
BMLin (BM Lingo -> End of the Buffer): 28803 clocks; Return value: 29621
BMBinS(BM Hutch -> Not found): 32202 clocks; Return value: 4294967295
BMFJT (BM Cresta-> Not found): 63415 clocks; Return value: 4294967295
BMLin (BM Lingo -> Not found): 34826 clocks; Return value: 4294967295

Press ENTER to exit...


Note that to build this example, you must use ML 6.15 or later.

[attachment deleted by admin]
Title: Re: BM Test revisited
Post by: Mark Jones on April 23, 2006, 03:01:52 PM
Hutch, I get:

Quote
Timing routines by MichaelW  - www.masmforum.com
Please terminate any high-priority tasks and press ENTER to begin.


Boyer - Moore Tests:

BMBinS(BM Hutch -> Begining of the Buffer): 842 clocks; Return value: 111
BMFJT (BM Cresta-> Begining of the Buffer): 971 clocks; Return value: 111
BMLin (BM Lingo -> Begining of the Buffer):

Then it crashes with error code 0xC000001D at address 0x000000000040208A. AMD XP 2500+
Title: Re: BM Test revisited
Post by: arafel on April 23, 2006, 03:37:23 PM
It crashes for me too, but that's probably my fault since I tested it on PIII (933MHz) which doesn't support punpcklqdq instruction.

BMBinS(BM Hutch -> Begining of the Buffer): 932 clocks; Return value: 111
BMFJT (BM Cresta-> Begining of the Buffer): 1012 clocks; Return value: 111
BMLin (BM Lingo -> Begining of the Buffer):
Title: Re: BM Test revisited
Post by: dsouza123 on April 23, 2006, 06:35:40 PM
The results of three runs that each crash on BMLin

BMBinS(BM Hutch -> Begining of the Buffer): 995 clocks; Return value: 111
BMFJT (BM Cresta-> Begining of the Buffer): 951 clocks; Return value: 111
BMLin (BM Lingo -> Begining of the Buffer):

BMBinS(BM Hutch -> Begining of the Buffer): 975 clocks; Return value: 111
BMFJT (BM Cresta-> Begining of the Buffer): 950 clocks; Return value: 111
BMLin (BM Lingo -> Begining of the Buffer):

BMBinS(BM Hutch -> Begining of the Buffer): 813 clocks; Return value: 111
BMFJT (BM Cresta-> Begining of the Buffer): 950 clocks; Return value: 111
BMLin (BM Lingo -> Begining of the Buffer):

Athlon Thunderbird 1.2 Ghz  Windows XP Pro

I guess the CPU / instruction sets needed to test this without crashing
are Pentium 4 and Athlon 64/Opteron maybe others.
Guessing SSE2 support is required.

I commented out all the invoke BMLin12 lines,
reassembled (as a console app), these are the results of two runs.
Hopefully there are no alignment/timing issues introduced
by commenting out the invokes.

BMBinS(BM Hutch -> Begining of the Buffer): 981 clocks; Return value: 111
BMFJT (BM Cresta-> Begining of the Buffer): 955 clocks; Return value: 111
BMLin (BM Lingo -> Begining of the Buffer): 1 clocks; Return value: 1
BMBinS(BM Hutch -> Middle of the Buffer): 18663 clocks; Return value: 19384
BMFJT (BM Cresta-> Middle of the Buffer): 15636 clocks; Return value: 19384
BMLin (BM Lingo -> Middle of the Buffer): 2 clocks; Return value: 1
BMBinS(BM Hutch -> End of the Buffer): 18676 clocks; Return value: 29621
BMFJT (BM Cresta-> End of the Buffer): 18081 clocks; Return value: 29621
BMLin (BM Lingo -> End of the Buffer): 1 clocks; Return value: 1
BMBinS(BM Hutch -> Not found): 26920 clocks; Return value: -1
BMFJT (BM Cresta-> Not found): 26468 clocks; Return value: -1
BMLin (BM Lingo -> Not found): 2 clocks; Return value: 1

BMBinS(BM Hutch -> Begining of the Buffer): 819 clocks; Return value: 111
BMFJT (BM Cresta-> Begining of the Buffer): 953 clocks; Return value: 111
BMLin (BM Lingo -> Begining of the Buffer): 1 clocks; Return value: 1
BMBinS(BM Hutch -> Middle of the Buffer): 18528 clocks; Return value: 19384
BMFJT (BM Cresta-> Middle of the Buffer): 15954 clocks; Return value: 19384
BMLin (BM Lingo -> Middle of the Buffer): 1 clocks; Return value: 1
BMBinS(BM Hutch -> End of the Buffer): 19134 clocks; Return value: 29621
BMFJT (BM Cresta-> End of the Buffer): 18131 clocks; Return value: 29621
BMLin (BM Lingo -> End of the Buffer): 1 clocks; Return value: 1
BMBinS(BM Hutch -> Not found): 26919 clocks; Return value: -1
BMFJT (BM Cresta-> Not found): 26470 clocks; Return value: -1
BMLin (BM Lingo -> Not found): 2 clocks; Return value: 1

[attachment deleted by admin]
Title: Re: BM Test revisited
Post by: EduardoS on April 23, 2006, 07:59:05 PM
If ML6.15 is required to compile probably SSE2 is required to run...

But i'm a luck guy and Athlon 64 suport SSE2:

BMBinS(BM Hutch -> Begining of the Buffer): 693 clocks; Return value: 111
BMFJT (BM Cresta-> Begining of the Buffer): 930 clocks; Return value: 111
BMLin (BM Lingo -> Begining of the Buffer): 442 clocks; Return value: 111
BMBinS(BM Hutch -> Middle of the Buffer): 19722 clocks; Return value: 19384
BMFJT (BM Cresta-> Middle of the Buffer): 15563 clocks; Return value: 19384
BMLin (BM Lingo -> Middle of the Buffer): 10776 clocks; Return value: 19384
BMBinS(BM Hutch -> End of the Buffer): 20012 clocks; Return value: 29621
BMFJT (BM Cresta-> End of the Buffer): 17967 clocks; Return value: 29621
BMLin (BM Lingo -> End of the Buffer): 12438 clocks; Return value: 29621
BMBinS(BM Hutch -> Not found): 27943 clocks; Return value: 4294967295
BMFJT (BM Cresta-> Not found): 26340 clocks; Return value: 4294967295
BMLin (BM Lingo -> Not found): 18429 clocks; Return value: 4294967295
Title: Re: BM Test revisited
Post by: Phoenix on April 23, 2006, 08:37:33 PM
Athlon64 / WinXP Pro SP2:

BMBinS(BM Hutch -> Begining of the Buffer): 692 clocks; Return value: 111
BMFJT (BM Cresta-> Begining of the Buffer): 930 clocks; Return value: 111
BMLin (BM Lingo -> Begining of the Buffer): 440 clocks; Return value: 111
BMBinS(BM Hutch -> Middle of the Buffer): 19735 clocks; Return value: 19384
BMFJT (BM Cresta-> Middle of the Buffer): 15569 clocks; Return value: 19384
BMLin (BM Lingo -> Middle of the Buffer): 10774 clocks; Return value: 19384
BMBinS(BM Hutch -> End of the Buffer): 20036 clocks; Return value: 29621
BMFJT (BM Cresta-> End of the Buffer): 17999 clocks; Return value: 29621
BMLin (BM Lingo -> End of the Buffer): 12451 clocks; Return value: 29621
BMBinS(BM Hutch -> Not found): 27959 clocks; Return value: 4294967295
BMFJT (BM Cresta-> Not found): 26349 clocks; Return value: 4294967295
BMLin (BM Lingo -> Not found): 18425 clocks; Return value: 4294967295
Title: Re: BM Test revisited
Post by: mnemonic on April 23, 2006, 09:44:30 PM
AMD Turion64 / Win XP Home SP2 (32bit)


BMBinS(BM Hutch -> Begining of the Buffer): 691 clocks; Return value: 111
BMFJT (BM Cresta-> Begining of the Buffer): 930 clocks; Return value: 111
BMLin (BM Lingo -> Begining of the Buffer): 440 clocks; Return value: 111
BMBinS(BM Hutch -> Middle of the Buffer): 19740 clocks; Return value: 19384
BMFJT (BM Cresta-> Middle of the Buffer): 15555 clocks; Return value: 19384
BMLin (BM Lingo -> Middle of the Buffer): 10767 clocks; Return value: 19384
BMBinS(BM Hutch -> End of the Buffer): 20014 clocks; Return value: 29621
BMFJT (BM Cresta-> End of the Buffer): 17962 clocks; Return value: 29621
BMLin (BM Lingo -> End of the Buffer): 12433 clocks; Return value: 29621
BMBinS(BM Hutch -> Not found): 27932 clocks; Return value: 4294967295
BMFJT (BM Cresta-> Not found): 26333 clocks; Return value: 4294967295
BMLin (BM Lingo -> Not found): 18408 clocks; Return value: 4294967295
Title: Re: BM Test revisited
Post by: hutch-- on April 24, 2006, 01:37:42 AM
I guess I should have mentioned that the algo by lingo required SSE2. I was mainly interested in the difference on hardware late enough to run all of the instructions. I changed the Horspool BM version to the full version that uses both shifts as it was generally faster on a wider range of data and it ran a lot faster on Lingo's test data than the Horspool version did.
Title: Re: BM Test revisited
Post by: hutch-- on April 24, 2006, 03:15:14 AM
I have had a play with the 4 heuristic BM and got it marginally faster on my PIV. These are the current timings with it. As before, the example contains SSE2 code so it will not run on earlier machines without it.


Timing routines by MichaelW  - www.masmforum.com
Please terminate any high-priority tasks and press ENTER to begin.


Boyer - Moore Tests:

BMBinS(BM Hutch -> Begining of the Buffer): 950 clocks; Return value: 111
BMFJT (BM Cresta-> Begining of the Buffer): 1339 clocks; Return value: 111
BMLin (BM Lingo -> Begining of the Buffer): 753 clocks; Return value: 111
BMBinS(BM Hutch -> Middle of the Buffer): 22544 clocks; Return value: 19384
BMFJT (BM Cresta-> Middle of the Buffer): 37920 clocks; Return value: 19384
BMLin (BM Lingo -> Middle of the Buffer): 21632 clocks; Return value: 19384
BMBinS(BM Hutch -> End of the Buffer): 24624 clocks; Return value: 29621
BMFJT (BM Cresta-> End of the Buffer): 45327 clocks; Return value: 29621
BMLin (BM Lingo -> End of the Buffer): 28714 clocks; Return value: 29621
BMBinS(BM Hutch -> Not found): 30926 clocks; Return value: 4294967295
BMFJT (BM Cresta-> Not found): 63282 clocks; Return value: 4294967295
BMLin (BM Lingo -> Not found): 34683 clocks; Return value: 4294967295

Press ENTER to exit...

[attachment deleted by admin]
Title: Re: BM Test revisited
Post by: Ghirai on April 24, 2006, 08:53:46 AM
The gode above gives these results, on an Athlon 64 3000+:

BMBinS(BM Hutch -> Begining of the Buffer): 647 clocks; Return value: 111
BMFJT (BM Cresta-> Begining of the Buffer): 940 clocks; Return value: 111
BMLin (BM Lingo -> Begining of the Buffer): 444 clocks; Return value: 111
BMBinS(BM Hutch -> Middle of the Buffer): 19965 clocks; Return value: 19384
BMFJT (BM Cresta-> Middle of the Buffer): 15666 clocks; Return value: 19384
BMLin (BM Lingo -> Middle of the Buffer): 10836 clocks; Return value: 19384
BMBinS(BM Hutch -> End of the Buffer): 20652 clocks; Return value: 29621
BMFJT (BM Cresta-> End of the Buffer): 18086 clocks; Return value: 29621
BMLin (BM Lingo -> End of the Buffer): 12524 clocks; Return value: 29621
BMBinS(BM Hutch -> Not found): 30261 clocks; Return value: 4294967295
BMFJT (BM Cresta-> Not found): 26523 clocks; Return value: 4294967295
BMLin (BM Lingo -> Not found): 18537 clocks; Return value: 4294967295


Why are there so big differences in the last 2 lines of the result, between my system and yours?
Your PIV is 2.8 GHz, and my AMD is 2.0 GHz...
Title: Re: BM Test revisited
Post by: EduardoS on April 24, 2006, 02:06:35 PM
Quote from: Ghirai on April 24, 2006, 08:53:46 AM
Why are there so big differences in the last 2 lines of the result, between my system and yours?
Your PIV is 2.8 GHz, and my AMD is 2.0 GHz...
1) The time here is counted in clocks, a P4 1.6GHz or 3.2GHz will have the same value;

2) Athlon execute more instructions per cicle than P4.

A) To have the time in nanoseconds divide the result by the clock of your processor.
Title: Re: BM Test revisited
Post by: y-code on April 24, 2006, 02:48:05 PM
Isn't there a much faster rewrite of that Cresta code in anothr thread here? ANd wasn't that Horspool code shown to be buggy anyway?
Title: Re: BM Test revisited
Post by: EduardoS on April 24, 2006, 03:38:59 PM
I forgot, the second one:
Quote
BMBinS(BM Hutch -> Begining of the Buffer): 642 clocks; Return value: 111
BMFJT (BM Cresta-> Begining of the Buffer): 933 clocks; Return value: 111
BMLin (BM Lingo -> Begining of the Buffer): 440 clocks; Return value: 111
BMBinS(BM Hutch -> Middle of the Buffer): 19803 clocks; Return value: 19384
BMFJT (BM Cresta-> Middle of the Buffer): 15561 clocks; Return value: 19384
BMLin (BM Lingo -> Middle of the Buffer): 10773 clocks; Return value: 19384
BMBinS(BM Hutch -> End of the Buffer): 20516 clocks; Return value: 29621
BMFJT (BM Cresta-> End of the Buffer): 17974 clocks; Return value: 29621
BMLin (BM Lingo -> End of the Buffer): 12443 clocks; Return value: 29621
BMBinS(BM Hutch -> Not found): 30072 clocks; Return value: 4294967295
BMFJT (BM Cresta-> Not found): 26353 clocks; Return value: 4294967295
BMLin (BM Lingo -> Not found): 18422 clocks; Return value: 4294967295
Title: Re: BM Test revisited
Post by: hutch-- on April 24, 2006, 11:25:54 PM
Thanks guys, perhaps I should have rewritten the example so it does the timing in real time but it does give a reasonable idea of the difference on different hardware. I wondered if anyone had a later Intel box with an EM64T or similar.

y-code,

I would be more than interested in a testing method that verifies BM algorithms but I have yet to see one. Years ago when I was working on these three on a PIII I wrote testing software that ran many millions of variable length patterns and I could not get them to fail but this is not proof that they are without failure. The reason for replacing the Horspool with a complete BM algo was because the Horspool only uses one of the shift types and is slower in many circumstances than the original design by Bob Boyer.
Title: Re: BM Test revisited
Post by: Ian_B on April 30, 2006, 04:20:49 PM
Some more optimisation, mostly removing redundant code. I believe these times (P4 Northwood) speak for themselves. On P4 at least this stripped-down version of hutch's routine performs 5-20% faster and outclasses the MMX code that's far less flexible anyway (no provision for starting offsets) and doesn't even work on all processors. Should be a good compromise if you don't want to write specific targetted code.

BMBinS (BM Hutch -> Beginning of the Buffer): 890 clocks; Return value: 111
BMB2   (BM H/IB  -> Beginning of the Buffer): 847 clocks; Return value: 111
BMFJT  (BM Cresta-> Beginning of the Buffer): 1238 clocks; Return value: 111
BMLin  (BM Lingo -> Beginning of the Buffer): 669 clocks; Return value: 111

BMBinS (BM Hutch -> Middle of the Buffer): 21407 clocks; Return value: 19384
BMB2   (BM H/IB  -> Middle of the Buffer): 16690 clocks; Return value: 19384
BMFJT  (BM Cresta-> Middle of the Buffer): 36571 clocks; Return value: 19384
BMLin  (BM Lingo -> Middle of the Buffer): 20976 clocks; Return value: 19384

BMBinS (BM Hutch -> End of the Buffer): 22873 clocks; Return value: 29621
BMB2   (BM H/IB  -> End of the Buffer): 21122 clocks; Return value: 29621
BMFJT  (BM Cresta-> End of the Buffer): 43807 clocks; Return value: 29621
BMLin  (BM Lingo -> End of the Buffer): 27268 clocks; Return value: 29621

BMBinS (BM Hutch -> Not found): 29011 clocks; Return value: -1
BMB2   (BM H/IB  -> Not found): 26064 clocks; Return value: -1
BMFJT  (BM Cresta-> Not found): 62204 clocks; Return value: -1
BMLin  (BM Lingo -> Not found): 33544 clocks; Return value: -1


Wasn't this account supposed to have been deleted, BTW...?  :(

Ian_B


[attachment deleted by admin]
Title: Re: BM Test revisited
Post by: hutch-- on May 01, 2006, 05:42:11 AM
Ian,

These are the results on my Prescott PIV.


Timing routines by MichaelW  - www.masmforum.com
Please terminate any high-priority tasks and press ENTER to begin.


Boyer - Moore Tests:

BMBinS (BM Hutch -> Beginning of the Buffer): 922 clocks; Return value: 111
BMB2   (BM H/IB  -> Beginning of the Buffer): 934 clocks; Return value: 111
BMFJT  (BM Cresta-> Beginning of the Buffer): 1382 clocks; Return value: 111
BMLin  (BM Lingo -> Beginning of the Buffer): 756 clocks; Return value: 111

BMBinS (BM Hutch -> Middle of the Buffer): 22468 clocks; Return value: 19384
BMB2   (BM H/IB  -> Middle of the Buffer): 17980 clocks; Return value: 19384
BMFJT  (BM Cresta-> Middle of the Buffer): 37930 clocks; Return value: 19384
BMLin  (BM Lingo -> Middle of the Buffer): 21646 clocks; Return value: 19384

BMBinS (BM Hutch -> End of the Buffer): 24660 clocks; Return value: 29621
BMB2   (BM H/IB  -> End of the Buffer): 22678 clocks; Return value: 29621
BMFJT  (BM Cresta-> End of the Buffer): 45426 clocks; Return value: 29621
BMLin  (BM Lingo -> End of the Buffer): 28452 clocks; Return value: 29621

BMBinS (BM Hutch -> Not found): 31007 clocks; Return value: -1
BMB2   (BM H/IB  -> Not found): 27855 clocks; Return value: -1
BMFJT  (BM Cresta-> Not found): 63169 clocks; Return value: -1
BMLin  (BM Lingo -> Not found): 34706 clocks; Return value: -1

Press ENTER to exit...


> Wasn't this account supposed to have been deleted, BTW...?

Sorry, I don't do requests.  :bg


Title: Re: BM Test revisited
Post by: PBrennick on May 01, 2006, 06:33:25 AM
Ian,
The account still seems to be useful to you and you seem to be helpful to us.  Why not stick around.  I, for one, need all the optimization help and/or examples that I can get!

Paul
Title: Re: BM Test revisited
Post by: Ian_B on May 01, 2006, 02:16:28 PM
As you don't need to keep EAX, you can squeeze a few more cycles from it if you replace the last 2 lines of my Calc_Suffix_Shift section before the jump with:
    lea esi, [esi+eax+1]        ; minimum shift is 1
Title: Re: BM Test revisited
Post by: Ian_B on May 01, 2006, 08:38:16 PM
And this version gets just a little more speed by checking for DWORD alignment of the source data before doing the DWORD compares, so at least those memory pulls are aligned. For longer source strings this should make a difference, for shorter it adds a little more overhead. By altering all ADD/SUB 1 to INC/DEC I managed to get better times on my P4 too, which is counter-intuitive (according to the thread in the Workshop) but I can't argue with the timings and it might help on AMD. I would like to see some AMD timings for this, as hutch's original code was particularly unfavoured by that silicon.

BMBinS (BM Hutch -> Beginning of the Buffer): 881 clocks; Return value: 111
BMB2   (BM H/IB  -> Beginning of the Buffer): 841 clocks; Return value: 111
BMFJT  (BM Cresta-> Beginning of the Buffer): 1235 clocks; Return value: 111
BMLin  (BM Lingo -> Beginning of the Buffer): 667 clocks; Return value: 111

BMBinS (BM Hutch -> Middle of the Buffer): 21351 clocks; Return value: 19384
BMB2   (BM H/IB  -> Middle of the Buffer): 17005 clocks; Return value: 19384
BMFJT  (BM Cresta-> Middle of the Buffer): 36411 clocks; Return value: 19384
BMLin  (BM Lingo -> Middle of the Buffer): 20900 clocks; Return value: 19384

BMBinS (BM Hutch -> End of the Buffer): 22823 clocks; Return value: 29621
BMB2   (BM H/IB  -> End of the Buffer): 21020 clocks; Return value: 29621
BMFJT  (BM Cresta-> End of the Buffer): 43618 clocks; Return value: 29621
BMLin  (BM Lingo -> End of the Buffer): 27427 clocks; Return value: 29621

BMBinS (BM Hutch -> Not found): 29051 clocks; Return value: -1
BMB2   (BM H/IB  -> Not found): 25311 clocks; Return value: -1
BMFJT  (BM Cresta-> Not found): 61775 clocks; Return value: -1
BMLin  (BM Lingo -> Not found): 33621 clocks; Return value: -1



[attachment deleted by admin]
Title: Re: BM Test revisited
Post by: hutch-- on May 01, 2006, 09:32:28 PM
Seems to be shaping up well. here are the tests on my Prescott 2.8 PIV.


Boyer - Moore Tests:

BMBinS (BM Hutch -> Beginning of the Buffer): 936 clocks; Return value: 111
BMB2   (BM H/IB  -> Beginning of the Buffer): 893 clocks; Return value: 111
BMFJT  (BM Cresta-> Beginning of the Buffer): 1337 clocks; Return value: 111
BMLin  (BM Lingo -> Beginning of the Buffer): 753 clocks; Return value: 111

BMBinS (BM Hutch -> Middle of the Buffer): 22716 clocks; Return value: 19384
BMB2   (BM H/IB  -> Middle of the Buffer): 18740 clocks; Return value: 19384
BMFJT  (BM Cresta-> Middle of the Buffer): 37933 clocks; Return value: 19384
BMLin  (BM Lingo -> Middle of the Buffer): 21677 clocks; Return value: 19384

BMBinS (BM Hutch -> End of the Buffer): 24643 clocks; Return value: 29621
BMB2   (BM H/IB  -> End of the Buffer): 23308 clocks; Return value: 29621
BMFJT  (BM Cresta-> End of the Buffer): 45311 clocks; Return value: 29621
BMLin  (BM Lingo -> End of the Buffer): 28530 clocks; Return value: 29621

BMBinS (BM Hutch -> Not found): 30962 clocks; Return value: -1
BMB2   (BM H/IB  -> Not found): 27459 clocks; Return value: -1
BMFJT  (BM Cresta-> Not found): 63085 clocks; Return value: -1
BMLin  (BM Lingo -> Not found): 34627 clocks; Return value: -1

Press ENTER to exit...
Title: Re: BM Test revisited
Post by: Ossa on May 01, 2006, 09:36:34 PM
AMD Athlon 64 (Mobile) - added "clock spin up" code at the start to kick the processor up to its consistent 2400MHz rather than mess up the timings with the speed switch midway through. Not that the actual clocks timings would change, but it stops the system taking the time to alter the cpu clock midway through a test.


BMBinS (BM Hutch -> Beginning of the Buffer): 715 clocks; Return value: 111
BMB2   (BM H/IB  -> Beginning of the Buffer): 571 clocks; Return value: 111
BMFJT  (BM Cresta-> Beginning of the Buffer): 1109 clocks; Return value: 111
BMLin  (BM Lingo -> Beginning of the Buffer): 481 clocks; Return value: 111

BMBinS (BM Hutch -> Middle of the Buffer): 21639 clocks; Return value: 19384
BMB2   (BM H/IB  -> Middle of the Buffer): 13990 clocks; Return value: 19384
BMFJT  (BM Cresta-> Middle of the Buffer): 16568 clocks; Return value: 19384
BMLin  (BM Lingo -> Middle of the Buffer): 11546 clocks; Return value: 19384

BMBinS (BM Hutch -> End of the Buffer): 21844 clocks; Return value: 29621
BMB2   (BM H/IB  -> End of the Buffer): 15044 clocks; Return value: 29621
BMFJT  (BM Cresta-> End of the Buffer): 19166 clocks; Return value: 29621
BMLin  (BM Lingo -> End of the Buffer): 13301 clocks; Return value: 29621

BMBinS (BM Hutch -> Not found): 32096 clocks; Return value: -1
BMB2   (BM H/IB  -> Not found): 21808 clocks; Return value: -1
BMFJT  (BM Cresta-> Not found): 28107 clocks; Return value: -1
BMLin  (BM Lingo -> Not found): 19680 clocks; Return value: -1


Ossa
Title: Re: BM Test revisited
Post by: Ian_B on May 03, 2006, 04:26:51 PM
Thanks Ossa, it's good to know that my optimisations aren't just effective on Intel. I suspect the additional conditional jumps I eliminated may be the largest source of the speed gains. The DWORD aligned version doesn't seem to be the all-round improvement I'd hoped, though. It's still kicking the MMX's ass, however, so my work here is done...  :8)

Ian_B