News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

BM Test revisited

Started by hutch--, April 23, 2006, 01:12:04 PM

Previous topic - Next topic

hutch--

Ian,

These are the results on my Prescott PIV.


Timing routines by MichaelW  - www.masmforum.com
Please terminate any high-priority tasks and press ENTER to begin.


Boyer - Moore Tests:

BMBinS (BM Hutch -> Beginning of the Buffer): 922 clocks; Return value: 111
BMB2   (BM H/IB  -> Beginning of the Buffer): 934 clocks; Return value: 111
BMFJT  (BM Cresta-> Beginning of the Buffer): 1382 clocks; Return value: 111
BMLin  (BM Lingo -> Beginning of the Buffer): 756 clocks; Return value: 111

BMBinS (BM Hutch -> Middle of the Buffer): 22468 clocks; Return value: 19384
BMB2   (BM H/IB  -> Middle of the Buffer): 17980 clocks; Return value: 19384
BMFJT  (BM Cresta-> Middle of the Buffer): 37930 clocks; Return value: 19384
BMLin  (BM Lingo -> Middle of the Buffer): 21646 clocks; Return value: 19384

BMBinS (BM Hutch -> End of the Buffer): 24660 clocks; Return value: 29621
BMB2   (BM H/IB  -> End of the Buffer): 22678 clocks; Return value: 29621
BMFJT  (BM Cresta-> End of the Buffer): 45426 clocks; Return value: 29621
BMLin  (BM Lingo -> End of the Buffer): 28452 clocks; Return value: 29621

BMBinS (BM Hutch -> Not found): 31007 clocks; Return value: -1
BMB2   (BM H/IB  -> Not found): 27855 clocks; Return value: -1
BMFJT  (BM Cresta-> Not found): 63169 clocks; Return value: -1
BMLin  (BM Lingo -> Not found): 34706 clocks; Return value: -1

Press ENTER to exit...


> Wasn't this account supposed to have been deleted, BTW...?

Sorry, I don't do requests.  :bg


Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

PBrennick

Ian,
The account still seems to be useful to you and you seem to be helpful to us.  Why not stick around.  I, for one, need all the optimization help and/or examples that I can get!

Paul
The GeneSys Project is available from:
The Repository or My crappy website

Ian_B

As you don't need to keep EAX, you can squeeze a few more cycles from it if you replace the last 2 lines of my Calc_Suffix_Shift section before the jump with:
    lea esi, [esi+eax+1]        ; minimum shift is 1

Ian_B

And this version gets just a little more speed by checking for DWORD alignment of the source data before doing the DWORD compares, so at least those memory pulls are aligned. For longer source strings this should make a difference, for shorter it adds a little more overhead. By altering all ADD/SUB 1 to INC/DEC I managed to get better times on my P4 too, which is counter-intuitive (according to the thread in the Workshop) but I can't argue with the timings and it might help on AMD. I would like to see some AMD timings for this, as hutch's original code was particularly unfavoured by that silicon.

BMBinS (BM Hutch -> Beginning of the Buffer): 881 clocks; Return value: 111
BMB2   (BM H/IB  -> Beginning of the Buffer): 841 clocks; Return value: 111
BMFJT  (BM Cresta-> Beginning of the Buffer): 1235 clocks; Return value: 111
BMLin  (BM Lingo -> Beginning of the Buffer): 667 clocks; Return value: 111

BMBinS (BM Hutch -> Middle of the Buffer): 21351 clocks; Return value: 19384
BMB2   (BM H/IB  -> Middle of the Buffer): 17005 clocks; Return value: 19384
BMFJT  (BM Cresta-> Middle of the Buffer): 36411 clocks; Return value: 19384
BMLin  (BM Lingo -> Middle of the Buffer): 20900 clocks; Return value: 19384

BMBinS (BM Hutch -> End of the Buffer): 22823 clocks; Return value: 29621
BMB2   (BM H/IB  -> End of the Buffer): 21020 clocks; Return value: 29621
BMFJT  (BM Cresta-> End of the Buffer): 43618 clocks; Return value: 29621
BMLin  (BM Lingo -> End of the Buffer): 27427 clocks; Return value: 29621

BMBinS (BM Hutch -> Not found): 29051 clocks; Return value: -1
BMB2   (BM H/IB  -> Not found): 25311 clocks; Return value: -1
BMFJT  (BM Cresta-> Not found): 61775 clocks; Return value: -1
BMLin  (BM Lingo -> Not found): 33621 clocks; Return value: -1



[attachment deleted by admin]

hutch--

Seems to be shaping up well. here are the tests on my Prescott 2.8 PIV.


Boyer - Moore Tests:

BMBinS (BM Hutch -> Beginning of the Buffer): 936 clocks; Return value: 111
BMB2   (BM H/IB  -> Beginning of the Buffer): 893 clocks; Return value: 111
BMFJT  (BM Cresta-> Beginning of the Buffer): 1337 clocks; Return value: 111
BMLin  (BM Lingo -> Beginning of the Buffer): 753 clocks; Return value: 111

BMBinS (BM Hutch -> Middle of the Buffer): 22716 clocks; Return value: 19384
BMB2   (BM H/IB  -> Middle of the Buffer): 18740 clocks; Return value: 19384
BMFJT  (BM Cresta-> Middle of the Buffer): 37933 clocks; Return value: 19384
BMLin  (BM Lingo -> Middle of the Buffer): 21677 clocks; Return value: 19384

BMBinS (BM Hutch -> End of the Buffer): 24643 clocks; Return value: 29621
BMB2   (BM H/IB  -> End of the Buffer): 23308 clocks; Return value: 29621
BMFJT  (BM Cresta-> End of the Buffer): 45311 clocks; Return value: 29621
BMLin  (BM Lingo -> End of the Buffer): 28530 clocks; Return value: 29621

BMBinS (BM Hutch -> Not found): 30962 clocks; Return value: -1
BMB2   (BM H/IB  -> Not found): 27459 clocks; Return value: -1
BMFJT  (BM Cresta-> Not found): 63085 clocks; Return value: -1
BMLin  (BM Lingo -> Not found): 34627 clocks; Return value: -1

Press ENTER to exit...
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Ossa

AMD Athlon 64 (Mobile) - added "clock spin up" code at the start to kick the processor up to its consistent 2400MHz rather than mess up the timings with the speed switch midway through. Not that the actual clocks timings would change, but it stops the system taking the time to alter the cpu clock midway through a test.


BMBinS (BM Hutch -> Beginning of the Buffer): 715 clocks; Return value: 111
BMB2   (BM H/IB  -> Beginning of the Buffer): 571 clocks; Return value: 111
BMFJT  (BM Cresta-> Beginning of the Buffer): 1109 clocks; Return value: 111
BMLin  (BM Lingo -> Beginning of the Buffer): 481 clocks; Return value: 111

BMBinS (BM Hutch -> Middle of the Buffer): 21639 clocks; Return value: 19384
BMB2   (BM H/IB  -> Middle of the Buffer): 13990 clocks; Return value: 19384
BMFJT  (BM Cresta-> Middle of the Buffer): 16568 clocks; Return value: 19384
BMLin  (BM Lingo -> Middle of the Buffer): 11546 clocks; Return value: 19384

BMBinS (BM Hutch -> End of the Buffer): 21844 clocks; Return value: 29621
BMB2   (BM H/IB  -> End of the Buffer): 15044 clocks; Return value: 29621
BMFJT  (BM Cresta-> End of the Buffer): 19166 clocks; Return value: 29621
BMLin  (BM Lingo -> End of the Buffer): 13301 clocks; Return value: 29621

BMBinS (BM Hutch -> Not found): 32096 clocks; Return value: -1
BMB2   (BM H/IB  -> Not found): 21808 clocks; Return value: -1
BMFJT  (BM Cresta-> Not found): 28107 clocks; Return value: -1
BMLin  (BM Lingo -> Not found): 19680 clocks; Return value: -1


Ossa
Website (very old): ossa.the-wot.co.uk

Ian_B

Thanks Ossa, it's good to know that my optimisations aren't just effective on Intel. I suspect the additional conditional jumps I eliminated may be the largest source of the speed gains. The DWORD aligned version doesn't seem to be the all-round improvement I'd hoped, though. It's still kicking the MMX's ass, however, so my work here is done...  :8)

Ian_B