News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

String searching

Started by Ian_B, January 05, 2006, 06:32:36 PM

Previous topic - Next topic

lingo

FindInWinInc should be after last byte of the fbuffer! :wink

lingo

#16
If I'm not wrong FJT3 is  "FJT3 Cresta/IanB,  word-length shifts" algo
and BMLin12 is "Boyer-Moore Lingo,dword-length shifts:" algo.
They are diferent algos and it is unreal to compare them... :wink
Intel(R) Core(TM)2 Duo CPU     E8500  @ 3.16GHz (SSE4)

Please terminate any high-priority tasks and press ENTER to begin...


String search tests:


Search Test 1 - value expected 37
FJT2 Cresta/IanB,  byte-length shifts: 37 ; clocks: 339
FJT3 Cresta/IanB,  word-length shifts: 37 ; clocks: 278
FJT4 Cresta/IanB, dword-length shifts: 37 ; clocks: 331
Boyer-Moore Lingo, byte-length shifts: 37 ; clocks: 88
Boyer-Moore Lingo, word-length shifts: 37 ; clocks: 107
Boyer-Moore Lingo,dword-length shifts: 37 ; clocks: 137

Search Test 2 - value expected 1007
FJT2 Cresta/IanB,  byte-length shifts: 1007 ; clocks: 19015
FJT3 Cresta/IanB,  word-length shifts: 1007 ; clocks: 18973
FJT4 Cresta/IanB, dword-length shifts: 1007 ; clocks: 19046
Boyer-Moore Lingo, byte-length shifts: 1007 ; clocks: 7862
Boyer-Moore Lingo, word-length shifts: 1007 ; clocks: 7867
Boyer-Moore Lingo,dword-length shifts: 1007 ; clocks: 7848

Search Test 3 - value expected 1008
FJT2 Cresta/IanB,  byte-length shifts: 1008 ; clocks: 652
FJT3 Cresta/IanB,  word-length shifts: 1008 ; clocks: 601
FJT4 Cresta/IanB, dword-length shifts: 1008 ; clocks: 763
Boyer-Moore Lingo, byte-length shifts: 1008 ; clocks: 474
Boyer-Moore Lingo, word-length shifts: 1008 ; clocks: 498
Boyer-Moore Lingo,dword-length shifts: 1008 ; clocks: 599

Search Test 4 - value expected 1008
FJT2 Cresta/IanB,  byte-length shifts: 1008 ; clocks: 1853
FJT3 Cresta/IanB,  word-length shifts: 1008 ; clocks: 1303
FJT4 Cresta/IanB, dword-length shifts: 1008 ; clocks: 1863
Boyer-Moore Lingo, byte-length shifts: 1008 ; clocks: 1096
Boyer-Moore Lingo, word-length shifts: 1008 ; clocks: 1120
Boyer-Moore Lingo,dword-length shifts: 1008 ; clocks: 1156

Search Test 5 - value expected 1008
FJT2 Cresta/IanB,  byte-length shifts: 1008 ; clocks: 2066
FJT3 Cresta/IanB,  word-length shifts: 1008 ; clocks: 1299
FJT4 Cresta/IanB, dword-length shifts: 1008 ; clocks: 2056
Boyer-Moore Lingo, byte-length shifts: 1008 ; clocks: 1092
Boyer-Moore Lingo, word-length shifts: 1008 ; clocks: 1114
Boyer-Moore Lingo,dword-length shifts: 1008 ; clocks: 1145

Search Test 6 - value expected 1008
FJT2 Cresta/IanB,  byte-length shifts: 1008 ; clocks: 767
FJT3 Cresta/IanB,  word-length shifts: 1008 ; clocks: 711
FJT4 Cresta/IanB, dword-length shifts: 1008 ; clocks: 776
Boyer-Moore Lingo, byte-length shifts: 1008 ; clocks: 583
Boyer-Moore Lingo, word-length shifts: 1008 ; clocks: 608
Boyer-Moore Lingo,dword-length shifts: 1008 ; clocks: 638

Search Test 7 - value expected 1009
FJT2 Cresta/IanB,  byte-length shifts: 1009 ; clocks: 959
FJT3 Cresta/IanB,  word-length shifts: 1009 ; clocks: 905
FJT4 Cresta/IanB, dword-length shifts: 1009 ; clocks: 962
Boyer-Moore Lingo, byte-length shifts: 1009 ; clocks: 768
Boyer-Moore Lingo, word-length shifts: 1009 ; clocks: 792
Boyer-Moore Lingo,dword-length shifts: 1009 ; clocks: 826

Press ENTER to exit...




[attachment deleted by admin]

jj2007

Hi Lingo,
I can give you some timings, but I won't have the time to add my algo to your testbed.

By the way, not everybody has Masm32 installed on C:  :wink
    include \masm32\include\windows.inc
instead of
    include C:\masm32\include\windows.inc
would be helpful for others who want to assemble your code.


Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)

String search tests:


Search Test 1 - value expected 37
FJT2 Cresta/IanB,  byte-length shifts: 37 ; clocks: 424
FJT3 Cresta/IanB,  word-length shifts: 37 ; clocks: 461
FJT4 Cresta/IanB, dword-length shifts: 37 ; clocks: 506
Boyer-Moore Lingo, byte-length shifts: 37 ; clocks: 169
Boyer-Moore Lingo, word-length shifts: 37 ; clocks: 190
Boyer-Moore Lingo,dword-length shifts: 37 ; clocks: 241

Search Test 2 - value expected 1007
FJT2 Cresta/IanB,  byte-length shifts: 1007 ; clocks: 17932
FJT3 Cresta/IanB,  word-length shifts: 1007 ; clocks: 17944
FJT4 Cresta/IanB, dword-length shifts: 1007 ; clocks: 18007
Boyer-Moore Lingo, byte-length shifts: 1007 ; clocks: 10085
Boyer-Moore Lingo, word-length shifts: 1007 ; clocks: 8589
Boyer-Moore Lingo,dword-length shifts: 1007 ; clocks: 9015

Search Test 3 - value expected 1008
FJT2 Cresta/IanB,  byte-length shifts: 1008 ; clocks: 778
FJT3 Cresta/IanB,  word-length shifts: 1008 ; clocks: 818
FJT4 Cresta/IanB, dword-length shifts: 1008 ; clocks: 865
Boyer-Moore Lingo, byte-length shifts: 1008 ; clocks: 550
Boyer-Moore Lingo, word-length shifts: 1008 ; clocks: 581
Boyer-Moore Lingo,dword-length shifts: 1008 ; clocks: 655

Search Test 4 - value expected 1008
FJT2 Cresta/IanB,  byte-length shifts: 1008 ; clocks: 2269
FJT3 Cresta/IanB,  word-length shifts: 1008 ; clocks: 2393
FJT4 Cresta/IanB, dword-length shifts: 1008 ; clocks: 2367
Boyer-Moore Lingo, byte-length shifts: 1008 ; clocks: 1767
Boyer-Moore Lingo, word-length shifts: 1008 ; clocks: 1754
Boyer-Moore Lingo,dword-length shifts: 1008 ; clocks: 1322

Search Test 5 - value expected 1008
FJT2 Cresta/IanB,  byte-length shifts: 1008 ; clocks: 1759
FJT3 Cresta/IanB,  word-length shifts: 1008 ; clocks: 2332
FJT4 Cresta/IanB, dword-length shifts: 1008 ; clocks: 1867
Boyer-Moore Lingo, byte-length shifts: 1008 ; clocks: 1181
Boyer-Moore Lingo, word-length shifts: 1008 ; clocks: 1784
Boyer-Moore Lingo,dword-length shifts: 1008 ; clocks: 1632

Search Test 6 - value expected 1008
FJT2 Cresta/IanB,  byte-length shifts: 1008 ; clocks: 878
FJT3 Cresta/IanB,  word-length shifts: 1008 ; clocks: 910
FJT4 Cresta/IanB, dword-length shifts: 1008 ; clocks: 951
Boyer-Moore Lingo, byte-length shifts: 1008 ; clocks: 668
Boyer-Moore Lingo, word-length shifts: 1008 ; clocks: 685
Boyer-Moore Lingo,dword-length shifts: 1008 ; clocks: 756

Search Test 7 - value expected 1009
FJT2 Cresta/IanB,  byte-length shifts: 1009 ; clocks: 1069
FJT3 Cresta/IanB,  word-length shifts: 1009 ; clocks: 1098
FJT4 Cresta/IanB, dword-length shifts: 1009 ; clocks: 1143
Boyer-Moore Lingo, byte-length shifts: 1009 ; clocks: 849
Boyer-Moore Lingo, word-length shifts: 1009 ; clocks: 875
Boyer-Moore Lingo,dword-length shifts: 1009 ; clocks: 933

jj2007

#18
Quote from: lingo on April 17, 2009, 03:34:37 PM
FindInWinInc should be after last byte of the fbuffer! :wink

Hi Lingo,

To be honest, I don't get the message: Why must the pattern be after the buffer? That is not a realistic constraint imho.

I have incorporated the Windows.inc test into the new version of the testbed, and all your three algos seem to search (as the cycle count demonstrates) but return zero. For comparison, FJT3 and all others return the correct number of bytes.

Another little problem is that if I replace Duplicate inc with Duplicate inx, your algos throw an exception ::)

Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)

Finding 'Duplicate inc' at the end of \masm32\include\windows.inc:

Lib InString:   849668          ; clocks: 4863499
FJT3:           849667          ; clocks: 1372417
BMLinDB:        0               ; clocks: 1393513
BMLinDW:        0               ; clocks: 1395150
BMLinDD:        0               ; clocks: 1352871
InstrJJ:        849668          ; clocks: 1063916

Code sizes:
FJT3            249
BMLinDB         395
InstrJJ         282


EDIT: Managed to squeeze out a few percent, see timings above and new attachment.

[attachment deleted by admin]

lingo

BM is exact pattern matching algorithm rather than substring (only) searching  algorithm (as InString).
So, "Usage would include tasks like recursively searching files for virus patterns, searching databases for keys or data, text and word processing and any other task that requires handling large amounts of data at very high speed" by Hutch from here
Example: With BM we can search for bytes->  SrcPattern DB 12,0,5,99,0,0,27,0,3
with InString we can't...  :wink

If you want to use my BM algo I have really a constraint:
-your SrcPattern should be more than 8 bytes long
-the SrcPattern must be after the buffer


"That is not a realistic constraint imho."
I don't need such info...


jj2007

Quote from: lingo on April 20, 2009, 06:06:33 PM
BM is exact pattern matching algorithm rather than substring (only) searching  algorithm (as InString).
Well, the title says "string searching"  :bg

Quote
So, "Usage would include tasks like recursively searching files for virus patterns, searching databases for keys or data, text and word processing and any other task that requires handling large amounts of data at very high speed" by Hutch from here
Example: With BM we can search for bytes->  SrcPattern DB 12,0,5,99,0,0,27,0,3
with InString we can't...  :wink
If you mean the zero delimiters: I could adjust my algo accordingly. But I won't because I am mainly interested in a fast InString replacement. Your algo is extremely fast on long search patterns, which makes it excellent for virus detection. My version will often be faster for simple text searches.

Quote
If you want to use my BM algo I have really a constraint:
-your SrcPattern should be more than 8 bytes long
-the SrcPattern must be after the buffer

Strange that the other BM algos don't have these constraints, and don't throw an exception if there is no match... ::)

Quote
"That is not a realistic constraint imho."
I don't need such info...

Lucky you :bg