My results
msvcrt strstr 126
windows api strstr 1112(ugh)
Masm InString 220
InStringL 89 (don't know who wrote this one but is fantastic code)
ntdll strstr (undocumented? I couldn't get it to work)
[attachment deleted by admin]
cube,
On my work PIV the MSVCRT strstr is faster than the InstringL proc. The API is very slow and the MASM32 module carries a bit too much overhead. It was never classed as fast, it just generally worked.
hutch--,
Really? I tested on many strings InStringL is always the fastest, seems to get better with longer strings IMO anyway.
Cube,
What box are you running the tests on ? I did the test on a 2.8 gig Northwood core PIV.
E^cube,
It is my old Instring proc (at the end with L as Lingo)
http://www.asmcommunity.net/board/index.php?topic=13638.msg134462#msg134462
It is still the fastest proc
at my box with P4 3.6 GHz :lol
times:
timing msvcrt strstr - 319
timing strstr - 1830
timing InString - 337
timing InStringL- 148
Regards,
Lingo
Results on my AMD Athlon 64 X2 Dual Core:
msvcrt strstr 124
windows api strstr 1089
Masm InString 217
InStringL 89
Edit:
While I'm at it, would anybody mind explaining why Lingo's code is so much faster than the rests? Does having prologue and epilogue turned off make any difference in terms of speed? This is not the first time I've seen lingo's cryptic and very-hard-to-read code beat other codes, so I must ask: why? :dazzled:
The NTDLL strstr appears to be slightly slower than the MSVCRT strstr. Also, InString returns a character position instead of an address, so it's probably doing more work.
[attachment deleted by admin]
Quote from: Seb on November 18, 2006, 02:00:32 PM
While I'm at it, would anybody mind explaining why Lingo's code is so much faster than the rests? Does having prologue and epilogue turned off make any difference in terms of speed? This is not the first time I've seen lingo's cryptic and very-hard-to-read code beat other codes, so I must ask: why? :dazzled:
Maybe he is skilled? :8)
Quote from: EduardoS on November 18, 2006, 04:20:54 PM
Quote from: Seb on November 18, 2006, 02:00:32 PM
While I'm at it, would anybody mind explaining why Lingo's code is so much faster than the rests? Does having prologue and epilogue turned off make any difference in terms of speed? This is not the first time I've seen lingo's cryptic and very-hard-to-read code beat other codes, so I must ask: why? :dazzled:
Maybe he is skilled? :8)
Yes, obviously he is. ::)
Quote from: Seb on November 18, 2006, 02:00:32 PM
Results on my AMD Athlon 64 X2 Dual Core:
msvcrt strstr 124
windows api strstr 1089
Masm InString 217
InStringL 89
Edit:
While I'm at it, would anybody mind explaining why Lingo's code is so much faster than the rests? Does having prologue and epilogue turned off make any difference in terms of speed? This is not the first time I've seen lingo's cryptic and very-hard-to-read code beat other codes, so I must ask: why? :dazzled:
Yes hes very skilled , if you look at his code it makes sense he doesn't use the stack-frame and access the parameters directly(prologue and epilogue is turning off the ebp setup code,)
mov edi, dword ptr [esp+8] ;param 1
mov esi, dword ptr [esp+12] ;param 2
mov ecx, dword ptr [esp+16] ;param 3
instead of
mov edi, param1
mov esi, param 2
mov ecx,param 3
also if you read his code hes checks 2 things at once, uses byte alignment, and reading somethings in dwords. As Donkey has mentioned before the cpu reads everything in dwords, not bytes or words, all dwords, just shifts up and down when needed, so he uses a lot of speed techniques, all in all lingo is a n asm all-star.
It did have that look of Lingo's code. :bg
Something that is worth checking is doing a test on a much longer string, from memory the test piece was using a very short string and this may account for the bad result with the API which usually have SEH and other things around them to slow them up. A longer test piece would remove those factors to a large degree.
I was thinking of loading an external file of 16 or 32k just so you get past the minor overhead factors.
Quote from: E^cube on November 18, 2006, 09:48:20 PM
all in all lingo is a n asm all-star.
Agreed.
With the risk of going off-topic in the discussion, I have a question for lingo: Would you mind mind telling us curious fokes what optimization manuals or papers you read? I'm very interested in every aspect of Assembly code optimization, so it'd be greatly appreciated if you shared your resources with your fellow comrades at the MASM forum. :U
Seb,
I think Lingo IS the optimisation manual, he has been producing very fast algos for a long time now. :U
Quote from: Seb on November 18, 2006, 10:22:34 PM
Quote from: E^cube on November 18, 2006, 09:48:20 PM
all in all lingo is a n asm all-star.
Agreed.
With the risk of going off-topic in the discussion, I have a question for lingo: Would you mind mind telling us curious fokes what optimization manuals or papers you read? I'm very interested in every aspect of Assembly code optimization, so it'd be greatly appreciated if you shared your resources with your fellow comrades at the MASM forum. :U
I'm not lingo but http://www.mark.masmcode.com/ and http://www.masm32.com/board/index.php?topic=5464.0 are good optimization pages.
Quote from: hutch-- on November 18, 2006, 10:24:01 PM
Seb,
I think Lingo IS the optimisation manual, he has been producing very fast algos for a long time now. :U
Hear, hear! :cheekygreen:
Quote from: E^cube on November 18, 2006, 10:30:43 PM
I'm not lingo but http://www.mark.masmcode.com/ and http://www.masm32.com/board/index.php?topic=5464.0 are good optimization pages.
Thanks!
I had a quick play with the library version of InString by onrolling a couple of the loops and it is a bit faster but the algo properly needs a rewrite. An interesting factor is both the msvcrt and lingo's algos get faster with longer patterns where a byte scanner gets slower in its matching loop.
Here are the timings on my 2 PIVs.
Northwood code 2.8 gig PIV
719 timing msvcrt strstr
763 timing InStringx
480 timing InStringL
699 timing msvcrt strstr
768 timing InStringx
474 timing InStringL
696 timing msvcrt strstr
769 timing InStringx
465 timing InStringL
713 timing msvcrt strstr
764 timing InStringx
465 timing InStringL
698 timing msvcrt strstr
767 timing InStringx
470 timing InStringL
704 timing msvcrt strstr
763 timing InStringx
466 timing InStringL
716 timing msvcrt strstr
767 timing InStringx
462 timing InStringL
702 timing msvcrt strstr
767 timing InStringx
464 timing InStringL
----------------------
705 average msvcrt strstr
766 average InStringx
468 average InStringL
----------------------
Press any key to continue ...
Prescott core 3.0 gig PIV.
845 timing msvcrt strstr
848 timing InStringx
434 timing InStringL
846 timing msvcrt strstr
846 timing InStringx
438 timing InStringL
844 timing msvcrt strstr
845 timing InStringx
447 timing InStringL
845 timing msvcrt strstr
845 timing InStringx
464 timing InStringL
842 timing msvcrt strstr
844 timing InStringx
445 timing InStringL
844 timing msvcrt strstr
843 timing InStringx
450 timing InStringL
843 timing msvcrt strstr
847 timing InStringx
458 timing InStringL
847 timing msvcrt strstr
845 timing InStringx
470 timing InStringL
----------------------
844 average msvcrt strstr
845 average InStringx
450 average InStringL
----------------------
Press any key to continue ...
Lingo's algo has clearly got the legs on the other two.
[attachment deleted by admin]
I have coded a new version of Instring "finstr" which has a lot less overhead than the library version. This version differs from the library version in that it does not use a preset length for either the source string or the search pattern but calculates them as zero terminated strings. The new version is faster on normal to short length strings than either Lingo's or the MSVCRT strstr but as the string gets longer, Lingos starts to become faster. The timings below are on my 2 PIVs but I did run a test on an AMD I own and Lingo's code better suits the AMD than either the MSVCRT or my own does.
These are the results on my 2 PIVs.
2.8 gig Northwood PIV
163 timing msvcrt strstr
70 timing finstr
118 timing InStringL
163 timing msvcrt strstr
71 timing finstr
127 timing InStringL
164 timing msvcrt strstr
71 timing finstr
120 timing InStringL
162 timing msvcrt strstr
70 timing finstr
120 timing InStringL
162 timing msvcrt strstr
71 timing finstr
119 timing InStringL
163 timing msvcrt strstr
71 timing finstr
119 timing InStringL
161 timing msvcrt strstr
72 timing finstr
119 timing InStringL
160 timing msvcrt strstr
71 timing finstr
118 timing InStringL
----------------------
162 average msvcrt strstr
70 average finstr
120 average InStringL
----------------------
Press any key to continue ...
3.0 gig Prescott PIV
257 timing msvcrt strstr
110 timing finstr
141 timing InStringL
259 timing msvcrt strstr
110 timing finstr
141 timing InStringL
258 timing msvcrt strstr
110 timing finstr
142 timing InStringL
253 timing msvcrt strstr
110 timing finstr
142 timing InStringL
257 timing msvcrt strstr
110 timing finstr
142 timing InStringL
257 timing msvcrt strstr
110 timing finstr
142 timing InStringL
257 timing msvcrt strstr
110 timing finstr
142 timing InStringL
257 timing msvcrt strstr
110 timing finstr
141 timing InStringL
----------------------
256 average msvcrt strstr
110 average finstr
141 average InStringL
----------------------
Press any key to continue ...
[attachment deleted by admin]
My AMD, K-8:
instr2:
813 timing msvcrt strstr
749 timing InStringx
313 timing InStringL
769 timing msvcrt strstr
749 timing InStringx
371 timing InStringL
768 timing msvcrt strstr
771 timing InStringx
318 timing InStringL
782 timing msvcrt strstr
749 timing InStringx
307 timing InStringL
768 timing msvcrt strstr
751 timing InStringx
310 timing InStringL
768 timing msvcrt strstr
751 timing InStringx
319 timing InStringL
766 timing msvcrt strstr
749 timing InStringx
318 timing InStringL
771 timing msvcrt strstr
750 timing InStringx
307 timing InStringL
----------------------
775 average msvcrt strstr
752 average InStringx
320 average InStringL
----------------------
Press any key to continue ...
instr3:
122 timing msvcrt strstr
102 timing finstr
87 timing InStringL
118 timing msvcrt strstr
90 timing finstr
91 timing InStringL
118 timing msvcrt strstr
99 timing finstr
89 timing InStringL
119 timing msvcrt strstr
31 timing finstr
56 timing InStringL
118 timing msvcrt strstr
87 timing finstr
92 timing InStringL
117 timing msvcrt strstr
89 timing finstr
89 timing InStringL
118 timing msvcrt strstr
89 timing finstr
91 timing InStringL
118 timing msvcrt strstr
89 timing finstr
91 timing InStringL
----------------------
118 average msvcrt strstr
84 average finstr
85 average InStringL
----------------------
Press any key to continue ...
Sorry Hutch,
I apologize... :(
just forgot about that...
Regards,
Lingo
The source code for the "finstr" proc has been modified by removing the loop unrolls. I removed the comments and the timings are as below. The algo was designed with the large unroll to compensate for the extra testing for zero in both the scan loop and the match loop.
2.8 gig Northwood core PIV
160 timing msvcrt strstr
72 timing finstr
104 timing InStringL
162 timing msvcrt strstr
72 timing finstr
106 timing InStringL
164 timing msvcrt strstr
75 timing finstr
103 timing InStringL
163 timing msvcrt strstr
72 timing finstr
106 timing InStringL
162 timing msvcrt strstr
72 timing finstr
106 timing InStringL
157 timing msvcrt strstr
73 timing finstr
105 timing InStringL
163 timing msvcrt strstr
75 timing finstr
105 timing InStringL
159 timing msvcrt strstr
73 timing finstr
105 timing InStringL
----------------------
161 average msvcrt strstr
73 average finstr
105 average InStringL
----------------------
Press any key to continue ...
3.0 gig Prescott core PIV
257 timing msvcrt strstr
110 timing finstr
140 timing InStringL
257 timing msvcrt strstr
110 timing finstr
142 timing InStringL
257 timing msvcrt strstr
110 timing finstr
142 timing InStringL
255 timing msvcrt strstr
111 timing finstr
140 timing InStringL
257 timing msvcrt strstr
110 timing finstr
142 timing InStringL
257 timing msvcrt strstr
110 timing finstr
143 timing InStringL
257 timing msvcrt strstr
112 timing finstr
141 timing InStringL
258 timing msvcrt strstr
110 timing finstr
142 timing InStringL
----------------------
256 average msvcrt strstr
110 average finstr
141 average InStringL
----------------------
Press any key to continue ...
Later.
Tested with the longer string commented out in the test piece, Lingo's algo is clearly faster as the string length gets longer.
710 timing msvcrt strstr
526 timing finstr
343 timing InStringL
700 timing msvcrt strstr
525 timing finstr
349 timing InStringL
708 timing msvcrt strstr
526 timing finstr
341 timing InStringL
707 timing msvcrt strstr
524 timing finstr
350 timing InStringL
708 timing msvcrt strstr
530 timing finstr
334 timing InStringL
706 timing msvcrt strstr
525 timing finstr
334 timing InStringL
705 timing msvcrt strstr
522 timing finstr
354 timing InStringL
701 timing msvcrt strstr
524 timing finstr
352 timing InStringL
----------------------
705 average msvcrt strstr
525 average finstr
344 average InStringL
----------------------
Press any key to continue ...
[attachment deleted by admin]
114 timing msvcrt strstr
95 timing finstr
108 timing InStringL
114 timing msvcrt strstr
97 timing finstr
106 timing InStringL
115 timing msvcrt strstr
95 timing finstr
107 timing InStringL
115 timing msvcrt strstr
96 timing finstr
112 timing InStringL
113 timing msvcrt strstr
97 timing finstr
106 timing InStringL
115 timing msvcrt strstr
95 timing finstr
107 timing InStringL
113 timing msvcrt strstr
108 timing finstr
107 timing InStringL
113 timing msvcrt strstr
96 timing finstr
107 timing InStringL
----------------------
114 average msvcrt strstr
97 average finstr
107 average InStringL
----------------------
Press any key to continue ...
amd 64 3800+
Hi Hutch-
Looking at your code, I don't understand why you duplicate the code inside the repeat one more time outside the repeat loop.
Isn't this-
scanloop:
REPEAT 15
add esi, ebp
movzx edx, BYTE PTR [esi]
cmp eax, edx
je prematch
test edx, edx ; test for source terminator
jz no_match
ENDM
add esi, ebp
movzx edx, BYTE PTR [esi]
cmp eax, edx
je prematch
test edx, edx ; test for source terminator
jnz scanloop
jmp no_match
the same as this-
REPEAT 16
add esi, ebp
movzx edx, BYTE PTR [esi]
cmp eax, edx
je prematch
test edx, edx ; test for source terminator
jz no_match
ENDM
jmp scanloop
? At best it saves one cycle every 16 bytes tested for the most likely jump being to scanloop.
And if you were to get four bytes to test at once, you could replace 3 movzx's with one rol edx,16 like-
REPEAT 4
add esi, ebp
mov edx,[esi]
test dl, dl ; test for source terminator
jz no_match
cmp al, dl
je prematch
add esi,ebp
test dh, dh ; test for source terminator
jz no_match
cmp al, dh
je prematch
add esi,ebp
rol edx,16
test dl, dl ; test for source terminator
jz no_match
cmp al, dl
je prematch
add esi,ebp
test dh, dh ; test for source terminator
jz no_match
cmp al, dh
je prematch
ENDM
jmp scanloop
which run slightly faster on my amd.
Jim,
> Looking at your code, I don't understand why you duplicate the code inside the repeat one more time outside the repeat loop.
Habit, I usually do an unroll by repeating the first one less than the required unroll count and changing any necessary jumps.
I would be interested in seeing your alternate version with the DWORD read and the ROL. It would probably need to be aligned before the DWORD read which is easy enough to do. The only thing that would worry me is the stall on some older Intel hardware for doing byte reads after a DWORD write.
P3:
129 timing msvcrt strstr
114 timing finstr
111 timing InStringL
129 timing msvcrt strstr
113 timing finstr
111 timing InStringL
129 timing msvcrt strstr
113 timing finstr
110 timing InStringL
129 timing msvcrt strstr
113 timing finstr
111 timing InStringL
129 timing msvcrt strstr
114 timing finstr
110 timing InStringL
129 timing msvcrt strstr
113 timing finstr
111 timing InStringL
128 timing msvcrt strstr
113 timing finstr
111 timing InStringL
129 timing msvcrt strstr
113 timing finstr
110 timing InStringL
----------------------
128 average msvcrt strstr
113 average finstr
110 average InStringL
----------------------
Press any key to continue ...
P3 Mobile (probably 1400MHz, though maybe less, SpeedStep doesn't seem to work properly...):
93 timing msvcrt strstr
108 timing finstr
71 timing InStringL
96 timing msvcrt strstr
111 timing finstr
71 timing InStringL
96 timing msvcrt strstr
131 timing finstr
70 timing InStringL
93 timing msvcrt strstr
108 timing finstr
71 timing InStringL
95 timing msvcrt strstr
108 timing finstr
70 timing InStringL
94 timing msvcrt strstr
118 timing finstr
70 timing InStringL
93 timing msvcrt strstr
119 timing finstr
69 timing InStringL
93 timing msvcrt strstr
108 timing finstr
70 timing InStringL
----------------------
94 average msvcrt strstr
113 average finstr
70 average InStringL
----------------------
Press any key to continue ...
Interesting that finstr is the same for both mine and MichaelW's computers, though the others are approx 30-40 faster on mine (consistently). I guess I am running at 1400MHz :bg
Cheers,
Zooba :U
QuoteGenuineIntel Family 6 Model 13 Stepping 6
Intel Brand String: Intel(R) Pentium(R) M processor
Features: FPU TSC CX8 CMOV CLFSH FXSR MMX SSE SSE2
Quote94 timing msvcrt strstr
107 timing finstr
71 timing InStringL
93 timing msvcrt strstr
122 timing finstr
71 timing InStringL
95 timing msvcrt strstr
129 timing finstr
71 timing InStringL
94 timing msvcrt strstr
127 timing finstr
71 timing InStringL
94 timing msvcrt strstr
122 timing finstr
71 timing InStringL
94 timing msvcrt strstr
107 timing finstr
71 timing InStringL
93 timing msvcrt strstr
123 timing finstr
71 timing InStringL
94 timing msvcrt strstr
107 timing finstr
71 timing InStringL
----------------------
93 average msvcrt strstr
118 average finstr
71 average InStringL
----------------------
Press any key to continue ...
Hutch-
Quote from: hutch-- on November 21, 2006, 05:54:26 AM
>
I would be interested in seeing your alternate version with the DWORD read and the ROL. It would probably need to be aligned before the DWORD read which is easy enough to do. The only thing that would worry me is the stall on some older Intel hardware for doing byte reads after a DWORD write.
Look at the last block of code in my previous post. All I did is replace your block with that one, and it reduced the run from 115 to 107 cycles on the shorter string. Still much slower than Lingo on the longer string however. But look carefully, I could have screwed up the endian thing. I haven't done any masm since february and am a littly rusty.
edit. Actually, SHR edx,16 would be better than the ROL, and move it up one line.
edit2. I just dusted off my old celeron laptop and gave it a try. No difference in speed so basically, forget everything I said.
Shorter substrings and rolling out?
OK, ... new InStringL with new results: :lol
207 timing msvcrt strstr
111 timing finstr
70 timing InStringL
209 timing msvcrt strstr
117 timing finstr
71 timing InStringL
209 timing msvcrt strstr
111 timing finstr
70 timing InStringL
209 timing msvcrt strstr
112 timing finstr
70 timing InStringL
206 timing msvcrt strstr
113 timing finstr
70 timing InStringL
209 timing msvcrt strstr
113 timing finstr
70 timing InStringL
209 timing msvcrt strstr
111 timing finstr
70 timing InStringL
213 timing msvcrt strstr
110 timing finstr
70 timing InStringL
----------------------
208 average msvcrt strstr
112 average finstr
70 average InStringL
----------------------
Press any key to continue ...
Regards,
Lingo
[attachment deleted by admin]
114 timing msvcrt strstr
94 timing finstr
49 timing InStringL
113 timing msvcrt strstr
94 timing finstr
49 timing InStringL
112 timing msvcrt strstr
95 timing finstr
49 timing InStringL
113 timing msvcrt strstr
93 timing finstr
50 timing InStringL
113 timing msvcrt strstr
95 timing finstr
49 timing InStringL
113 timing msvcrt strstr
94 timing finstr
48 timing InStringL
113 timing msvcrt strstr
99 timing finstr
50 timing InStringL
112 timing msvcrt strstr
95 timing finstr
49 timing InStringL
----------------------
112 average msvcrt strstr
94 average finstr
49 average InStringL
----------------------
Press any key to continue ...
Very nice timings! :U
107 timing msvcrt strstr
90 timing finstr
47 timing InStringL
107 timing msvcrt strstr
90 timing finstr
47 timing InStringL
107 timing msvcrt strstr
89 timing finstr
46 timing InStringL
107 timing msvcrt strstr
89 timing finstr
47 timing InStringL
107 timing msvcrt strstr
90 timing finstr
47 timing InStringL
107 timing msvcrt strstr
89 timing finstr
47 timing InStringL
106 timing msvcrt strstr
102 timing finstr
47 timing InStringL
106 timing msvcrt strstr
90 timing finstr
46 timing InStringL
----------------------
106 average msvcrt strstr
91 average finstr
46 average InStringL
----------------------
I have rewritten the scanloop part of finstr using a different technique to determine if the 4 bytes being tested contain a null termniator or not. i have been testing against Lingo's latest which is a very fast algo but it does have some anomolies in terms of what is pointed at it and these make it inconsistent in its speed. The match loop is a lot faster in his latest version but for some reason it has problems with text that starts with a blank space " word " etc ...
I have not added the final part of finstr which is to test up to the last 3 bytes so in its current form it will miss a search pattern up to 3 bytes long if it is at the end of the source being searched. It does not effect any of the examples in the test piece.
These are the results on my dev PIV.
Source length = 185
Mismatch length = 290
280 finstr mismatch recovery
21 finstr short
81 finstr medium
128 finstr long
175 finstr end
----------------------
Short Search
----------------------
74 average finstr
114 average InStringL
----------------------
----------------------
Medium Search
----------------------
195 average finstr
173 average InStringL
----------------------
----------------------
Long Search
----------------------
385 average finstr
467 average InStringL
----------------------
------------------------------------------
Full Length search on high ANSI characters
------------------------------------------
353 average finstr
505 average InStringL
----------------------
----------------------
Mismatch Recovery Test
----------------------
1186 average finstr
1365 average InStringL
----------------------
Press any key to continue ...
[attachment deleted by admin]
Hi,
that's my results on an AMD FX57
Source length = 185
Mismatch length = 290
280 finstr mismatch recovery
21 finstr short
81 finstr medium
128 finstr long
175 finstr end
----------------------
Short Search
----------------------
86 average finstr
86 average InStringL
----------------------
----------------------
Medium Search
----------------------
240 average finstr
111 average InStringL
----------------------
----------------------
Long Search
----------------------
464 average finstr
355 average InStringL
----------------------
------------------------------------------
Full Length search on high ANSI characters
------------------------------------------
426 average finstr
306 average InStringL
----------------------
----------------------
Mismatch Recovery Test
----------------------
1416 average finstr
928 average InStringL
----------------------
Press any key to continue ...
Ulli
Good job Hutch.
Can you explain how this part works?
; ----------------------------------------------
; test if the next 4 bytes contain an ascii zero
; ----------------------------------------------
add esi, 4
mov edx, [esi]
lea ecx, [edx-01010101h]
not edx
and ecx, esp ; modify ECX
test ecx, edx ; test with no modify
jnz no_match
The old method of doing one load of a double into edx and testing dl and dh for example-
; ----------------------------------------
; test each byte for a patn 1st char match
; ----------------------------------------
mov ebx,[esi]
mov edx,[esi+2]
sub bl,al
jz si0
sub bh,al
jz si1
sub dl,al
jz si2
sub dh,al
jz si3
is faster on an AMD, but slower on the celeron.
One problem-
Try
Mainstr db "The time has come the walrus said to speak of many things,"
db " of birds and bugs and ceiling wax, of cabbages and kings."
db " And if the sea is boiling hot and whether pigs have wings «¤»¼µþ®§© z",0
Substr1 db "z",0
It never finds the "z"
Timing results:
your latest on AMD latest code with old your latest on Old method on
Athlon XP 3000+ method of match as Celeron 2.0 Ghz Celeron
at 2.16 Ghz above on AMD
Source length = 185 Source length = 185 Source length = 185 Source length = 185
Mismatch length = 290 Mismatch length = 290 Mismatch length = 290 Mismatch length = 290
280 finstr mismatch r 280 finstr mismatch r 280 finstr mismatch r 280 finstr mismatch r
21 finstr short 21 finstr short 21 finstr short 21 finstr short
81 finstr medium 81 finstr medium 81 finstr medium 81 finstr medium
128 finstr long 128 finstr long 128 finstr long 128 finstr long
175 finstr end 175 finstr end 175 finstr end 175 finstr end
--------------------- --------------------- --------------------- ---------------------
Short Search Short Search Short Search Short Search
--------------------- --------------------- --------------------- ---------------------
110 average finstr 105 average finstr 70 average finstr 88 average finstr
113 average InStringL 113 average InStringL 112 average InStringL 116 average InStringL
--------------------- --------------------- --------------------- ---------------------
--------------------- --------------------- --------------------- ---------------------
Medium Search Medium Search Medium Search Medium Search
--------------------- --------------------- --------------------- ---------------------
304 average finstr 255 average finstr 190 average finstr 239 average finstr
142 average InStringL 141 average InStringL 169 average InStringL 173 average InStringL
--------------------- --------------------- --------------------- ---------------------
--------------------- --------------------- --------------------- ---------------------
Long Search Long Search Long Search Long Search
--------------------- --------------------- --------------------- ---------------------
478 average finstr 421 average finstr 388 average finstr 444 average finstr
377 average InStringL 377 average InStringL 461 average InStringL 453 average InStringL
--------------------- --------------------- --------------------- ---------------------
--------------------- --------------------- --------------------- ---------------------
Full Length search on Full Length search on Full Length search on Full Length search on
--------------------- --------------------- --------------------- ---------------------
557 average finstr 419 average finstr 345 average finstr 425 average finstr
539 average InStringL 539 average InStringL 502 average InStringL 492 average InStringL
--------------------- --------------------- --------------------- ---------------------
--------------------- --------------------- --------------------- ---------------------
Mismatch Recovery Tes Mismatch Recovery Tes Mismatch Recovery Tes Mismatch Recovery Tes
--------------------- --------------------- --------------------- ---------------------
1676 average finstr 1710 average finstr 1151 average finstr 1245 average finstr
1166 average InString 1168 average InString 1322 average InString 1309 average InString
--------------------- --------------------- --------------------- ---------------------
Press any key to cont Press any key to cont Press any key to cont Press any key to cont
So looking at the numbers, I would say your new method is the best overall if you fix the problem with finding the last character.
Jim,
Thanks for doing all of the tests.
I posted the reason why above.
Quote
I have not added the final part of finstr which is to test up to the last 3 bytes so in its current form it will miss a search pattern up to 3 bytes long if it is at the end of the source being searched. It does not effect any of the examples in the test piece.
The block of code that determines if a null is within the 4 bytes read in is AND masking. Agner Fog used it in his classic StrLen algo. I used the preset value in ESP because its a normal speedup for Agner's algo.
QuoteI posted the reason why above.
opps :red
P3:
Source length = 185
Mismatch length = 290
280 finstr mismatch recovery
21 finstr short
81 finstr medium
128 finstr long
175 finstr end
----------------------
Short Search
----------------------
107 average finstr
114 average InStringL
----------------------
----------------------
Medium Search
----------------------
295 average finstr
191 average InStringL
----------------------
----------------------
Long Search
----------------------
503 average finstr
491 average InStringL
----------------------
------------------------------------------
Full Length search on high ANSI characters
------------------------------------------
561 average finstr
493 average InStringL
----------------------
----------------------
Mismatch Recovery Test
----------------------
1647 average finstr
1343 average InStringL
----------------------
Pentium D (family 15, model 4, stepping 7):
Source length = 185
Mismatch length = 290
280 finstr mismatch recovery
21 finstr short
81 finstr medium
128 finstr long
175 finstr end
----------------------
Short Search
----------------------
98 average finstr
96 average InStringL
----------------------
----------------------
Medium Search
----------------------
256 average finstr
180 average InStringL
----------------------
----------------------
Long Search
----------------------
434 average finstr
416 average InStringL
----------------------
------------------------------------------
Full Length search on high ANSI characters
------------------------------------------
518 average finstr
524 average InStringL
----------------------
----------------------
Mismatch Recovery Test
----------------------
1416 average finstr
1358 average InStringL
----------------------