Text only | Text with Images

The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: jj2007 on June 15, 2008, 10:09:46 PM

Title: InstrCi - case-insensitive string search
Post by: jj2007 on June 15, 2008, 10:09:46 PM

I needed a case-insensitive string search and didn't find anything usable provided by the OS, so I got inspired by the old InStringL/finstr (http://www.masm32.com/board/index.php?topic=6181.15) etc posts.
My new function InstrCi (instr case-insensitive) uses brute force and still has a limitation: the pattern should have 4 bytes, not 3, and not 88. However, that's minor polishing. Here are some results - see attachment for code:

----------------------
9441 average crt_strstr res=4217403
7332 average finstr res=2619 (should be 2620)
5805 average InStringL res=4217403
9760 average InstrCi res=4217403
1942446 average StrStrI res=4217403

So it is far from Hutch & Lingo performance but can compete with msvcrt strstr. Interesting the performance of the shlwapi StrStrI, the only function offered by the OS - it's a factor 200 slower. The other code was taken from a 22 Nov 2006 post; finstr has a minor bug, it returns a position one too low (e.g. 0 if the pattern is already found in the first byte of the source).

Now I wonder, of course, what our champions here could tickle out of this code...
One more point: I use a simple OR 20h for getting rid of the case; there are two variants here:
1. strictly case-insensitive: mov edi, 20202020h ; lowercase bit mask for finding "TEST" in "This is a Test"
2. intelligently case-insensitive: mov edi, 20h ; lowercase bit mask for finding "test" and "Test" but not "TEST" in "This is a Test"

Code Select

InstrCi proc uses edi lpSource:DWORD, lpPattern:DWORD
  mov eax, lpSource
  mov edx, lpPattern
  mov edx, [edx]
; mov edi, 20202020h		; lowercase bit mask for finding "TEST" in "This is a Test"
  mov edi, 20h			; lowercase bit mask for finding "test" and "Test" but not "TEST" in "This is a Test"
  or edx, edi			; make pattern lowercase
@@:	mov ecx, [eax]
	test cl, cl		; source end?
	jz @F
	or ecx, edi		; make source lowercase
	add eax, 1		; inc eax would cost around 5%
	cmp ecx, edx
	jne @B
  dec eax
  ret
@@:
  xor eax, eax
  ret
InstrCi endp

[attachment deleted by admin]

Title: Re: InstrCi - case-insensitive string search
Post by: hutch-- on June 16, 2008, 12:03:35 AM

jj,

It looks like a good short algo which be useful in almost all applications of case insensitive searches. If you need an extended version I would be inclined to try a table based version with the option of providing your own table to suit different languages that have different characters.

One other comment, for doing repeat searches for text, its useful to be able to specify the end of the last result so you can just re-enter the algo and add that result to the start address of the source for the next search. You can just add the last search result to the start address yourself as the caller but its tidier to have the algo do it.

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on June 16, 2008, 12:51:21 PM

Quote from: hutch-- on June 16, 2008, 12:03:35 AM
It looks like a good short algo which be useful in almost all applications of case insensitive searches. If you need an extended version I would be inclined to try a table based version with the option of providing your own table to suit different languages that have different characters.

For searching non-English literature, yes. For tasks related to IT and programming, the simple or 32 algo should do, I guess.

QuoteOne other comment, for doing repeat searches for text, its useful to be able to specify the end of the last result so you can just re-enter the algo and add that result to the start address of the source for the next search. You can just add the last search result to the start address yourself as the caller but its tidier to have the algo do it.

I had taken that option out for making the benchmarks easier to understand. Attached the full version; it has grown in size but speed is still on par with crt_strstr:

Code Select

----------------------
12414 average crt_strstr        res=4217403
10789 average finstr    res=2619
6538 average InStringL  res=4217403
12365 average InstrCi   res=4217403
1520117 average StrStrI res=4217403
----------------------

InstrCi knows three search modes, passed with the fourth para:

StartPos equ 1

1.   invoke InstrCi, StartPos, addr Mainstr, addr SubstrxCi, 0
   start at pos 1, search case-sensitive for SubstrxCi in Mainstr

2.   invoke InstrCi, StartPos, addr Mainstr, addr SubstrxCi, 1
   start at pos 1, search case-insensitive for SubstrxCi in Mainstr

3.   invoke InstrCi, StartPos, addr Mainstr, addr SubstrxCi, 2
   start at pos 1, search for SubstrxCi in Mainstr, ignore case of first character

The first para indicates the start position. Use 1 for a standard search.
If this para is 0, then InstrCi continues directly after the last match.

[attachment deleted by admin]

Title: Re: InstrCi - case-insensitive string search
Post by: hutch-- on June 16, 2008, 02:57:14 PM

jj,

I have had a good play with your example and it looks like your algo has the legs on the VCRT version by a very long way. I added a boyer moore search algo to the test bed just to show you the speed difference with an algo of this type. They get faster the longer the pattern is but they are usually slower on strings under about 6 or 7 chars long.

One of our members IanB did an optimisation on a BM design that made it a lot faster as well but I don't readily have a copy of Ian's algo.

With your algo there are some reasonable optimisations to do to get it faster. Replace the REPE CMPSB with incremented pointer code, try and aviod code like the following,

Code Select


mov cx, [eax]
test cl, cl

Try to aviod following partial register read and writes, one of the exceptions is MOVZX to a 32 bit reg followed by a CMP AL, ???

These are the times I am getting with this attached test piece.

Code Select


20342 timing crt_strstr
13686 timing finstr
9491 timing InStringL
21796 timing InstrCi
1862545 timing StrStrI
2307 timing BM

20940 timing crt_strstr
13475 timing finstr
10670 timing InStringL
21811 timing InstrCi
1863999 timing StrStrI
2327 timing BM

21103 timing crt_strstr
14070 timing finstr
11019 timing InStringL
21925 timing InstrCi
1797823 timing StrStrI
2354 timing BM

20406 timing crt_strstr
14026 timing finstr
10766 timing InStringL
21892 timing InstrCi
1791850 timing StrStrI
2344 timing BM

20554 timing crt_strstr
13678 timing finstr
10507 timing InStringL
21800 timing InstrCi
1797879 timing StrStrI
2310 timing BM

20038 timing crt_strstr
13881 timing finstr
10708 timing InStringL
21916 timing InstrCi
1820599 timing StrStrI
2353 timing BM

20139 timing crt_strstr
13461 timing finstr
10493 timing InStringL
21789 timing InstrCi
1811869 timing StrStrI
2300 timing BM

20382 timing crt_strstr
13860 timing finstr
10883 timing InStringL
22386 timing InstrCi
1800658 timing StrStrI
2368 timing BM

----------------------
20488 average crt_strstr        res=4221479
13767 average finstr    res=2597
10567 average InStringL res=4221479
21914 average InstrCi   res=4221479
1818402 average StrStrI res=4221479
2332 average BM res=2597
----------------------
Press any key to continue ...

[attachment deleted by admin]

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on June 16, 2008, 03:24:57 PM

Hutch, thanks a lot for testing this. Really appreciated!

Quote from: hutch-- on June 16, 2008, 02:57:14 PM
(BM is) usually slower on strings under about 6 or 7 chars long.

In my experience, most real life searches are 2-5 characters...

QuoteWith your algo there are some reasonable optimisations to do to get it faster. Replace the REPE CMPSB

I know it's slow, but I use it only in an outer loop, so it doesn't affect speed.

Code Select


mov cx, [eax]
test cl, cl

So what's the faster alternative here? Mov doesn't set the flag.

Code Select

@@:	mov ecx, [eax]
	test cl, cl		; end of source string?
	jz SrcEnd
	or ecx, ebx		; make source lowercase
	add eax, 1		; (inc eax would cost around 5%)
	cmp ecx, edx	; compare 4 bytes of source with pattern (faster than 2 bytes cmp)
	jne @B

Quote
These are the times I am getting with this attached test piece.

No attachment in your post, sorry...

Minor change in my code:

InstrCi proc
mov eax, lpSource
.if StartPos         ; e.g. invoke InstrCi, 1, addr Mainstr, addr SubstrxCi, 0 means
   push StartPos   ; start at pos1, find Substr in Mainstr, case-sensitive
   pop CurPos      ; invoke InstrCi, 0, ... means continue with Current Position
   dec CurPos      ; 1-based index, so if 1 is passed, add nothing to eax
.endif
add eax, CurPos      ; start with current position or StartPos

It would not find the match if it was at position 0 of the source.

Title: Re: InstrCi - case-insensitive string search
Post by: hutch-- on June 16, 2008, 03:36:19 PM

Sorry about that, attached now.

Code Select


mov cx, [eax]
test cl, cl

Becomes

mov ecx, [eax]
test ecx, ecx

This code is problematic as it uses two partial registers which must be obtained internally by masking.

Code Select


  .if cl && ch			; pattern length >=4?

This code,

Code Select


@@:	lodsb		; mov al, [esi], esi++

replace with commented code or

movzx eax, BYTE PTR [esi]
add esi, 1

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on June 16, 2008, 03:45:33 PM

Quote from: hutch-- on June 16, 2008, 03:36:19 PM
Code Select Expand
mov cx, [eax] test cl, cl

Becomes

Code Select Expand
mov ecx, [eax] test ecx, ecx

Replacing the mov cx with mov ecx does indeed increase the speed significantly, see new attachment. But I have to stick with the test cl, cl because only the 4th character will be the source's null terminator.

[attachment deleted by admin]

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on June 16, 2008, 04:00:25 PM

Hutch,
Thanks for this. BM speeds are indeed impressive; when I reduce, however, the pattern to "wings",
I get these timings:
----------------------
12114 average crt_strstr res=4221501
11082 average finstr res=2619
6694 average InStringL res=4221501
12296 average InstrCi res=4221501
1534310 average StrStrI res=4221501
10121 average BM res=2619

One minor point:

Code Select

    mov slen, len(OFFSET Mainstr)
    mov plen, len(OFFSET Substrx)

I guess the BM would not suffer too much from taking these off - in real life, you don't know the size beforehand.

Is it difficult to make BM case-insensitive?

Title: Re: InstrCi - case-insensitive string search
Post by: hutch-- on June 16, 2008, 04:11:04 PM

jj,

> when I reduce, however, the pattern to "wings",

This is consistent with the design, the step size is not big enough to counter the extra overhead.

> Is it difficult to make BM case-insensitive?

I think it can be done with an extra table but it will be no joy to code. A long time ago I saw one written by Jeremy Collake but I don't know where it is any longer, it was about 9 years ago.

Re the lengths, the version of the algo was designed to do binary search as well but I have seen variants that read until the terminator in both source and pattern. I think there is an ancient Microsoft one in an early version of MASM for 32 bit winnt 3.5.

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on June 16, 2008, 04:21:49 PM

Quote from: hutch-- on June 16, 2008, 04:11:04 PM
I have seen variants that read until the terminator in both source and pattern

Not a big issue anyway. Len(pattern) is short and fast, while Len(source) might often be known through other sources, e.g. if you load a file for searching into memory then you know the file size.

Do you have any way to get statistics on the average length of search words here in the forum?

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on June 16, 2008, 10:09:17 PM

Hutch,
I have refined the prog a little bit, allowing to compare long and short patterns.
It needs a minimum pattern of 2 bytes but has no further limitations afaik.
Here is one result - note that all InstrCi versions perform a
case-insensitive search, so they must be a bit slower:
----------------------
9600 average crt_strstr res=4229691
7341 average finstr res=2619
5649 average InStringL res=4229691
8443 average InstrCi 1s res=4229691
8643 average InstrCi 2s res=4229691
8663 average InstrCi 1L res=4229691
8612 average InstrCi 2L res=4229691
8225 avg Boyer-Moore s res=2619
968 avg Boyer-Moore L res=147

This is quite unexpected, so here is one that looks more "expected":
----------------------
9490 average crt_strstr res=4229691
7346 average finstr res=2619
5786 average InStringL res=4229691
8405 average InstrCi 1s res=4229691
8473 average InstrCi 2s res=4229691
8665 average InstrCi 1L res=4229691
8649 average InstrCi 2L res=4229691
8147 avg Boyer-Moore s res=2619
8470 avg Boyer-Moore L res=2619

Version A looks for "and whether pigs have wings and fly like elephants in the sky"
Version B looks for "wings and fly like elephants in the sky"
Not sure whether it's my fault, or whether the BM code has a little bug...

[attachment deleted by admin]

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on June 16, 2008, 10:18:10 PM

More on the Boyer-Moore problem (see attachment above):
end of MainStr:
   db "and whether pigs have wings and fly like elephants in the sky...", 0

db "ave wings and fly like elephants in the sky", 0   ; yields fast but wrong result, 43 characters plus null
db "ve wings and fly like elephants in the sky", 0   ; yields correct result, 42 characters plus null

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on June 17, 2008, 10:32:40 AM

New algo, slightly faster and shorter. Speedwise it is now on par with finstr, sizewise still very compact, see below (I could not calculate sizes for the external strstr and StrStrI functions).
The problem with the BM algo persists, see source.
In any case, if Lingo succeeds to make his search case-insensitive without sacrificing too much on speed, we might have a clear winner, at least for short patterns :wink

----------------------
12031 average crt_strstr res=4229691
11006 average finstr res=2619
6615 average InStringL res=4229691
10907 average InstrCi 1s res=4229691
10563 average InstrCi 2s res=4229691
11007 average InstrCi 1L res=4229691
10744 average InstrCi 2L res=4229691
10805 * average InstrCi
201 diff finstr-InstrCi (neg = finstr is faster)
9751 avg Boyer-Moore s res=2619
7768 avg Boyer-Moore L res=2616
1548024 average StrStrI res=4229691

Sizes:
596 finstr
305 InStringL
150 InstrCi
199 Boyer-Moore
----------------------

[attachment deleted by admin]

Title: Re: InstrCi - case-insensitive string search
Post by: hutch-- on June 17, 2008, 10:51:21 AM

jj,

I think you can do a reasonably fast version using a table depending on the design you use to do it. Once the table is in cache you don't keep getting the same penalty as first time. Even as a classic byte scanner if you convert the first character in the pattern to a single case that matches the table you get a memory read for the character and one from the table but it should only be marginally slower than the case sensitive scanner.

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on June 17, 2008, 12:17:10 PM

Quote from: hutch-- on June 17, 2008, 10:51:21 AM
I think you can do a reasonably fast version using a table depending on the design you use to do it. Once the table is in cache you don't keep getting the same penalty as first time. Even as a classic byte scanner if you convert the first character in the pattern to a single case that matches the table you get a memory read for the character and one from the table but it should only be marginally slower than the case sensitive scanner.

Sure it could be done, but why? InstrCi is already a little bit faster than finstr, and a lot faster than the Masm32 HLL istring - comparing apples and oranges because the latter perform only a case-sensitive search. If there is a true need for doing case-insensitive search on Chinese text, then the correct way is to copy the whole source to a buffer, call CharLower, do the same with the pattern and run Lingo's super fast algo over it; same with BM for long patterns.

----------------------
12176 average crt_strstr res=4233
10966 average finstr res=2619
7619 average InStringL res=4233787
10708 average InstrCi 1s res=4233
10785 average InstrCi 2s res=4233
10757 average InstrCi 1L res=4233
10935 average InstrCi 2L res=4233
10796 * average InstrCi
170 diff finstr-InstrCi (neg = finstr is faster)
10247 avg Boyer-Moore s res=2619
7954 avg Boyer-Moore L res=2616
15156 Hll istring res=2620
1501106 average StrStrI res=4233787

Sizes:
596 finstr
305 InStringL
150 InstrCi
199 Boyer-Moore
----------------------

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on June 17, 2008, 09:31:18 PM

> InstrCi is already a little bit faster than finstr

To my embarassment, this is not true for my other machine, a P4 2.4 GHz. But here InstrCi has an edge over the Boyer-Moore algo, at least for short patterns. Any more benchmarks?

----------------------
9508 average crt_strstr res=4233787
7381 average finstr res=2619
5539 average InStringL res=4233787
7499 average InstrCi 1s res=4233787
7462 average InstrCi 2s res=4233787
7691 average InstrCi 1L res=4233787
7683 average InstrCi 2L res=4233787
7583 * average InstrCi
-202 diff finstr-InstrCi (neg = finstr is faster)
798 diff BM s - InstrCi (neg = BM is faster)
8381 avg Boyer-Moore s res=2619
7409 avg Boyer-Moore L res=2616
11857 Hll istring res=2620
1945881 average StrStrI res=4233787

Sizes:
596 finstr
305 InStringL
187 InstrCi
199 Boyer-Moore
----------------------

[attachment deleted by admin]

Title: Re: InstrCi - case-insensitive string search
Post by: Ian_B on June 20, 2008, 05:11:16 AM

Hutch, I'm honoured you remember my optimisation experiments. :red

Here is the original test piece I put together. I'm sure other people (Lingo) have better versions. My starting point was the existing MASM library, especially Hutch's Boyer-Moore which I thought I could see scope for improvement in. There was a different Boyer-Moore design floating around by Cresta that was extremely poorly optimised that looked as if it had potential, especially once an existing bug had been removed. I also wanted to confirm that the speed of the Boyer-Moore designs can be varied by the size of the shift-table used. For a 255-byte table, you can search strings up to 254 chars in length, or for 65535 for strings 65534 in length, etc. Obviously the most efficient searching takes place with smaller tables, because they take less time setting up, and because the algorithm has to do some memory operations using [ECX*n] modifiers, and if you can avoid using the *4 modifier needed for the 4Gb stringlength search table (256 DWORDs in table) you get less latency on those ops. For max efficiency you just need to select the version that handles strings at least as long as the one you want to search for.

I wanted to see how the variations compared not just on classic random text, their prime use for most people I guess, but also on some synthetic data. The reason was that Hutch's original complaint about my reducing the table to 255 chars IIRC was that the BMs were good for doing viral pattern matching with extremely large search lengths, which obviously isn't the same as plain text searching. That's why the search strings are a little odd. There's a random text search at the end of the test suite, though, but really it should be done with a much larger text sample for better results.

There's some comments in the file about the results on my P4 Northwood. The results I got indicated the Cresta rewrites are definitely the best overall, due to their efficient DWORD test approach, and they might be a good basis for further optimisation if anyone's interested. They are also easily altered to produce compact dedicated search routines for specific text where you don't need a general algo, and you can add the search table in as pre-calculated data that avoids the setup overhead, useful if you need the same few searches over and over in an app.

Regarding case-insensitive searching, if I was using this code to create such a search I'd add a setup section to the algo in which both the search string and the text to search were capitalised before running the search using your OR 20202020h method. That way you only do the OR once for each byte in the string, rather than speculatively over and over in the algo.

Note my preference for more efficient functions using params passed in registers over stack params and frames. It also allows easier use of EBP as a spare reg. Sorry if that annoys people. You can recode to your preference if you like. :P

[attachment deleted by admin]

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on May 07, 2009, 06:30:06 PM

Update on Instring:

Code Select

Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)

Average cycle count:
5035     InstrJJ (jj2007)
4097     BMLinDB (Lingo)
16176    MasmLib InstringL
InstrJJ : InstringL = 31 %

Code Select


Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)

Average cycle counts:
2476     InstrJJ (jj2007)
2990     BMLinDB (Lingo)
6773     BM (Boyer-Moore Hutch)
10158    InString (Masm32 Lib)
InstrJJ : InString = 24 %

Code sizes:
InstrJJ      = 388
BM           = 199
BMLinDB      = 395
InStringL    = 305

Requires SSE2. I would be interested to see some AMD and Dual Core timings :bg

EDIT: Shaved off 7 bytes and >50 cycles.
EDIT(2): Minor bugfix in the one-byte pattern section, 2 bytes less (they caused the bug :red).
EDIT(3): Attachment removed - will be back when certain oddities are fixed.
EDIT(4): Oddities are fixed, attachment is back, but code is longer now :bg
EDIT(5): Could not resist to add the find "Duplicate inc" in Windows.inc test :wink

I have tried to test quite a number of odd mainstring/pattern combinations, and it seems to be foolproof now, but there might still be surprises (also for other included algos). Use at your own risk... and please give me feedback if it bounces.

[attachment deleted by admin]

Title: Re: InstrCi - case-insensitive string search
Post by: lingo on May 07, 2009, 10:14:50 PM

Latest result:

Code Select


Intel(R) Core(TM)2 Duo CPU     E8500  @ 3.16GHz (SSE4)

Search Test 1 - value expected 37; lenSrchPattern ->22
InString - JJ:                         38 ; clocks: 104
InString - Lingo:                      37 ; clocks: 39

Search Test 2 - value expected 1007; lenSrchPattern ->17
InString - JJ:                         1008 ; clocks: 23455
InString - Lingo:                      1007 ; clocks: 6395

Search Test 3 - value expected 1008 ;lenSrchPattern ->16
InString - JJ:                         1009 ; clocks: 686
InString - Lingo:                      1008 ; clocks: 502

Search Test 4 - value expected 1008 ;lenSrchPattern ->16
InString - JJ:                         1009 ; clocks: 4326
InString - Lingo:                      1008 ; clocks: 1417

Search Test 5 - value expected 1008 ;lenSrchPattern ->16
InString - JJ:                         1009 ; clocks: 3367
InString - Lingo:                      1008 ; clocks: 1308

Search Test 6 - value expected 1008 ;lenSrchPattern ->16
InString - JJ:                         1009 ; clocks: 641
InString - Lingo:                      1008 ; clocks: 498

Search Test 7 - value expected 1009 ;lenSrchPattern ->14
InString - JJ:                         1010 ; clocks: 621
InString - Lingo:                      1009 ; clocks: 502

Search Test 8 - value expected 1001 ;lenSrchPattern ->1
InString - JJ:                         1002 ; clocks: 324
InString - Lingo:                      1001 ; clocks: 102

Search Test 9 - value expected 1001 ;lenSrchPattern ->2
InString - JJ:                         1002 ; clocks: 629
InString - Lingo:                      1001 ; clocks: 470

Search Test 10 - value expected 1001 ;lenSrchPattern ->3
InString - JJ:                         1002 ; clocks: 638
InString - Lingo:                      1001 ; clocks: 448

Search Test 11 - value expected 1001 ;lenSrchPattern ->4
InString - JJ:                         1002 ; clocks: 643
InString - Lingo:                      1001 ; clocks: 504

Search Test 12 - value expected 1001 ;lenSrchPattern ->5
InString - JJ:                         1002 ; clocks: 787
InString - Lingo:                      1001 ; clocks: 639

Search Test 13 --Find 'Duplicate inc' in 'windows.inc' ;lenSrchPattern ->13
InString - JJ:                         1127625 ; clocks: 679641
InString - Lingo:                      1127624 ; clocks: 542808

 Press ENTER to exit...

Title: Re: InstrCi - case-insensitive string search
Post by: dedndave on May 07, 2009, 10:56:07 PM

what makes it faster ?

(where's the beef?)

Title: Re: InstrCi - case-insensitive string search
Post by: Mark Jones on May 07, 2009, 11:48:19 PM

Code Select

AMD Athlon(tm) 64 X2 Dual Core Processor 4000+ (SSE3)
...
Average cycle count:
2955     InstrJJ (jj2007)
3891     BMLinDB (Lingo)
7274     Boyer-Moore
13815    MasmLib InstringL
InstrJJ : InstringL = 21 %
Code size InstrJJ=366

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on May 08, 2009, 05:48:14 AM

Quote from: dedndave on May 07, 2009, 10:56:07 PM
what makes it faster ?

(where's the beef?)

Here is the inner loop (simplified):

Code Select

L1:	movdqa xmm1, [esi]	; load 16 bytes from current aligned address
	movdqu xmm4, [esi+1]	; load another 16 bytes UNALIGNED - must be in the middle
	movdqa xmm2, [esi]	; save 16 bytes for testing the zero delimiter - faster than movdqa xmm2, xmm1
Lu:	lea esi, [esi+16]	; position in mainstring (moving up/down lea or add costs cycles)
	pcmpeqw xmm1, xmm3	; compare packed words in xmm1 and xmm3 for equality
	pcmpeqw xmm4, xmm3	; compare packed words in xmm4 and xmm3 for equality
	pcmpeqb xmm2, xmm1	; xmm1 is filled with either 0 or FF; if it's FF, the byte at that position cannot be zero
	pmovmskb eax, xmm2	; set byte mask in ecx for zero delimiter
	pmovmskb edx, xmm1	; set byte mask in edx for search pattern word
	pmovmskb ecx, xmm4	; set byte mask in ecx for search pattern word
	shl ecx, 1		; adjust for esi+1 (add ecx, ecx is a lot slower)
	test eax, eax		; zero byte found?
	jnz @F			; check/clear ebp, then ChkNull
	or edx, ecx		; one of them needs to have the word
	je L1			; 0=no pattern byte found, go back

The secret is in pcmpeqw:
Compare packed words in xmm2/m128 and xmm1 for equality (http://www.intel.com/software/products/documentation/vlin/mergedprojects/analyzer_ec/mergedProjects/reference_olh/mergedProjects/instructions/instruct32_hh/vc231.htm)

The SSE2 loop is very fast, much faster than all the rest, because it reads 16 bytes at a time. The art is not to leave it too often. Traditional Instring implementations have a byte scanner as the inner loop: Take the first character of your search pattern, e.g. d for dedndave, and loop until you find it; then check the rest, if it's edndave, great, otherwise back to the inner loop. You can imagine that you have to leave the inner loop very often for e, n, s etc. - and that your algo gets slowed down drastically, in spite of the fast inner loop.

Now the trick here is to use a word scanner: de is a lot less frequent that a single d - statistically speaking 256 times less frequent. The inner loop is a little bit slower, because you need some acrobatics (and one slow unaligned movedqu) to implement a word scanner, but it pays off in general. The only algo that comes close is Lingo's Boyer-Moore implementation (http://www.masm32.com/board/index.php?topic=3601.msg83512#msg83512) (and it is even faster on a P4).

Title: Re: InstrCi - case-insensitive string search
Post by: dedndave on May 08, 2009, 06:05:42 AM

very cool JJ - I have never used any of the SSE instructions
I am finding it hard to remember all the regular ones - lol
Not as young as I used to be, but I will get there
For my brain to remember, I have to use an instruction someplace at least once

Title: Re: InstrCi - case-insensitive string search
Post by: Mark Jones on May 08, 2009, 09:44:37 PM

Whatever happened with measuring code in "LAMPs"? :bg

Code Select

AMD Athlon(tm) 64 X2 Dual Core Processor 4000+ (SSE3)

Cycle counts:
3224    BMLinDB, addr Mainstr, addr TestSubEx, (SIZEOF TestSubEx)-1
4408    BMLinDB, addr Mainstr, addr TestSubDx, (SIZEOF TestSubDx)-1
4386    BMLinDB, addr Mainstr, addr TestSubXx, (SIZEOF TestSubXx)-1

11922   InString, 1, addr Mainstr, addr TestSubE
13401   InString, 1, addr Mainstr, addr TestSubD
16522   InString, 1, addr Mainstr, addr TestSubX

1774    InstrJJ, 1, addr Mainstr, addr TestSubE, 0
1770    InstrJJ, 1, addr Mainstr, addr TestSubE, 0
1771    InstrJJ, 1, addr Mainstr, addr TestSubE, 0

1834    InstrJJ, 1, addr Mainstr, addr TestSubD, 0
1823    InstrJJ, 1, addr Mainstr, addr TestSubD, 0
1835    InstrJJ, 1, addr Mainstr, addr TestSubD, 0

5656    InstrJJ, 1, addr Mainstr, addr TestSubX, 0
5609    InstrJJ, 1, addr Mainstr, addr TestSubX, 0
5633    InstrJJ, 1, addr Mainstr, addr TestSubX, 0

Average cycle count:
3078     InstrJJ (jj2007)
4006     BMLinDB (Lingo)
13948    MasmLib InstringL
InstrJJ : InstringL = 22 %
Code sizes:
InstrJJ      = 359
BM           = 199
BMLinDB      = 395
InStringL    = 305

Title: Re: InstrCi - case-insensitive string search
Post by: dedndave on May 08, 2009, 10:22:10 PM

never heard of LAMPS

we used KOPS
MOPS
FLOPS
kinda like "gallons per mile" - lol

they were really meaningless terms because noone could agree on what constituted one operation
i think each benchmark i saw used a slightly different definition

many benchmarks measured how fast they could do a bunch of NOP's

the real-world standard was how fast you could re-calculate some pre-defined spread-sheet with Lotus 123 - lol

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on May 08, 2009, 11:31:32 PM

Quote from: Mark Jones on May 08, 2009, 09:44:37 PM
Whatever happened with measuring code in "LAMPs"? :bg

Right now I am a bit confused with my macros :dazzled:

Code Select

AMD Athlon(tm) 64 X2 Dual Core Processor 4000+ (SSE3)

3078     InstrJJ (jj2007)
4006     BMLinDB (Lingo)
13948    MasmLib InstringL
InstrJJ : InstringL = 22 %
Code sizes:
InstrJJ      = 359
BM           = 199
BMLinDB      = 395
InStringL    = 305

I love it, thanks for the timings :bg

Anyway, it seems even LAMP (http://www.masm32.com/board/index.php?topic=9437.msg68698#msg68698)-wise the algo is competitive - SSE2 rocks :toothy

I had to post a bugfix, see above. The algo reported wrong values when a one-byte pattern was found behind the main strings zero delimiter.

Title: Re: InstrCi - case-insensitive string search
Post by: Mark Jones on May 09, 2009, 12:07:01 AM

Welcome. Keep in mind that Lingo optimizes for his hardware, so his timings are still valid. In fact, AMD has been continually losing market share (http://www.xbitlabs.com/news/cpu/display/20080407081556_AMD_Continues_to_Lose_Market_Share_to_Intel_Analysts.html) for a long time now, and AMD is in the vast majority. Someday, there may not be any more AMD chips (http://www.amd.com/us-en/Corporate/VirtualPressRoom/0,,51_104_543~131177,00.html) to worry about timing. ::)

Curious, why did you choose CLOCKS * SQRT(SIZE) for the LAMP rating?

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on May 09, 2009, 06:50:22 AM

Quote from: Mark Jones on May 09, 2009, 12:07:01 AM

Curious, why did you choose CLOCKS * SQRT(SIZE) for the LAMP rating?

There is no "exact science" behind this formula, but it reflects a bit the philosophy of many who come to this forum: Lean and mean performance, i.e. code that is small and fast - the equivalent to WIN32_LEAN_AND_MEAN. Of course, nobody would choose code that is half in size but twice as slow, but on the other hand many here would hesitate to put into their libraries code that is 2% faster but twice as big. So I chose the square root of size, in order to reduce its influence.

In the end, you will often opt for the faster code, but the LAMP's are an incentive to care also for size.
:bg

Code Select

MinLampSize = 200
LAMP$ MACRO cycles:REQ, csize:REQ	; Usage: print LAMP$(cycles, size), " LAMPs", 13, 10
	push ecx
	push cycles
	ffree ST(7)			;; free the stack for pushing
	mov ecx, csize
	.if ecx==0
		add ecx, MinLampSize
	.endif
	push ecx
	fild dword ptr [esp]		;; move into ST (0)
	pop ecx
	fsqrt
	push cycles
	fild dword ptr [esp]		;; push cycles on FPU
	fmul ST, ST(1)		;; multiply with sqrt(size)
	fistp dword ptr [esp]
	pop eax
	ifndef LampsBuffer
	.data?
	LampsCycles	dd ?
	LampsBuffer	dd 80 dup (?)
	.code
	endif
	shr eax, 10					;; kiloLAMPS
	invoke dwtoa, eax, addr LampsBuffer
	pop eax
	shr eax, 10
	mov LampsCycles, eax
	pop ecx
EXITM <offset LampsBuffer>
ENDM

Usage:

Code Select

	print "getlines:", 13, 10
	counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
		invoke getlines, Win$, cbFile
		mov esi, eax
	counter_end
	print LAMP$(eax, offset getlinesE-getlines), " kiloLAMPs"
	print str$(LampsCycles), " kilocycles"

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on May 09, 2009, 11:45:58 PM

Bugfix for the InstrJJ SSE2 version posted, see attachment above (http://www.masm32.com/board/index.php?topic=9370.msg84533#msg84533). Size increased a bit, speed is still the same:

Code Select

Average cycle counts:
2485     InstrJJ (jj2007)
2990     BMLinDB (Lingo)
10139    InString (Masm32 Lib)
InstrJJ : InString = 24 %

Apologies to those who downloaded the previous versions.

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on May 11, 2009, 09:58:51 AM

Hi everybody,
I have switched from SSE2 word scanner to byte scanner with improved non-SSE loops, it is now a few cycles slower on a P4 but I have a suspicion that it rocks on Core & AMD.

Code Select

Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
Average cycle counts:
4280     InstrJJ (jj2007)
16486    InString (Masm32 Lib)

Can I have some timings, please? Thanks.

[attachment deleted by admin]

Title: Re: InstrCi - case-insensitive string search
Post by: UtillMasm on May 11, 2009, 10:18:05 AM

Code Select

c:\china>\masm32\bin\ml.exe /c /coff InstrCi11_BL_BSF.asm
Microsoft (R) Macro Assembler Version 6.15.8803
        Patched for you by promethee [ECL] in the year 2001 - enjoy
Copyright (C) Microsoft Corp 1981-2000.  All rights reserved.

 Assembling: InstrCi11_BL_BSF.asm

c:\china>\masm32\bin\link.exe /subsystem:console InstrCi11_BL
_BSF.obj
Microsoft (R) Incremental Linker Version 5.12.8078
Copyright (C) Microsoft Corp 1992-1998. All rights reserved.

c:\china>InstrCi11_BL_BSF.exe
Genuine Intel(R) CPU           T2400  @ 1.83GHz (SSE3)
Test correctness for InstrJJ: OK
Average cycle counts:
1748     InstrJJ (jj2007)
2817     BMLinDB (Lingo)
10305    InString (Masm32 Lib)
InstrJJ : InString = 16 %

Code sizes:
InstrJJ      = 319
BMLinDB      = 395
InStringL    = 305

c:\china>

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on May 11, 2009, 11:27:22 AM

> InstrJJ : InString = 16 %

Thanks, UtillMasm. One third faster than the previous version, not too bad :8)

Title: Re: InstrCi - case-insensitive string search
Post by: fearless on May 11, 2009, 03:41:20 PM

Code Select

Intel(R) Core(TM)2 Quad  CPU   Q9550  @ 2.83GHz (SSE4)
Test correctness for InstrJJ: OK
Average cycle counts:
1148     InstrJJ (jj2007)
2725     BMLinDB (Lingo)
9355     InString (Masm32 Lib)
InstrJJ : InString = 12 %

Code sizes:
InstrJJ      = 319
BMLinDB      = 395
InStringL    = 305

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on May 11, 2009, 04:14:10 PM

Quote from: fearless on May 11, 2009, 03:41:20 PM

Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz (SSE4)
..
InstrJJ : InString = 12 %

Thanks. For a P4 I get now 25%, for the T2400 16%, and a Quad with SSE4 gives factor 8 compared to non-SSE2. Interesting pattern...

Quote

Title: Re: InstrCi - case-insensitive string search
Post by: redskull on May 11, 2009, 04:16:54 PM

Code Select

Intel(R) Core(TM)2 Duo CPU     E4500  @ 2.20GHz (SSE4)
Test correctness for InstrJJ: OK
Average cycle counts:
1071     InstrJJ (jj2007)
2728     BMLinDB (Lingo)
9324     InString (Masm32 Lib)
InstrJJ : InString = 11 %

Code sizes:
InstrJJ      = 319
BMLinDB      = 395
InStringL    = 305

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on May 11, 2009, 04:49:51 PM

Quote from: redskull on May 11, 2009, 04:16:54 PM
Code Select Expand
Intel(R) Core(TM)2 Duo CPU E4500 @ 2.20GHz (SSE4) Test correctness for InstrJJ: OK Average cycle counts: 1071 InstrJJ (jj2007) 2728 BMLinDB (Lingo) 9324 InString (Masm32 Lib) InstrJJ : InString = 11 %

Thanks. Not a Quad but SSE4. On my old one-core CPU, I get much less of an improvement:

Code Select

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
Test correctness for InstrJJ: OK
Average cycle counts:
1717     InstrJJ (jj2007)
2780     BMLinDB (Lingo)
10196    InString (Masm32 Lib)
InstrJJ : InString = 16 %

Lingo's algo is SSE2, too, and pretty stable, so I guess Intel has made progress on a few very specific instructions that I use in the inner loop.

Code Select

L1:	movdqa xmm2, [esi]
Lu:	movdqa xmm1, xmm0	; xmm0 has first byte of pattern
	lea esi, [esi+16]	; position in mainstring
	pcmpeqb xmm1, xmm2	; compare memory to pattern byte
	pmovmskb edx, xmm1	; set byte mask in edx for search pattern byte
	pcmpeqb xmm1, xmm2	; look for zero byte (xmm1 is either 0 or ff)
	pmovmskb eax, xmm1	; set byte mask in eax for zero delimiter byte
	test eax, eax		; db 02eh	; branch hint: not taken, no effect
	jne ZBF			; zero byte found? check/clear ebp, then ChkNull
	test edx, edx		; db 03eh	; branch hint: taken, no effect
	je L1			; pattern byte found? no=go back, yes=repeat the exercise with a word

Title: Re: InstrCi - case-insensitive string search
Post by: FORTRANS on May 11, 2009, 05:39:30 PM

Hi,

AMD Athlon(tm) 64 Processor 3200+ (SSE2)
Test correctness for InstrJJ: OK
Average cycle counts:
3106 InstrJJ (jj2007)
3872 BMLinDB (Lingo)
13736 InString (Masm32 Lib)
InstrJJ : InString = 22 %

Code sizes:
InstrJJ = 319
BMLinDB = 395
InStringL = 305

Regards,

Steve N.

Title: Re: InstrCi - case-insensitive string search
Post by: Mark Jones on May 11, 2009, 06:51:23 PM

Code Select


AMD Athlon(tm) 64 X2 Dual Core Processor 4000+ (SSE3)
Test correctness for InstrJJ: OK
Average cycle counts:
2985     InstrJJ (jj2007)
3872     BMLinDB (Lingo)
13725    InString (Masm32 Lib)
InstrJJ : InString = 21 %

Code sizes:
InstrJJ      = 319
BMLinDB      = 395
InStringL    = 305

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on May 11, 2009, 07:33:48 PM

Thanks to all who tested this code. I have run into a very odd problem that's driving me nuts:

Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
Test correctness for InstrJJ: OK
Average cycle counts:
2636 InstrJJ (jj2007)
2778 BMLinDB (Lingo)
10198 InString (Masm32 Lib)
InstrJJ : InString = 25 %

Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
Test correctness for InstrJJ: OK
Average cycle counts:
1719 InstrJJ (jj2007)
2778 BMLinDB (Lingo)
10088 InString (Masm32 Lib)
InstrJJ : InString = 17 %

The only difference is the IFNOP switch in line 6 of the attachment.
If set to 1, it inserts an apparently harmless

   .if res1            ;; any global variable will do, but not .if eax
      nop
   .endif

into the main loop of the timings macro:

Code Select

  REPEAT REP_CT
  	if IFNOP
	.if res1				;; any global variable will do, but not .if eax
		nop
	.endif
	endif
	invoke Sleep, 50
	counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
		tmp$
	counter_end
	add globcycles, eax
	inc globcount
	if DisplayCycles
	.if eax>65536
		shr eax, 10
	.endif
	out$ CATSTR <print str$(eax), 9, ">, Description, <", 13, 10>
	out$
	endif
  ENDM

Otherwise the code is identical. Any ideas?
::)

[attachment deleted by admin]

Title: Re: InstrCi - case-insensitive string search
Post by: Jimg on May 11, 2009, 08:14:15 PM

Why are you so surprised? I see this all the time on amd.
You move the call to the routine by n bytes. I didn't actually assemble it since I don't have sse2, but check for odd/even address, etc.
Some instructions prefer to be on an even address while others an odd address.

Title: Re: InstrCi - case-insensitive string search
Post by: dedndave on May 11, 2009, 09:10:33 PM

that is a HUGE difference, though
it makes me wonder if it is worth the effort to optimize, beyond a certain point
get it running sufficiently fast, then insert nop's behind each instruction to tune it in - lol

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on May 11, 2009, 09:43:22 PM

Quote from: Jimg on May 11, 2009, 08:14:15 PM
Some instructions prefer to be on an even address while others an odd address.

Both data and code are aligned 16 plus a fixed offset. But it is an "alignment" issue - I can reproduce the effect by inserting a given number of nops before the getkey+exit...

Do the two exes produce different results on your CPUs??

:(

EDIT: I managed to get the difference between the slow and fast calls to the algo:

Code Select

;CPU Disasm FAST
;Address               Hex dump                       Command
;004036D2              ? 68 6C614000                  push offset InstrCi11_BL_BSF.0040616C
;004036D7              ? 68 47574000                  push offset InstrCi11_BL_BSF.00405747
;004036DC              . 6A 01                        push 1
;004036DE              . E8 3D0E0000                  call 00404520 #######
;004036E3              . 832D 047A4000 01             sub dword ptr [407A04], 1

;CPU Disasm slow
;004036D2              ? 68 6C614000                  push offset InstrCi11_BL_BSF.0040616C
;004036D7              ? 68 47574000                  push offset InstrCi11_BL_BSF.00405747
;004036DC              . 6A 01                        push 1
;004036DE              . E8 2D0E0000                  call 00404510 #######
;004036E3              . 832D 047A4000 01             sub dword ptr [407A04], 1

A code caching issue maybe??

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on May 11, 2009, 10:08:22 PM

Quote from: dedndave on May 11, 2009, 09:10:33 PM
that is a HUGE difference, though
it makes me wonder if it is worth the effort to optimize, beyond a certain point
get it running sufficiently fast, then insert nop's behind each instruction to tune it in - lol

Inserting nops inside the algo helps sometimes :bg

But here it's from where you call it that makes the difference. The algo itself, and the data addresses, are unchanged... same algo size, no sudden conversions from near to far jumps etc...

Title: Re: InstrCi - case-insensitive string search
Post by: dedndave on May 11, 2009, 11:11:41 PM

i don't think the stack cares if you push an odd number or an even one
it has to be related to the EIP, itself
perhaps the lag occurs upon returning from the call, as opposed to making the call
so that, after the routine is done, it wants an even addy to return to ?
it does have to re-fill the queue at that point

now, we just need an assembler that will even-align (16-byte align - ouch) the return addy of calls - what a difference !
hint - hint GoAsm guys

Title: Re: InstrCi - case-insensitive string search
Post by: MichaelW on May 12, 2009, 03:59:54 AM

I haven't been following this thread, but if the alignment of the return address is a problem it can readily be aligned by any one of several methods. For example:

Code Select


    ; push arguments
    push OFFSET return_label
    push OFFSET procedure
    ret
    align 16
  return_label:

Note that the OFFSET directive is not necessary, it was just added for clarity.

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on May 12, 2009, 05:06:37 AM

Quote from: MichaelW on May 12, 2009, 03:59:54 AM
I haven't been following this thread, but if the alignment of the return address is a problem

Thanks. It is unlikely that this is the problem because the return address is 004036E3 in both the slow and fast case. Somewhere about 1000 cycles per call get lost, and I would really like to understand why...

Title: Re: InstrCi - case-insensitive string search
Post by: MichaelW on May 12, 2009, 06:27:35 AM

I can't see any reason for the difference, but your app is so complex that it would be easy to miss a problem. You could try isolating the problem components in a simple test app, where there are fewer opportunities for errors and side effects.

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on May 12, 2009, 06:40:54 AM

Quote from: MichaelW on May 12, 2009, 06:27:35 AM
I can't see any reason for the difference, but your app is so complex that it would be easy to miss a problem. You could try isolating the problem components in a simple test app, where there are fewer opportunities for errors and side effects.

Thanks, Michael, for looking at this. The app is 28k including the "Mainstring" it searches; it may be complex in a way, but algo start is para-aligned, data alignment is fixed, too, and it is so small that code and data should fit into the cache. So how can moving the algo 16 bytes further down cost 1,000 cycles per call?

Isolating the problem components is of course a good idea, but most probably I won't be able to reproduce the effect any more.
I am lost... ::)

EDIT: Just tested the two exes attached above on P4:

Code Select

Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)

InstrCi11_IfNop_OFF.exe
4233     InstrJJ (jj2007)
4020     BMLinDB (Lingo)

InstrCi11_IfNop_ON.exe
4275     InstrJJ (jj2007)
4045     BMLinDB (Lingo)

Not the slightest difference. On the Celeron, there is a consistent difference of 1,000 cycles for InstrJJ but none for BMLinDB. So it is even CPU-specific... and the P4 is usually more sensitive to alignment than the more recent Celeron M.

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on May 12, 2009, 07:44:15 AM

Quote from: Jimg on May 11, 2009, 08:14:15 PM
Why are you so surprised? I see this all the time on amd.

Jim,
Did you ever find an explanation why this can happen? Some pattern that would point to possible causes? Does it happen in your big and small apps alike? What is the order of magnitude you observed: +40%=1000 cycles difference, or much less?

Title: Re: InstrCi - case-insensitive string search
Post by: MichaelW on May 12, 2009, 08:37:24 AM

Perhaps the code with the extra 16 bytes will not fit in the cache, or some critical part of it will not fit. If this is so, adding more bytes should eventually cause the code to misbehave on other processors.

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on May 12, 2009, 11:55:25 AM

Quote from: MichaelW on May 12, 2009, 08:37:24 AM
Perhaps the code with the extra 16 bytes will not fit in the cache, or some critical part of it will not fit. If this is so, adding more bytes should eventually cause the code to misbehave on other processors.

The Celeron M 420 has 32kb of L1 cache (http://en.kioskea.net/guide/details/989288-intel-celeron-420-1-60-ghz-socket-775-800-mhz-bus), so the 28k exe should comfortably fit into the cache. But I am not a hardware specialist...

By the way, I cross-checked with GetTickCount, and the cycle counts are correct. Calling the algo is 1,000 cycles slower when it sits 16 bytes further ~~down~~ up:boohoo:

EDIT: Found a good article on Optimizing for instruction caches (http://www.dspdesignline.com/howto/202601670):

(page 2): "The effective size of a function in the cache can vary depending on its alignment. Suppose a function requires less than one line of cache, but that it overlaps a cache line boundary. In this case, the function will require allocation of two cache lines, substantially increasing it effective size".

Sounds plausible, but can an executable with only 28k overlap with a cache line boundary if the L1 cache is 32k? Is that the explanation?? If I understand the article correctly, two cache lines may be allocated, fine. But only once, or for every call, costing 1000 cycles??

Title: Re: InstrCi - case-insensitive string search
Post by: Jimg on May 12, 2009, 02:56:44 PM

Quote from: jj2007 on May 12, 2009, 07:44:15 AM
Quote from: Jimg on May 11, 2009, 08:14:15 PM
Why are you so surprised? I see this all the time on amd.

Jim,
Did you ever find an explanation why this can happen? Some pattern that would point to possible causes? Does it happen in your big and small apps alike? What is the order of magnitude you observed: +40%=1000 cycles difference, or much less?

I only know it happens. In fact, a couple of years ago, I set up my time testing routines to actually copy the code under test to the same physical memory location before starting the test, so every call was from and to the same location. About the time I got this all working, I realized I would never be able to play with the big boys because I had an AMD and they were all Pentium-4 centric. Took all the fun out of it.

Here's a sample. I was getting rock solid repeatability with only ten loops.

[attachment deleted by admin]

Title: Re: InstrCi - case-insensitive string search
Post by: dedndave on May 12, 2009, 02:59:40 PM

I only see slight differences with a Prescott....

InstrCi11_IfNop_ON.exe

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
Test correctness for InstrJJ: OK
Average cycle counts:
4295 InstrJJ (jj2007)
3995 BMLinDB (Lingo)
16355 InString (Masm32 Lib)
InstrJJ : InString = 26 %

InstrCi11_IfNop_OFF.exe

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
Test correctness for InstrJJ: OK
Average cycle counts:
4286 InstrJJ (jj2007)
4045 BMLinDB (Lingo)
16671 InString (Masm32 Lib)
InstrJJ : InString = 25 %

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on May 12, 2009, 04:02:17 PM

Quote from: Jimg on May 12, 2009, 02:56:44 PM

I only know it happens. In fact, a couple of years ago, I set up my time testing routines to actually copy the code under test to the same physical memory location before starting the test, so every call was from and to the same location. About the time I got this all working, I realized I would never be able to play with the big boys because I had an AMD and they were all Pentium-4 centric. Took all the fun out of it.

Here's a sample. I was getting rock solid repeatability with only ten loops.

Thanks, very interesting. It seems to be linked to
instruction cache locality (http://www.google.it/search?hl=en&safe=off&num=50&newwindow=1&q=%22instruction+cache+locality%22&btnG=Search), although I still do not understand how this should affect a 300-byte algo in a 28k executable on a CPU with 32k L1 cache...

Quote from: dedndave on May 12, 2009, 02:59:40 PM
I only see slight differences with a Prescott....

My P4 is a Prescott, too - same result: no difference. It is CPU-specific.

Title: Re: InstrCi - case-insensitive string search
Post by: UtillMasm on May 12, 2009, 04:09:25 PM

Code Select

c:\china>\masm32\bin\ml.exe /c /coff instrnew.asm
Microsoft (R) Macro Assembler Version 6.14.8444
Copyright (C) Microsoft Corp 1981-1997.  All rights reserved.

 Assembling: instrnew.asm

c:\china>\masm32\bin\link.exe /subsystem:console instrnew.obj
Microsoft (R) Incremental Linker Version 5.12.8078
Copyright (C) Microsoft Corp 1992-1998. All rights reserved.


c:\china>instrnew.exe
Type       Short   Medium     Long HighAnsi MisMatch
Source   Mainstr  Mainstr  Mainstr  Mainstr mismatch
Pattern  Substr1  Substr2  Substr3  Substr4   mmword
======= ======== ======== ======== ======== ========
OVH          264        0        0        0        0
InStrL        88      132      363      462     1155
finstr        88      286      429      550     1496
finstr2      154      209      363      363     1617
finstr3      143      209      352      352     1507
finstr4       88      308      451      594     1496
finstr5       88      319      451      605     1496
finstr6      154      209      363      363     1617

Press enter to exit...

c:\china>

Title: Re: InstrCi - case-insensitive string search
Post by: dedndave on May 12, 2009, 04:17:14 PM

This subject brings to mind something that has troubled me a bit.
If a programmer tests/optimizes his/her code on one machine, as many of us would
during a normal session, the code is likely to become biased for that CPU.
As a group, it would be nice if we could standardize testing and document
most of our results in one easy-to-use reference. That way, if a programmer
(especially a n00b like me) wants to find an optimization that has been
tested on a variety of platforms, he/she doesn't have to search and read all
the forum threads concerning a single subject. I am not refering to major
algorithms, but simple little things that have been widely tested.
It seems impractical to have all the members test every algo we write, in
order to acquire a multi-platform sampling. On the other side of the coin,
we don't want finished code that is (sometimes unknowingly) optimized
for a specific processor. There must be a solution someplace.

As for Jochen's issue, it is machine specific, but all of us want to know what
the solution is so that we may attempt to avoid the issue in code that is
written and tested on a different CPU. It makes me wonder how many
programs out there suffer on one or more specific CPU(s).

JJ - This morning I was reading Agner's assembly optimization manual.
One of the things that made me think of your problem was his
statement "Make sure it isn't the test program". He went on to say
that many hours are spent trying to fix an algorithm, when it turns
out to be the testing method at fault. I am not saying your test-bed
has a flaw, I just wanted you to consider the possibilities. At our age,
we can't afford to pull out too much of the hair we have.

Title: Re: InstrCi - case-insensitive string search
Post by: UtillMasm on May 12, 2009, 04:28:17 PM

it's possible, we can make a custom cpu(or SoftCpu) for our program in furture! :toothy

Title: Re: InstrCi - case-insensitive string search
Post by: fearless on May 12, 2009, 06:21:20 PM

Quote from: dedndave on May 12, 2009, 04:17:14 PM
..he/she doesn't have to search and read all
the forum threads concerning a single subject.

I agree. Also could there be a summary thread made for all the tested algos that would allow us just to refer to it for the very latest downloads and information. Another thing i would like to see is an indicator of the definitive best solution algo (not always possible i know) or in place of that a recommendation on the best one (or more than one - but try to keep it to one) to go with and a note on the one to avoid or at use the least.

All this in one place would be a very handy resource. Some threads are massive and/or there are multiple threads for similar topics. For example szLen optimize (http://www.masm32.com/board/index.php?topic=1807.0) has 25 pages, and there are two threads on string copying: lstrcpy vs szCopy (http://www.masm32.com/board/index.php?topic=10830.0) has 4 pages, SzCpy vs. lstrcpy (http://www.masm32.com/board/index.php?topic=1589.0) has 8 pages. Could the lab implement other subforums here for different categories: string algos & optimizing, memory algos & optimizing, etc etc. I realise that could take a lot of work, also the placement of topics into the correct subforums may coz moderators grief in moving and sorting etc. Maybe a post template could also be created that defines the information that should be provided when creating a new thread in the lab for purposes of presenting a new or modified algo.

Just some ideas, hope no one takes offence.

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on May 12, 2009, 06:35:33 PM

Quote from: Jimg on May 12, 2009, 02:56:44 PM

Here's a sample. I was getting rock solid repeatability with only ten loops.

My 2 cts - Celeron M. It took me a while because it choked with ml 9.0 ;-)
Well organised table, by the way :U

Code Select

Type       Short   Medium     Long HighAnsi MisMatch
Source   Mainstr  Mainstr  Mainstr  Mainstr mismatch
Pattern  Substr1  Substr2  Substr3  Substr4   mmword
======= ======== ======== ======== ======== ========
OVH          264        0        0        0        0
InStrL        96      132      348      468     1152
finstr        96      288      432      552     1500
finstr2      144      216      372      372     1608
finstr3      144      204      348      348     1512
finstr4       84      300      456      588     1500
finstr5       84      324      456      612     1488
finstr6      144      216      360      372     1608

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on May 12, 2009, 07:01:53 PM

Quote from: fearless on May 12, 2009, 06:21:20 PM
Just some ideas, hope no one takes offence.

Excellent ideas, fearless. Some of these threads are really too detailed (and I plead guilty :bg), and unfortunately not always on topic. Good coders tend to have strong personalities :wink

Standardisation could help a lot - Michael's timer macros are a fantastic achievement, but we might add some rules, e.g. how to pass parameters (we all know that fastcall is a few cycles faster...). You may have seen the timings macro in my posts above, I have done that one to suit my own laziness, but I believe we could work out something along those lines. Not to mention LAMPs :wink

Title: Re: InstrCi - case-insensitive string search
Post by: dedndave on May 12, 2009, 07:16:41 PM

all good ideas
i agree with fearless - some of the threads require a lot of reading that does not pertain to what you want to find
perhaps we could start by outlining some goals and making a list of which members have which CPU's
that way, when a new piece of code is introduced, we can get it tested on several platforms, then hash over the optimization
it would give all members a chance to contribute - acknowledging that a few members are really good at it - some not so good (like me - lol)
along the way, we all get to learn tricks and tips
i know some old stuff that needs to be modernized by the optimizing wizards that hang out in here
but, for me, it will be mostly a learning experience

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on May 12, 2009, 08:02:30 PM

One more element to the never ending story: I tried another mainstring/pattern with otherwise identical settings.

Result: For the "slow" setting, execution is around half as fast and cycle counts are very erratic, but they never reach 1000-1200 cycles more as with the long mainstring. Which means a proportional pattern, i.e. the algo just runs with breaks on. The fast version runs very stable at 325 cycles.

Intel(R) Celeron(R) M CPU 420
Average cycle counts:

Slow version:
600-800 InstrJJ

Fast version:
325 InstrJJ

Title: Re: InstrCi - case-insensitive string search
Post by: Jimg on May 12, 2009, 09:27:29 PM

Quote from: jj2007 on May 12, 2009, 06:35:33 PM
Quote from: Jimg on May 12, 2009, 02:56:44 PM

Here's a sample. I was getting rock solid repeatability with only ten loops.

My 2 cts - Celeron M. It took me a while because it choked with ml 9.0 ;-)
Well organised table, by the way :U

Code Select Expand
Type Short Medium Long HighAnsi MisMatch Source Mainstr Mainstr Mainstr Mainstr mismatch Pattern Substr1 Substr2 Substr3 Substr4 mmword ======= ======== ======== ======== ======== ======== OVH 264 0 0 0 0 InStrL 96 132 348 468 1152 finstr 96 288 432 552 1500 finstr2 144 216 372 372 1608 finstr3 144 204 348 348 1512 finstr4 84 300 456 588 1500 finstr5 84 324 456 612 1488 finstr6 144 216 360 372 1608

So where's instrjj ?

Title: Re: InstrCi - case-insensitive string search
Post by: fearless on May 12, 2009, 11:00:51 PM

Code Select

Type       Short   Medium     Long HighAnsi MisMatch
Source   Mainstr  Mainstr  Mainstr  Mainstr mismatch
Pattern  Substr1  Substr2  Substr3  Substr4   mmword
======= ======== ======== ======== ======== ========
OVH          229        0        0        0        0
InStrL        51       94      383      340      842
finstr        51      153      238      281      816
finstr2      111      145      264      255      952
finstr3      111      153      247      238      986
finstr4       51      204      289      417      799
finstr5       60      187      272      383      808
finstr6      111      145      264      255      952

Title: Re: InstrCi - case-insensitive string search
Post by: dedndave on May 12, 2009, 11:48:00 PM

fearless CPU: Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz (SSE4)
nice numbers
i like the core 2 quad q9550, i think

here are mine for a prescott - here we go with the funny minus signs, again
i hope some of the other numbers are wrong, as well - lol

Type Short Medium Long HighAnsi MisMatch
Source Mainstr Mainstr Mainstr Mainstr mismatch
Pattern Substr1 Substr2 Substr3 Substr4 mmword
======= ======== ======== ======== ======== ========
OVH 555 -30 -15 -30 -23
InStrL 120 232 577 855 2482
finstr 142 405 578 750 2295
finstr2 158 337 750 690 2670
finstr3 120 412 712 735 2745
finstr4 82 420 712 893 2175
finstr5 127 487 773 1005 2408
finstr6 158 360 600 720 2760

Title: Re: InstrCi - case-insensitive string search
Post by: NightWare on May 13, 2009, 12:38:19 AM

Quote from: jj2007 on May 12, 2009, 11:55:25 AM
Sounds plausible, but can an executable with only 28k overlap with a cache line boundary if the L1 cache is 32k? Is that the explanation?? If I understand the article correctly, two cache lines may be allocated, fine. But only once, or for every call, costing 1000 cycles??

hmm, it is an alignment problem, but not the "normal alignment", not directly... a pentium M can't deal with more than 4 uops in one operation (it's under the abilities of a P3), with this "skill" (i see lingo laughting here...), a single byte can change lot of things (especially with recent instructions sets)...

Quote from: jj2007 on May 12, 2009, 07:01:53 PM
Standardisation could help a lot - Michael's timer macros are a fantastic achievement, but we might add some rules, e.g. how to pass parameters (we all know that fastcall is a few cycles faster...). You may have seen the timings macro in my posts above, I have done that one to suit my own laziness, but I believe we could work out something along those lines. Not to mention LAMPs :wink

hmm, look like a racism against real coders :'(

Quote from: fearless on May 12, 2009, 06:21:20 PM
All this in one place would be a very handy resource. Some threads are massive and/or there are multiple threads for similar topics.

it's not as simple, periodically someone found another way to do stuff, it's also very often that algos don't do exactly the same things... or don't use the same instructions sets... etc... for example, in my case (as french), a BM algo or even a case insensitive algo are useless, why ? because with all the accents, etc... (and possible errors linked to this) it can't give satisfying results. so i have to use something else (i post it coz it can be usefull on others languages, however you must correct the values of the table, depending of your language.) :

Code Select

.DATA
;
; tables pour obtenir les caractères majuscules équivalents (fait 0.25 ko, cette table est utilisée pour réduire la recherche/comparaison aux
; caractères majuscules basiques, pour tenir compte des erreurs concernant les accents, trémas, cédilles, etc...)
;
_Search_Table_		BYTE 000,001,002,003,004,005,006,007,008,009,010,011,012,013,014,015
				BYTE 016,017,018,019,020,021,022,023,024,025,026,027,028,029,030,031
; le jeu de caractères classique :
				BYTE 032,033,034,035,036,037,038,039,040,041,042,043,044,045,046,047
				BYTE 048,049,050,051,052,053,054,055,056,057,058,059,060,061,062,063
				BYTE 064,065,066,067,068,069,070,071,072,073,074,075,076,077,078,079
				BYTE 080,081,082,083,084,085,086,087,088,089,090,091,092,093,094,095
				BYTE 096,065,066,067,068,069,070,071,072,073,074,075,076,077,078,079
				BYTE 080,081,082,083,084,085,086,087,088,089,090,123,124,125,126,127
; le jeu de caractères étendu :
				BYTE 128,129,130,131,132,133,134,135,136,137,83,139,140,141,90,143
				BYTE 144,145,146,147,148,149,150,151,152,153,83,155,140,157,90,89
				BYTE 160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175
				BYTE 176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191
				BYTE 65,65,65,65,65,65,198,67,69,69,69,69,73,73,73,73
				BYTE 68,78,79,79,79,79,79,215,216,85,85,85,85,89,222,223
				BYTE 65,65,65,65,65,65,198,67,69,69,69,69,73,73,73,73
				BYTE 68,78,79,79,79,79,79,247,216,85,85,85,85,89,222,89

.CODE
ALIGN 16
;
; trouver une chaine de caractères dans une chaîne/un texte ascii (la première occurrence)
; note : insensible à la casse, accents compris... attention, ne teste pas si la position est valide
;
; syntax :
; mov eax,starting position of the search (in the string to scan)
; mov esi,OFFSET (address of the string to find)
; mov edi,OFFSET (address of the string to scan)
; call  FindNextString
;
; Return :
; eax = -1 if the string isn't found, otherwise eax has the position of the string
;
FindNextString PROC
		push ebx										;; empiler ebx
		push ecx										;; empiler ecx
		push edx										;; empiler edx

		add eax,edi									;; on ajoute la position à l'adresse de la chaîne
; on stocke le premier octet à trouver
		movzx ebx,BYTE PTR [esi]							;; placer le premier octet à trouver dans bl
		mov bl,BYTE PTR [_Search_Table_+ebx]				;; placer la majuscule de la table dans bl
		dec eax										;; enlever 1 à l'adresse ( cause de la ligne qui suit...)
; maintenant, recherche du premier caractère identique
Label1:	inc eax										;; ajouter 1 à l'adresse de loctet à comparer
		movzx edx,BYTE PTR [eax]							;; lire l'octet à comparer, et le placer dans dl
		test edx,edx									;; fixer les flags de dl
		jz Label5										;; si c'est égal à 0, aller Label5
		mov dl,BYTE PTR [_Search_Table_+edx]				;; sinon, placer la majuscule de la table dans dl
		cmp dl,bl										;; comparer ensuite à la majuscule à trouver
		jne Label1									;; si c'est différent, on continu la recherche, aller Label1
; sinon, on teste les caractères suivants 
Label2:	xor ecx,ecx									;; effacer ecx
Label3:	inc ecx										;; augmenter ecx d'1
		movzx edx,BYTE PTR [esi+ecx]						;; placer l'octet suivant dans dl
		test edx,edx									;; fixer les flags de edx
		jz Label4										;; si c'est égal à 0, aller Label4
		mov bh,BYTE PTR [_Search_Table_+edx]				;; sinon, placer la majuscule de la table dans bh
		movzx edx,BYTE PTR [eax+ecx]						;; lire l'octet suivant à comparer
		mov dl,BYTE PTR [_Search_Table_+edx]				;; placer la majuscule de la table dans dl
		cmp dl,bh										;; comparer ensuite à la majuscule à trouver
		je Label3										;; si c'est égal à 0, aller Label3
		jmp Label1									;; aller Label1
; ici la chaîne est trouvée, alors on defini eax qui contiendra l'adresse 
Label4:	sub eax,edi									;; on soustrait l'adresse de la chaîne pour obtenir la position trouvée dans la chaîne

		pop edx										;; désempiler edx
		pop ecx										;; désempiler ecx
		pop ebx										;; désempiler ebx
	ret												;; retourner (sortir de la procédure)
; chaîne non trouvée
Label5:	mov eax,-1									;; sinon on met -1 (chaîne non trouvée dans eax)

		pop edx										;; désempiler edx
		pop ecx										;; désempiler ecx
		pop ebx										;; désempiler ebx
	ret												;; retourner (sortir de la procédure)
FindNextString ENDP

Title: Re: InstrCi - case-insensitive string search
Post by: dedndave on May 13, 2009, 01:12:12 AM

Quotehmm, look like a racism against real coders :'(

i understand what you mean by this, but you can't tell me you learned everything out of a book
somehwere along the line, you learned from someone else, also
i considered myself to be a pretty fair programmer in 16-bit
not that i was a pro, but i have done a lot of math that many cannot do
i'll show you mine if you show me yours :lol

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on May 13, 2009, 05:41:38 AM

Quote from: NightWare on May 13, 2009, 12:38:19 AM
a single byte can change lot of things (especially with recent instructions sets)...

Algo is unchanged, code start is 16-byte aligned in both cases. Just 16 bytes further up, and it slows down by 40% and timings get erratic...

Quote
Quote from: jj2007 on May 12, 2009, 07:01:53 PM
Standardisation could help a lot - Michael's timer macros are a fantastic achievement, but we might add some rules, e.g. how to pass parameters (we all know that fastcall is a few cycles faster...). You may have seen the timings macro in my posts above, I have done that one to suit my own laziness, but I believe we could work out something along those lines. Not to mention LAMPs :wink
hmm, look like a racism against real coders :'(

No racism intended. We want a testbed that makes algos comparable. Usually you do that with macros that allow a syntax of this type

Code Select

	timings crt_strstr, addr Mainstr, addr TestSubD
	timings InstrCi, 1, addr Mainstr, addr TestSubE, 0
	timings BMLinDB, addr Mainstr, addr TestSubEx, (SIZEOF TestSubEx)-1

where each line replaces

Code Select

	counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
		invoke crt_strstr, addr Mainstr, addr TestSubD
	counter_end
	print str$(eax), 9, "cycles for rep stosd", 13, 10

Now it gets messy if one programmer says "my algo needs Mainstr in eax and Pattern in ecx, only then it will be really fast".

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on May 13, 2009, 06:04:06 AM

Quote from: Jimg on May 12, 2009, 09:27:29 PM

So where's instrjj ?

Here it is. I had to change quite a number of things to make it compatible ;-)

Code Select

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
Type       Short   Medium     Long HighAnsi MisMatch
Source   Mainstr  Mainstr  Mainstr  Mainstr mismatch
Pattern  Substr1  Substr2  Substr3  Substr4   mXword
======= ======== ======== ======== ======== ========
OVH          264        0        0        0        0
InStrL        84      132      360      468     1152
finstr        96      288      432      552     1500
finstr2      144      216      372      372     1608
finstr3      144      204      348      348     1512
finstr4       84      324      456      624     1500
finstr5       84      324      456      612     1488
finstr6      156      216      360      372     1608
InstrJJ       84      120      204      132     1536

[attachment deleted by admin]

Title: Re: InstrCi - case-insensitive string search
Post by: dedndave on May 13, 2009, 07:09:24 AM

i find this very disturbing
i don't know what's going on, here
i get very different numbers from everyone else on my prescott
also, i get very different numbers if i redirect the output to a text file
and different numbers again if i open it with Windows Explorer
i am running XP MCE2005 SP2 (the OS shouldn't make much difference - who knows)

compare these results with the ones i pasted a few posts back ^

here, i open a command window and redirect the output with "instrnew>test.txt"
i get similar results by running it without redirection (some test programs, i get different results by redirecting)

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
Type Short Medium Long HighAnsi MisMatch
Source Mainstr Mainstr Mainstr Mainstr mismatch
Pattern Substr1 Substr2 Substr3 Substr4 mXword
======= ======== ======== ======== ======== ========
OVH 487 0 0 0 0
InStrL 91 203 495 750 1433
finstr 90 278 420 511 1418
finstr2 105 270 675 540 1695
finstr3 105 285 548 540 1718
finstr4 98 285 473 638 1343
finstr5 105 353 623 705 1403
finstr6 105 270 428 533 1696
InstrJJ 188 180 451 248 2888

on these two, i ran it by clicking on it in Windows Explorer
notice the inconsistencies, especially the minus signs

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
Type Short Medium Long HighAnsi MisMatch
Source Mainstr Mainstr Mainstr Mainstr mismatch
Pattern Substr1 Substr2 Substr3 Substr4 mXword
======= ======== ======== ======== ======== ========
OVH 517 -7 8 0 -7
InStrL 143 248 555 923 2303
finstr 158 443 675 788 2408
finstr2 188 398 668 765 2738
finstr3 188 413 818 736 2783
finstr4 188 450 908 938 2198
finstr5 188 533 781 1035 2258
finstr6 188 390 668 728 2775
InstrJJ 278 300 841 391 4531

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
Type Short Medium Long HighAnsi MisMatch
Source Mainstr Mainstr Mainstr Mainstr mismatch
Pattern Substr1 Substr2 Substr3 Substr4 mXword
======= ======== ======== ======== ======== ========
OVH 495 23 15 23 22
InStrL 218 270 653 915 2400
finstr 180 443 622 780 2467
finstr2 232 405 773 795 2813
finstr3 202 458 780 802 2850
finstr4 173 488 892 960 2220
finstr5 165 548 787 1072 2288
finstr6 180 412 690 780 2745
InstrJJ 285 322 698 397 4417

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on May 13, 2009, 07:35:50 AM

Quote from: dedndave on May 13, 2009, 07:09:24 AM
i get very different numbers from everyone else on my prescott

Just launched my Prescott:

Code Select

Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
Type       Short   Medium     Long HighAnsi MisMatch
Source   Mainstr  Mainstr  Mainstr  Mainstr mismatch
Pattern  Substr1  Substr2  Substr3  Substr4   mXword
======= ======== ======== ======== ======== ========
OVH          459       17       17       17       -9
InStrL       110      229      382      535     1402
finstr       119      263      399      519     1428
finstr2      127      280      450      561     1708
finstr3      127      289      459      569     1725
finstr4      110      323      527      620     1360
finstr5      119      340      544      722     1428
finstr6      127      297      450      552     1708
InstrJJ      196      187      467      280     2661

This is with TimesToLoop equ 10000. With only 10, timings are pretty erratic...

Title: Re: InstrCi - case-insensitive string search
Post by: sinsi on May 13, 2009, 07:37:07 AM

I tried the redirection thing and got exactly the same numbers as from explorer (no joke, they all matched).

Code Select


Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz (SSE4)
Type       Short   Medium     Long HighAnsi MisMatch
Source   Mainstr  Mainstr  Mainstr  Mainstr mismatch
Pattern  Substr1  Substr2  Substr3  Substr4   mXword
======= ======== ======== ======== ======== ========
OVH          252        0        0        0        0
InStrL        63      108      270      360      882
finstr        54      180      243      297      819
finstr2      108      171      288      261      954
finstr3      117      180      261      252      954
finstr4       54      216      279      387      810
finstr5       45      216      279      387      810
finstr6      108      171      288      261      954
InstrJJ       63       72      135      126     1323

Title: Re: InstrCi - case-insensitive string search
Post by: fearless on May 13, 2009, 07:41:55 AM

Code Select

Intel(R) Core(TM)2 Quad  CPU   Q9550  @ 2.83GHz (SSE
Type       Short   Medium     Long HighAnsi MisMatch
Source   Mainstr  Mainstr  Mainstr  Mainstr mismatch
Pattern  Substr1  Substr2  Substr3  Substr4   mXword
======= ======== ======== ======== ======== ========
OVH          229        0        0        0        0
InStrL        51      102      187      340      842
finstr        51      153      238      281      808
finstr2      102      145      247      255      952
finstr3      102      162      255      247      986
finstr4       51      213      272      417      799
finstr5       43      196      264      374      808
finstr6      111      145      247      255      961
InstrJJ       60       68      136       77     1369

Title: Re: InstrCi - case-insensitive string search
Post by: dedndave on May 13, 2009, 08:29:45 AM

lol - fearless gets such nice numbers with that CPU
i hafta go buy one :U

i have been playing with selecting cores on a multi-core machine
i have tried to keep in mind that, in some cases, the cores may not be the same CPU type, especially in the case of overdrive chips
it seems that, if they have an overdrive chip, you have to use SetProcessAffinityMask to tell the system which core you want to look at
this includes the CPUID - how else would you select which processor to run CPUID, right ?
if they have different cores, you may have to run the CPUID routine more than once, right?
(also, for certain non-Intel/non-AMD CPU's, you have to enable the CPUID instruction)
you have to use SetProcessAffinityMask on a multi-core for the RDTSC instruction, also

now, here are a couple things that i have not seen in any of the test code that i have looked at to date

in order to use Get/SetProcessAffinityMask to control core selection.....

1) use GetSecurityInfo to see ... (of course, there is no documented "SetSecurityInfo" function - :P )

to use GetProcessAffinityMask on XP or Windows server 2003, you must have:
PROCESS_QUERY_INFORMATION access right
to use GetProcessAffinityMask on other OS's, you must have:
PROCESS_QUERY_INFORMATION or PROCESS_QUERY_LIMITED_INFORMATION access right

to use SetProcessAffinityMask (to manipulate core selection), you must have:
PROCESS_SET_INFORMATION access right

2) to use SetProcessAffinityMask (to manipulate core selection)
use SetProcessAffinityUpdateMode to set PROCESS_AFFINITY_ENABLE_AUTO_UPDATE

The system can adjust process affinity under various conditions, such as when a processor is added dynamically. By default, dynamic updates to the process affinity are disabled for each process.

Processes should use this function to indicate whether they can handle dynamic adjustment of process affinity by the system. After a process enables affinity update mode, it can call this function to disable it. However, a process cannot enable affinity update mode after it has used this function to disable it.

in other words, i don't think anyone is looking at the return values when they try to control core selection for RDTSC
if they were, they should get an error condition, no ?
at any rate, it should be verifiable by using GetProcessAffinityMask to see if they have selected a single core to read the counter
also, for a complete CPU ID routine, you need to ID the OS to see if you can even read the GetProcessAffinityMask value
and, you need to be able to control SetProcessAffinityMask in order to specify which core the CPUID instruction is executing on

running a test on a single core machine - easy
multi-core - not so easy

Title: Re: InstrCi - case-insensitive string search
Post by: UtillMasm on May 13, 2009, 09:14:16 AM

Code Select

c:\china>"C:\Program Files\Microsoft SDKs\Windows\v6.0\VC\Bin\ml.exe" /c /coff instrnew.asm
Microsoft (R) Macro Assembler Version 8.00.50727.762
Copyright (C) Microsoft Corporation.  All rights reserved.

 Assembling: instrnew.asm

c:\china>\masm32\bin\link.exe /subsystem:console instrnew.obj
Microsoft (R) Incremental Linker Version 5.12.8078
Copyright (C) Microsoft Corp 1992-1998. All rights reserved.


c:\china>instrnew.exe
Genuine Intel(R) CPU           T2400  @ 1.83GHz (SSE3)
Type       Short   Medium     Long HighAnsi MisMatch
Source   Mainstr  Mainstr  Mainstr  Mainstr mismatch
Pattern  Substr1  Substr2  Substr3  Substr4   mXword
======= ======== ======== ======== ======== ========
OVH          264        0        0        0        0
InStrL        88      132      352      462     1155
finstr        88      286      429      550     1507
finstr2      143      209      363      363     1617
finstr3      143      209      352      352     1507
finstr4       88      308      451      594     1496
finstr5       88      319      462      605     1496
finstr6      143      209      363      363     1617
InstrJJ       88      121      209      132     1518

Press enter to exit...

c:\china>

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on May 13, 2009, 09:25:03 AM

Made a few changes to Jim's package:
- ShowCPU
- Overall cycles (helps to quickly identify problems with timing consistency etc)
- increased TimesToLoop to 10000
- added an "easy" mismatch case ("xillabong"; note the old mmword has become a reserved word)

Code Select

Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
Type       Short   Medium     Long HighAnsi MisMatch MisMatch  Overall
Source   Mainstr  Mainstr  Mainstr  Mainstr mismatch mismatch  sum of
Pattern  Substr1  Substr2  Substr3  Substr4  mmStr_b  mmStr_x  cycles
======= ======== ======== ======== ======== ======== ======== ========
OVH          459       17       17       17       17       17      544
InStrL       162      162      502      527     1326      366     3045
finstr       119      272      391      527     1411      722     3442
finstr2      119      280      442      552     1708      765     3866
finstr3      119      280      467      552     1717      773     3908
finstr4      111      306      535      671     1352      927     3902
finstr5      110      340      637      689     1411     1045     4232
finstr6      111      272      459      544     1708      765     3859
InstrJJ      204      187      476      280     2644      297     4088

EDIT: Timings on the Celeron are a lot more stable:

Code Select

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
Type       Short   Medium     Long HighAnsi MisMatch MisMatch  Overall
Source   Mainstr  Mainstr  Mainstr  Mainstr mismatch mismatch  sum of
Pattern  Substr1  Substr2  Substr3  Substr4  mmStr_b  mmStr_x  cycles
======= ======== ======== ======== ======== ======== ======== ========
OVH          264        0        0        0        0        0      264
InStrL        84      132      360      468     1152      300     2496
finstr       108      288      432      552     1500      828     3708
finstr2      144      216      360      372     1608      540     3240
finstr3      144      204      348      348     1512      528     3084
finstr4       84      324      456      636     1500      924     3924
finstr5       84      336      456      612     1488      996     3972
finstr6      144      216      360      372     1608      540     3240
InstrJJ       84      120      204      132     1548      156     2244

[attachment deleted by admin]

Title: Re: InstrCi - case-insensitive string search
Post by: dedndave on May 13, 2009, 09:46:53 AM

much better
i get very close results either way i run it
now, can you make my prescott run like fearless's machine ? lol

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
Type Short Medium Long HighAnsi MisMatch MisMatch Overall
Source Mainstr Mainstr Mainstr Mainstr mismatch mismatch sum of
Pattern Substr1 Substr2 Substr3 Substr4 mmStr_b mmStr_x cycles
======= ======== ======== ======== ======== ======== ======== ========
OVH 487 0 0 0 0 0 487
InStrL 135 203 308 518 1290 360 2814
finstr 90 255 375 473 1403 683 3279
finstr2 113 270 435 495 1695 758 3766
finstr3 105 263 435 495 1695 750 3743
finstr4 98 278 540 601 1335 915 3767
finstr5 98 345 495 660 1403 1028 4029
finstr6 98 270 421 533 1673 758 3753
InstrJJ 188 180 450 248 2640 270 3976

Title: Re: InstrCi - case-insensitive string search
Post by: dedndave on May 13, 2009, 09:59:08 AM

i may have misunderstood the use of
SetProcessAffinityUpdateMode to set PROCESS_AFFINITY_ENABLE_AUTO_UPDATE
i need to research this one a little more
as i read it, it needed to be enabled to use SetProcessAffinityMask
after a re-read, it may need to be disabled to dis-allow the system from altering the ProcessAffinity dynamically
i can write a little program to test this - probably easier than reading MS docs

Title: Re: InstrCi - case-insensitive string search
Post by: UtillMasm on May 13, 2009, 10:12:09 AM

Code Select

#c:\china>"\Program Files\Microsoft SDKs\Windows\v6.0\VC\Bin\ml.exe" /c /coff instrnew.asm
#Microsoft (R) Macro Assembler Version 8.00.50727.762
#Copyright (C) Microsoft Corporation.  All rights reserved.
#
# Assembling: instrnew.asm
#\masm32\include\windows.inc(16610) : error A2138: invalid data initializer
#
#c:\china>

#
why?

Code Select

#ImportRec record Reserved:11,NameType:3,Type2:2
#IMPORT_OBJECT_HEADER struct
# Sig1 dw ?
# Sig2 dw ?
# Version dw ?
# Machine dw ?
# TimeDateStamp dd ?
# SizeOfData dd ?
# union
#  Ordinal dw ?
#  Hint dw ?
# ends
# rImport ImportRec <> ;line 16610
#IMPORT_OBJECT_HEADER ends

no why (because GeneSys windows.inc v1.30 comment IMPORT_OBJECT_HEADER struct.), comment line 16610 to continue...

Code Select

c:\china>"\Program Files\Microsoft SDKs\Windows\v6.0\VC\Bin\ml.exe" /c /coff instrnew.asm
Microsoft (R) Macro Assembler Version 8.00.50727.762
Copyright (C) Microsoft Corporation.  All rights reserved.

  Assembling: instrnew.asm

c:\china>\masm32\bin\link.exe /subsystem:console instrnew.obj
Microsoft (R) Incremental Linker Version 5.12.8078
Copyright (C) Microsoft Corp 1992-1998. All rights reserved.


c:\china>instrnew.exe
Genuine Intel(R) CPU           T2400  @ 1.83GHz (SSE3)
Type       Short   Medium     Long HighAnsi MisMatch MisMatch  Overall
Source   Mainstr  Mainstr  Mainstr  Mainstr mismatch mismatch  sum of
Pattern  Substr1  Substr2  Substr3  Substr4  mmStr_b  mmStr_x  cycles
======= ======== ======== ======== ======== ======== ======== ========
OVH          264        0        0        0        0        0      264
InStrL        88      132      341      462     1133      297     2453
finstr        88      286      429      550     1496      836     3685
finstr2      143      209      352      363     1617      539     3223
finstr3      121      209      352      352     1507      517     3058
finstr4       88      297      451      594     1496      924     3850
finstr5       77      308      451      605     1496      935     3872
finstr6      143      209      363      363     1606      539     3223
InstrJJ       88      121      209      132     1474      154     2178

Press any key to exit...
c:\china>

Title: Re: InstrCi - case-insensitive string search
Post by: fearless on May 13, 2009, 12:38:42 PM

Code Select

Intel(R) Core(TM)2 Quad  CPU   Q9550  @ 2.83GHz (SSE4)
Type       Short   Medium     Long HighAnsi MisMatch MisMatch  Overall
Source   Mainstr  Mainstr  Mainstr  Mainstr mismatch mismatch  sum of
Pattern  Substr1  Substr2  Substr3  Substr4  mmStr_b  mmStr_x  cycles
======= ======== ======== ======== ======== ======== ======== ========
OVH          229        0        0        0        0        0      229
InStrL        43       94      162      340      833      204     1676
finstr        43      153      230      281      808      434     1949
finstr2      111      145      247      255      952      383     2093
finstr3      102      162      255      247      978      349     2093
finstr4       43      204      289      417      799      663     2415
finstr5       43      187      264      366      800      587     2247
finstr6      102      145      239      255      952      375     2068
InstrJJ       60       68      128       77     1386       85     1804

I ran it a few times and noticed a few minor variations in some of the categories.

Code Select

Intel(R) Core(TM)2 Quad  CPU   Q9550  @ 2.83GHz (SSE4)
Type       Short   Medium     Long HighAnsi MisMatch MisMatch  Overall
Source   Mainstr  Mainstr  Mainstr  Mainstr mismatch mismatch  sum of
Pattern  Substr1  Substr2  Substr3  Substr4  mmStr_b  mmStr_x  cycles
======= ======== ======== ======== ======== ======== ======== ========
OVH          229        0        0        0        0        0      229
InStrL        43      102      230      332      842      204     1753
finstr        43      153      230      281      808      434     1949
finstr2      102      145      230      247      952      374     2050
finstr3      111      162      255      238      978      357     2101
finstr4       43      204      289      417      799      663     2415
finstr5       43      187      264      366      808      587     2255
finstr6      102      145      247      247      952      366     2059
InstrJJ       60       68      128       77     1386       85     1804

Press any key to exit...
C:\Downloads\InstrTable>instrnew
Intel(R) Core(TM)2 Quad  CPU   Q9550  @ 2.83GHz (SSE4)
Type       Short   Medium     Long HighAnsi MisMatch MisMatch  Overall
Source   Mainstr  Mainstr  Mainstr  Mainstr mismatch mismatch  sum of
Pattern  Substr1  Substr2  Substr3  Substr4  mmStr_b  mmStr_x  cycles
======= ======== ======== ======== ======== ======== ======== ========
OVH          229        0        0        0        0        0      229
InStrL        51      102      179      340      842      204     1718
finstr        51      153      230      281      808      434     1957
finstr2      103      145      238      255      952      374     2067
finstr3      111      162      255      247      978      357     2110
finstr4       43      204      289      417      799      663     2415
finstr5       43      188      272      366      799      604     2272
finstr6      102      145      247      255      952      374     2075
InstrJJ       60       68      128       77     1386       85     1804

Press any key to exit...
C:\Downloads\InstrTable>instrnew
Intel(R) Core(TM)2 Quad  CPU   Q9550  @ 2.83GHz (SSE4)
Type       Short   Medium     Long HighAnsi MisMatch MisMatch  Overall
Source   Mainstr  Mainstr  Mainstr  Mainstr mismatch mismatch  sum of
Pattern  Substr1  Substr2  Substr3  Substr4  mmStr_b  mmStr_x  cycles
======= ======== ======== ======== ======== ======== ======== ========
OVH          229        0        0        0        0        0      229
InStrL        43       94      162      340      842      204     1685
finstr        43      153      238      281      808      434     1957
finstr2      103      145      247      255      952      383     2085
finstr3      111      153      255      230      986      357     2092
finstr4       43      204      289      417      799      663     2415
finstr5       43      187      264      383      799      587     2263
finstr6      102      145      230      255      952      383     2067
InstrJJ       60       68      128       77     1386       85     1804

Title: Re: InstrCi - case-insensitive string search
Post by: dedndave on May 13, 2009, 01:17:41 PM

to simplify the security access rights issue.....

because the requirements are different for different operating systems,
rather than using GetSecurityInfo to determine of you have sufficient access rights,
it is easier to just use something like this:
1) GetProcessAffinityMask (save the values)
2) SetProcessAffinityMask (select a single core mask)
3) GetProcessAffinityMask (test to see if you changed it)

using the Get/SetPriorityClass has the same access rights requirements
as Get/SetProcessAffinityMask and may be tested in a similar manner

i guess if you can change PriorityClass level, then you can change ProcessAffinity as well
only need to test one

Title: Re: InstrCi - case-insensitive string search
Post by: Mark Jones on May 13, 2009, 03:50:52 PM

Code Select


AMD Athlon(tm) 64 X2 Dual Core Processor 4000+ (SSE3)
Type       Short   Medium     Long HighAnsi MisMatch MisMatch  Overall
Source   Mainstr  Mainstr  Mainstr  Mainstr mismatch mismatch  sum of
Pattern  Substr1  Substr2  Substr3  Substr4  mmStr_b  mmStr_x  cycles
======= ======== ======== ======== ======== ======== ======== ========
OVH           71        0        0        0        0        0       71
InStrL       100      110      328      425      905      263     2131
finstr        88      235      471      444     1382      663     3283
finstr2       99      201      407      335     1504      468     3014
finstr3       98      188      371      320     1467      457     2901
finstr4       79      214      439      420     1262      609     3023
finstr5       74      215      449      403     1274      593     3008
finstr6       99      202      388      335     1505      460     2989
InstrJJ      136      134      358      161     2006      204     2999

Run2:
======= ======== ======== ======== ======== ======== ======== ========
OVH           71        0        0        0        0        0       71
InStrL        82      110      328      424      905      263     2112
finstr        88      235      471      444     1382      663     3283
finstr2       99      202      392      319     1501      465     2978
finstr3       98      188      386      320     1467      457     2916
finstr4       79      214      439      420     1262      616     3030
finstr5       74      215      449      403     1274      593     3008
finstr6       99      202      388      335     1504      468     2996
InstrJJ      136      134      362      161     2006      206     3005

Run3:
======= ======== ======== ======== ======== ======== ======== ========
OVH           71        0        0        0        0        0       71
InStrL        83      110      328      425      905      263     2114
finstr        88      235      469      444     1380      663     3279
finstr2       99      202      392      327     1507      462     2989
finstr3       98      188      386      320     1466      457     2915
finstr4       79      214      423      420     1261      616     3013
finstr5       74      215      449      403     1274      593     3008
finstr6       99      202      407      334     1507      468     3017
InstrJJ      136      134      349      161     2006      206     2992

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on May 13, 2009, 04:47:37 PM

More than 2000 cycles on my ultrafast Celeron was just not acceptable, so I decided to dig out my trusty old word scanner:

Code Select

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
Type        Short   Medium     Long HighAnsi MisMatch MisMatch  Overall
Source    Mainstr  Mainstr  Mainstr  Mainstr mismatch mismatch  sum of
Pattern   Substr1  Substr2  Substr3  Substr4  mmStr_b  mmStr_x  cycles
======== ======== ======== ======== ======== ======== ======== ========
OVH           264        0        0        0        0        0      264
InStrL         96      144      252      468     1152      300     2412
finstr         84      288      444      588     1500      828     3732
finstr2       144      216      360      372     1608      540     3240
finstr3       144      204      348      348     1512      528     3084
finstr4        84      300      456      648     2352      924     4764
finstr5        84      336      492      648     1488     1008     4056
finstr6       144      216      360      372     1608      540     3240
InstrJJ        84      120      204      132     1536      156     2232
InstrJJw       60      120      180      156     1164      240     1920

:bg

[attachment deleted by admin]

Title: Re: InstrCi - case-insensitive string search
Post by: dedndave on May 13, 2009, 05:09:18 PM

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
Type Short Medium Long HighAnsi MisMatch MisMatch Overall
Source Mainstr Mainstr Mainstr Mainstr mismatch mismatch sum of
Pattern Substr1 Substr2 Substr3 Substr4 mmStr_b mmStr_x cycles
======== ======== ======== ======== ======== ======== ======== ========
OVH 487 0 0 0 0 0 487
InStrL 90 150 480 525 1290 315 2850
finstr 90 270 383 473 1403 698 3317
finstr2 106 248 428 540 1673 758 3753
finstr3 98 263 510 495 1695 750 3811
finstr4 91 270 458 600 1343 915 3677
finstr5 90 323 608 661 1403 1028 4113
finstr6 98 255 428 495 1680 758 3714
InstrJJ 188 180 450 248 2648 278 3992
InstrJJw 90 180 338 293 2730 405 4036

Title: Re: InstrCi - case-insensitive string search
Post by: fearless on May 13, 2009, 05:51:08 PM

Code Select

Intel(R) Core(TM)2 Quad  CPU   Q9550  @ 2.83GHz (SSE4)
Type        Short   Medium     Long HighAnsi MisMatch MisMatch  Overall
Source    Mainstr  Mainstr  Mainstr  Mainstr mismatch mismatch  sum of
Pattern   Substr1  Substr2  Substr3  Substr4  mmStr_b  mmStr_x  cycles
======== ======== ======== ======== ======== ======== ======== ========
OVH           229        0        0        0        0        0      229
InStrL         51       94      162      332      842      204     1685
finstr         43      153      230      281      808      434     1949
finstr2       102      145      247      255      952      374     2075
finstr3       111      145      247      238      978      349     2068
finstr4        51      179      264      374      799      587     2254
finstr5        43      187      272      383      808      595     2288
finstr6       102      145      247      255      952      374     2075
InstrJJ        60       68      128       77     1318       85     1736
InstrJJw       43       77      153      136      910      187     1506

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on May 13, 2009, 06:30:29 PM

Just for the fun of it, I wrote a little proggie that monitors how INSTR gets used in my main baby, simulating a typical user session. Wow what a surprise...!

Code Select

Avg Mainstr=	55.1	bytes
Avg Pattern=	3.46	bytes
Total calls=	13,237,796

Looks as if we should concentrate more on short strings around 55 bytes long, and ultrashort patterns ::)

EDIT: In the light of these findings, I added a switch called OptiStrings, which forces shorter Mainstrings - see below. I also added the current Masm32 library InString for comparison, as InstrM32 (that was a bit tricky with JimG's code copy scheme :8))

Code Select

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
Type        Short   Medium     Long HighAnsi MisMatch MisMatch  Overall
Source    Mainstr  Mainstr  Mainstr  Mainstr mismatch mismatch  sum of
Pattern   Substr1  Substr2  Substr3  Substr4  mmStr_b  mmStr_x  cycles
======== ======== ======== ======== ======== ======== ======== ========
OVH           264        0        0        0        0        0      264
InStrL         36       36       72      264       12      120      540
InstrM32      192      204      336      408      168      396     1704
finstr         48       84      132      300       24      312      900
finstr2        48       72      120      204       24      216      684
finstr3        48       72      120      204       36      192      672
finstr4        48       96      144      348       24      336      996
finstr5        48       84      132      336       24      324      948
finstr6        48       72      120      204       24      216      684
InstrJJ        48       60       96       96       48       84      432
InstrJJw       36       60       84      108       48      120      456

Sizes:
Mainstr         100
Substr1         4
Substr2         7
Substr3         14
Substr4         9

mismatch        100
mmStr_b         5
mmStr_x         5

[attachment deleted by admin]

Title: Re: InstrCi - case-insensitive string search
Post by: UtillMasm on May 13, 2009, 07:17:58 PM

Code Select

Genuine Intel(R) CPU           T2400  @ 1.83GHz (SSE3)
Type        Short   Medium     Long HighAnsi MisMatch MisMatch  Overall
Source    Mainstr  Mainstr  Mainstr  Mainstr mismatch mismatch  sum of
Pattern   Substr1  Substr2  Substr3  Substr4  mmStr_b  mmStr_x  cycles
======== ======== ======== ======== ======== ======== ======== ========
OVH           264        0        0        0        0        0      264
InStrL         88      132      253      462     1133      297     2365
finstr         88      286      429      550     1496      825     3674
finstr2       132      209      363      363     1606      539     3212
finstr3       132      209      352      352     1507      528     3080
finstr4        88      297      451      594     1496      924     3850
finstr5        77      308      451      605     1496      935     3872
finstr6       143      209      363      363     1606      539     3223
InstrJJ        88      121      209      132     1518      165     2233
InstrJJw       55      110      198      154     1155      231     1903

Title: Re: InstrCi - case-insensitive string search
Post by: Mark Jones on May 13, 2009, 09:31:04 PM

Interesting...

Code Select

AMD Athlon(tm) 64 X2 Dual Core Processor 4000+ (SSE3)
Type        Short   Medium     Long HighAnsi MisMatch MisMatch  Overall
Source    Mainstr  Mainstr  Mainstr  Mainstr mismatch mismatch  sum of
Pattern   Substr1  Substr2  Substr3  Substr4  mmStr_b  mmStr_x  cycles
======== ======== ======== ======== ======== ======== ======== ========
OVH            71        0        0        0        0        0       71
InStrL         30       32       63      235       21      111      492
InstrM32      177      249      354      558      168      520     2026
finstr         42       70      110      238       29      225      714
finstr2        45       64      109      186       33      162      599
finstr3        47       65      106      179       33      159      589
finstr4        42       67      111      228       26      216      690
finstr5        39       66      110      219       24      205      663
finstr6        45       64      109      186       33      166      603
InstrJJ        60       72      133      108       62      126      561
InstrJJw       53       76      122      130       58      148      587

Title: Re: InstrCi - case-insensitive string search
Post by: lingo on May 13, 2009, 11:44:46 PM

I'm wondering why the last version of InString-JJ is so slow
for my Test 13 --Find 'Duplicate inc' in 'windows.inc' -
clocks: 844498 against clocs: 679641 of previous version

Code Select



Intel(R) Core(TM)2 Duo CPU     E8500  @ 3.16GHz (SSE4)

Search Test 1 - value expected 37; lenSrchPattern ->22
InString - JJ:                         38 ; clocks: 84
SrchBytes - Lingo:                     37 ; clocks: 49

Search Test 2 - value expected 1007; lenSrchPattern ->17
InString - JJ:                         1008 ; clocks: 13623
SrchBytes - Lingo:                     1007 ; clocks: 11672

Search Test 3 - value expected 1008 ;lenSrchPattern ->16
InString - JJ:                         1009 ; clocks: 292
SrchBytes - Lingo:                     1008 ; clocks: 134

Search Test 4 - value expected 1008 ;lenSrchPattern ->16
InString - JJ:                         1009 ; clocks: 2979
SrchBytes - Lingo:                     1008 ; clocks: 1400

Search Test 5 - value expected 1008 ;lenSrchPattern ->16
InString - JJ:                         1009 ; clocks: 1829
SrchBytes - Lingo:                     1008 ; clocks: 1285

Search Test 6 - value expected 1008 ;lenSrchPattern ->16
InString - JJ:                         1009 ; clocks: 292
SrchBytes - Lingo:                     1008 ; clocks: 129

Search Test 7 - value expected 1009 ;lenSrchPattern ->14
InString - JJ:                         1010 ; clocks: 544
SrchBytes - Lingo:                     1009 ; clocks: 521

Search Test 8 - value expected 1001 ;lenSrchPattern ->1
InString - JJ:                         1002 ; clocks: 260
SrchBytes - Lingo:                     1001 ; clocks: 92

Search Test 9 - value expected 1001 ;lenSrchPattern ->2
InString - JJ:                         1002 ; clocks: 254
SrchBytes - Lingo:                     1001 ; clocks: 95

Search Test 10 - value expected 1001 ;lenSrchPattern ->3
InString - JJ:                         1002 ; clocks: 257
SrchBytes - Lingo:                     1001 ; clocks: 96

Search Test 11 - value expected 1001 ;lenSrchPattern ->4
InString - JJ:                         1002 ; clocks: 257
SrchBytes - Lingo:                     1001 ; clocks: 96

Search Test 12 - value expected 1001 ;lenSrchPattern ->5
InString - JJ:                         1002 ; clocks: 267
SrchBytes - Lingo:                     1001 ; clocks: 98

Search Test 13 --Find 'Duplicate inc' in 'windows.inc' ;lenSrchPattern ->13
InString - JJ:                         1127625 ; clocks: 844498
SrchBytes - Lingo:                     1127624 ; clocks: 510066

 Press ENTER to exit...

Title: Re: InstrCi - case-insensitive string search
Post by: UtillMasm on May 14, 2009, 04:48:12 AM

Code Select

Microsoft Windows [Version 6.0.6001]
Copyright (c) 2006 Microsoft Corporation.  All rights reserved.

c:\Users\Administrator>cd\china

c:\china>\masm32\bin\ml.exe /c /coff instrnew.asm
Microsoft (R) Macro Assembler Version 6.14.8444
Copyright (C) Microsoft Corp 1981-1997.  All rights reserved.

 Assembling: instrnew.asm
instrnew.asm(1315) : error A2008: syntax error : pshufd
instrnew.asm(1341) : error A2008: syntax error : xmm2
instrnew.asm(1348) : error A2008: syntax error : movdqa
instrnew.asm(1349) : error A2008: syntax error : movdqa
instrnew.asm(1495) : error A2008: syntax error : pshufd
instrnew.asm(1517) : error A2008: syntax error : xmm1
instrnew.asm(1518) : error A2008: syntax error : xmm4
instrnew.asm(1521) : error A2008: syntax error : movdqa
instrnew.asm(1527) : error A2008: syntax error : movdqa
instrnew.asm(1528) : error A2008: syntax error : xmm4
instrnew.asm(1529) : error A2008: syntax error : xmm2
instrnew.asm(1637) : error A2008: syntax error : pshufd
instrnew.asm(1638) : error A2008: syntax error : movdqu
instrnew.asm(1639) : error A2008: syntax error : movdqa
instrnew.asm(1314) : error A2006: undefined symbol : xmm0
instrnew.asm(1352) : error A2006: undefined symbol : xmm1
instrnew.asm(1353) : error A2006: undefined symbol : xmm1
instrnew.asm(1354) : error A2006: undefined symbol : xmm1
instrnew.asm(1355) : error A2006: undefined symbol : xmm1
instrnew.asm(1492) : error A2006: undefined symbol : xmm3
instrnew.asm(1531) : error A2006: undefined symbol : xmm1
instrnew.asm(1532) : error A2006: undefined symbol : xmm1
instrnew.asm(1533) : error A2006: undefined symbol : xmm4
instrnew.asm(1535) : error A2006: undefined symbol : xmm4
instrnew.asm(1536) : error A2006: undefined symbol : xmm2
instrnew.asm(1537) : error A2006: undefined symbol : xmm2
instrnew.asm(1636) : error A2006: undefined symbol : xmm3
instrnew.asm(1641) : error A2006: undefined symbol : xmm1
instrnew.asm(1642) : error A2006: undefined symbol : xmm2
instrnew.asm(1643) : error A2006: undefined symbol : xmm1
instrnew.asm(1644) : error A2006: undefined symbol : xmm2

c:\china>

replace ml.exe by 615 to continue.

Code Select

c:\china>\masm32\bin\ml.exe /c /coff instrnew.asm
Microsoft (R) Macro Assembler Version 6.15.8803
        Patched for you by promethee [ECL] in the year 2001 - enjoy
Copyright (C) Microsoft Corp 1981-2000.  All rights reserved.

 Assembling: instrnew.asm
instrnew.asm(1314) : error A2006: undefined symbol : xmm0
instrnew.asm(1315) : error A2006: undefined symbol : xmm0
instrnew.asm(1342) : error A2006: undefined symbol : xmm2
instrnew.asm(1349) : error A2006: undefined symbol : xmm2
instrnew.asm(1350) : error A2006: undefined symbol : xmm1
instrnew.asm(1352) : error A2006: undefined symbol : xmm1
instrnew.asm(1353) : error A2006: undefined symbol : xmm1
instrnew.asm(1354) : error A2006: undefined symbol : xmm1
instrnew.asm(1355) : error A2006: undefined symbol : xmm1
instrnew.asm(1492) : error A2006: undefined symbol : xmm3
instrnew.asm(1495) : error A2006: undefined symbol : xmm3
instrnew.asm(1518) : error A2006: undefined symbol : xmm1
instrnew.asm(1519) : error A2006: undefined symbol : xmm4
instrnew.asm(1522) : error A2006: undefined symbol : xmm2
instrnew.asm(1528) : error A2006: undefined symbol : xmm1
instrnew.asm(1529) : error A2006: undefined symbol : xmm4
instrnew.asm(1530) : error A2006: undefined symbol : xmm2
instrnew.asm(1531) : error A2006: undefined symbol : xmm1
instrnew.asm(1532) : error A2006: undefined symbol : xmm1
instrnew.asm(1533) : error A2006: undefined symbol : xmm4
instrnew.asm(1535) : error A2006: undefined symbol : xmm4
instrnew.asm(1536) : error A2006: undefined symbol : xmm2
instrnew.asm(1537) : error A2006: undefined symbol : xmm2
instrnew.asm(1636) : error A2006: undefined symbol : xmm3
instrnew.asm(1637) : error A2006: undefined symbol : xmm3
instrnew.asm(1638) : error A2006: undefined symbol : xmm1
instrnew.asm(1639) : error A2006: undefined symbol : xmm2
instrnew.asm(1641) : error A2006: undefined symbol : xmm1
instrnew.asm(1642) : error A2006: undefined symbol : xmm2
instrnew.asm(1643) : error A2006: undefined symbol : xmm1
instrnew.asm(1644) : error A2006: undefined symbol : xmm2

oh my fo!

jump to another way.

Code Select

c:\china>"\Program Files\Microsoft SDKs\Windows\v6.0\VC\Bin\ml.exe" /c /coff instrnew.asm
Microsoft (R) Macro Assembler Version 8.00.50727.762
Copyright (C) Microsoft Corporation.  All rights reserved.

 Assembling: instrnew.asm
\masm32\include\windows.inc(16569) : error A2191: cannot include structure in se
lf
\masm32\include\windows.inc(16610) : error A2138: invalid data initializer

oh my masm32!

comment 'FPO_DATA struct' (16569) and 'IMPORT_OBJECT_HEADER struct' (16610) to continue.

Code Select

c:\china>"\Program Files\Microsoft SDKs\Windows\v6.0\VC\Bin\ml.exe" /c /coff instrnew.asm
Microsoft (R) Macro Assembler Version 8.00.50727.762
Copyright (C) Microsoft Corporation.  All rights reserved.

 Assembling: instrnew.asm

c:\china>\masm32\bin\link.exe /subsystem:console instrnew.obj
Microsoft (R) Incremental Linker Version 5.12.8078
Copyright (C) Microsoft Corp 1992-1998. All rights reserved.


c:\china>instrnew.exe
Genuine Intel(R) CPU           T2400  @ 1.83GHz (SSE3)
Type        Short   Medium     Long HighAnsi MisMatch MisMatch  Overall
Source    Mainstr  Mainstr  Mainstr  Mainstr mismatch mismatch  sum of
Pattern   Substr1  Substr2  Substr3  Substr4  mmStr_b  mmStr_x  cycles
======== ======== ======== ======== ======== ======== ======== ========
OVH           264        0        0        0        0        0      264
InStrL         33       33       77      253       22      121      539
InstrM32      198      231      330      418      176      407     1760
finstr         44       77      132      297       22      297      869
finstr2        44       66      121      209       33      198      671
finstr3        55       66      110      198       33      198      660
finstr4        44       77      132      319       22      319      913
finstr5        44       77      132      330       22      330      935
finstr6        55       66      110      209       33      198      671
InstrJJ        55       66       99       99       55       88      462
InstrJJw       44       55       88      110       44      121      462

Press any key to exit...
c:\china>

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on May 14, 2009, 05:20:07 AM

Quote from: lingo on May 13, 2009, 11:44:46 PM
I'm wondering why the last version of InString-JJ is so slow
for my Test 13 --Find 'Duplicate inc' in 'windows.inc' -
clocks: 844498 against clocs: 679641 of previous version

Latest version is a byte scanner again, so it runs an ultrafast inner SSE2 loop with D instead of Du - at the expense of many false hits that forces it to run slow code in the bsf loop (there are more than 18,000 uppercase D in windows.inc). Try replacing Duplicate with WARNING...

The new package (the one that produces the table) has both versions. InStringJJw is the word scanner explained here (http://www.masm32.com/board/index.php?topic=9370.msg84571#msg84571), while the first example of the more recent byte scanner version is at the bottom of that page. As the tables show, they are roughly equivalent. If the search string starts with a rare character, the byte scanner rocks, of course.

The version attached contains the file test. Try playing with the WindInc patterns :wink

Code Select

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
Type        Short   Medium HighAnsi MisMatch MisMatch     File  Overall
Source    Mainstr  Mainstr  Mainstr mismatch mismatch  fbuffer  sum of
Pattern   Substr1  Substr2  Substr4  mmStr_b  mmStr_x  WindInc  cycles*
======== ======== ======== ======== ======== ======== ======== ========
OVH           264        0        0        0        0        0      264
InStrL         36       36      264       12      120  1558560      468
InstrM32      192      240      408      168      384  3035448     1392
finstr         48       84      300       24      300  2806320      756
finstr2        48       60      204       24      204  2342100      540
finstr3        48       60      204       24      192  2353740      528
finstr4        36       72      312       24      312  3027360      756
finstr5        36       96      372       24      348  3315468      876
finstr6        48       60      204       24      204  2342112      540
InstrJJ        48       72       96       60      108   815556      384
InstrJJw       36       60      108       48      120   667620      372

* not including file test

[attachment deleted by admin]

Title: Re: InstrCi - case-insensitive string search
Post by: UtillMasm on May 14, 2009, 11:36:35 AM

Code Select

c:\china>"C:\Program Files\Microsoft SDKs\Windows\v6.0\VC\Bin\ml.exe" /c /coff
 instrnew.asm
Microsoft (R) Macro Assembler Version 8.00.50727.762
Copyright (C) Microsoft Corporation.  All rights reserved.

 Assembling: instrnew.asm

c:\china>\masm32\bin\link.exe /subsystem:console instrnew.obj
Microsoft (R) Incremental Linker Version 5.12.8078
Copyright (C) Microsoft Corp 1992-1998. All rights reserved.


c:\china>instrnew.exe
Genuine Intel(R) CPU           T2400  @ 1.83GHz (SSE3)
Type        Short   Medium HighAnsi MisMatch MisMatch     File  Overall
Source    Mainstr  Mainstr  Mainstr mismatch mismatch  fbuffer  sum of
Pattern   Substr1  Substr2  Substr4  mmStr_b  mmStr_x  WindInc  cycles*
======== ======== ======== ======== ======== ======== ======== ========
OVH           264        0        0        0        0        0      264
InStrL         33       33      264       11      110  1577950      451
InstrM32      253      275      495      187      407  3063731     1617
finstr         44       88      297       22      297  2874828      748
finstr2        55       66      209       33      198  2409429      561
finstr3        44       66      198       33      187  2414093      528
finstr4        44       77      319       22      308  3129148      770
finstr5        44       77      330       22      330  3298768      803
finstr6        55       66      209       33      198  2410782      561
InstrJJ        44       66       99       55      110   808071      374
InstrJJw       33       55       99       44      121   687203      352

* not including file test

Sizes:
Mainstr         100
Substr1         4
Substr2         7
Substr4         9

mismatch        100
mmStr_b         5
mmStr_x         5

File            874032
WindInc         13 = Duplicate inc

Press any key to exit...
c:\china>

Title: Re: InstrCi - case-insensitive string search
Post by: lingo on May 14, 2009, 11:49:45 AM

I can't start included exe from InstrNewSSE_FileTest.zip (see attachment)

Code Select


Intel(R) Core(TM)2 Duo CPU     E8500  @ 3.16GHz (SSE4)

Search Test 1 - value expected 37; lenSrchPattern ->22
InString - JJb:                         38 ; clocks: 86
InString - JJw:                         38 ; clocks: 92
SrchBytes -  Lingo:                     37 ; clocks: 49

Search Test 2 - value expected 1007; lenSrchPattern ->17
InString - JJb:                         1008 ; clocks: 13628
InString - JJw:                         1008 ; clocks: 11236
SrchBytes -  Lingo:                     1007 ; clocks: 11592

Search Test 3 - value expected 1008 ;lenSrchPattern ->16
InString - JJb:                         1009 ; clocks: 303
InString - JJw:                         1009 ; clocks: 658
SrchBytes -  Lingo:                     1008 ; clocks: 135

Search Test 4 - value expected 1008 ;lenSrchPattern ->16
InString - JJb:                         1009 ; clocks: 3108
InString - JJw:                         1009 ; clocks: 3634
SrchBytes -  Lingo:                     1008 ; clocks: 1544

Search Test 5 - value expected 1008 ;lenSrchPattern ->16
InString - JJb:                         1009 ; clocks: 1822
InString - JJw:                         1009 ; clocks: 2470
SrchBytes -  Lingo:                     1008 ; clocks: 1273

Search Test 6 - value expected 1008 ;lenSrchPattern ->16
InString - JJb:                         1009 ; clocks: 303
InString - JJw:                         1009 ; clocks: 640
SrchBytes -  Lingo:                     1008 ; clocks: 130

Search Test 7 - value expected 1009 ;lenSrchPattern ->14
InString - JJb:                         1010 ; clocks: 567
InString - JJw:                         1010 ; clocks: 597
SrchBytes -  Lingo:                     1009 ; clocks: 523

Search Test 8 - value expected 1001 ;lenSrchPattern ->1
InString - JJb:                         1002 ; clocks: 265
InString - JJw:                         1002 ; clocks: 326
SrchBytes -  Lingo:                     1001 ; clocks: 93

Search Test 9 - value expected 1001 ;lenSrchPattern ->2
InString - JJb:                         1002 ; clocks: 256
InString - JJw:                         1002 ; clocks: 611
SrchBytes -  Lingo:                     1001 ; clocks: 96

Search Test 10 - value expected 1001 ;lenSrchPattern ->3
InString - JJb:                         1002 ; clocks: 257
InString - JJw:                         1002 ; clocks: 609
SrchBytes -  Lingo:                     1001 ; clocks: 97

Search Test 11 - value expected 1001 ;lenSrchPattern ->4
InString - JJb:                         1002 ; clocks: 257
InString - JJw:                         1002 ; clocks: 613
SrchBytes -  Lingo:                     1001 ; clocks: 96

Search Test 12 - value expected 1001 ;lenSrchPattern ->5
InString - JJb:                         1002 ; clocks: 265
InString - JJw:                         1002 ; clocks: 771
SrchBytes -  Lingo:                     1001 ; clocks: 97

Search Test 13 --Find 'Duplicate inc' in 'windows.inc' ;lenSrchPattern ->13
InString - JJb:                         1127625 ; clocks: 845967
InString - JJw:                         1127625 ; clocks: 665849
SrchBytes -  Lingo:                     1127624 ; clocks: 510086

Search Test 14 --Find 'warning or' in 'windows.inc' ;lenSrchPattern ->10
InString - JJb:                         1127404 ; clocks: 362831
InString - JJw:                         1127404 ; clocks: 665390
SrchBytes -  Lingo:                     1127403 ; clocks: 207779

 Press ENTER to exit...

[attachment deleted by admin]

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on May 14, 2009, 12:12:48 PM

Quote from: lingo on May 14, 2009, 11:49:45 AM
I can't start included exe from InstrNewSSE_FileTest.zip (see attachment)

Strange. It's your own algo - I swear I didn't modify a single byte. Maybe a problem with Vista?

Your timings look good. When will you post code? I guess I can speak for everybody when I say we are curious :bg

Title: Re: InstrCi - case-insensitive string search
Post by: UtillMasm on May 14, 2009, 12:21:04 PM

on my 'Microsoft Windows Vista Ultimate 32bit english with Service Pack 1' is ok.

Title: Re: InstrCi - case-insensitive string search
Post by: lingo on May 14, 2009, 01:28:27 PM

"Strange. It's your own algo - I swear I didn't modify a single byte."

I'm not a newbie to talk with me this way...
Your "old" instrings work for me OK (see the results.txt in every subdir)
Your last two tests do not work for me and I have no time to dig in
your macros.

UtillMasm,

I am with Vista Ultimate 64 bit with SP 1

[attachment deleted by admin]

Title: Re: InstrCi - case-insensitive string search
Post by: UtillMasm on May 14, 2009, 01:33:10 PM

oh my fo!

i have assume you with 32bit. :lol

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on May 14, 2009, 01:38:25 PM

Quote from: lingo on May 14, 2009, 01:28:27 PM
"Strange. It's your own algo - I swear I didn't modify a single byte."

I have no time to dig in your macros.

Just to avoid embarrassing copyright battles: the macros are JimG (TM). And they work on almost all PCs... :bg

Title: Re: InstrCi - case-insensitive string search
Post by: Jimg on May 14, 2009, 02:17:53 PM

Just to remove any misunderstanding, they are not trademarked, they are free for any use whatsoever, and the printing macros were mods of some code originally written by Phil. (wonder what ever happen to Phil?)

Title: Re: InstrCi - case-insensitive string search
Post by: dedndave on May 14, 2009, 03:12:28 PM

we don't support 64 bit OS's here at MASM32
MS doesn't support us - we don't have to support them :P

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on May 14, 2009, 08:57:59 PM

It was a bit tricky, but I managed to get Lingo's byte shift Boyer-Moore algo (http://www.masm32.com/board/index.php?topic=3601.msg83057#msg83057) on board:

Code Select

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
Type        Short   Medium HighAnsi MisMatch MisMatch     File  Overall
Source    Mainstr  Mainstr  Mainstr mismatch mismatch  fbuffer  sum of
Pattern   Substr1  Substr2  Substr4  mmStr_b  mmStr_x  WindInc  cycles*
======== ======== ======== ======== ======== ======== ======== ========
OVH           264        0        0        0        0        0      264
InstrM32      192      240      408      168      384  3034608     1392
finstr2        48       60      204       24      216  2342292      552
finstr3        48       60      204       24      216  2355540      552
finstr6        48       60      204       24      204  2342484      540
InStrL         36       36      264       12      120  1557576      468
InstrJJ        48       72      108       60       96   815124      384
InstrJJw       36       60      108       48      120   665688      372
BMLinDB         0        0      168        0        0   941484      168

Note:
a) 0 cycles means the pattern was too short for Lingo's algo - minimum is 8 bytes.
b) Lingo's code had to be modified for purely technical reasons - the macros supplied by JimG/Phil require 2 args instead of 3.

Warning: The executable has no idea on which drive your \masm32\include\windows.inc sits. Therefore extract the exe's to the same drive as your Masm32 installation, and run it from there.

EDIT: Error check added. Watch out for e after the cycle count.

[attachment deleted by admin]

Title: Re: InstrCi - case-insensitive string search
Post by: fearless on May 14, 2009, 10:19:48 PM

Code Select

Intel(R) Core(TM)2 Quad  CPU   Q9550  @ 2.83GHz (SSE4)
Type        Short   Medium HighAnsi MisMatch MisMatch     File  Overall
Source    Mainstr  Mainstr  Mainstr mismatch mismatch  fbuffer  sum of
Pattern   Substr1  Substr2  Substr4  mmStr_b  mmStr_x  WindInc  cycles*
======== ======== ======== ======== ======== ======== ======== ========
OVH           229        0        0        0        0        0      229
InstrM32      128      162      324      111      306  2664793     1031
finstr2        26       43      187       18      136  1876070      410
finstr3        34       43      128       17      128  1918850      350
finstr6        26       43      145       17      128  1873902      359
InStrL         17       26      179        9       68  1328491      299
InstrJJ        26       34       51       34       43   651534      188
InstrJJw       17       43       68       26       85   498678      239
BMLinDB         0        0      102        0        0   847875      102

* not including file test

Sizes:
Mainstr         100
Substr1         4
Substr2         7
Substr4         9

mismatch        100
mmStr_b         5
mmStr_x         5

File            851328
WindInc         13 = Duplicate inc

Title: Re: InstrCi - case-insensitive string search
Post by: UtillMasm on May 15, 2009, 03:53:07 AM

Code Select

c:\china>"\Program Files\Microsoft SDKs\Windows\v6.0\VC\Bin\ml.exe" /c /coff instrnew.asm
Microsoft (R) Macro Assembler Version 8.00.50727.762
Copyright (C) Microsoft Corporation.  All rights reserved.

 Assembling: instrnew.asm

c:\china>\masm32\bin\link.exe /subsystem:console instrnew.obj
Microsoft (R) Incremental Linker Version 5.12.8078
Copyright (C) Microsoft Corp 1992-1998. All rights reserved.


c:\china>instrnew.exe
Genuine Intel(R) CPU           T2400  @ 1.83GHz (SSE3)
Type        Short   Medium HighAnsi MisMatch MisMatch     File  Overall
Source    Mainstr  Mainstr  Mainstr mismatch mismatch  fbuffer  sum of
Pattern   Substr1  Substr2  Substr4  mmStr_b  mmStr_x  WindInc  cycles*
======== ======== ======== ======== ======== ======== ======== ========
OVH           264        0        0        0        0        0      264
InstrM32      209      242      429      187      407  3089163     1474
finstr2        55       66      242       33      209  2425984      605
finstr3        44       66      198       33      187  2434069      528
finstr6        55       66      198       33      198  2423784      550
InStrL         44       33      253       11      121  1559305      462
InstrJJ        44       66       99       55      110   812658      374
InstrJJw       33       55       99       44      110   697653      341
BMLinDB       -11        0      165        0        0   660748      154

* not including file test

Sizes:
Mainstr         100
Substr1         4
Substr2         7
Substr4         9

mismatch        100
mmStr_b         5
mmStr_x         5

File            874031
WindInc         13 = Duplicate inc

Press any key to exit...
c:\china>

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on May 15, 2009, 06:46:50 AM

Error check added, see attachment above. Watch out for e after the cycle count:

Code Select


Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
Type        Short   Medium HighAnsi MisMatch MisMatch     File  Overall
Source    Mainstr  Mainstr  Mainstr mismatch mismatch  fbuffer  sum of
Pattern   Substr1  Substr2  Substr4  mmStr_b  mmStr_x  WindInc  cycles*
======== ======== ======== ======== ======== ======== ======== ========
OVH           264        0        0        0        0        0      264
InstrM32      192      216      408      168      384  3034320     1368
finstr2        48       72      204       24      204  2343300      552
finstr3        48       72      204       24      192  2350572      540
finstr6        48       72      204       24      204  2342664      552
InStrL         36       48      264       12      120 1558824e      480
InstrJJ        48       96      120       48       84   813828      396
InstrJJw       36       60      108       36      108   667296      348
BMLinDB        0e      132      168        0       0e   947220      300

* not including file test

Sizes:
Mainstr         100
Substr1         4
Substr2         9
Substr4         9

mismatch        100
mmStr_b         5
mmStr_x         5

File            849788
WindInc         13 = Duplicate inc

Note 0e is not an error - this algo was not designed for simple string searches and needs at least an 8-byte pattern. The overall cycle count is not valid in this case, of course.

Title: Re: InstrCi - case-insensitive string search
Post by: UtillMasm on May 15, 2009, 07:11:39 AM

Code Select

c:\china>instrnew.exe
Genuine Intel(R) CPU           T2400  @ 1.83GHz (SSE3)
Type        Short   Medium HighAnsi MisMatch MisMatch     File  Overall
Source    Mainstr  Mainstr  Mainstr mismatch mismatch  fbuffer  sum of
Pattern   Substr1  Substr2  Substr4  mmStr_b  mmStr_x  WindInc  cycles*
======== ======== ======== ======== ======== ======== ======== ========
OVH           264      -11      -11        0        0        0      242
InstrM32      198      253      429      187      396  3053754     1463
finstr2        55       66      209       33      198  2415127      561
finstr3        44       66      198       22      187  2429416      517
finstr6        55       66      198       33      198  2424917      550
InStrL         33       44      253       11      121  1577477      462
InstrJJ        44       66       99       44       77   808027      330
InstrJJw       33       66       99       33       99   683859      330
BMLinDB        0e      121      165      -11       0e   649583      275

* not including file test

Sizes:
Mainstr         100
Substr1         4
Substr2         9
Substr4         9

mismatch        100
mmStr_b         5
mmStr_x         5

File            874029
WindInc         13 = Duplicate inc

Press any key to exit...
c:\china>

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on May 15, 2009, 07:45:04 AM

Quote from: UtillMasm on May 15, 2009, 07:11:39 AM
Code Select Expand
Genuine Intel(R) CPU T2400 @ 1.83GHz (SSE3) Type Short Medium HighAnsi MisMatch MisMatch File Overall InStrL 33 44 253 11 121 1577477 462 File 874029

Code Select

Type        Short   Medium HighAnsi MisMatch MisMatch     File  Overall
InStrL         26       43      298        1      170 2512065e  <------------------

File 849788

Strange. Which Masm32/Windows.inc version do you use? For my version, the algo produces an error...

Directory of D:\masm32\include
05.06.08 03:57 849,788 windows.inc

Title: Re: InstrCi - case-insensitive string search
Post by: UtillMasm on May 15, 2009, 07:51:29 AM

yes, because i comment two lines.

\masm32\include\windows.inc(16569)
\masm32\include\windows.inc(16610)

atachment is before comment.

[attachment deleted by admin]

Title: Re: InstrCi - case-insensitive string search
Post by: Mark Jones on May 15, 2009, 03:34:53 PM

Code Select

AMD Athlon(tm) 64 X2 Dual Core Processor 4000+ (SSE3)
Type        Short   Medium HighAnsi MisMatch MisMatch     File  Overall
Source    Mainstr  Mainstr  Mainstr mismatch mismatch  fbuffer  sum of
Pattern   Substr1  Substr2  Substr4  mmStr_b  mmStr_x  WindInc  cycles*
======== ======== ======== ======== ======== ======== ======== ========
OVH            71        0        0        0        0        0       71
InstrM32      177      262      555      168      520  4635503     1682
finstr2        45       74      186       33      165  2235211      503
finstr3        47       71      179       33      158  2177821      488
finstr6        45       71      186       33      165  2232679      500
InStrL         30       36      197       21      109 1517858e      393
InstrJJ        61       80      108       59      116  1163430      424
InstrJJw       53       86      128       54      135   832986      456
BMLinDB       11e      125      199       11      11e  1446577      357

* not including file test

Sizes:
Mainstr         100
Substr1         4
Substr2         9
Substr4         9

mismatch        100
mmStr_b         5
mmStr_x         5

File            849788
WindInc         13 = Duplicate inc

Title: Re: InstrCi - case-insensitive string search
Post by: lingo on February 08, 2011, 12:08:52 AM

New CPU with new instructions and results: :lol

Code Select


comment	* SrchBytes by Lingo *			
align	16		
SrchBytes	proc lpBuffer:dword, lpSrchPatt:dword
        mov	edx, [esp+2*4]
        mov	eax, [esp+1*4]  
        movdqu	xmm2, xmmword ptr[edx]
        pxor	xmm3, xmm3
@Loop:
        pcmpistri xmm2, xmmword ptr [eax], 0Ch
        jna	@f		 
	pcmpistri xmm2, xmmword ptr [eax+16], 0Ch
        lea	eax, [eax+32]
        ja	@Loop
	lea	ecx, [ecx-16]
@@:
	lea	eax, [eax+ecx]
        jnc	L_not_found
	sub	edx, eax
	movd	xmm7, eax
@@:
    movdqu	xmm1,	xmmword ptr[eax + edx]
    pcmpistrm	xmm3,	xmm1, 58h
    movdqu	xmm4,	xmmword ptr[eax]
    add		eax,	16
    pand	xmm4,	xmm0
    pcmpistri	xmm1,	xmm4, 18h
    ja		@b
	movd	eax,	xmm7
	jnc	L_found
	add	edx, eax
	add	eax, 1
	jne	@Loop
align 8
L_not_found:
	or	eax, -1
	ret	2*4
align 8
L_found:
	sub	eax, [esp+1*4] 
	ret	2*4
SrchBytes	endp

The results are:

       Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz (SSE4)

Search Test 1 - value expected 37; lenSrchPattern ->22
InString - JJb:                         38 ; clocks: 52
InString - JJw:                         38 ; clocks: 60
SrchBytes -  Lingo:                     37 ; clocks: 23

Search Test 2 - value expected 1007; lenSrchPattern ->17
InString - JJb:                         1008 ; clocks: 10432
InString - JJw:                         1008 ; clocks: 7897
SrchBytes -  Lingo:                     1007 ; clocks: 1710

Search Test 3 - value expected 1008 ;lenSrchPattern ->16
InString - JJb:                         1009 ; clocks: 208
InString - JJw:                         1009 ; clocks: 284
SrchBytes -  Lingo:                     1008 ; clocks: 166

Search Test 4 - value expected 1008 ;lenSrchPattern ->16
InString - JJb:                         1009 ; clocks: 1994
InString - JJw:                         1009 ; clocks: 2183
SrchBytes -  Lingo:                     1008 ; clocks: 1409

Search Test 5 - value expected 1008 ;lenSrchPattern ->16
InString - JJb:                         1009 ; clocks: 1284
InString - JJw:                         1009 ; clocks: 1377
SrchBytes -  Lingo:                     1008 ; clocks: 1050

Search Test 6 - value expected 1008 ;lenSrchPattern ->16
InString - JJb:                         1009 ; clocks: 205
InString - JJw:                         1009 ; clocks: 289
SrchBytes -  Lingo:                     1008 ; clocks: 174

Search Test 7 - value expected 1009 ;lenSrchPattern ->14
InString - JJb:                         1010 ; clocks: 296
InString - JJw:                         1010 ; clocks: 281
SrchBytes -  Lingo:                     1009 ; clocks: 214

Search Test 8 - value expected 1001 ;lenSrchPattern ->1
InString - JJb:                         1002 ; clocks: 175
InString - JJw:                         1002 ; clocks: 186
SrchBytes -  Lingo:                     1001 ; clocks: 138

Search Test 9 - value expected 1001 ;lenSrchPattern ->2
InString - JJb:                         1002 ; clocks: 181
InString - JJw:                         1002 ; clocks: 261
SrchBytes -  Lingo:                     1001 ; clocks: 158

Search Test 10 - value expected 1001 ;lenSrchPattern ->3
InString - JJb:                         1002 ; clocks: 180
InString - JJw:                         1002 ; clocks: 264
SrchBytes -  Lingo:                     1001 ; clocks: 151

Search Test 11 - value expected 1001 ;lenSrchPattern ->4
InString - JJb:                         1002 ; clocks: 176
InString - JJw:                         1002 ; clocks: 269
SrchBytes -  Lingo:                     1001 ; clocks: 159

Search Test 12 - value expected 1001 ;lenSrchPattern ->5
InString - JJb:                         1002 ; clocks: 188
InString - JJw:                         1002 ; clocks: 264
SrchBytes -  Lingo:                     1001 ; clocks: 171

Search Test 13 --Find 'Duplicate inc' in 'windows.inc' ;lenSrchPattern ->13
InString - JJb:                         1127625 ; clocks: 550930
InString - JJw:                         1127625 ; clocks: 249918
SrchBytes -  Lingo:                     1127624 ; clocks: 202321

Search Test 14 --Find 'warning or' in 'windows.inc' ;lenSrchPattern ->10
InString - JJb:                         1127404 ; clocks: 196414
InString - JJw:                         1127404 ; clocks: 250335
SrchBytes -  Lingo:                     1127403 ; clocks: 171055

 Press ENTER to exit...

Title: Re: InstrCi - case-insensitive string search
Post by: jj2007 on February 08, 2011, 06:57:35 AM

Since I fully trust my old friend Lingo, I double-clicked the exe - and wow, that's probably called a "SSE4.2 GPF" :green2

Text only | Text with Images

SMF 2.1.4 © 2023, Simple Machines