News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

StrChar revisited

Started by KeepingRealBusy, November 07, 2011, 03:58:17 AM

Previous topic - Next topic

KeepingRealBusy

Somehow my arrachment didn't! It should be there now!
Dave.

I decided to start this as a new topic since the last one was over a year old.

This will be my final contribution to the StrChr saga. The test source is an
accumulation of versions of my code, where KRBNew4 is Lingo's last contribution
(that I know of or remember seeing), but in my real version (not in the KRBNew4
version) I include code to save all regs and thus must return the match point to
a target DWORD - this code makes it run slower than Lingo's version here (AKA
KRBNew4) and thus my relative time savings would be even better than what I have
reported here.

I can see no way to improve Lingo's NULL char processing, his internal loop is
exactly 16 butes long! However, the non-NULL code is making multiple compares,
looking for a match character match and also a NULL char match to insure that
you do not run off the end of the string and thus get a memory access error.

For most of my uses, the string length of all defined strings is known, and
strings I read from a file are also put into known length strings in the
following form:

    WORD    BYTE count
    BYTE    array of characters

These strings have no trailing null (tokenized in place by skipping the first
two BYTES of the read buffer, then replacing those first two BYTES with the
length of the first string, and then replacing each CRLF with the length of the
following string). But, the length of each string is known!

I figured out that I could write a Strnchr procedure that would be faster than
before. Basically, I save the regs, use the string length to determine the end
of the string, capture the contents of the trailing DWORD, save the match
character DWORD at the end of the string, search only for matches with the match
character, and when a match is found, check if the match point is the modified
end DWORD point, if so then zero the return match point value, restore the
modified string, save the match point, restore regs, then return.

The following is a brief list of the timing differences between the Lingo
version (KRBNew4) and my newest version called SSEStrnchr:

1931    cycles for KRBNew4, match long string
1293    cycles for KRBNew4, match null in long string
28      cycles for KRBNew4, match null in short string
1317    cycles for SSEStrnchr, match long string
1297    cycles for SSEStrnchr match null in long string
34      cycles for SSEStrnchr, match null in short string

The newest version saves about 32% of the time for searching for non-NULL
character match. A NULL match is approximately the same time as older version,
but both versions should probably be kept and the older version used for NULL
matches and in cases where the string length is not known.

Note: For a defined string, the following will create a known length string
format:

                WORD    (LENGTHOF szString - 1)
    szString    BYTE    "This is a string",0

A companion StrRnchr procedure was also created which searches a string in
reverse, but first replaces the DWORD in front of the head of the string with
the match character, then restores it as it exits.

I have attached a zip of the make file, the timing source, the exe, the object,
the timing results, and a copy of this description.

I can supply a copy of my tokenizing routine. It is SSE and fast!

Dave.

Antariy


9506    cycles for crt_strchr, match long string
9481    cycles for crt_strchr, no match long string
22302   cycles for KRBOld, match long string
7874    cycles for KRBNew, match long string
7948    cycles for KRBNew2, match long string
4682    cycles for KRBNew3, match long string
2274    cycles for KRBNew3, match null in long string

2813    cycles for KRBNew4, match long string
2206    cycles for KRBNew4, match null in long string
21      cycles for KRBNew4, match null in short string
2292    cycles for SSEStrnchr, match long string
2297    cycles for SSEStrnchr match null in long string
40      cycles for SSEStrnchr, match null in short string

7862    cycles for KRBNew, no match long string
7911    cycles for KRBNew2, no match long string
4815    cycles for KRBNew3, no match long string
2815    cycles for KRBNew4, no match long string
2261    cycles for SSEStrnchr, no match long string

9422    cycles for crt_strchr, match long string
9458    cycles for crt_strchr, no match long string
23350   cycles for KRBOld, match long string
7852    cycles for KRBNew, match long string
7833    cycles for KRBNew2, match long string
4759    cycles for KRBNew3, match long string
2357    cycles for KRBNew3, match null in long string

2812    cycles for KRBNew4, match long string
2283    cycles for KRBNew4, match null in long string
22      cycles for KRBNew4, match null in short string
2287    cycles for SSEStrnchr, match long string
2283    cycles for SSEStrnchr match null in long string
39      cycles for SSEStrnchr, match null in short string

8830    cycles for KRBNew, no match long string
7916    cycles for KRBNew2, no match long string
4822    cycles for KRBNew3, no match long string
2823    cycles for KRBNew4, no match long string
2269    cycles for SSEStrnchr, no match long string

9474    cycles for crt_strchr, match long string
9392    cycles for crt_strchr, no match long string
22291   cycles for KRBOld, match long string
7832    cycles for KRBNew, match long string
7910    cycles for KRBNew2, match long string
4735    cycles for KRBNew3, match long string
2357    cycles for KRBNew3, match null in long string

2813    cycles for KRBNew4, match long string
2222    cycles for KRBNew4, match null in long string
22      cycles for KRBNew4, match null in short string
2334    cycles for SSEStrnchr, match long string
2214    cycles for SSEStrnchr match null in long string
31      cycles for SSEStrnchr, match null in short string

7862    cycles for KRBNew, no match long string
7979    cycles for KRBNew2, no match long string
4768    cycles for KRBNew3, no match long string
2846    cycles for KRBNew4, no match long string
2343    cycles for SSEStrnchr, no match long string

9453    cycles for crt_strchr, match long string
9475    cycles for crt_strchr, no match long string
22285   cycles for KRBOld, match long string
7777    cycles for KRBNew, match long string
7931    cycles for KRBNew2, match long string
4769    cycles for KRBNew3, match long string
2273    cycles for KRBNew3, match null in long string

2732    cycles for KRBNew4, match long string
2220    cycles for KRBNew4, match null in long string
28      cycles for KRBNew4, match null in short string
2268    cycles for SSEStrnchr, match long string
2222    cycles for SSEStrnchr match null in long string
31      cycles for SSEStrnchr, match null in short string

8428    cycles for KRBNew, no match long string
7836    cycles for KRBNew2, no match long string
4836    cycles for KRBNew3, no match long string
2725    cycles for KRBNew4, no match long string
2253    cycles for SSEStrnchr, no match long string

9474    cycles for crt_strchr, match long string
9489    cycles for crt_strchr, no match long string
22286   cycles for KRBOld, match long string
7789    cycles for KRBNew, match long string
7832    cycles for KRBNew2, match long string
4785    cycles for KRBNew3, match long string
2269    cycles for KRBNew3, match null in long string

2814    cycles for KRBNew4, match long string
2207    cycles for KRBNew4, match null in long string
22      cycles for KRBNew4, match null in short string
2262    cycles for SSEStrnchr, match long string
2303    cycles for SSEStrnchr match null in long string
31      cycles for SSEStrnchr, match null in short string

7764    cycles for KRBNew, no match long string
8201    cycles for KRBNew2, no match long string
4690    cycles for KRBNew3, no match long string
2802    cycles for KRBNew4, no match long string
2322    cycles for SSEStrnchr, no match long string

Codesizes:
dostrchr:       12
KRBOld: 32
KRBNew: 97
KRBNew2:        141
KRBNew3:        219
KRBNew4:        141
SSEStrnchr:     173

jj2007

Hi Steve,
Here are my timings.
@Alex: Welcome back :U

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
Find character in string: long string 5000 bytes.
9695    cycles for crt_strchr, match long string
9732    cycles for crt_strchr, no match long string
13526   cycles for KRBOld, match long string
3385    cycles for KRBNew, match long string
3739    cycles for KRBNew2, match long string
2495    cycles for KRBNew3, match long string
1130    cycles for KRBNew3, match null in long string

2136    cycles for KRBNew4, match long string
1009    cycles for KRBNew4, match null in long string
7       cycles for KRBNew4, match null in short string
1342    cycles for SSEStrnchr, match long string
1044    cycles for SSEStrnchr match null in long string
14      cycles for SSEStrnchr, match null in short string

3372    cycles for KRBNew, no match long string
3583    cycles for KRBNew2, no match long string
2518    cycles for KRBNew3, no match long string
2131    cycles for KRBNew4, no match long string
1344    cycles for SSEStrnchr, no match long string

Vortex

Hi KeepingRealBusy,

Some minor problems :

Trying to assemble with ml.exe V6.14.8444 supplied with Masm32, I got a lot of syntax error messages :

strchar4a.asm(559) : error A2008: syntax error : xmm

Switching to ml.exe V6.15.8803 solved the problem.

Your original strchar4a.exe is reported as malware by some AV engines but they are false positives :

http://virusscan.jotti.org/tr/scanresult/e97eaab391ed597b4b9c046c76c257fb4b127a16

Here is my report :


Intel(R) Pentium(R) 4 CPU 3.20GHz (SSE3)
Find character in string: long string 5000 bytes.
8896    cycles for crt_strchr, match long string
9047    cycles for crt_strchr, no match long string
20933   cycles for KRBOld, match long string
7371    cycles for KRBNew, match long string
7347    cycles for KRBNew2, match long string
4411    cycles for KRBNew3, match long string
2144    cycles for KRBNew3, match null in long string

2595    cycles for KRBNew4, match long string
2081    cycles for KRBNew4, match null in long string
20      cycles for KRBNew4, match null in short string
2121    cycles for SSEStrnchr, match long string
2089    cycles for SSEStrnchr match null in long string
31      cycles for SSEStrnchr, match null in short string

7366    cycles for KRBNew, no match long string
7375    cycles for KRBNew2, no match long string
4447    cycles for KRBNew3, no match long string
2595    cycles for KRBNew4, no match long string
2121    cycles for SSEStrnchr, no match long string

8871    cycles for crt_strchr, match long string
8868    cycles for crt_strchr, no match long string
20954   cycles for KRBOld, match long string
7318    cycles for KRBNew, match long string
7351    cycles for KRBNew2, match long string
4418    cycles for KRBNew3, match long string
2145    cycles for KRBNew3, match null in long string

2590    cycles for KRBNew4, match long string
2104    cycles for KRBNew4, match null in long string
22      cycles for KRBNew4, match null in short string
2122    cycles for SSEStrnchr, match long string
2098    cycles for SSEStrnchr match null in long string
32      cycles for SSEStrnchr, match null in short string

7314    cycles for KRBNew, no match long string
7354    cycles for KRBNew2, no match long string
4456    cycles for KRBNew3, no match long string
2597    cycles for KRBNew4, no match long string
2122    cycles for SSEStrnchr, no match long string

9196    cycles for crt_strchr, match long string
8974    cycles for crt_strchr, no match long string
20933   cycles for KRBOld, match long string
7317    cycles for KRBNew, match long string
7355    cycles for KRBNew2, match long string
4504    cycles for KRBNew3, match long string
2210    cycles for KRBNew3, match null in long string

2643    cycles for KRBNew4, match long string
2088    cycles for KRBNew4, match null in long string
22      cycles for KRBNew4, match null in short string
2128    cycles for SSEStrnchr, match long string
2100    cycles for SSEStrnchr match null in long string
33      cycles for SSEStrnchr, match null in short string

7327    cycles for KRBNew, no match long string
7403    cycles for KRBNew2, no match long string
4563    cycles for KRBNew3, no match long string
2623    cycles for KRBNew4, no match long string
2125    cycles for SSEStrnchr, no match long string

8845    cycles for crt_strchr, match long string
9214    cycles for crt_strchr, no match long string
20972   cycles for KRBOld, match long string
7320    cycles for KRBNew, match long string
7350    cycles for KRBNew2, match long string
4422    cycles for KRBNew3, match long string
2153    cycles for KRBNew3, match null in long string

2590    cycles for KRBNew4, match long string
2080    cycles for KRBNew4, match null in long string
34      cycles for KRBNew4, match null in short string
2122    cycles for SSEStrnchr, match long string
2083    cycles for SSEStrnchr match null in long string
34      cycles for SSEStrnchr, match null in short string

7329    cycles for KRBNew, no match long string
7434    cycles for KRBNew2, no match long string
4489    cycles for KRBNew3, no match long string
2622    cycles for KRBNew4, no match long string
2130    cycles for SSEStrnchr, no match long string

8873    cycles for crt_strchr, match long string
9074    cycles for crt_strchr, no match long string
21162   cycles for KRBOld, match long string
7427    cycles for KRBNew, match long string
7406    cycles for KRBNew2, match long string
4475    cycles for KRBNew3, match long string
2149    cycles for KRBNew3, match null in long string

2640    cycles for KRBNew4, match long string
2082    cycles for KRBNew4, match null in long string
22      cycles for KRBNew4, match null in short string
2154    cycles for SSEStrnchr, match long string
2145    cycles for SSEStrnchr match null in long string
34      cycles for SSEStrnchr, match null in short string

7337    cycles for KRBNew, no match long string
7409    cycles for KRBNew2, no match long string
4493    cycles for KRBNew3, no match long string
2624    cycles for KRBNew4, no match long string
2163    cycles for SSEStrnchr, no match long string

Codesizes:
dostrchr:       12
KRBOld: 32
KRBNew: 97
KRBNew2:        141
KRBNew3:        219
KRBNew4:        141
SSEStrnchr:     173
--- ok ---

FORTRANS


 (SSE1)
Find character in string: long string 5000 bytes.
9885 cycles for crt_strchr, match long string
9839 cycles for crt_strchr, no match long string
20186 cycles for KRBOld, match long string
1949 cycles for KRBNew, match long string
2065 cycles for KRBNew2, match long string
2073 cycles for KRBNew3, match long string
1295 cycles for KRBNew3, match null in long string

1606 cycles for KRBNew4, match long string
966 cycles for KRBNew4, match null in long string
12 cycles for KRBNew4, match null in short string
1301 cycles for SSEStrnchr, match long string
998 cycles for SSEStrnchr match null in long string
21 cycles for SSEStrnchr, match null in short string

1946 cycles for KRBNew, no match long string
2055 cycles for KRBNew2, no match long string
2072 cycles for KRBNew3, no match long string
1612 cycles for KRBNew4, no match long string
1310 cycles for SSEStrnchr, no match long string

9914 cycles for crt_strchr, match long string
9817 cycles for crt_strchr, no match long string
20204 cycles for KRBOld, match long string
1951 cycles for KRBNew, match long string
2063 cycles for KRBNew2, match long string
2083 cycles for KRBNew3, match long string
1293 cycles for KRBNew3, match null in long string

1607 cycles for KRBNew4, match long string
968 cycles for KRBNew4, match null in long string
13 cycles for KRBNew4, match null in short string
1300 cycles for SSEStrnchr, match long string
996 cycles for SSEStrnchr match null in long string
21 cycles for SSEStrnchr, match null in short string

1948 cycles for KRBNew, no match long string
2075 cycles for KRBNew2, no match long string
2074 cycles for KRBNew3, no match long string
1612 cycles for KRBNew4, no match long string
1313 cycles for SSEStrnchr, no match long string

9879 cycles for crt_strchr, match long string
9815 cycles for crt_strchr, no match long string
20188 cycles for KRBOld, match long string
1951 cycles for KRBNew, match long string
2175 cycles for KRBNew2, match long string
2074 cycles for KRBNew3, match long string
1292 cycles for KRBNew3, match null in long string

1607 cycles for KRBNew4, match long string
970 cycles for KRBNew4, match null in long string
13 cycles for KRBNew4, match null in short string
1297 cycles for SSEStrnchr, match long string
998 cycles for SSEStrnchr match null in long string
21 cycles for SSEStrnchr, match null in short string

1951 cycles for KRBNew, no match long string
2051 cycles for KRBNew2, no match long string
2072 cycles for KRBNew3, no match long string
1612 cycles for KRBNew4, no match long string
1309 cycles for SSEStrnchr, no match long string

9875 cycles for crt_strchr, match long string
9818 cycles for crt_strchr, no match long string
20187 cycles for KRBOld, match long string
1953 cycles for KRBNew, match long string
2075 cycles for KRBNew2, match long string
2074 cycles for KRBNew3, match long string
1292 cycles for KRBNew3, match null in long string

1612 cycles for KRBNew4, match long string
966 cycles for KRBNew4, match null in long string
14 cycles for KRBNew4, match null in short string
1297 cycles for SSEStrnchr, match long string
999 cycles for SSEStrnchr match null in long string
21 cycles for SSEStrnchr, match null in short string

1949 cycles for KRBNew, no match long string
2054 cycles for KRBNew2, no match long string
2072 cycles for KRBNew3, no match long string
1614 cycles for KRBNew4, no match long string
1311 cycles for SSEStrnchr, no match long string

9876 cycles for crt_strchr, match long string
9942 cycles for crt_strchr, no match long string
20234 cycles for KRBOld, match long string
1952 cycles for KRBNew, match long string
2062 cycles for KRBNew2, match long string
2077 cycles for KRBNew3, match long string
1296 cycles for KRBNew3, match null in long string

1608 cycles for KRBNew4, match long string
967 cycles for KRBNew4, match null in long string
13 cycles for KRBNew4, match null in short string
1299 cycles for SSEStrnchr, match long string
996 cycles for SSEStrnchr match null in long string
22 cycles for SSEStrnchr, match null in short string

1947 cycles for KRBNew, no match long string
2053 cycles for KRBNew2, no match long string
2074 cycles for KRBNew3, no match long string
1617 cycles for KRBNew4, no match long string
1310 cycles for SSEStrnchr, no match long string

Codesizes:
dostrchr: 12
KRBOld: 32
KRBNew: 97
KRBNew2: 141
KRBNew3: 219
KRBNew4: 141
SSEStrnchr: 173
--- ok ---

dedndave

prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
Find character in string: long string 5000 bytes.
9068    cycles for crt_strchr, match long string
8910    cycles for crt_strchr, no match long string
21361   cycles for KRBOld, match long string
7354    cycles for KRBNew, match long string
7402    cycles for KRBNew2, match long string
4434    cycles for KRBNew3, match long string
2155    cycles for KRBNew3, match null in long string

2604    cycles for KRBNew4, match long string
2097    cycles for KRBNew4, match null in long string
20      cycles for KRBNew4, match null in short string
2133    cycles for SSEStrnchr, match long string
2094    cycles for SSEStrnchr match null in long string
34      cycles for SSEStrnchr, match null in short string

7349    cycles for KRBNew, no match long string
7395    cycles for KRBNew2, no match long string
4479    cycles for KRBNew3, no match long string
2613    cycles for KRBNew4, no match long string
2137    cycles for SSEStrnchr, no match long string

8880    cycles for crt_strchr, match long string
9442    cycles for crt_strchr, no match long string
22416   cycles for KRBOld, match long string
8575    cycles for KRBNew, match long string
8173    cycles for KRBNew2, match long string
5244    cycles for KRBNew3, match long string
2160    cycles for KRBNew3, match null in long string

2607    cycles for KRBNew4, match long string
2089    cycles for KRBNew4, match null in long string
20      cycles for KRBNew4, match null in short string
2136    cycles for SSEStrnchr, match long string
2100    cycles for SSEStrnchr match null in long string
31      cycles for SSEStrnchr, match null in short string

9165    cycles for KRBNew, no match long string
8036    cycles for KRBNew2, no match long string
4477    cycles for KRBNew3, no match long string
2604    cycles for KRBNew4, no match long string
2139    cycles for SSEStrnchr, no match long string

8901    cycles for crt_strchr, match long string
8886    cycles for crt_strchr, no match long string
21064   cycles for KRBOld, match long string
7375    cycles for KRBNew, match long string
7395    cycles for KRBNew2, match long string
4435    cycles for KRBNew3, match long string
2153    cycles for KRBNew3, match null in long string

2612    cycles for KRBNew4, match long string
2140    cycles for KRBNew4, match null in long string
23      cycles for KRBNew4, match null in short string
2205    cycles for SSEStrnchr, match long string
2301    cycles for SSEStrnchr match null in long string
36      cycles for SSEStrnchr, match null in short string

7352    cycles for KRBNew, no match long string
7369    cycles for KRBNew2, no match long string
4682    cycles for KRBNew3, no match long string
2623    cycles for KRBNew4, no match long string
2136    cycles for SSEStrnchr, no match long string

9043    cycles for crt_strchr, match long string
8897    cycles for crt_strchr, no match long string
21042   cycles for KRBOld, match long string
7395    cycles for KRBNew, match long string
7449    cycles for KRBNew2, match long string
4486    cycles for KRBNew3, match long string
2190    cycles for KRBNew3, match null in long string

2644    cycles for KRBNew4, match long string
2096    cycles for KRBNew4, match null in long string
27      cycles for KRBNew4, match null in short string
2163    cycles for SSEStrnchr, match long string
2133    cycles for SSEStrnchr match null in long string
37      cycles for SSEStrnchr, match null in short string

7687    cycles for KRBNew, no match long string
7463    cycles for KRBNew2, no match long string
4608    cycles for KRBNew3, no match long string
2650    cycles for KRBNew4, no match long string
2220    cycles for SSEStrnchr, no match long string

9459    cycles for crt_strchr, match long string
8909    cycles for crt_strchr, no match long string
21264   cycles for KRBOld, match long string
7402    cycles for KRBNew, match long string
7449    cycles for KRBNew2, match long string
4488    cycles for KRBNew3, match long string
2184    cycles for KRBNew3, match null in long string

2927    cycles for KRBNew4, match long string
2125    cycles for KRBNew4, match null in long string
25      cycles for KRBNew4, match null in short string
2186    cycles for SSEStrnchr, match long string
2111    cycles for SSEStrnchr match null in long string
36      cycles for SSEStrnchr, match null in short string

7348    cycles for KRBNew, no match long string
7539    cycles for KRBNew2, no match long string
4466    cycles for KRBNew3, no match long string
2823    cycles for KRBNew4, no match long string
2133    cycles for SSEStrnchr, no match long string

lingo

It is an old stuff for archaic CPUs but I'm wondering why to be so slow...  :lol
One reason is:
La malattia italiano per lo stoccaggio inutili di registri è molto contagiosa.. :lol
(The Italian disease for unnecessary storing of registers is very contagious...)

Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz (SSE4)
Find character in string: long string 5000 bytes.
9490    cycles for crt_strchr, match long string
6959    cycles for crt_strchr, no match long string
9671    cycles for KRBOld, match long string
1180    cycles for KRBNew, match long string
1377    cycles for KRBNew2, match long string
1349    cycles for KRBNew3, match long string
707     cycles for KRBNew3, match null in long string

828     cycles for KRBNew4, match long string
618     cycles for KRBNew4, match null in long string
3       cycles for KRBNew4, match null in short string
738     cycles for SSEStrnchr, match long string
570     cycles for SSEStrnchrLingo, match long string
579     cycles for SSEStrnchr match null in long string
1       cycles for SSEStrnchrLingo match null in long string
6       cycles for SSEStrnchr, match null in short string
1       cycles for SSEStrnchrLingo, match null in short string

1229    cycles for KRBNew, no match long string
1444    cycles for KRBNew2, no match long string
1335    cycles for KRBNew3, no match long string
816     cycles for KRBNew4, no match long string
730     cycles for SSEStrnchr, no match long string
572     cycles for SSEStrnchrLingo, no match long string

9840    cycles for crt_strchr, match long string
7071    cycles for crt_strchr, no match long string
9657    cycles for KRBOld, match long string
1222    cycles for KRBNew, match long string
1457    cycles for KRBNew2, match long string
1353    cycles for KRBNew3, match long string
716     cycles for KRBNew3, match null in long string

831     cycles for KRBNew4, match long string
615     cycles for KRBNew4, match null in long string
0       cycles for KRBNew4, match null in short string
746     cycles for SSEStrnchr, match long string
571     cycles for SSEStrnchrLingo, match long string
581     cycles for SSEStrnchr match null in long string
0       cycles for SSEStrnchrLingo match null in long string
5       cycles for SSEStrnchr, match null in short string
3       cycles for SSEStrnchrLingo, match null in short string

1202    cycles for KRBNew, no match long string
1407    cycles for KRBNew2, no match long string
1351    cycles for KRBNew3, no match long string
826     cycles for KRBNew4, no match long string
741     cycles for SSEStrnchr, no match long string
569     cycles for SSEStrnchrLingo, no match long string

10428   cycles for crt_strchr, match long string
5747    cycles for crt_strchr, no match long string
9639    cycles for KRBOld, match long string
1215    cycles for KRBNew, match long string
1404    cycles for KRBNew2, match long string
1369    cycles for KRBNew3, match long string
723     cycles for KRBNew3, match null in long string

824     cycles for KRBNew4, match long string
606     cycles for KRBNew4, match null in long string
1       cycles for KRBNew4, match null in short string
725     cycles for SSEStrnchr, match long string
564     cycles for SSEStrnchrLingo, match long string
579     cycles for SSEStrnchr match null in long string
3       cycles for SSEStrnchrLingo match null in long string
6       cycles for SSEStrnchr, match null in short string
0       cycles for SSEStrnchrLingo, match null in short string

1204    cycles for KRBNew, no match long string
1446    cycles for KRBNew2, no match long string
1366    cycles for KRBNew3, no match long string
847     cycles for KRBNew4, no match long string
731     cycles for SSEStrnchr, no match long string
573     cycles for SSEStrnchrLingo, no match long string

12068   cycles for crt_strchr, match long string
5762    cycles for crt_strchr, no match long string
9737    cycles for KRBOld, match long string
1227    cycles for KRBNew, match long string
1453    cycles for KRBNew2, match long string
1350    cycles for KRBNew3, match long string
705     cycles for KRBNew3, match null in long string

815     cycles for KRBNew4, match long string
610     cycles for KRBNew4, match null in long string
2       cycles for KRBNew4, match null in short string
738     cycles for SSEStrnchr, match long string
573     cycles for SSEStrnchrLingo, match long string
582     cycles for SSEStrnchr match null in long string
2       cycles for SSEStrnchrLingo match null in long string
4       cycles for SSEStrnchr, match null in short string
1       cycles for SSEStrnchrLingo, match null in short string

1215    cycles for KRBNew, no match long string
1409    cycles for KRBNew2, no match long string
1354    cycles for KRBNew3, no match long string
828     cycles for KRBNew4, no match long string
742     cycles for SSEStrnchr, no match long string
571     cycles for SSEStrnchrLingo, no match long string

8994    cycles for crt_strchr, match long string
7428    cycles for crt_strchr, no match long string
10900   cycles for KRBOld, match long string
1212    cycles for KRBNew, match long string
1389    cycles for KRBNew2, match long string
1363    cycles for KRBNew3, match long string
705     cycles for KRBNew3, match null in long string

825     cycles for KRBNew4, match long string
615     cycles for KRBNew4, match null in long string
3       cycles for KRBNew4, match null in short string
738     cycles for SSEStrnchr, match long string
570     cycles for SSEStrnchrLingo, match long string
571     cycles for SSEStrnchr match null in long string
0       cycles for SSEStrnchrLingo match null in long string
2       cycles for SSEStrnchr, match null in short string
-2      cycles for SSEStrnchrLingo, match null in short string

1219    cycles for KRBNew, no match long string
1449    cycles for KRBNew2, no match long string
1355    cycles for KRBNew3, no match long string
826     cycles for KRBNew4, no match long string
726     cycles for SSEStrnchr, no match long string
568     cycles for SSEStrnchrLingo, no match long string

Codesizes:
dostrchr:       12
KRBOld: 32
KRBNew: 97
KRBNew2:        141
KRBNew3:        219
KRBNew4:        141
SSEStrnchr:     173
SSEStrnchrLingo:123
--- ok ---





jj2007

A bit slow, but at least no exceptions thrown :bg
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
Find character in string: long string 5000 bytes.
1346    cycles for SSEStrnchr, no match long string
1463    cycles for SSEStrnchrLingo, no match long string


That looks a bit inefficient:
    pcmpeqb   xmm2, [eax+32]           ;   Compare 16 match characters to 16 BYTES of the source.
    add       eax, 32               ;   Increment the source string pointer by 32
    movdqa    xmm1, xmm0
    pmovmskb  edx, xmm2                   ;   Return a 1 for each matched character to the low 16 BITS of edx.


One percent faster on my archaic CPU:
    add       eax, 32               ;   Increment the source string pointer by 32
    movdqa    xmm1, xmm0
    pcmpeqb   xmm2, [eax]           ;   Compare 16 match characters to 16 BYTES of the source.
    pmovmskb  edx, xmm2                   ;   Return a 1 for each matched character to the low 16 BITS of edx.


What I like most with Lingo's code is that it can be so easily improved:
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
1463    cycles for SSEStrnchrLingo, no match long string
1320    cycles for SSEStrnchrLingoJJ, no match long string


Greetings to Toronto :U

clive

AMD Phenom(tm) II X6 1055T Processor (SSE3)
Find character in string: long string 5000 bytes.
7691    cycles for crt_strchr, match long string
7638    cycles for crt_strchr, no match long string
15201   cycles for KRBOld, match long string
1943    cycles for KRBNew, match long string
1881    cycles for KRBNew2, match long string
1743    cycles for KRBNew3, match long string
1460    cycles for KRBNew3, match null in long string

1426    cycles for KRBNew4, match long string
1137    cycles for KRBNew4, match null in long string
7       cycles for KRBNew4, match null in short string
1322    cycles for SSEStrnchr, match long string
855     cycles for SSEStrnchrLingo, match long string
1121    cycles for SSEStrnchr match null in long string
6       cycles for SSEStrnchrLingo match null in long string
9       cycles for SSEStrnchr, match null in short string
6       cycles for SSEStrnchrLingo, match null in short string

1913    cycles for KRBNew, no match long string
1932    cycles for KRBNew2, no match long string
1790    cycles for KRBNew3, no match long string
1476    cycles for KRBNew4, no match long string
1311    cycles for SSEStrnchr, no match long string
866     cycles for SSEStrnchrLingo, no match long string

7662    cycles for crt_strchr, match long string
7641    cycles for crt_strchr, no match long string
15238   cycles for KRBOld, match long string
1945    cycles for KRBNew, match long string
1941    cycles for KRBNew2, match long string
1748    cycles for KRBNew3, match long string
1467    cycles for KRBNew3, match null in long string

1475    cycles for KRBNew4, match long string
1128    cycles for KRBNew4, match null in long string
7       cycles for KRBNew4, match null in short string
1329    cycles for SSEStrnchr, match long string
857     cycles for SSEStrnchrLingo, match long string
1147    cycles for SSEStrnchr match null in long string
6       cycles for SSEStrnchrLingo match null in long string
9       cycles for SSEStrnchr, match null in short string
6       cycles for SSEStrnchrLingo, match null in short string

1937    cycles for KRBNew, no match long string
1939    cycles for KRBNew2, no match long string
1824    cycles for KRBNew3, no match long string
1479    cycles for KRBNew4, no match long string
1320    cycles for SSEStrnchr, no match long string
850     cycles for SSEStrnchrLingo, no match long string

7652    cycles for crt_strchr, match long string
7653    cycles for crt_strchr, no match long string
15247   cycles for KRBOld, match long string
1955    cycles for KRBNew, match long string
1947    cycles for KRBNew2, match long string
1801    cycles for KRBNew3, match long string
1457    cycles for KRBNew3, match null in long string

1468    cycles for KRBNew4, match long string
1116    cycles for KRBNew4, match null in long string
7       cycles for KRBNew4, match null in short string
1326    cycles for SSEStrnchr, match long string
851     cycles for SSEStrnchrLingo, match long string
1140    cycles for SSEStrnchr match null in long string
6       cycles for SSEStrnchrLingo match null in long string
9       cycles for SSEStrnchr, match null in short string
6       cycles for SSEStrnchrLingo, match null in short string

1958    cycles for KRBNew, no match long string
1947    cycles for KRBNew2, no match long string
1786    cycles for KRBNew3, no match long string
1472    cycles for KRBNew4, no match long string
1322    cycles for SSEStrnchr, no match long string
854     cycles for SSEStrnchrLingo, no match long string

7684    cycles for crt_strchr, match long string
7694    cycles for crt_strchr, no match long string
15041   cycles for KRBOld, match long string
1956    cycles for KRBNew, match long string
1956    cycles for KRBNew2, match long string
1794    cycles for KRBNew3, match long string
1463    cycles for KRBNew3, match null in long string

1475    cycles for KRBNew4, match long string
1136    cycles for KRBNew4, match null in long string
7       cycles for KRBNew4, match null in short string
1317    cycles for SSEStrnchr, match long string
904     cycles for SSEStrnchrLingo, match long string
1140    cycles for SSEStrnchr match null in long string
6       cycles for SSEStrnchrLingo match null in long string
9       cycles for SSEStrnchr, match null in short string
5       cycles for SSEStrnchrLingo, match null in short string

1966    cycles for KRBNew, no match long string
1931    cycles for KRBNew2, no match long string
1769    cycles for KRBNew3, no match long string
1442    cycles for KRBNew4, no match long string
1325    cycles for SSEStrnchr, no match long string
851     cycles for SSEStrnchrLingo, no match long string

7645    cycles for crt_strchr, match long string
7611    cycles for crt_strchr, no match long string
15324   cycles for KRBOld, match long string
1944    cycles for KRBNew, match long string
1935    cycles for KRBNew2, match long string
1801    cycles for KRBNew3, match long string
1445    cycles for KRBNew3, match null in long string

1469    cycles for KRBNew4, match long string
1146    cycles for KRBNew4, match null in long string
7       cycles for KRBNew4, match null in short string
1315    cycles for SSEStrnchr, match long string
869     cycles for SSEStrnchrLingo, match long string
1149    cycles for SSEStrnchr match null in long string
6       cycles for SSEStrnchrLingo match null in long string
9       cycles for SSEStrnchr, match null in short string
6       cycles for SSEStrnchrLingo, match null in short string

1958    cycles for KRBNew, no match long string
1937    cycles for KRBNew2, no match long string
1779    cycles for KRBNew3, no match long string
1486    cycles for KRBNew4, no match long string
1294    cycles for SSEStrnchr, no match long string
856     cycles for SSEStrnchrLingo, no match long string

Codesizes:
dostrchr:       12
KRBOld: 32
KRBNew: 97
KRBNew2:        141
KRBNew3:        219
KRBNew4:        141
SSEStrnchr:     173
SSEStrnchrLingo:123
--- ok ---
It could be a random act of randomness. Those happen a lot as well.

KeepingRealBusy

Lingo (and jj since your modification has the same problems),

The following will not work, there is no guarantee that dh is null:


    xor       edx, edx           ;
    add       edx, [esp+3*4]        ;   Get the match character.
     .
     .
     .
add   dh,  dl   ;
    je       @Zero               ;


The following will work:

X
    xor       edx, edx           ;
    add       dl, [esp+3*4]        ;   Get the match character.
     .
     .
     .
add   dh,  dl   ;
    je       @Zero               ;



The following will destroy the input string with no recovery possible:

    mov       [ecx], dl           ;


The following re-write will work:


    mov       eax, [esp+1*4]          ;   Get the source string pointer.
    lea       ecx, [eax-1]   ;
    add       ecx, [esp+2*4]           ;   Get the string size.
    mov       dl, [ecx]                   ;   Get the original trailing character.
    mov       [esp+1*4] ,dl               ;   Save over the source pointer.
     .
     .
     .
    xor       edx, edx           ;
    add       dl, [esp+3*4]        ;   Get the match character.
     .
     .
     .
    cmp       eax, ecx                    ;   Is the match past the end of the string?
    cmovae    eax, edx                    ;   If so, return a null response (Not found).
    mov       dl, [esp+1*4]               ;   Get the saved trailng character.
    mov       [ecx], dl                   ;   Restore the string.
    ret       4*3 
     .
     .
     .
@Zero:
    mov       eax, ecx   ;   Return in eax-> offset of zero (end of string)
    mov       dl, [esp+1*4]               ;   Get the saved trailng character.
    mov       [ecx], dl                   ;   Restore the string.
    ret       4*3


The next thing is one of concept. There is no reason to consider that the string
length needs to be an end of string which is followed by a null, only that it is
the maximum number of characters to search. Consider parsing an expression with
embedded parentheses. Search to the end of string for a "(". If found then
recursively search past the "(" to the end of the string for a "(", then search
past the first "(" to the end of the string for a ")", and accept the lowest
found character of the two as the found character. If that character is another
"(", then recursively search again... If the the found character is a ")", then
you have an expression to parse that lies between the "(" and the ")". Search in
the expression for operators from the start of the expression for the length of
the expression, but note that there is no null at the end of the expression.

The same problem comes up when scanning MASM source files or .LST files and
encountering string data that can be delimited with either '"' characters or
"'", or both in the same string declaration:


    szStr   BYTE    "This is John","'", "s book titled " '"A tale".',0


which in plane text would read:


    This is John's book titled "A tale".


If the data to be searched is a text file that has been read into a buffer (with
a CRLF preceeding the actual data and a CRLF following the actual data in case
of a naked final string, then followed by a null), and it is desired to output
all lines that contain a match string, then search the buffer for all occurrences
of the first character of the match string. For each match check, if the match
string does not match the characters in the string, increment the buffer pointer
by one and find the next first character match. If the match string matches,
then back scan from the match point for a 0Ah (the head of the buffer line -1),
then forward scan from the match point for a 0Dh (the end of the buffer line),
increment the head pointer by 1, calculate the length as end-head+2, output the
string and CRLF, then continue the scan for a first character match from one of
two points. If you want to output the line for each string that matches the
match string (maybe multiple matches on a line), then increment the match point
by 1 and search for the first character again, otherwise, restart the search for
the first character at the end of line pointer (or increment it by 2 and then
search). When you finally get a null response, then all matches have been found.

Dave.

KeepingRealBusy

This timing run was done after the following changes had been made to
StrChar4b.asm (the Lingojj version in StrChar4b.zip):

Since Lingo and Lingojj were clobbering the null at the end of the strings
(would affect any routine following), I added code to all timing loops to
initialize the null at the end of the appropriate test string at every iteration
in the loop. This is the same instruction added to all loops to keep relative
timing the same. It may add 1 or 2 cycles to the reported time for the loop, but
the relative times should be comparable.

I renamed KRBNew4 back to OldLingo since that is what it really is - Lingo's
code from a year ago.

I made appropriate changes to SSEStrnchr based on my prior post, discussing the
proposed usages of the code for the project at hand.

The following timings had to be executed about 6 or 7 times to get consistency.
There were wildly different times between the two reported sequences, and
several negative times. The following times seem to track the overall average of
what was seen for the individual times (these times are the actual times for the
last pass - not some average taken over the 7 runs).


AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ (SSE3)
Find character in string: long string 5000 bytes.
7555    cycles for crt_strchr, match long string
7571    cycles for crt_strchr, no match long string
16538   cycles for KRBOld, match long string
927     cycles for KRBNew, match long string
3123    cycles for KRBNew2, match long string
4838    cycles for KRBNew3, match long string
1598    cycles for KRBNew3, match null in long string

1932    cycles for OldLingo, match long string
1287    cycles for OldLingo, match null in long string
27      cycles for OldLingo, match null in short string
1305    cycles for SSEStrnchr, match long string
1457    cycles for SSEStrnchrLingo, match long string
37      cycles for SSEStrnchr match null in long string
8       cycles for SSEStrnchrLingo match null in long string
35      cycles for SSEStrnchr, match null in short string
7       cycles for SSEStrnchrLingo, match null in short string

3182    cycles for KRBNew, no match long string
2900    cycles for KRBNew2, no match long string
2279    cycles for KRBNew3, no match long string
1939    cycles for OldLingo, no match long string
1308    cycles for SSEStrnchr, no match long string
1458    cycles for SSEStrnchrLingo, no match long string
1316    cycles for SSEStrnchrLingoJJ, no match long string

10613   cycles for crt_strchr, match long string
8064    cycles for crt_strchr, no match long string
15312   cycles for KRBOld, match long string
3168    cycles for KRBNew, match long string
2881    cycles for KRBNew2, match long string
2257    cycles for KRBNew3, match long string
1597    cycles for KRBNew3, match null in long string

1931    cycles for OldLingo, match long string
3761    cycles for OldLingo, match null in long string
27      cycles for OldLingo, match null in short string
1306    cycles for SSEStrnchr, match long string
1458    cycles for SSEStrnchrLingo, match long string
37      cycles for SSEStrnchr match null in long string
7       cycles for SSEStrnchrLingo match null in long string
35      cycles for SSEStrnchr, match null in short string
7       cycles for SSEStrnchrLingo, match null in short string

3168    cycles for KRBNew, no match long string
2884    cycles for KRBNew2, no match long string
2270    cycles for KRBNew3, no match long string
1932    cycles for OldLingo, no match long string
1320    cycles for SSEStrnchr, no match long string
1557    cycles for SSEStrnchrLingo, no match long string
1385    cycles for SSEStrnchrLingoJJ, no match long string

Codesizes:
dostrchr:       12
KRBOld: 32
KRBNew: 97
KRBNew2:        141
KRBNew3:        219
OldLingo:       141
SSEStrnchr:     204
SSEStrnchrLingo:123
SSEStrnchrLingoJJ:121
--- ok ---


Dave.