Somehow my arrachment didn't! It should be there now!
Dave.
I decided to start this as a new topic since the last one was over a year old.
This will be my final contribution to the StrChr saga. The test source is an
accumulation of versions of my code, where KRBNew4 is Lingo's last contribution
(that I know of or remember seeing), but in my real version (not in the KRBNew4
version) I include code to save all regs and thus must return the match point to
a target DWORD - this code makes it run slower than Lingo's version here (AKA
KRBNew4) and thus my relative time savings would be even better than what I have
reported here.
I can see no way to improve Lingo's NULL char processing, his internal loop is
exactly 16 butes long! However, the non-NULL code is making multiple compares,
looking for a match character match and also a NULL char match to insure that
you do not run off the end of the string and thus get a memory access error.
For most of my uses, the string length of all defined strings is known, and
strings I read from a file are also put into known length strings in the
following form:
WORD BYTE count
BYTE array of characters
These strings have no trailing null (tokenized in place by skipping the first
two BYTES of the read buffer, then replacing those first two BYTES with the
length of the first string, and then replacing each CRLF with the length of the
following string). But, the length of each string is known!
I figured out that I could write a Strnchr procedure that would be faster than
before. Basically, I save the regs, use the string length to determine the end
of the string, capture the contents of the trailing DWORD, save the match
character DWORD at the end of the string, search only for matches with the match
character, and when a match is found, check if the match point is the modified
end DWORD point, if so then zero the return match point value, restore the
modified string, save the match point, restore regs, then return.
The following is a brief list of the timing differences between the Lingo
version (KRBNew4) and my newest version called SSEStrnchr:
1931 cycles for KRBNew4, match long string
1293 cycles for KRBNew4, match null in long string
28 cycles for KRBNew4, match null in short string
1317 cycles for SSEStrnchr, match long string
1297 cycles for SSEStrnchr match null in long string
34 cycles for SSEStrnchr, match null in short string
The newest version saves about 32% of the time for searching for non-NULL
character match. A NULL match is approximately the same time as older version,
but both versions should probably be kept and the older version used for NULL
matches and in cases where the string length is not known.
Note: For a defined string, the following will create a known length string
format:
WORD (LENGTHOF szString - 1)
szString BYTE "This is a string",0
A companion StrRnchr procedure was also created which searches a string in
reverse, but first replaces the DWORD in front of the head of the string with
the match character, then restores it as it exits.
I have attached a zip of the make file, the timing source, the exe, the object,
the timing results, and a copy of this description.
I can supply a copy of my tokenizing routine. It is SSE and fast!
Dave.
9506 cycles for crt_strchr, match long string
9481 cycles for crt_strchr, no match long string
22302 cycles for KRBOld, match long string
7874 cycles for KRBNew, match long string
7948 cycles for KRBNew2, match long string
4682 cycles for KRBNew3, match long string
2274 cycles for KRBNew3, match null in long string
2813 cycles for KRBNew4, match long string
2206 cycles for KRBNew4, match null in long string
21 cycles for KRBNew4, match null in short string
2292 cycles for SSEStrnchr, match long string
2297 cycles for SSEStrnchr match null in long string
40 cycles for SSEStrnchr, match null in short string
7862 cycles for KRBNew, no match long string
7911 cycles for KRBNew2, no match long string
4815 cycles for KRBNew3, no match long string
2815 cycles for KRBNew4, no match long string
2261 cycles for SSEStrnchr, no match long string
9422 cycles for crt_strchr, match long string
9458 cycles for crt_strchr, no match long string
23350 cycles for KRBOld, match long string
7852 cycles for KRBNew, match long string
7833 cycles for KRBNew2, match long string
4759 cycles for KRBNew3, match long string
2357 cycles for KRBNew3, match null in long string
2812 cycles for KRBNew4, match long string
2283 cycles for KRBNew4, match null in long string
22 cycles for KRBNew4, match null in short string
2287 cycles for SSEStrnchr, match long string
2283 cycles for SSEStrnchr match null in long string
39 cycles for SSEStrnchr, match null in short string
8830 cycles for KRBNew, no match long string
7916 cycles for KRBNew2, no match long string
4822 cycles for KRBNew3, no match long string
2823 cycles for KRBNew4, no match long string
2269 cycles for SSEStrnchr, no match long string
9474 cycles for crt_strchr, match long string
9392 cycles for crt_strchr, no match long string
22291 cycles for KRBOld, match long string
7832 cycles for KRBNew, match long string
7910 cycles for KRBNew2, match long string
4735 cycles for KRBNew3, match long string
2357 cycles for KRBNew3, match null in long string
2813 cycles for KRBNew4, match long string
2222 cycles for KRBNew4, match null in long string
22 cycles for KRBNew4, match null in short string
2334 cycles for SSEStrnchr, match long string
2214 cycles for SSEStrnchr match null in long string
31 cycles for SSEStrnchr, match null in short string
7862 cycles for KRBNew, no match long string
7979 cycles for KRBNew2, no match long string
4768 cycles for KRBNew3, no match long string
2846 cycles for KRBNew4, no match long string
2343 cycles for SSEStrnchr, no match long string
9453 cycles for crt_strchr, match long string
9475 cycles for crt_strchr, no match long string
22285 cycles for KRBOld, match long string
7777 cycles for KRBNew, match long string
7931 cycles for KRBNew2, match long string
4769 cycles for KRBNew3, match long string
2273 cycles for KRBNew3, match null in long string
2732 cycles for KRBNew4, match long string
2220 cycles for KRBNew4, match null in long string
28 cycles for KRBNew4, match null in short string
2268 cycles for SSEStrnchr, match long string
2222 cycles for SSEStrnchr match null in long string
31 cycles for SSEStrnchr, match null in short string
8428 cycles for KRBNew, no match long string
7836 cycles for KRBNew2, no match long string
4836 cycles for KRBNew3, no match long string
2725 cycles for KRBNew4, no match long string
2253 cycles for SSEStrnchr, no match long string
9474 cycles for crt_strchr, match long string
9489 cycles for crt_strchr, no match long string
22286 cycles for KRBOld, match long string
7789 cycles for KRBNew, match long string
7832 cycles for KRBNew2, match long string
4785 cycles for KRBNew3, match long string
2269 cycles for KRBNew3, match null in long string
2814 cycles for KRBNew4, match long string
2207 cycles for KRBNew4, match null in long string
22 cycles for KRBNew4, match null in short string
2262 cycles for SSEStrnchr, match long string
2303 cycles for SSEStrnchr match null in long string
31 cycles for SSEStrnchr, match null in short string
7764 cycles for KRBNew, no match long string
8201 cycles for KRBNew2, no match long string
4690 cycles for KRBNew3, no match long string
2802 cycles for KRBNew4, no match long string
2322 cycles for SSEStrnchr, no match long string
Codesizes:
dostrchr: 12
KRBOld: 32
KRBNew: 97
KRBNew2: 141
KRBNew3: 219
KRBNew4: 141
SSEStrnchr: 173
Hi Steve,
Here are my timings.
@Alex: Welcome back :U
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
Find character in string: long string 5000 bytes.
9695 cycles for crt_strchr, match long string
9732 cycles for crt_strchr, no match long string
13526 cycles for KRBOld, match long string
3385 cycles for KRBNew, match long string
3739 cycles for KRBNew2, match long string
2495 cycles for KRBNew3, match long string
1130 cycles for KRBNew3, match null in long string
2136 cycles for KRBNew4, match long string
1009 cycles for KRBNew4, match null in long string
7 cycles for KRBNew4, match null in short string
1342 cycles for SSEStrnchr, match long string
1044 cycles for SSEStrnchr match null in long string
14 cycles for SSEStrnchr, match null in short string
3372 cycles for KRBNew, no match long string
3583 cycles for KRBNew2, no match long string
2518 cycles for KRBNew3, no match long string
2131 cycles for KRBNew4, no match long string
1344 cycles for SSEStrnchr, no match long string
Hi KeepingRealBusy,
Some minor problems :
Trying to assemble with ml.exe V6.14.8444 supplied with Masm32, I got a lot of syntax error messages :
strchar4a.asm(559) : error A2008: syntax error : xmm
Switching to ml.exe V6.15.8803 solved the problem.
Your original strchar4a.exe is reported as malware by some AV engines but they are false positives :
http://virusscan.jotti.org/tr/scanresult/e97eaab391ed597b4b9c046c76c257fb4b127a16
Here is my report :
Intel(R) Pentium(R) 4 CPU 3.20GHz (SSE3)
Find character in string: long string 5000 bytes.
8896 cycles for crt_strchr, match long string
9047 cycles for crt_strchr, no match long string
20933 cycles for KRBOld, match long string
7371 cycles for KRBNew, match long string
7347 cycles for KRBNew2, match long string
4411 cycles for KRBNew3, match long string
2144 cycles for KRBNew3, match null in long string
2595 cycles for KRBNew4, match long string
2081 cycles for KRBNew4, match null in long string
20 cycles for KRBNew4, match null in short string
2121 cycles for SSEStrnchr, match long string
2089 cycles for SSEStrnchr match null in long string
31 cycles for SSEStrnchr, match null in short string
7366 cycles for KRBNew, no match long string
7375 cycles for KRBNew2, no match long string
4447 cycles for KRBNew3, no match long string
2595 cycles for KRBNew4, no match long string
2121 cycles for SSEStrnchr, no match long string
8871 cycles for crt_strchr, match long string
8868 cycles for crt_strchr, no match long string
20954 cycles for KRBOld, match long string
7318 cycles for KRBNew, match long string
7351 cycles for KRBNew2, match long string
4418 cycles for KRBNew3, match long string
2145 cycles for KRBNew3, match null in long string
2590 cycles for KRBNew4, match long string
2104 cycles for KRBNew4, match null in long string
22 cycles for KRBNew4, match null in short string
2122 cycles for SSEStrnchr, match long string
2098 cycles for SSEStrnchr match null in long string
32 cycles for SSEStrnchr, match null in short string
7314 cycles for KRBNew, no match long string
7354 cycles for KRBNew2, no match long string
4456 cycles for KRBNew3, no match long string
2597 cycles for KRBNew4, no match long string
2122 cycles for SSEStrnchr, no match long string
9196 cycles for crt_strchr, match long string
8974 cycles for crt_strchr, no match long string
20933 cycles for KRBOld, match long string
7317 cycles for KRBNew, match long string
7355 cycles for KRBNew2, match long string
4504 cycles for KRBNew3, match long string
2210 cycles for KRBNew3, match null in long string
2643 cycles for KRBNew4, match long string
2088 cycles for KRBNew4, match null in long string
22 cycles for KRBNew4, match null in short string
2128 cycles for SSEStrnchr, match long string
2100 cycles for SSEStrnchr match null in long string
33 cycles for SSEStrnchr, match null in short string
7327 cycles for KRBNew, no match long string
7403 cycles for KRBNew2, no match long string
4563 cycles for KRBNew3, no match long string
2623 cycles for KRBNew4, no match long string
2125 cycles for SSEStrnchr, no match long string
8845 cycles for crt_strchr, match long string
9214 cycles for crt_strchr, no match long string
20972 cycles for KRBOld, match long string
7320 cycles for KRBNew, match long string
7350 cycles for KRBNew2, match long string
4422 cycles for KRBNew3, match long string
2153 cycles for KRBNew3, match null in long string
2590 cycles for KRBNew4, match long string
2080 cycles for KRBNew4, match null in long string
34 cycles for KRBNew4, match null in short string
2122 cycles for SSEStrnchr, match long string
2083 cycles for SSEStrnchr match null in long string
34 cycles for SSEStrnchr, match null in short string
7329 cycles for KRBNew, no match long string
7434 cycles for KRBNew2, no match long string
4489 cycles for KRBNew3, no match long string
2622 cycles for KRBNew4, no match long string
2130 cycles for SSEStrnchr, no match long string
8873 cycles for crt_strchr, match long string
9074 cycles for crt_strchr, no match long string
21162 cycles for KRBOld, match long string
7427 cycles for KRBNew, match long string
7406 cycles for KRBNew2, match long string
4475 cycles for KRBNew3, match long string
2149 cycles for KRBNew3, match null in long string
2640 cycles for KRBNew4, match long string
2082 cycles for KRBNew4, match null in long string
22 cycles for KRBNew4, match null in short string
2154 cycles for SSEStrnchr, match long string
2145 cycles for SSEStrnchr match null in long string
34 cycles for SSEStrnchr, match null in short string
7337 cycles for KRBNew, no match long string
7409 cycles for KRBNew2, no match long string
4493 cycles for KRBNew3, no match long string
2624 cycles for KRBNew4, no match long string
2163 cycles for SSEStrnchr, no match long string
Codesizes:
dostrchr: 12
KRBOld: 32
KRBNew: 97
KRBNew2: 141
KRBNew3: 219
KRBNew4: 141
SSEStrnchr: 173
--- ok ---
(SSE1)
Find character in string: long string 5000 bytes.
9885 cycles for crt_strchr, match long string
9839 cycles for crt_strchr, no match long string
20186 cycles for KRBOld, match long string
1949 cycles for KRBNew, match long string
2065 cycles for KRBNew2, match long string
2073 cycles for KRBNew3, match long string
1295 cycles for KRBNew3, match null in long string
1606 cycles for KRBNew4, match long string
966 cycles for KRBNew4, match null in long string
12 cycles for KRBNew4, match null in short string
1301 cycles for SSEStrnchr, match long string
998 cycles for SSEStrnchr match null in long string
21 cycles for SSEStrnchr, match null in short string
1946 cycles for KRBNew, no match long string
2055 cycles for KRBNew2, no match long string
2072 cycles for KRBNew3, no match long string
1612 cycles for KRBNew4, no match long string
1310 cycles for SSEStrnchr, no match long string
9914 cycles for crt_strchr, match long string
9817 cycles for crt_strchr, no match long string
20204 cycles for KRBOld, match long string
1951 cycles for KRBNew, match long string
2063 cycles for KRBNew2, match long string
2083 cycles for KRBNew3, match long string
1293 cycles for KRBNew3, match null in long string
1607 cycles for KRBNew4, match long string
968 cycles for KRBNew4, match null in long string
13 cycles for KRBNew4, match null in short string
1300 cycles for SSEStrnchr, match long string
996 cycles for SSEStrnchr match null in long string
21 cycles for SSEStrnchr, match null in short string
1948 cycles for KRBNew, no match long string
2075 cycles for KRBNew2, no match long string
2074 cycles for KRBNew3, no match long string
1612 cycles for KRBNew4, no match long string
1313 cycles for SSEStrnchr, no match long string
9879 cycles for crt_strchr, match long string
9815 cycles for crt_strchr, no match long string
20188 cycles for KRBOld, match long string
1951 cycles for KRBNew, match long string
2175 cycles for KRBNew2, match long string
2074 cycles for KRBNew3, match long string
1292 cycles for KRBNew3, match null in long string
1607 cycles for KRBNew4, match long string
970 cycles for KRBNew4, match null in long string
13 cycles for KRBNew4, match null in short string
1297 cycles for SSEStrnchr, match long string
998 cycles for SSEStrnchr match null in long string
21 cycles for SSEStrnchr, match null in short string
1951 cycles for KRBNew, no match long string
2051 cycles for KRBNew2, no match long string
2072 cycles for KRBNew3, no match long string
1612 cycles for KRBNew4, no match long string
1309 cycles for SSEStrnchr, no match long string
9875 cycles for crt_strchr, match long string
9818 cycles for crt_strchr, no match long string
20187 cycles for KRBOld, match long string
1953 cycles for KRBNew, match long string
2075 cycles for KRBNew2, match long string
2074 cycles for KRBNew3, match long string
1292 cycles for KRBNew3, match null in long string
1612 cycles for KRBNew4, match long string
966 cycles for KRBNew4, match null in long string
14 cycles for KRBNew4, match null in short string
1297 cycles for SSEStrnchr, match long string
999 cycles for SSEStrnchr match null in long string
21 cycles for SSEStrnchr, match null in short string
1949 cycles for KRBNew, no match long string
2054 cycles for KRBNew2, no match long string
2072 cycles for KRBNew3, no match long string
1614 cycles for KRBNew4, no match long string
1311 cycles for SSEStrnchr, no match long string
9876 cycles for crt_strchr, match long string
9942 cycles for crt_strchr, no match long string
20234 cycles for KRBOld, match long string
1952 cycles for KRBNew, match long string
2062 cycles for KRBNew2, match long string
2077 cycles for KRBNew3, match long string
1296 cycles for KRBNew3, match null in long string
1608 cycles for KRBNew4, match long string
967 cycles for KRBNew4, match null in long string
13 cycles for KRBNew4, match null in short string
1299 cycles for SSEStrnchr, match long string
996 cycles for SSEStrnchr match null in long string
22 cycles for SSEStrnchr, match null in short string
1947 cycles for KRBNew, no match long string
2053 cycles for KRBNew2, no match long string
2074 cycles for KRBNew3, no match long string
1617 cycles for KRBNew4, no match long string
1310 cycles for SSEStrnchr, no match long string
Codesizes:
dostrchr: 12
KRBOld: 32
KRBNew: 97
KRBNew2: 141
KRBNew3: 219
KRBNew4: 141
SSEStrnchr: 173
--- ok ---
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
Find character in string: long string 5000 bytes.
9068 cycles for crt_strchr, match long string
8910 cycles for crt_strchr, no match long string
21361 cycles for KRBOld, match long string
7354 cycles for KRBNew, match long string
7402 cycles for KRBNew2, match long string
4434 cycles for KRBNew3, match long string
2155 cycles for KRBNew3, match null in long string
2604 cycles for KRBNew4, match long string
2097 cycles for KRBNew4, match null in long string
20 cycles for KRBNew4, match null in short string
2133 cycles for SSEStrnchr, match long string
2094 cycles for SSEStrnchr match null in long string
34 cycles for SSEStrnchr, match null in short string
7349 cycles for KRBNew, no match long string
7395 cycles for KRBNew2, no match long string
4479 cycles for KRBNew3, no match long string
2613 cycles for KRBNew4, no match long string
2137 cycles for SSEStrnchr, no match long string
8880 cycles for crt_strchr, match long string
9442 cycles for crt_strchr, no match long string
22416 cycles for KRBOld, match long string
8575 cycles for KRBNew, match long string
8173 cycles for KRBNew2, match long string
5244 cycles for KRBNew3, match long string
2160 cycles for KRBNew3, match null in long string
2607 cycles for KRBNew4, match long string
2089 cycles for KRBNew4, match null in long string
20 cycles for KRBNew4, match null in short string
2136 cycles for SSEStrnchr, match long string
2100 cycles for SSEStrnchr match null in long string
31 cycles for SSEStrnchr, match null in short string
9165 cycles for KRBNew, no match long string
8036 cycles for KRBNew2, no match long string
4477 cycles for KRBNew3, no match long string
2604 cycles for KRBNew4, no match long string
2139 cycles for SSEStrnchr, no match long string
8901 cycles for crt_strchr, match long string
8886 cycles for crt_strchr, no match long string
21064 cycles for KRBOld, match long string
7375 cycles for KRBNew, match long string
7395 cycles for KRBNew2, match long string
4435 cycles for KRBNew3, match long string
2153 cycles for KRBNew3, match null in long string
2612 cycles for KRBNew4, match long string
2140 cycles for KRBNew4, match null in long string
23 cycles for KRBNew4, match null in short string
2205 cycles for SSEStrnchr, match long string
2301 cycles for SSEStrnchr match null in long string
36 cycles for SSEStrnchr, match null in short string
7352 cycles for KRBNew, no match long string
7369 cycles for KRBNew2, no match long string
4682 cycles for KRBNew3, no match long string
2623 cycles for KRBNew4, no match long string
2136 cycles for SSEStrnchr, no match long string
9043 cycles for crt_strchr, match long string
8897 cycles for crt_strchr, no match long string
21042 cycles for KRBOld, match long string
7395 cycles for KRBNew, match long string
7449 cycles for KRBNew2, match long string
4486 cycles for KRBNew3, match long string
2190 cycles for KRBNew3, match null in long string
2644 cycles for KRBNew4, match long string
2096 cycles for KRBNew4, match null in long string
27 cycles for KRBNew4, match null in short string
2163 cycles for SSEStrnchr, match long string
2133 cycles for SSEStrnchr match null in long string
37 cycles for SSEStrnchr, match null in short string
7687 cycles for KRBNew, no match long string
7463 cycles for KRBNew2, no match long string
4608 cycles for KRBNew3, no match long string
2650 cycles for KRBNew4, no match long string
2220 cycles for SSEStrnchr, no match long string
9459 cycles for crt_strchr, match long string
8909 cycles for crt_strchr, no match long string
21264 cycles for KRBOld, match long string
7402 cycles for KRBNew, match long string
7449 cycles for KRBNew2, match long string
4488 cycles for KRBNew3, match long string
2184 cycles for KRBNew3, match null in long string
2927 cycles for KRBNew4, match long string
2125 cycles for KRBNew4, match null in long string
25 cycles for KRBNew4, match null in short string
2186 cycles for SSEStrnchr, match long string
2111 cycles for SSEStrnchr match null in long string
36 cycles for SSEStrnchr, match null in short string
7348 cycles for KRBNew, no match long string
7539 cycles for KRBNew2, no match long string
4466 cycles for KRBNew3, no match long string
2823 cycles for KRBNew4, no match long string
2133 cycles for SSEStrnchr, no match long string
It is an old stuff for archaic CPUs but I'm wondering why to be so slow... :lol
One reason is:
La malattia italiano per lo stoccaggio inutili di registri è molto contagiosa.. :lol
(The Italian disease for unnecessary storing of registers is very contagious...)
Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz (SSE4)
Find character in string: long string 5000 bytes.
9490 cycles for crt_strchr, match long string
6959 cycles for crt_strchr, no match long string
9671 cycles for KRBOld, match long string
1180 cycles for KRBNew, match long string
1377 cycles for KRBNew2, match long string
1349 cycles for KRBNew3, match long string
707 cycles for KRBNew3, match null in long string
828 cycles for KRBNew4, match long string
618 cycles for KRBNew4, match null in long string
3 cycles for KRBNew4, match null in short string
738 cycles for SSEStrnchr, match long string
570 cycles for SSEStrnchrLingo, match long string
579 cycles for SSEStrnchr match null in long string
1 cycles for SSEStrnchrLingo match null in long string
6 cycles for SSEStrnchr, match null in short string
1 cycles for SSEStrnchrLingo, match null in short string
1229 cycles for KRBNew, no match long string
1444 cycles for KRBNew2, no match long string
1335 cycles for KRBNew3, no match long string
816 cycles for KRBNew4, no match long string
730 cycles for SSEStrnchr, no match long string
572 cycles for SSEStrnchrLingo, no match long string
9840 cycles for crt_strchr, match long string
7071 cycles for crt_strchr, no match long string
9657 cycles for KRBOld, match long string
1222 cycles for KRBNew, match long string
1457 cycles for KRBNew2, match long string
1353 cycles for KRBNew3, match long string
716 cycles for KRBNew3, match null in long string
831 cycles for KRBNew4, match long string
615 cycles for KRBNew4, match null in long string
0 cycles for KRBNew4, match null in short string
746 cycles for SSEStrnchr, match long string
571 cycles for SSEStrnchrLingo, match long string
581 cycles for SSEStrnchr match null in long string
0 cycles for SSEStrnchrLingo match null in long string
5 cycles for SSEStrnchr, match null in short string
3 cycles for SSEStrnchrLingo, match null in short string
1202 cycles for KRBNew, no match long string
1407 cycles for KRBNew2, no match long string
1351 cycles for KRBNew3, no match long string
826 cycles for KRBNew4, no match long string
741 cycles for SSEStrnchr, no match long string
569 cycles for SSEStrnchrLingo, no match long string
10428 cycles for crt_strchr, match long string
5747 cycles for crt_strchr, no match long string
9639 cycles for KRBOld, match long string
1215 cycles for KRBNew, match long string
1404 cycles for KRBNew2, match long string
1369 cycles for KRBNew3, match long string
723 cycles for KRBNew3, match null in long string
824 cycles for KRBNew4, match long string
606 cycles for KRBNew4, match null in long string
1 cycles for KRBNew4, match null in short string
725 cycles for SSEStrnchr, match long string
564 cycles for SSEStrnchrLingo, match long string
579 cycles for SSEStrnchr match null in long string
3 cycles for SSEStrnchrLingo match null in long string
6 cycles for SSEStrnchr, match null in short string
0 cycles for SSEStrnchrLingo, match null in short string
1204 cycles for KRBNew, no match long string
1446 cycles for KRBNew2, no match long string
1366 cycles for KRBNew3, no match long string
847 cycles for KRBNew4, no match long string
731 cycles for SSEStrnchr, no match long string
573 cycles for SSEStrnchrLingo, no match long string
12068 cycles for crt_strchr, match long string
5762 cycles for crt_strchr, no match long string
9737 cycles for KRBOld, match long string
1227 cycles for KRBNew, match long string
1453 cycles for KRBNew2, match long string
1350 cycles for KRBNew3, match long string
705 cycles for KRBNew3, match null in long string
815 cycles for KRBNew4, match long string
610 cycles for KRBNew4, match null in long string
2 cycles for KRBNew4, match null in short string
738 cycles for SSEStrnchr, match long string
573 cycles for SSEStrnchrLingo, match long string
582 cycles for SSEStrnchr match null in long string
2 cycles for SSEStrnchrLingo match null in long string
4 cycles for SSEStrnchr, match null in short string
1 cycles for SSEStrnchrLingo, match null in short string
1215 cycles for KRBNew, no match long string
1409 cycles for KRBNew2, no match long string
1354 cycles for KRBNew3, no match long string
828 cycles for KRBNew4, no match long string
742 cycles for SSEStrnchr, no match long string
571 cycles for SSEStrnchrLingo, no match long string
8994 cycles for crt_strchr, match long string
7428 cycles for crt_strchr, no match long string
10900 cycles for KRBOld, match long string
1212 cycles for KRBNew, match long string
1389 cycles for KRBNew2, match long string
1363 cycles for KRBNew3, match long string
705 cycles for KRBNew3, match null in long string
825 cycles for KRBNew4, match long string
615 cycles for KRBNew4, match null in long string
3 cycles for KRBNew4, match null in short string
738 cycles for SSEStrnchr, match long string
570 cycles for SSEStrnchrLingo, match long string
571 cycles for SSEStrnchr match null in long string
0 cycles for SSEStrnchrLingo match null in long string
2 cycles for SSEStrnchr, match null in short string
-2 cycles for SSEStrnchrLingo, match null in short string
1219 cycles for KRBNew, no match long string
1449 cycles for KRBNew2, no match long string
1355 cycles for KRBNew3, no match long string
826 cycles for KRBNew4, no match long string
726 cycles for SSEStrnchr, no match long string
568 cycles for SSEStrnchrLingo, no match long string
Codesizes:
dostrchr: 12
KRBOld: 32
KRBNew: 97
KRBNew2: 141
KRBNew3: 219
KRBNew4: 141
SSEStrnchr: 173
SSEStrnchrLingo:123
--- ok ---
A bit slow, but at least no exceptions thrown :bg
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
Find character in string: long string 5000 bytes.
1346 cycles for SSEStrnchr, no match long string
1463 cycles for SSEStrnchrLingo, no match long string
That looks a bit inefficient:
pcmpeqb xmm2, [eax+32] ; Compare 16 match characters to 16 BYTES of the source.
add eax, 32 ; Increment the source string pointer by 32
movdqa xmm1, xmm0
pmovmskb edx, xmm2 ; Return a 1 for each matched character to the low 16 BITS of edx.
One percent faster on my archaic CPU:
add eax, 32 ; Increment the source string pointer by 32
movdqa xmm1, xmm0
pcmpeqb xmm2, [eax] ; Compare 16 match characters to 16 BYTES of the source.
pmovmskb edx, xmm2 ; Return a 1 for each matched character to the low 16 BITS of edx.
What I like most with Lingo's code is that it can be so easily improved:
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
1463 cycles for SSEStrnchrLingo, no match long string
1320 cycles for SSEStrnchrLingoJJ, no match long string
Greetings to Toronto :U
AMD Phenom(tm) II X6 1055T Processor (SSE3)
Find character in string: long string 5000 bytes.
7691 cycles for crt_strchr, match long string
7638 cycles for crt_strchr, no match long string
15201 cycles for KRBOld, match long string
1943 cycles for KRBNew, match long string
1881 cycles for KRBNew2, match long string
1743 cycles for KRBNew3, match long string
1460 cycles for KRBNew3, match null in long string
1426 cycles for KRBNew4, match long string
1137 cycles for KRBNew4, match null in long string
7 cycles for KRBNew4, match null in short string
1322 cycles for SSEStrnchr, match long string
855 cycles for SSEStrnchrLingo, match long string
1121 cycles for SSEStrnchr match null in long string
6 cycles for SSEStrnchrLingo match null in long string
9 cycles for SSEStrnchr, match null in short string
6 cycles for SSEStrnchrLingo, match null in short string
1913 cycles for KRBNew, no match long string
1932 cycles for KRBNew2, no match long string
1790 cycles for KRBNew3, no match long string
1476 cycles for KRBNew4, no match long string
1311 cycles for SSEStrnchr, no match long string
866 cycles for SSEStrnchrLingo, no match long string
7662 cycles for crt_strchr, match long string
7641 cycles for crt_strchr, no match long string
15238 cycles for KRBOld, match long string
1945 cycles for KRBNew, match long string
1941 cycles for KRBNew2, match long string
1748 cycles for KRBNew3, match long string
1467 cycles for KRBNew3, match null in long string
1475 cycles for KRBNew4, match long string
1128 cycles for KRBNew4, match null in long string
7 cycles for KRBNew4, match null in short string
1329 cycles for SSEStrnchr, match long string
857 cycles for SSEStrnchrLingo, match long string
1147 cycles for SSEStrnchr match null in long string
6 cycles for SSEStrnchrLingo match null in long string
9 cycles for SSEStrnchr, match null in short string
6 cycles for SSEStrnchrLingo, match null in short string
1937 cycles for KRBNew, no match long string
1939 cycles for KRBNew2, no match long string
1824 cycles for KRBNew3, no match long string
1479 cycles for KRBNew4, no match long string
1320 cycles for SSEStrnchr, no match long string
850 cycles for SSEStrnchrLingo, no match long string
7652 cycles for crt_strchr, match long string
7653 cycles for crt_strchr, no match long string
15247 cycles for KRBOld, match long string
1955 cycles for KRBNew, match long string
1947 cycles for KRBNew2, match long string
1801 cycles for KRBNew3, match long string
1457 cycles for KRBNew3, match null in long string
1468 cycles for KRBNew4, match long string
1116 cycles for KRBNew4, match null in long string
7 cycles for KRBNew4, match null in short string
1326 cycles for SSEStrnchr, match long string
851 cycles for SSEStrnchrLingo, match long string
1140 cycles for SSEStrnchr match null in long string
6 cycles for SSEStrnchrLingo match null in long string
9 cycles for SSEStrnchr, match null in short string
6 cycles for SSEStrnchrLingo, match null in short string
1958 cycles for KRBNew, no match long string
1947 cycles for KRBNew2, no match long string
1786 cycles for KRBNew3, no match long string
1472 cycles for KRBNew4, no match long string
1322 cycles for SSEStrnchr, no match long string
854 cycles for SSEStrnchrLingo, no match long string
7684 cycles for crt_strchr, match long string
7694 cycles for crt_strchr, no match long string
15041 cycles for KRBOld, match long string
1956 cycles for KRBNew, match long string
1956 cycles for KRBNew2, match long string
1794 cycles for KRBNew3, match long string
1463 cycles for KRBNew3, match null in long string
1475 cycles for KRBNew4, match long string
1136 cycles for KRBNew4, match null in long string
7 cycles for KRBNew4, match null in short string
1317 cycles for SSEStrnchr, match long string
904 cycles for SSEStrnchrLingo, match long string
1140 cycles for SSEStrnchr match null in long string
6 cycles for SSEStrnchrLingo match null in long string
9 cycles for SSEStrnchr, match null in short string
5 cycles for SSEStrnchrLingo, match null in short string
1966 cycles for KRBNew, no match long string
1931 cycles for KRBNew2, no match long string
1769 cycles for KRBNew3, no match long string
1442 cycles for KRBNew4, no match long string
1325 cycles for SSEStrnchr, no match long string
851 cycles for SSEStrnchrLingo, no match long string
7645 cycles for crt_strchr, match long string
7611 cycles for crt_strchr, no match long string
15324 cycles for KRBOld, match long string
1944 cycles for KRBNew, match long string
1935 cycles for KRBNew2, match long string
1801 cycles for KRBNew3, match long string
1445 cycles for KRBNew3, match null in long string
1469 cycles for KRBNew4, match long string
1146 cycles for KRBNew4, match null in long string
7 cycles for KRBNew4, match null in short string
1315 cycles for SSEStrnchr, match long string
869 cycles for SSEStrnchrLingo, match long string
1149 cycles for SSEStrnchr match null in long string
6 cycles for SSEStrnchrLingo match null in long string
9 cycles for SSEStrnchr, match null in short string
6 cycles for SSEStrnchrLingo, match null in short string
1958 cycles for KRBNew, no match long string
1937 cycles for KRBNew2, no match long string
1779 cycles for KRBNew3, no match long string
1486 cycles for KRBNew4, no match long string
1294 cycles for SSEStrnchr, no match long string
856 cycles for SSEStrnchrLingo, no match long string
Codesizes:
dostrchr: 12
KRBOld: 32
KRBNew: 97
KRBNew2: 141
KRBNew3: 219
KRBNew4: 141
SSEStrnchr: 173
SSEStrnchrLingo:123
--- ok ---
Lingo (and jj since your modification has the same problems),
The following will not work, there is no guarantee that dh is null:
xor edx, edx ;
add edx, [esp+3*4] ; Get the match character.
.
.
.
add dh, dl ;
je @Zero ;
The following will work:
X
xor edx, edx ;
add dl, [esp+3*4] ; Get the match character.
.
.
.
add dh, dl ;
je @Zero ;
The following will destroy the input string with no recovery possible:
mov [ecx], dl ;
The following re-write will work:
mov eax, [esp+1*4] ; Get the source string pointer.
lea ecx, [eax-1] ;
add ecx, [esp+2*4] ; Get the string size.
mov dl, [ecx] ; Get the original trailing character.
mov [esp+1*4] ,dl ; Save over the source pointer.
.
.
.
xor edx, edx ;
add dl, [esp+3*4] ; Get the match character.
.
.
.
cmp eax, ecx ; Is the match past the end of the string?
cmovae eax, edx ; If so, return a null response (Not found).
mov dl, [esp+1*4] ; Get the saved trailng character.
mov [ecx], dl ; Restore the string.
ret 4*3
.
.
.
@Zero:
mov eax, ecx ; Return in eax-> offset of zero (end of string)
mov dl, [esp+1*4] ; Get the saved trailng character.
mov [ecx], dl ; Restore the string.
ret 4*3
The next thing is one of concept. There is no reason to consider that the string
length needs to be an end of string which is followed by a null, only that it is
the maximum number of characters to search. Consider parsing an expression with
embedded parentheses. Search to the end of string for a "(". If found then
recursively search past the "(" to the end of the string for a "(", then search
past the first "(" to the end of the string for a ")", and accept the lowest
found character of the two as the found character. If that character is another
"(", then recursively search again... If the the found character is a ")", then
you have an expression to parse that lies between the "(" and the ")". Search in
the expression for operators from the start of the expression for the length of
the expression, but note that there is no null at the end of the expression.
The same problem comes up when scanning MASM source files or .LST files and
encountering string data that can be delimited with either '"' characters or
"'", or both in the same string declaration:
szStr BYTE "This is John","'", "s book titled " '"A tale".',0
which in plane text would read:
This is John's book titled "A tale".
If the data to be searched is a text file that has been read into a buffer (with
a CRLF preceeding the actual data and a CRLF following the actual data in case
of a naked final string, then followed by a null), and it is desired to output
all lines that contain a match string, then search the buffer for all occurrences
of the first character of the match string. For each match check, if the match
string does not match the characters in the string, increment the buffer pointer
by one and find the next first character match. If the match string matches,
then back scan from the match point for a 0Ah (the head of the buffer line -1),
then forward scan from the match point for a 0Dh (the end of the buffer line),
increment the head pointer by 1, calculate the length as end-head+2, output the
string and CRLF, then continue the scan for a first character match from one of
two points. If you want to output the line for each string that matches the
match string (maybe multiple matches on a line), then increment the match point
by 1 and search for the first character again, otherwise, restart the search for
the first character at the end of line pointer (or increment it by 2 and then
search). When you finally get a null response, then all matches have been found.
Dave.
This timing run was done after the following changes had been made to
StrChar4b.asm (the Lingojj version in StrChar4b.zip):
Since Lingo and Lingojj were clobbering the null at the end of the strings
(would affect any routine following), I added code to all timing loops to
initialize the null at the end of the appropriate test string at every iteration
in the loop. This is the same instruction added to all loops to keep relative
timing the same. It may add 1 or 2 cycles to the reported time for the loop, but
the relative times should be comparable.
I renamed KRBNew4 back to OldLingo since that is what it really is - Lingo's
code from a year ago.
I made appropriate changes to SSEStrnchr based on my prior post, discussing the
proposed usages of the code for the project at hand.
The following timings had to be executed about 6 or 7 times to get consistency.
There were wildly different times between the two reported sequences, and
several negative times. The following times seem to track the overall average of
what was seen for the individual times (these times are the actual times for the
last pass - not some average taken over the 7 runs).
AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ (SSE3)
Find character in string: long string 5000 bytes.
7555 cycles for crt_strchr, match long string
7571 cycles for crt_strchr, no match long string
16538 cycles for KRBOld, match long string
927 cycles for KRBNew, match long string
3123 cycles for KRBNew2, match long string
4838 cycles for KRBNew3, match long string
1598 cycles for KRBNew3, match null in long string
1932 cycles for OldLingo, match long string
1287 cycles for OldLingo, match null in long string
27 cycles for OldLingo, match null in short string
1305 cycles for SSEStrnchr, match long string
1457 cycles for SSEStrnchrLingo, match long string
37 cycles for SSEStrnchr match null in long string
8 cycles for SSEStrnchrLingo match null in long string
35 cycles for SSEStrnchr, match null in short string
7 cycles for SSEStrnchrLingo, match null in short string
3182 cycles for KRBNew, no match long string
2900 cycles for KRBNew2, no match long string
2279 cycles for KRBNew3, no match long string
1939 cycles for OldLingo, no match long string
1308 cycles for SSEStrnchr, no match long string
1458 cycles for SSEStrnchrLingo, no match long string
1316 cycles for SSEStrnchrLingoJJ, no match long string
10613 cycles for crt_strchr, match long string
8064 cycles for crt_strchr, no match long string
15312 cycles for KRBOld, match long string
3168 cycles for KRBNew, match long string
2881 cycles for KRBNew2, match long string
2257 cycles for KRBNew3, match long string
1597 cycles for KRBNew3, match null in long string
1931 cycles for OldLingo, match long string
3761 cycles for OldLingo, match null in long string
27 cycles for OldLingo, match null in short string
1306 cycles for SSEStrnchr, match long string
1458 cycles for SSEStrnchrLingo, match long string
37 cycles for SSEStrnchr match null in long string
7 cycles for SSEStrnchrLingo match null in long string
35 cycles for SSEStrnchr, match null in short string
7 cycles for SSEStrnchrLingo, match null in short string
3168 cycles for KRBNew, no match long string
2884 cycles for KRBNew2, no match long string
2270 cycles for KRBNew3, no match long string
1932 cycles for OldLingo, no match long string
1320 cycles for SSEStrnchr, no match long string
1557 cycles for SSEStrnchrLingo, no match long string
1385 cycles for SSEStrnchrLingoJJ, no match long string
Codesizes:
dostrchr: 12
KRBOld: 32
KRBNew: 97
KRBNew2: 141
KRBNew3: 219
OldLingo: 141
SSEStrnchr: 204
SSEStrnchrLingo:123
SSEStrnchrLingoJJ:121
--- ok ---
Dave.