Can I have some timings on more recent CPUs, please? Thanks.
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
306 cycles for RevString
345 cycles for RevString2a
219 cycles for RevString2b
221 cycles for RevString3
220 cycles for RevStr
375 cycles for szRev1
175 cycles for RevLingo (needs aligned strings)
Intel(R) Core(TM) i3-2100 CPU @ 3.10GHz (SSE4)
107 cycles for RevString
103 cycles for RevString2a
103 cycles for RevString2b
107 cycles for RevString3
109 cycles for RevStr
200 cycles for szRev1
62 cycles for RevLingo (needs aligned strings)
Little variant: The executable is assembled with MasmBasic, i.e. useMB = 1 in line 4. If you want to assemble it yourself, either set the switch to 0 or use the library (http://www.masm32.com/board/index.php?topic=12460).
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
******** timings for unaligned strings:
259 cycles for RevString
261 cycles for RevString2a
164 cycles for RevString2b
165 cycles for RevString3
222 cycles for RevStr
400 cycles for Masm32 szRev
******** timings for 16-byte aligned strings:
228 cycles for RevString
264 cycles for RevString2a
164 cycles for RevString2b
163 cycles for RevString3
223 cycles for RevStr
389 cycles for Masm32 szRev
177 cycles for RevLingo (needs aligned strings)
Hi JJ,
QuoteIntel(R) Pentium(R) D CPU 2.80GHz (SSE3)
******** timings for unaligned strings, useMB=1
219 cycles for RevString
228 cycles for RevString2a
161 cycles for RevString2b
158 cycles for RevString3
221 cycles for RevStr
371 cycles for Masm32 szRev
******** timings for 16-byte aligned strings:
214 cycles for RevString
243 cycles for RevString2a
157 cycles for RevString2b
157 cycles for RevString3
214 cycles for RevStr
373 cycles for Masm32 szRev
180 cycles for RevLingo (needs aligned strings)
..sogla ym gnitset sa hcus ,sesoprup fo yteirav a sevres taht ,gnol sretcarahc 0
01 ,gnirts a si sihT
This is a string, 100 characters long, that serves a variety of purposes, such a
s testing my algos..
Sizes:
64 RevString
64 RevString2a
67 RevString2b
67 RevString3
64 RevStr
64 szRev
110 RevLingo
--- ok ---
Intel(R) Core(TM)2 Quad CPU Q9650 @ 3.00GHz (SSE4)
******** timings for unaligned strings, useMB=1
120 cycles for RevString
82 cycles for RevString2a
80 cycles for RevString2b
80 cycles for RevString3
127 cycles for RevStr
216 cycles for Masm32 szRev
******** timings for 16-byte aligned strings:
120 cycles for RevString
83 cycles for RevString2a
80 cycles for RevString2b
80 cycles for RevString3
128 cycles for RevStr
210 cycles for Masm32 szRev
63 cycles for RevLingo (needs aligned strings)
..sogla ym gnitset sa hcus ,sesoprup fo yteirav a sevres taht ,gnol sretcarahc 0
01 ,gnirts a si sihT
This is a string, 100 characters long, that serves a variety of purposes, such a
s testing my algos..
Sizes:
64 RevString
64 RevString2a
67 RevString2b
67 RevString3
64 RevStr
64 szRev
110 RevLingo
--- ok ---
AMD Phenom(tm) II X6 1100T Processor (SSE3)
******** timings for unaligned strings, useMB=1
111 cycles for RevString
92 cycles for RevString2a
77 cycles for RevString2b
77 cycles for RevString3
108 cycles for RevStr
259 cycles for Masm32 szRev
******** timings for 16-byte aligned strings:
111 cycles for RevString
91 cycles for RevString2a
76 cycles for RevString2b
77 cycles for RevString3
108 cycles for RevStr
256 cycles for Masm32 szRev
67 cycles for RevLingo (needs aligned strings)
..sogla ym gnitset sa hcus ,sesoprup fo yteirav a sevres taht ,gnol sretcarahc 0
01 ,gnirts a si sihT
This is a string, 100 characters long, that serves a variety of purposes, such a
s testing my algos..
Sizes:
64 RevString
64 RevString2a
67 RevString2b
67 RevString3
64 RevStr
64 szRev
110 RevLingo
--- ok ---
Thanks a lot, folks :thumbu
I attach one more, with RevString2c 5% faster on a P4 but for unaligned strings only.
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
******** timings for unaligned strings, useMB=1
344 cycles for RevString
335 cycles for RevString2a
312 cycles for RevString2b
310 cycles for RevString2c
313 cycles for RevString3
345 cycles for RevStr
401 cycles for Masm32 szRev
******** timings for 16-byte aligned strings:
217 cycles for RevString
257 cycles for RevString2a
158 cycles for RevString2b
158 cycles for RevString2c
160 cycles for RevString3
216 cycles for RevStr
375 cycles for Masm32 szRev
172 cycles for RevLingo (needs aligned strings)
AMD Phenom(tm) II X6 1100T Processor (SSE3)
******** timings for unaligned strings, useMB=1
238 cycles for RevString
209 cycles for RevString2a
192 cycles for RevString2b
204 cycles for RevString2c
193 cycles for RevString3
246 cycles for RevStr
283 cycles for Masm32 szRev
******** timings for 16-byte aligned strings:
116 cycles for RevString
95 cycles for RevString2a
81 cycles for RevString2b
81 cycles for RevString2c
81 cycles for RevString3
114 cycles for RevStr
268 cycles for Masm32 szRev
70 cycles for RevLingo (needs aligned strings)
Wow, what changed?
Quote
Intel(R) Celeron(R) CPU 2.80GHz (SSE3)
******** timings for unaligned strings, useMB=1
345 cycles for RevString
341 cycles for RevString2a
333 cycles for RevString2b
297 cycles for RevString2c
315 cycles for RevString3
333 cycles for RevStr
402 cycles for Masm32 szRev
******** timings for 16-byte aligned strings:
216 cycles for RevString
246 cycles for RevString2a
160 cycles for RevString2b
160 cycles for RevString2c
161 cycles for RevString3
218 cycles for RevStr
375 cycles for Masm32 szRev
177 cycles for RevLingo (needs aligned strings)
This is a string, 100 characters long, that serves a variety of purposes, such
s testing my algos..
..sogla ym gnitset sa hcus ,sesoprup fo yteirav a sevres taht ,gnol sretcarahc
01 ,gnirts a si sihT
Sizes:
64 RevString
64 RevString2a
67 RevString2b
67 RevString2c
67 RevString3
64 RevStr
64 szRev
110 RevLingo
--- ok ---
Quote from: sinsi on October 14, 2011, 12:40:18 PM
Wow, what changed?
Surprisingly, nothing for Dave's P4, but see ToutEnMasm's results for 2b/2c...
Your results are a lot slower because I changed the unaligned string from 16+4 to 16+3. It seems the AMD is very sensitive to alignment...
The Celeron gets slow for 16+1, 16+2, 16+3, but 16+4 is exactly as fast as 16+0.
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
******** timings for unaligned strings (16+3), useMB=1
188 cycles for RevString2a
201 cycles for RevString2b
197 cycles for RevString2c
199 cycles for RevString2d
202 cycles for RevString3
268 cycles for RevStr
267 cycles for Masm32 szRev
******** timings for 16-byte aligned strings:
104 cycles for RevString2a
105 cycles for RevString2b
111 cycles for RevString2c
104 cycles for RevString2d
105 cycles for RevString3
180 cycles for RevStr
260 cycles for Masm32 szRev
88 cycles for RevLingo (needs aligned strings)
Just for fun, here (http://www.sqlservercentral.com/Forums/Topic697178-23-1.aspx) is a thread showing how it can be done in SQL. Hilarious... :green2
For what it's worth, I did a quick test of the CRT strrev function against the MASM32 szRev procedure, and on my P3 I got these results for aligned, 100 and 500-byte strings:
560 cycles, szRev
714 cycles, crt__strrev
2672 cycles, szRev
3311 cycles, crt__strrev
Thanks, Michael. It is slow but remarkably immune against misalignment:
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
******** timings for unaligned strings, useMB=1
687 cycles for crt__strrev
218 cycles for RevString2b
207 cycles for RevString2c
209 cycles for RevString2d
217 cycles for RevString3
278 cycles for RevStr
289 cycles for Masm32 szRev
******** timings for 16-byte aligned strings:
687 cycles for crt__strrev
110 cycles for RevString2b
116 cycles for RevString2c
112 cycles for RevString2d
110 cycles for RevString3
189 cycles for RevStr
273 cycles for Masm32 szRev
92 cycles for RevLingo (GPF for non-aligned strings)
By the way: I revived this issue following requests by two (http://www.masm32.com/board/index.php?topic=17557.0) fellows (http://www.masm32.com/board/index.php?topic=17556.0) yesterday night :bg
QuoteBy the way: I revived this issue following requests by two fellows yesterday night
they are probably oblivious to this thread even being related :bg
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
******** timings for unaligned strings, useMB=1
960 cycles for crt__strrev
315 cycles for RevString2b
286 cycles for RevString2c
304 cycles for RevString2d
319 cycles for RevString3
345 cycles for RevStr
402 cycles for Masm32 szRev
******** timings for 16-byte aligned strings:
946 cycles for crt__strrev
159 cycles for RevString2b
186 cycles for RevString2c
157 cycles for RevString2d
159 cycles for RevString3
216 cycles for RevStr
375 cycles for Masm32 szRev
174 cycles for RevLingo (GPF for non-aligned strings)
AMD 3.2 OC'd to 3.6
AMD Phenom(tm) II X4 955 Processor (SSE3)
******** timings for unaligned strings, useMB=1
251 cycles for RevString
222 cycles for RevString2a
205 cycles for RevString2b
219 cycles for RevString2c
205 cycles for RevString3
261 cycles for RevStr
299 cycles for Masm32 szRev
******** timings for 16-byte aligned strings:
122 cycles for RevString
103 cycles for RevString2a
83 cycles for RevString2b
86 cycles for RevString2c
83 cycles for RevString3
121 cycles for RevStr
285 cycles for Masm32 szRev
75 cycles for RevLingo (needs aligned strings)
This is a string, 100 characters long, that serves a variety of purposes, such a
s testing my algos..
..sogla ym gnitset sa hcus ,sesoprup fo yteirav a sevres taht ,gnol sretcarahc 0
01 ,gnirts a si sihT
Sizes:
64 RevString
64 RevString2a
67 RevString2b
67 RevString2c
67 RevString3
64 RevStr
64 szRev
110 RevLingo
--- ok ---
OK, thanks everybody :U
Now that we have a new super-fast algo: Any proposals where it could be applied? I don't have the faintest idea... ::)
i was poking through some SQL files the other day
they had stored all the URL's in reverse :P
Quote from: dedndave on October 15, 2011, 12:25:45 AM
i was poking through some SQL files the other day
they had stored all the URL's in reverse :P
You should PEEK, not POKE, Dave :naughty:
So ok, we'll sell them the algo for a modest fgure, say: 100,000 bucks. Do you know the CEO?
they were firefox SQL files :P
not much profit, there