News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Reverse string timings

Started by jj2007, October 14, 2011, 07:19:33 AM

Previous topic - Next topic

jj2007

Can I have some timings on more recent CPUs, please? Thanks.

Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
306     cycles for RevString
345     cycles for RevString2a
219     cycles for RevString2b
221     cycles for RevString3
220     cycles for RevStr
375     cycles for szRev1
175     cycles for RevLingo (needs aligned strings)

Ficko

Intel(R) Core(TM) i3-2100 CPU @ 3.10GHz (SSE4)
107   cycles for RevString
103   cycles for RevString2a
103   cycles for RevString2b
107   cycles for RevString3
109   cycles for RevStr
200   cycles for szRev1
62   cycles for RevLingo (needs aligned strings)

jj2007

Little variant: The executable is assembled with MasmBasic, i.e. useMB = 1 in line 4. If you want to assemble it yourself, either set the switch to 0 or use the library.

Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
******** timings for unaligned strings:
259     cycles for RevString
261     cycles for RevString2a
164     cycles for RevString2b
165     cycles for RevString3
222     cycles for RevStr
400     cycles for Masm32 szRev

******** timings for 16-byte aligned strings:
228     cycles for RevString
264     cycles for RevString2a
164     cycles for RevString2b
163     cycles for RevString3
223     cycles for RevStr
389     cycles for Masm32 szRev
177     cycles for RevLingo (needs aligned strings)

Bill Cravener

Hi JJ,

QuoteIntel(R) Pentium(R) D CPU 2.80GHz (SSE3)
******** timings for unaligned strings, useMB=1
219     cycles for RevString
228     cycles for RevString2a
161     cycles for RevString2b
158     cycles for RevString3
221     cycles for RevStr
371     cycles for Masm32 szRev

******** timings for 16-byte aligned strings:
214     cycles for RevString
243     cycles for RevString2a
157     cycles for RevString2b
157     cycles for RevString3
214     cycles for RevStr
373     cycles for Masm32 szRev
180     cycles for RevLingo (needs aligned strings)

..sogla ym gnitset sa hcus ,sesoprup fo yteirav a sevres taht ,gnol sretcarahc 0
01 ,gnirts a si sihT
This is a string, 100 characters long, that serves a variety of purposes, such a
s testing my algos..


Sizes:
64      RevString
64      RevString2a
67      RevString2b
67      RevString3
64      RevStr
64      szRev
110     RevLingo

--- ok ---

My MASM32 Examples.

"Prejudice does not arise from low intelligence it arises from conservative ideals to which people of low intelligence are drawn." ~ Isaidthat

hutch--


Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)
******** timings for unaligned strings, useMB=1
120     cycles for RevString
82      cycles for RevString2a
80      cycles for RevString2b
80      cycles for RevString3
127     cycles for RevStr
216     cycles for Masm32 szRev

******** timings for 16-byte aligned strings:
120     cycles for RevString
83      cycles for RevString2a
80      cycles for RevString2b
80      cycles for RevString3
128     cycles for RevStr
210     cycles for Masm32 szRev
63      cycles for RevLingo (needs aligned strings)

..sogla ym gnitset sa hcus ,sesoprup fo yteirav a sevres taht ,gnol sretcarahc 0
01 ,gnirts a si sihT
This is a string, 100 characters long, that serves a variety of purposes, such a
s testing my algos..


Sizes:
64      RevString
64      RevString2a
67      RevString2b
67      RevString3
64      RevStr
64      szRev
110     RevLingo

--- ok ---
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

sinsi


AMD Phenom(tm) II X6 1100T Processor (SSE3)
******** timings for unaligned strings, useMB=1
111     cycles for RevString
92      cycles for RevString2a
77      cycles for RevString2b
77      cycles for RevString3
108     cycles for RevStr
259     cycles for Masm32 szRev

******** timings for 16-byte aligned strings:
111     cycles for RevString
91      cycles for RevString2a
76      cycles for RevString2b
77      cycles for RevString3
108     cycles for RevStr
256     cycles for Masm32 szRev
67      cycles for RevLingo (needs aligned strings)

..sogla ym gnitset sa hcus ,sesoprup fo yteirav a sevres taht ,gnol sretcarahc 0
01 ,gnirts a si sihT
This is a string, 100 characters long, that serves a variety of purposes, such a
s testing my algos..


Sizes:
64      RevString
64      RevString2a
67      RevString2b
67      RevString3
64      RevStr
64      szRev
110     RevLingo

--- ok ---
Light travels faster than sound, that's why some people seem bright until you hear them.

jj2007

Thanks a lot, folks :thumbu
I attach one more, with RevString2c 5% faster on a P4 but for unaligned strings only.

dedndave

prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
******** timings for unaligned strings, useMB=1
344     cycles for RevString
335     cycles for RevString2a
312     cycles for RevString2b
310     cycles for RevString2c
313     cycles for RevString3
345     cycles for RevStr
401     cycles for Masm32 szRev

******** timings for 16-byte aligned strings:
217     cycles for RevString
257     cycles for RevString2a
158     cycles for RevString2b
158     cycles for RevString2c
160     cycles for RevString3
216     cycles for RevStr
375     cycles for Masm32 szRev
172     cycles for RevLingo (needs aligned strings)

sinsi


AMD Phenom(tm) II X6 1100T Processor (SSE3)
******** timings for unaligned strings, useMB=1
238     cycles for RevString
209     cycles for RevString2a
192     cycles for RevString2b
204     cycles for RevString2c
193     cycles for RevString3
246     cycles for RevStr
283     cycles for Masm32 szRev

******** timings for 16-byte aligned strings:
116     cycles for RevString
95      cycles for RevString2a
81      cycles for RevString2b
81      cycles for RevString2c
81      cycles for RevString3
114     cycles for RevStr
268     cycles for Masm32 szRev
70      cycles for RevLingo (needs aligned strings)

Wow, what changed?
Light travels faster than sound, that's why some people seem bright until you hear them.

ToutEnMasm

Quote
Intel(R) Celeron(R) CPU 2.80GHz (SSE3)
******** timings for unaligned strings, useMB=1
345     cycles for RevString
341     cycles for RevString2a
333     cycles for RevString2b
297     cycles for RevString2c
315     cycles for RevString3
333     cycles for RevStr
402     cycles for Masm32 szRev

******** timings for 16-byte aligned strings:
216     cycles for RevString
246     cycles for RevString2a
160     cycles for RevString2b
160     cycles for RevString2c
161     cycles for RevString3
218     cycles for RevStr
375     cycles for Masm32 szRev
177     cycles for RevLingo (needs aligned strings)

This is a string, 100 characters long, that serves a variety of purposes, such
s testing my algos..
..sogla ym gnitset sa hcus ,sesoprup fo yteirav a sevres taht ,gnol sretcarahc
01 ,gnirts a si sihT


Sizes:
64      RevString
64      RevString2a
67      RevString2b
67      RevString2c
67      RevString3
64      RevStr
64      szRev
110     RevLingo

--- ok ---


jj2007

Quote from: sinsi on October 14, 2011, 12:40:18 PM
Wow, what changed?

Surprisingly, nothing for Dave's P4, but see ToutEnMasm's results for 2b/2c...

Your results are a lot slower because I changed the unaligned string from 16+4 to 16+3. It seems the AMD is very sensitive to alignment...

The Celeron gets slow for 16+1, 16+2, 16+3, but 16+4 is exactly as fast as 16+0.

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
******** timings for unaligned strings (16+3), useMB=1
188     cycles for RevString2a
201     cycles for RevString2b
197     cycles for RevString2c
199     cycles for RevString2d
202     cycles for RevString3
268     cycles for RevStr
267     cycles for Masm32 szRev

******** timings for 16-byte aligned strings:
104     cycles for RevString2a
105     cycles for RevString2b
111     cycles for RevString2c
104     cycles for RevString2d
105     cycles for RevString3
180     cycles for RevStr
260     cycles for Masm32 szRev
88      cycles for RevLingo (needs aligned strings)

Just for fun, here is a thread showing how it can be done in SQL. Hilarious... :green2

MichaelW

For what it's worth, I did a quick test of the CRT strrev function against the MASM32 szRev procedure, and on my P3 I got these results for aligned, 100 and 500-byte strings:

560 cycles, szRev
714 cycles, crt__strrev

2672 cycles, szRev
3311 cycles, crt__strrev

eschew obfuscation

jj2007

Thanks, Michael. It is slow but remarkably immune against misalignment:
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
******** timings for unaligned strings, useMB=1
687     cycles for crt__strrev
218     cycles for RevString2b
207     cycles for RevString2c
209     cycles for RevString2d
217     cycles for RevString3
278     cycles for RevStr
289     cycles for Masm32 szRev

******** timings for 16-byte aligned strings:
687     cycles for crt__strrev
110     cycles for RevString2b
116     cycles for RevString2c
112     cycles for RevString2d
110     cycles for RevString3
189     cycles for RevStr
273     cycles for Masm32 szRev
92      cycles for RevLingo (GPF for non-aligned strings)


By the way: I revived this issue following requests by two fellows yesterday night :bg

dedndave

QuoteBy the way: I revived this issue following requests by two fellows yesterday night
they are probably oblivious to this thread even being related   :bg

prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
******** timings for unaligned strings, useMB=1
960     cycles for crt__strrev
315     cycles for RevString2b
286     cycles for RevString2c
304     cycles for RevString2d
319     cycles for RevString3
345     cycles for RevStr
402     cycles for Masm32 szRev

******** timings for 16-byte aligned strings:
946     cycles for crt__strrev
159     cycles for RevString2b
186     cycles for RevString2c
157     cycles for RevString2d
159     cycles for RevString3
216     cycles for RevStr
375     cycles for Masm32 szRev
174     cycles for RevLingo (GPF for non-aligned strings)

Gunner

AMD 3.2 OC'd to 3.6

AMD Phenom(tm) II X4 955 Processor (SSE3)
******** timings for unaligned strings, useMB=1
251     cycles for RevString
222     cycles for RevString2a
205     cycles for RevString2b
219     cycles for RevString2c
205     cycles for RevString3
261     cycles for RevStr
299     cycles for Masm32 szRev

******** timings for 16-byte aligned strings:
122     cycles for RevString
103     cycles for RevString2a
83      cycles for RevString2b
86      cycles for RevString2c
83      cycles for RevString3
121     cycles for RevStr
285     cycles for Masm32 szRev
75      cycles for RevLingo (needs aligned strings)

This is a string, 100 characters long, that serves a variety of purposes, such a
s testing my algos..
..sogla ym gnitset sa hcus ,sesoprup fo yteirav a sevres taht ,gnol sretcarahc 0
01 ,gnirts a si sihT


Sizes:
64      RevString
64      RevString2a
67      RevString2b
67      RevString2c
67      RevString3
64      RevStr
64      szRev
110     RevLingo

--- ok ---
~Rob (Gunner)
- IE Zone Editor
- Gunners File Type Editor
http://www.gunnerinc.com