I am cooking some algos and would love to see timings for other processors, especially AMDs, Core Duo, P4.
Any volunteers? Just double-click the exe and post the output. Pm me if you want to have the rather confused source :red
Thanxalot, jj
[attachment deleted by admin]
How about a P3:
(SSE1)
Source len=4096
5194 clocks, mode s1, DestA
5200 clocks, mode s1, DestB
5193 clocks, mode d1, DestA
5200 clocks, mode d1, DestB
6198 clocks, mode c1, DestA
6205 clocks, mode c1, DestB
12366 clocks, mode m1, DestA
12368 clocks, mode m1, DestB
Source len=128
213 clocks, mode s1, DestA
211 clocks, mode s1, DestB
213 clocks, mode d1, DestA
213 clocks, mode d1, DestB
222 clocks, mode c1, DestA
222 clocks, mode c1, DestB
407 clocks, mode m1, DestA
407 clocks, mode m1, DestB
Source len=16
71 clocks, mode s1, DestA
71 clocks, mode s1, DestB
74 clocks, mode d1, DestA
74 clocks, mode d1, DestB
38 clocks, mode c1, DestA
38 clocks, mode c1, DestB
--- OK ---
My results (returned to stock CPU/memory speeds for this test)
AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ (SSE3)
Source len=4096
4622 clocks, mode s1, DestA
4443 clocks, mode s1, DestB
4957 clocks, mode d1, DestA
4872 clocks, mode d1, DestB
8688 clocks, mode c1, DestA
8728 clocks, mode c1, DestB
20870 clocks, mode m1, DestA
27694 clocks, mode m1, DestB
Source len=128
107 clocks, mode s1, DestA
100 clocks, mode s1, DestB
103 clocks, mode d1, DestA
94 clocks, mode d1, DestB
323 clocks, mode c1, DestA
323 clocks, mode c1, DestB
926 clocks, mode m1, DestA
933 clocks, mode m1, DestB
Source len=16
104 clocks, mode s1, DestA
103 clocks, mode s1, DestB
105 clocks, mode d1, DestA
106 clocks, mode d1, DestB
53 clocks, mode c1, DestA
53 clocks, mode c1, DestB
--- OK ---
EDIT: What kind of computations does this do/test ?
And are lower numbers better or worse ?
Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz (SSE4)
Source len=4096
3062 clocks, mode s1, DestA
3571 clocks, mode s1, DestB
2752 clocks, mode d1, DestA
2627 clocks, mode d1, DestB
4130 clocks, mode c1, DestA
4130 clocks, mode c1, DestB
8223 clocks, mode m1, DestA
8236 clocks, mode m1, DestB
Source len=128
123 clocks, mode s1, DestA
133 clocks, mode s1, DestB
132 clocks, mode d1, DestA
129 clocks, mode d1, DestB
134 clocks, mode c1, DestA
134 clocks, mode c1, DestB
278 clocks, mode m1, DestA
281 clocks, mode m1, DestB
Source len=16
55 clocks, mode s1, DestA
55 clocks, mode s1, DestB
60 clocks, mode d1, DestA
77 clocks, mode d1, DestB
22 clocks, mode c1, DestA
22 clocks, mode c1, DestB
--- OK ---
Quote from: BlackVortex on February 15, 2009, 10:35:46 AM
EDIT: What kind of computations does this do/test ?
And are lower numbers better or worse ?
It's an lstrcpy-type algo, and low numbers are better. The purpose is to see the difference between source/destination data alignment.
Michael: Thanks for the P3 test. It's good to know that the branch to the non-SSE2 algo worked ;-)
Neil: Thanks :thumbu
Here are my own results.
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
Source len=4096
2754 clocks, mode s1, DestA
3095 clocks, mode s1, DestB
2788 clocks, mode d1, DestA
2543 clocks, mode d1, DestB
5153 clocks, mode c1, DestA
5152 clocks, mode c1, DestB
8235 clocks, mode m1, DestA
8236 clocks, mode m1, DestB
Source len=128
126 clocks, mode s1, DestA
133 clocks, mode s1, DestB
139 clocks, mode d1, DestA
133 clocks, mode d1, DestB
167 clocks, mode c1, DestA
167 clocks, mode c1, DestB
284 clocks, mode m1, DestA
287 clocks, mode m1, DestB
Source len=16
57 clocks, mode s1, DestA
57 clocks, mode s1, DestB
68 clocks, mode d1, DestA
76 clocks, mode d1, DestB
27 clocks, mode c1, DestA
27 clocks, mode c1, DestB
jj, here are my results:
Intel(R) Pentium(R) 4 CPU 2.53GHz (SSE2)
Source len=4096
9569 clocks, mode s1, DestA
9123 clocks, mode s1, DestB
11921 clocks, mode d1, DestA
8383 clocks, mode d1, DestB
4562 clocks, mode c1, DestA
4387 clocks, mode c1, DestB
8497 clocks, mode m1, DestA
8616 clocks, mode m1, DestB
Source len=128
409 clocks, mode s1, DestA
373 clocks, mode s1, DestB
482 clocks, mode d1, DestA
371 clocks, mode d1, DestB
164 clocks, mode c1, DestA
162 clocks, mode c1, DestB
288 clocks, mode m1, DestA
285 clocks, mode m1, DestB
Source len=16
142 clocks, mode s1, DestA
142 clocks, mode s1, DestB
177 clocks, mode d1, DestA
187 clocks, mode d1, DestB
29 clocks, mode c1, DestA
25 clocks, mode c1, DestB
--- OK ---
Quote from: rags on February 15, 2009, 11:39:29 AM
jj, here are my results:
Thanks, I am very disappointed. :dazzled:
Could you please give it another try? I want to see if at least the 16-byte aligned version brings some improvement on the P4...
[attachment deleted by admin]
Ok JJ, I ran each one twice for you.
testbed1, run 1:
Intel(R) Pentium(R) 4 CPU 2.53GHz (SSE2)
Source len=4096
9551 clocks, mode s1, DestA
9101 clocks, mode s1, DestB
11894 clocks, mode d1, DestA
8358 clocks, mode d1, DestB
4440 clocks, mode c1, DestA
4316 clocks, mode c1, DestB
8688 clocks, mode m1, DestA
8579 clocks, mode m1, DestB
Source len=128
409 clocks, mode s1, DestA
397 clocks, mode s1, DestB
484 clocks, mode d1, DestA
380 clocks, mode d1, DestB
164 clocks, mode c1, DestA
161 clocks, mode c1, DestB
286 clocks, mode m1, DestA
298 clocks, mode m1, DestB
Source len=16
142 clocks, mode s1, DestA
116 clocks, mode s1, DestB
171 clocks, mode d1, DestA
179 clocks, mode d1, DestB
29 clocks, mode c1, DestA
27 clocks, mode c1, DestB
;---------------------------------------------------------------------
testbed1, run 2:
Intel(R) Pentium(R) 4 CPU 2.53GHz (SSE2)
Source len=4096
9589 clocks, mode s1, DestA
9209 clocks, mode s1, DestB
11889 clocks, mode d1, DestA
8444 clocks, mode d1, DestB
4310 clocks, mode c1, DestA
4546 clocks, mode c1, DestB
8920 clocks, mode m1, DestA
8522 clocks, mode m1, DestB
Source len=128
413 clocks, mode s1, DestA
375 clocks, mode s1, DestB
486 clocks, mode d1, DestA
355 clocks, mode d1, DestB
195 clocks, mode c1, DestA
176 clocks, mode c1, DestB
284 clocks, mode m1, DestA
287 clocks, mode m1, DestB
Source len=16
142 clocks, mode s1, DestA
116 clocks, mode s1, DestB
163 clocks, mode d1, DestA
165 clocks, mode d1, DestB
29 clocks, mode c1, DestA
29 clocks, mode c1, DestB
;---------------------------------------------------------------------
testbed2, run 1:
Intel(R) Pentium(R) 4 CPU 2.53GHz (SSE2)
Source len=4096
9562 clocks, mode s1, DestA
9159 clocks, mode s1, DestB
1611 clocks, mode s1, DestC
11937 clocks, mode d1, DestA
8450 clocks, mode d1, DestB
1630 clocks, mode d1, DestC
4456 clocks, mode c1, DestA
4189 clocks, mode c1, DestB
4612 clocks, mode c1, DestC
8942 clocks, mode m1, DestA
8879 clocks, mode m1, DestB
8764 clocks, mode m1, DestC
Source len=128
451 clocks, mode s1, DestA
380 clocks, mode s1, DestB
187 clocks, mode s1, DestC
484 clocks, mode d1, DestA
376 clocks, mode d1, DestB
234 clocks, mode d1, DestC
166 clocks, mode c1, DestA
162 clocks, mode c1, DestB
162 clocks, mode c1, DestC
289 clocks, mode m1, DestA
300 clocks, mode m1, DestB
343 clocks, mode m1, DestC
;---------------------------------------------------------------------
testbed2, run 2:
Intel(R) Pentium(R) 4 CPU 2.53GHz (SSE2)
Source len=4096
9685 clocks, mode s1, DestA
9141 clocks, mode s1, DestB
1626 clocks, mode s1, DestC
12009 clocks, mode d1, DestA
8356 clocks, mode d1, DestB
1628 clocks, mode d1, DestC
4397 clocks, mode c1, DestA
4264 clocks, mode c1, DestB
4261 clocks, mode c1, DestC
8532 clocks, mode m1, DestA
8588 clocks, mode m1, DestB
8601 clocks, mode m1, DestC
Source len=128
450 clocks, mode s1, DestA
378 clocks, mode s1, DestB
191 clocks, mode s1, DestC
484 clocks, mode d1, DestA
374 clocks, mode d1, DestB
212 clocks, mode d1, DestC
166 clocks, mode c1, DestA
165 clocks, mode c1, DestB
162 clocks, mode c1, DestC
297 clocks, mode m1, DestA
346 clocks, mode m1, DestB
298 clocks, mode m1, DestC
;---------------------------------------------------------------------
Quote from: rags on February 15, 2009, 01:03:49 PM
Ok JJ, I ran each one twice for you.
Intel(R) Pentium(R) 4 CPU 2.53GHz (SSE2)
Source len=4096
9562 clocks, mode s1, DestA
9159 clocks, mode s1, DestB
1611 clocks, mode s1, DestC
Thanxalot, that cheered me up again :bg
(DestC means source and destination are on 16-byte boundaries)
JJ,
Here are my results. A question, the reults from a previous test in another posting listed my CPU as Itanium;
This one says Celeron. How come?
Quote
Intel(R) Celeron(R) CPU 1.70GHz (SSE2)
Source len=4096
9505 clocks, mode s1, DestA
9131 clocks, mode s1, DestB
12294 clocks, mode d1, DestA
8123 clocks, mode d1, DestB
4521 clocks, mode c1, DestA
4260 clocks, mode c1, DestB
8515 clocks, mode m1, DestA
8667 clocks, mode m1, DestB
Source len=128
435 clocks, mode s1, DestA
394 clocks, mode s1, DestB
494 clocks, mode d1, DestA
411 clocks, mode d1, DestB
162 clocks, mode c1, DestA
159 clocks, mode c1, DestB
286 clocks, mode m1, DestA
482 clocks, mode m1, DestB
Source len=16
146 clocks, mode s1, DestA
121 clocks, mode s1, DestB
174 clocks, mode d1, DestA
189 clocks, mode d1, DestB
29 clocks, mode c1, DestA
25 clocks, mode c1, DestB
--- OK ---
Paul
Quote from: PBrennick on February 15, 2009, 01:40:55 PM
JJ,
Here are my results. A question, the reults from a previous test in another posting listed my CPU as Itanium;
This one says Celeron. How come?
Paul,
Thanks. The first version of the ShowCPU algo tried to identify processors by family and model. The newer one uses the brand string, which is more precise. Your Celeron behaves like a P4 - see rags post above, following which I posted a second version above as TestBed2.zip. The latter gives also timings for the ideal case where both source and destination are para-aligned.
You welcome JJ :U
Hi,
Not sure if you want some older CPU's, but here goes. PIII
with Windows 2000, and Pentium MMX with Windows 98. The
PIII/P3 had some control characters that I replaces with the
equivalent text in an editor.
Regards,
Steve N.
This is your CPU:
Model 4
Family 5
Step 3
Manufacturer GenuineIntel
Description Intel P1 (1993+), MMX
Brand name
Source len=4096
13082 clocks, mode s1, DestA
12991 clocks, mode s1, DestB
12963 clocks, mode s1, DestC
12952 clocks, mode d1, DestA
12958 clocks, mode d1, DestB
12954 clocks, mode d1, DestC
12920 clocks, mode c1, DestA
12915 clocks, mode c1, DestB
12911 clocks, mode c1, DestC
51220 clocks, mode m1, DestA
51229 clocks, mode m1, DestB
51230 clocks, mode m1, DestC
Source len=128
468 clocks, mode s1, DestA
468 clocks, mode s1, DestB
467 clocks, mode s1, DestC
462 clocks, mode d1, DestA
458 clocks, mode d1, DestB
461 clocks, mode d1, DestC
424 clocks, mode c1, DestA
421 clocks, mode c1, DestB
422 clocks, mode c1, DestC
1655 clocks, mode m1, DestA
1678 clocks, mode m1, DestB
1673 clocks, mode m1, DestC
--- OK ---
This is your CPU:
Model 8
Family 6
Step 3
Manufacturer GenuineIntel
Description Intel P3 (2000+), SSE1
Brand name ^A^A^B^C
^A^A^B^C (SSE1)
Source len=4096
5219 clocks, mode s1, DestA
5216 clocks, mode s1, DestB
5236 clocks, mode s1, DestC
5218 clocks, mode d1, DestA
5228 clocks, mode d1, DestB
5217 clocks, mode d1, DestC
6245 clocks, mode c1, DestA
6233 clocks, mode c1, DestB
6249 clocks, mode c1, DestC
12441 clocks, mode m1, DestA
12430 clocks, mode m1, DestB
12429 clocks, mode m1, DestC
Source len=128
212 clocks, mode s1, DestA
212 clocks, mode s1, DestB
212 clocks, mode s1, DestC
214 clocks, mode d1, DestA
213 clocks, mode d1, DestB
214 clocks, mode d1, DestC
223 clocks, mode c1, DestA
223 clocks, mode c1, DestB
223 clocks, mode c1, DestC
408 clocks, mode m1, DestA
409 clocks, mode m1, DestB
410 clocks, mode m1, DestC
--- OK ---
JJ,
Thanx for the explanation, my CPU is, indeed, a Celeron, 1.70Ghz. It actually clocks at 1.69Ghz, though. The difference between the Spec. and the actual is so slight I doubt it has any significant impact on any testing I may choose to do. Do my results look okay to you?
Paul
Hi,
two more results :
AMD Athlon(tm) 64 FX-57 Processor (SSE3)
Source len=4096
1934 clocks, mode s1, DestA
1941 clocks, mode s1, DestB
1358 clocks, mode s1, DestC
2103 clocks, mode d1, DestA
2191 clocks, mode d1, DestB
1357 clocks, mode d1, DestC
3807 clocks, mode c1, DestA
3806 clocks, mode c1, DestB
3801 clocks, mode c1, DestC
12648 clocks, mode m1, DestA
12631 clocks, mode m1, DestB
12433 clocks, mode m1, DestC
Source len=128
96 clocks, mode s1, DestA
91 clocks, mode s1, DestB
80 clocks, mode s1, DestC
104 clocks, mode d1, DestA
95 clocks, mode d1, DestB
81 clocks, mode d1, DestC
143 clocks, mode c1, DestA
143 clocks, mode c1, DestB
143 clocks, mode c1, DestC
407 clocks, mode m1, DestA
413 clocks, mode m1, DestB
407 clocks, mode m1, DestC
--- OK ---
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz (SSE4)
Source len=4096
3059 clocks, mode s1, DestA
3577 clocks, mode s1, DestB
1541 clocks, mode s1, DestC
2776 clocks, mode d1, DestA
2628 clocks, mode d1, DestB
1550 clocks, mode d1, DestC
3282 clocks, mode c1, DestA
3281 clocks, mode c1, DestB
3279 clocks, mode c1, DestC
8270 clocks, mode m1, DestA
8238 clocks, mode m1, DestB
8416 clocks, mode m1, DestC
Source len=128
123 clocks, mode s1, DestA
136 clocks, mode s1, DestB
92 clocks, mode s1, DestC
133 clocks, mode d1, DestA
131 clocks, mode d1, DestB
95 clocks, mode d1, DestC
113 clocks, mode c1, DestA
114 clocks, mode c1, DestB
113 clocks, mode c1, DestC
290 clocks, mode m1, DestA
293 clocks, mode m1, DestB
292 clocks, mode m1, DestC
--- OK ---
Regards
Ulli
Quote from: PBrennick on February 15, 2009, 03:42:44 PM
JJ,
Thanx for the explanation, my CPU is, indeed, a Celeron, 1.70Ghz. It actually clocks at 1.69Ghz, though. The difference between the Spec. and the actual is so slight I doubt it has any significant impact on any testing I may choose to do.
Probably not. Cycles shouldn't change anyway.
Quote
Do my results look okay to you?
Paul
They look almost identical to rags' P4. I suspect you would get the same dramatic factor 5 improvement for the DestC case (where source and destination are aligned 16).
For the curious: s1 and d1 are SSE2 algos, c1 stands for crt_strcpy, and m1 means Masm32 library szCopy ;-)
Quote from: FORTRANS on February 15, 2009, 02:33:42 PM
Hi,
Not sure if you want some older CPU's, but here goes.
Thanks, Steve. Looks fine. By the way: How did you convince the exe to display the long version of the CPU description? I thought I had coded the short version only ;-)
From the latest assemble,
AMD Athlon(tm) 64 X2 Dual Core Processor 4000+ (SSE3)
Source len=4096
1677 clocks, mode s1, DestA
1693 clocks, mode s1, DestB
1371 clocks, mode s1, DestC
2246 clocks, mode d1, DestA
1902 clocks, mode d1, DestB
1367 clocks, mode d1, DestC
3863 clocks, mode c1, DestA
3841 clocks, mode c1, DestB
3850 clocks, mode c1, DestC
12520 clocks, mode m1, DestA
12623 clocks, mode m1, DestB
12503 clocks, mode m1, DestC
Source len=128
89 clocks, mode s1, DestA
90 clocks, mode s1, DestB
85 clocks, mode s1, DestC
104 clocks, mode d1, DestA
101 clocks, mode d1, DestB
86 clocks, mode d1, DestC
149 clocks, mode c1, DestA
150 clocks, mode c1, DestB
149 clocks, mode c1, DestC
409 clocks, mode m1, DestA
416 clocks, mode m1, DestB
421 clocks, mode m1, DestC
--- OK ---
Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz (SSE4)
Source len=4096
2942 clocks, mode s1, DestA
3687 clocks, mode s1, DestB
1447 clocks, mode s1, DestC
2801 clocks, mode d1, DestA
2705 clocks, mode d1, DestB
1442 clocks, mode d1, DestC
3115 clocks, mode c1, DestA
3113 clocks, mode c1, DestB
3114 clocks, mode c1, DestC
4155 clocks, mode m1, DestA
4157 clocks, mode m1, DestB
4159 clocks, mode m1, DestC
Source len=128
122 clocks, mode s1, DestA
132 clocks, mode s1, DestB
85 clocks, mode s1, DestC
136 clocks, mode d1, DestA
132 clocks, mode d1, DestB
94 clocks, mode d1, DestC
134 clocks, mode c1, DestA
135 clocks, mode c1, DestB
134 clocks, mode c1, DestC
169 clocks, mode m1, DestA
175 clocks, mode m1, DestB
168 clocks, mode m1, DestC
--- OK ---
Mode m1 doesn't seem to like amd does it?
You seem to like numbers jj...later this week I'll be building a 'new' dev box (p3 1000) - even more numbers for you :bg
Quote from: sinsi on February 16, 2009, 04:52:39 AM
Mode m1 doesn't seem to like amd does it?
Indeed, Mark's figures look incredibly slow for the Masm32lib szCopy algo. Among the standard ones, crt_strcpy (c1) is clearly the best - I threw lstrcpy out because it was too bad in all tests.
For the more curious, what are DestA etc. ? There is a fair bit of difference in the numbers.
Quote from: sinsi on February 16, 2009, 07:37:23 AM
For the more curious, what are DestA etc. ? There is a fair bit of difference in the numbers.
Different degrees of misalignent against a 16-byte boundary. SSE2 can work with non-aligned data, but it gets slow - so the algo checks whether aligning is possible; if yes, it goes for movaps etc., if no, it has to decide whether to use movups for the source and movaps for the destination, or vice versa. The problem is some processors are faster with source alignment, others with destination alignment...
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
Data (mis-)alignment:
diff src-DestA: n*16+4
diff src-DestB: n*16+12
diff src-DestC: n*16+0
Source len=4096
2762 clocks, mode s1, DestA
3096 clocks, mode s1, DestB
1488 clocks, mode s1, DestC
2776 clocks, mode d1, DestA
2534 clocks, mode d1, DestB
1501 clocks, mode d1, DestC
5160 clocks, mode c1, DestA
5186 clocks, mode c1, DestB
5634 clocks, mode c1, DestC
8278 clocks, mode m1, DestA
8308 clocks, mode m1, DestB
8299 clocks, mode m1, DestC
And one more for the really curious. A P4 is a P4... :dazzled: ??
Quote from: rags on February 15, 2009, 01:03:49 PM
Intel(R) Pentium(R) 4 CPU 2.53GHz (SSE2)
Source len=4096
9562 clocks, mode s1, DestA
9159 clocks, mode s1, DestB
1611 clocks, mode s1, DestC
11937 clocks, mode d1, DestA
8450 clocks, mode d1, DestB
1630 clocks, mode d1, DestC
4456 clocks, mode c1, DestA = crt_strcpy
4189 clocks, mode c1, DestB
4612 clocks, mode c1, DestC
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
Source len=4096
8587 clocks, mode s1, DestA
9119 clocks, mode s1, DestB
3505 clocks, mode s1, DestC
3692 clocks, mode d1, DestA
4752 clocks, mode d1, DestB
2096 clocks, mode d1, DestC
9255 clocks, mode c1, DestA = crt_strcpy
8544 clocks, mode c1, DestB
5532 clocks, mode c1, DestC
JJ, I ran testbed2 again:
Intel(R) Pentium(R) 4 CPU 2.53GHz (SSE2)
Source len=4096
9628 clocks, mode s1, DestA
9091 clocks, mode s1, DestB
1617 clocks, mode s1, DestC
11898 clocks, mode d1, DestA
8331 clocks, mode d1, DestB
1619 clocks, mode d1, DestC
4535 clocks, mode c1, DestA
4485 clocks, mode c1, DestB
4156 clocks, mode c1, DestC
could different amounts of onboard cache or system ram account for the differences?
I'm not sure how much cache I have , I bought this p4 used from a friend.
I have 2gb system ram.
Quote from: rags on February 16, 2009, 11:59:45 AM
JJ, I ran testbed2 again:
...
could different amounts of onboard cache or system ram account for the differences?
I'm not sure how much cache I have , I bought this p4 used from a friend.
I have 2gb system ram.
Don't know what the exact reason is. It's interesting though that the "brand strings" for our processors are identical, while yours is an SSE2, and mine is SSE3. Note also that crt_strcpy runs a lot faster on your (older) processor - as if Microsoft had optimised this algo for early P4's ...
EDIT: It seems you have a Northwood, while I have a Prescott P4 (Wiki (http://en.wikipedia.org/wiki/Pentium_4)):
Northwood
... A 2.4 GHz P4 was released in April 2002, and the bus speed increased from 400 MT/s to 533 MT/s for a 2.26 GHz, 2.4 GHz, and
2.53 GHz part in May, 2.66 GHz and 2.8 GHz parts in August
Prescott
On February 1, 2004, Intel introduced a new core codenamed "Prescott". ... Some programs benefitted from Prescott's
doubled cache and SSE3 instructions, whereas others were more crippled by its long, inefficient pipeline.
So the lesson is:
Don't rely on the CPUID brand string...
Good digging JJ. :)
Athlon Thunderbird 1170 Mhz
---TestBed1---
AMD Athlon(tm) Processor
Source len=4096
5190 clocks, mode s1, DestA
5215 clocks, mode s1, DestB
5191 clocks, mode d1, DestA
5214 clocks, mode d1, DestB
5174 clocks, mode c1, DestA
5195 clocks, mode c1, DestB
16521 clocks, mode m1, DestA
16558 clocks, mode m1, DestB
Source len=128
200 clocks, mode s1, DestA
200 clocks, mode s1, DestB
201 clocks, mode d1, DestA
201 clocks, mode d1, DestB
184 clocks, mode c1, DestA
184 clocks, mode c1, DestB
535 clocks, mode m1, DestA
536 clocks, mode m1, DestB
Source len=16
43 clocks, mode s1, DestA
47 clocks, mode s1, DestB
45 clocks, mode d1, DestA
45 clocks, mode d1, DestB
30 clocks, mode c1, DestA
30 clocks, mode c1, DestB
--- OK ---
---TestBed2---
AMD Athlon(tm) Processor
Source len=4096
5193 clocks, mode s1, DestA
5210 clocks, mode s1, DestB
5201 clocks, mode s1, DestC
5209 clocks, mode d1, DestA
5190 clocks, mode d1, DestB
5213 clocks, mode d1, DestC
5175 clocks, mode c1, DestA
5197 clocks, mode c1, DestB
5175 clocks, mode c1, DestC
16557 clocks, mode m1, DestA
16543 clocks, mode m1, DestB
17011 clocks, mode m1, DestC
Source len=128
198 clocks, mode s1, DestA
198 clocks, mode s1, DestB
198 clocks, mode s1, DestC
201 clocks, mode d1, DestA
200 clocks, mode d1, DestB
200 clocks, mode d1, DestC
184 clocks, mode c1, DestA
184 clocks, mode c1, DestB
184 clocks, mode c1, DestC
541 clocks, mode m1, DestA
536 clocks, mode m1, DestB
542 clocks, mode m1, DestC
--- OK ---