News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

memory copy test piece.

Started by hutch--, January 02, 2007, 12:20:46 PM

Previous topic - Next topic

hutch--

I was interested to see how the old REP MOVSD compared with a number of newer techniques using MMX and SSE and the results are interesting. With the exception of an MMX memory copy the old REP MOVSD is still faster than the rest on both PIV machines I have handy. The interesting part is that REP MOVSD compares well with the later code versions that use the non cached writes to avoid cache pollution.

I would be interested to see if anyone has either late model AMD 64 bit boxes or a core 2 Duo to try this test piece out with as I don't have access to machines of this type yet.

Note that the algorithms are not complete as they have no tail capacity, the couple I had available that did have tail code have had those sections commented out.

[attachment deleted by admin]
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

ramguru

Intel Core Duo T2500 (on my laptop)

1047 1 MS duration sseCopyO
562 1 MS duration regcopy
1172 1 MS duration ntwCopyD
422 1 MS duration ntwCopyQ
609 1 MS duration MemCopyD
1016 1 MS duration sseCopyO
547 1 MS duration regcopy
1187 1 MS duration ntwCopyD
422 1 MS duration ntwCopyQ
594 1 MS duration MemCopyD
1031 1 MS duration sseCopyO
547 1 MS duration regcopy
1188 1 MS duration ntwCopyD
421 1 MS duration ntwCopyQ
594 1 MS duration MemCopyD
1031 1 MS duration sseCopyO
547 1 MS duration regcopy
1188 1 MS duration ntwCopyD
422 1 MS duration ntwCopyQ
593 1 MS duration MemCopyD
1032 1 MS duration sseCopyO
547 1 MS duration regcopy
1171 1 MS duration ntwCopyD
438 1 MS duration ntwCopyQ
594 1 MS duration MemCopyD
1031 1 MS duration sseCopyO
547 1 MS duration regcopy
1187 1 MS duration ntwCopyD
407 1 MS duration ntwCopyQ
594 1 MS duration MemCopyD
1016 1 MS duration sseCopyO
546 1 MS duration regcopy
1188 1 MS duration ntwCopyD
422 1 MS duration ntwCopyQ
594 1 MS duration MemCopyD
1016 1 MS duration sseCopyO
562 1 MS duration regcopy
1172 1 MS duration ntwCopyD
438 1 MS duration ntwCopyQ
593 1 MS duration MemCopyD
00910030 esi
01120030 edi
1016 2 MS duration sseCopyO
563 2 MS duration regcopy
1187 2 MS duration ntwCopyD
391 2 MS duration ntwCopyQ
593 2 MS duration MemCopyD
1016 2 MS duration sseCopyO
563 2 MS duration regcopy
1187 2 MS duration ntwCopyD
391 2 MS duration ntwCopyQ
625 2 MS duration MemCopyD
1015 2 MS duration sseCopyO
578 2 MS duration regcopy
1172 2 MS duration ntwCopyD
391 2 MS duration ntwCopyQ
625 2 MS duration MemCopyD
1015 2 MS duration sseCopyO
563 2 MS duration regcopy
1172 2 MS duration ntwCopyD
391 2 MS duration ntwCopyQ
625 2 MS duration MemCopyD
1015 2 MS duration sseCopyO
579 2 MS duration regcopy
1187 2 MS duration ntwCopyD
406 2 MS duration ntwCopyQ
610 2 MS duration MemCopyD
1015 2 MS duration sseCopyO
563 2 MS duration regcopy
1203 2 MS duration ntwCopyD
406 2 MS duration ntwCopyQ
641 2 MS duration MemCopyD
1000 2 MS duration sseCopyO
579 2 MS duration regcopy
1187 2 MS duration ntwCopyD
406 2 MS duration ntwCopyQ
641 2 MS duration MemCopyD
1016 2 MS duration sseCopyO
547 2 MS duration regcopy
1203 2 MS duration ntwCopyD
422 2 MS duration ntwCopyQ
625 2 MS duration MemCopyD
Press any key to continue ...

UlliN

AMD Athlon 64 FX57, 2,8GHz

938 1 MS duration sseCopyO
375 1 MS duration regcopy
609 1 MS duration ntwCopyD
406 1 MS duration ntwCopyQ
313 1 MS duration MemCopyD
937 1 MS duration sseCopyO
375 1 MS duration regcopy
594 1 MS duration ntwCopyD
391 1 MS duration ntwCopyQ
312 1 MS duration MemCopyD
938 1 MS duration sseCopyO
375 1 MS duration regcopy
578 1 MS duration ntwCopyD
406 1 MS duration ntwCopyQ
297 1 MS duration MemCopyD
953 1 MS duration sseCopyO
375 1 MS duration regcopy
594 1 MS duration ntwCopyD
391 1 MS duration ntwCopyQ
312 1 MS duration MemCopyD
953 1 MS duration sseCopyO
360 1 MS duration regcopy
593 1 MS duration ntwCopyD
407 1 MS duration ntwCopyQ
296 1 MS duration MemCopyD
938 1 MS duration sseCopyO
375 1 MS duration regcopy
594 1 MS duration ntwCopyD
406 1 MS duration ntwCopyQ
312 1 MS duration MemCopyD
938 1 MS duration sseCopyO
391 1 MS duration regcopy
609 1 MS duration ntwCopyD
406 1 MS duration ntwCopyQ
328 1 MS duration MemCopyD
969 1 MS duration sseCopyO
391 1 MS duration regcopy
593 1 MS duration ntwCopyD
407 1 MS duration ntwCopyQ
312 1 MS duration MemCopyD
00910030 esi
01120030 edi
1047 2 MS duration sseCopyO
391 2 MS duration regcopy
593 2 MS duration ntwCopyD
422 2 MS duration ntwCopyQ
313 2 MS duration MemCopyD
1015 2 MS duration sseCopyO
375 2 MS duration regcopy
594 2 MS duration ntwCopyD
422 2 MS duration ntwCopyQ
297 2 MS duration MemCopyD
1031 2 MS duration sseCopyO
375 2 MS duration regcopy
594 2 MS duration ntwCopyD
422 2 MS duration ntwCopyQ
297 2 MS duration MemCopyD
1015 2 MS duration sseCopyO
391 2 MS duration regcopy
594 2 MS duration ntwCopyD
422 2 MS duration ntwCopyQ
312 2 MS duration MemCopyD
1047 2 MS duration sseCopyO
375 2 MS duration regcopy
594 2 MS duration ntwCopyD
422 2 MS duration ntwCopyQ
296 2 MS duration MemCopyD
1016 2 MS duration sseCopyO
375 2 MS duration regcopy
594 2 MS duration ntwCopyD
422 2 MS duration ntwCopyQ
312 2 MS duration MemCopyD
1016 2 MS duration sseCopyO
375 2 MS duration regcopy
594 2 MS duration ntwCopyD
421 2 MS duration ntwCopyQ
313 2 MS duration MemCopyD
1094 2 MS duration sseCopyO
375 2 MS duration regcopy
593 2 MS duration ntwCopyD
422 2 MS duration ntwCopyQ
297 2 MS duration MemCopyD
Press any key to continue ...

hutch--

Thanks guys, most appreciated. Looks like te MMX is faster in a core 2 duo while REP MOVSD is faster on the Athlon 64.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Seb


890 1 MS duration sseCopyO
500 1 MS duration regcopy
813 1 MS duration ntwCopyD
547 1 MS duration ntwCopyQ
406 1 MS duration MemCopyD
859 1 MS duration sseCopyO
516 1 MS duration regcopy
797 1 MS duration ntwCopyD
531 1 MS duration ntwCopyQ
422 1 MS duration MemCopyD
859 1 MS duration sseCopyO
500 1 MS duration regcopy
797 1 MS duration ntwCopyD
532 1 MS duration ntwCopyQ
421 1 MS duration MemCopyD
860 1 MS duration sseCopyO
500 1 MS duration regcopy
797 1 MS duration ntwCopyD
547 1 MS duration ntwCopyQ
406 1 MS duration MemCopyD
859 1 MS duration sseCopyO
516 1 MS duration regcopy
797 1 MS duration ntwCopyD
547 1 MS duration ntwCopyQ
406 1 MS duration MemCopyD
859 1 MS duration sseCopyO
516 1 MS duration regcopy
797 1 MS duration ntwCopyD
531 1 MS duration ntwCopyQ
406 1 MS duration MemCopyD
875 1 MS duration sseCopyO
500 1 MS duration regcopy
813 1 MS duration ntwCopyD
531 1 MS duration ntwCopyQ
422 1 MS duration MemCopyD
875 1 MS duration sseCopyO
500 1 MS duration regcopy
812 1 MS duration ntwCopyD
547 1 MS duration ntwCopyQ
406 1 MS duration MemCopyD
00910030 esi
01120030 edi
860 2 MS duration sseCopyO
515 2 MS duration regcopy
797 2 MS duration ntwCopyD
563 2 MS duration ntwCopyQ
422 2 MS duration MemCopyD
859 2 MS duration sseCopyO
516 2 MS duration regcopy
797 2 MS duration ntwCopyD
578 2 MS duration ntwCopyQ
406 2 MS duration MemCopyD
859 2 MS duration sseCopyO
516 2 MS duration regcopy
797 2 MS duration ntwCopyD
562 2 MS duration ntwCopyQ
422 2 MS duration MemCopyD
875 2 MS duration sseCopyO
500 2 MS duration regcopy
813 2 MS duration ntwCopyD
562 2 MS duration ntwCopyQ
406 2 MS duration MemCopyD
875 2 MS duration sseCopyO
516 2 MS duration regcopy
797 2 MS duration ntwCopyD
578 2 MS duration ntwCopyQ
406 2 MS duration MemCopyD
860 2 MS duration sseCopyO
515 2 MS duration regcopy
797 2 MS duration ntwCopyD
563 2 MS duration ntwCopyQ
406 2 MS duration MemCopyD
875 2 MS duration sseCopyO
500 2 MS duration regcopy
797 2 MS duration ntwCopyD
578 2 MS duration ntwCopyQ
406 2 MS duration MemCopyD
860 2 MS duration sseCopyO
515 2 MS duration regcopy
797 2 MS duration ntwCopyD
563 2 MS duration ntwCopyQ
421 2 MS duration MemCopyD
Press any key to continue ...


AMD Athlon 64 X2 Dual Core 4400+.

j_groothu

nice comparison!, The MOVSD version wipes the floor with the other versions on my p4, as i guess it would have with yours Hutch--. that's one heavily optimised instruction. :eek

Jason

hutch--

jason,

What vintage is you PIV, I have two handy, a Northwood core 2.8 gig and a Prescott core 3.2 gig and the MMX version is faster on both than REP MOVSD but not by much.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

j_groothu

It is an early Northwood 2 GigaHertz, CPU-Z says revision 'B0',  MMX looks much poorer ( bad revision or broken chip maybe?, I did get it for free  :P )

Jason


4109 1 MS duration sseCopyO
2141 1 MS duration regcopy
6578 1 MS duration ntwCopyD
8484 1 MS duration ntwCopyQ
766 1 MS duration MemCopyD
4078 1 MS duration sseCopyO
2141 1 MS duration regcopy
6593 1 MS duration ntwCopyD
8891 1 MS duration ntwCopyQ
766 1 MS duration MemCopyD
4078 1 MS duration sseCopyO
2140 1 MS duration regcopy
6797 1 MS duration ntwCopyD
8328 1 MS duration ntwCopyQ
782 1 MS duration MemCopyD
4078 1 MS duration sseCopyO
2172 1 MS duration regcopy
6718 1 MS duration ntwCopyD
8594 1 MS duration ntwCopyQ
781 1 MS duration MemCopyD
4250 1 MS duration sseCopyO
2172 1 MS duration regcopy
6766 1 MS duration ntwCopyD
8312 1 MS duration ntwCopyQ
766 1 MS duration MemCopyD
4078 1 MS duration sseCopyO
2125 1 MS duration regcopy
6703 1 MS duration ntwCopyD
8828 1 MS duration ntwCopyQ
797 1 MS duration MemCopyD
4250 1 MS duration sseCopyO
2141 1 MS duration regcopy
6718 1 MS duration ntwCopyD
8297 1 MS duration ntwCopyQ
766 1 MS duration MemCopyD
4094 1 MS duration sseCopyO
2250 1 MS duration regcopy
6562 1 MS duration ntwCopyD
8500 1 MS duration ntwCopyQ
781 1 MS duration MemCopyD
00910030 esi
01120030 edi
5735 2 MS duration sseCopyO
2000 2 MS duration regcopy
6578 2 MS duration ntwCopyD
10172 2 MS duration ntwCopyQ
828 2 MS duration MemCopyD
5703 2 MS duration sseCopyO
2000 2 MS duration regcopy
6641 2 MS duration ntwCopyD
10968 2 MS duration ntwCopyQ
766 2 MS duration MemCopyD
5672 2 MS duration sseCopyO
2015 2 MS duration regcopy
6579 2 MS duration ntwCopyD
10250 2 MS duration ntwCopyQ
750 2 MS duration MemCopyD
5671 2 MS duration sseCopyO
2032 2 MS duration regcopy
6781 2 MS duration ntwCopyD
10703 2 MS duration ntwCopyQ
766 2 MS duration MemCopyD
5656 2 MS duration sseCopyO
2141 2 MS duration regcopy
6578 2 MS duration ntwCopyD
10156 2 MS duration ntwCopyQ
750 2 MS duration MemCopyD
5797 2 MS duration sseCopyO
2015 2 MS duration regcopy
6641 2 MS duration ntwCopyD
11047 2 MS duration ntwCopyQ
766 2 MS duration MemCopyD
5687 2 MS duration sseCopyO
2031 2 MS duration regcopy
6578 2 MS duration ntwCopyD
10219 2 MS duration ntwCopyQ
781 2 MS duration MemCopyD
5657 2 MS duration sseCopyO
2015 2 MS duration regcopy
6594 2 MS duration ntwCopyD
10656 2 MS duration ntwCopyQ
766 2 MS duration MemCopyD
Press any key to continue ...

hutch--

jason,

Thanks for running the test, interesting result. I own a 1.5 gig PIV but it now runs Linux as my Unix server so i cannot test on it. Unusual difference in the MMX performance but it may be a bit early for some of the SSE instructions used as the MMX copy version uses a non-temporal write. The REP MOVSD version is clearly faster by a long way but then its been around with special circuitry for a long time as well.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Human

i must say 2 things, why that test is odd and you repeat everything one after another 8 times, isnt it better to calculate average value? 2nd. thing 1st test will be always useless due windows never allocates memory instantly, only on first pagefault, so i always skip first test, due its more like initialization sequence and 1st test is a lot slower then 2nd,3rd etc. and at end its best to set task to real time priority so results are more accurate, due our task takes max time

hutch--

Everyone has a theory, that one does the job as the diferences are simply large enough. The repeat by 8 solves any problem of first run being slower as you can see the rest of the runs.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php