News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Memory movement

Started by asmfan, June 14, 2006, 11:19:03 AM

Previous topic - Next topic

asmfan

I occasionally found this www.sciencemark.org/bench/membench/intro.html (already dead alas)
This is a bunch of procedures to study different ways of memory movement
Russia is a weird place

Mark Jones

Neat Asmfan! Thanks for sharing. :U

Hmm... the MMX_blk_pref_4KB example has an erronious "jnz fourkbcopyloop" jump. I'm looking into it, but haven't found the solution yet.
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

Mark Jones

Still haven't found the problem. Made a RadASM benchmark app from them though, it's incomplete and I'm going to be busy for the next few days so feel free to tinker with it. :wink

Quote from: AMD XP 2500+ Processor: AMD Athlon(tm) XP 2500+

repmovsd_copy:      196 cycles
alu_reg_copy:       172 cycles
MMX_copy:           111 cycles
MMX_blk_pref_4KB:   ERROR
MMX_blk_pref_16KB:  541 cycles
MMXw3DNowPref_copy: 125 cycles
MMXwSSEPref_copy:   125 cycles
SSE_copy:           160 cycles
SSE_blk_prf_4KB:    365 cycles
SSE_blk_prf_16KB:   214 cycles
SSEwPref_copy:      168 cycles
SSE2_copy:          59 cycles
SSE2wPref_copy:     65 cycles

Press ENTER to exit...


[attachment deleted by admin]
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

stanhebben

On an Amd Athlon 64 3200+:

Processor: AMD Athlon(tm) 64 Processor 3200+

repmovsd_copy:      150 cycles
alu_reg_copy:       170 cycles
MMX_copy:           100 cycles
MMX_blk_pref_4KB:   ERROR
MMX_blk_pref_16KB:  264 cycles
MMXw3DNowPref_copy: 105 cycles
MMXwSSEPref_copy:   105 cycles
SSE_copy:           220 cycles
SSE_blk_prf_4KB:    263 cycles
SSE_blk_prf_16KB:   263 cycles
SSEwPref_copy:      220 cycles
SSE2_copy:          219 cycles
SSE2wPref_copy:     219 cycles
Press ENTER to exit...


Major differences between the XP and 64...

[edit] And as for the error: on line 61, sixteenkbcopyloop must be fourkbcopyloop

Ossa

AMD 64 3400+ Mobile:

repmovsd_copy:      150 cycles
alu_reg_copy:       170 cycles
MMX_copy:           100 cycles
MMX_blk_pref_4KB:   ERROR
MMX_blk_pref_16KB:  265 cycles
MMXw3DNowPref_copy: 105 cycles
MMXwSSEPref_copy:   105 cycles
SSE_copy:           221 cycles
SSE_blk_prf_4KB:    266 cycles
SSE_blk_prf_16KB:   265 cycles
SSEwPref_copy:      221 cycles
SSE2_copy:          219 cycles
SSE2wPref_copy:     219 cycles


I also discovered that for the first run after inactivity, the MMX and SSE code runs slower (around 400 cycles in this case)... I think this is because the mobile processor shuts down the MMX and SSE ALUs for power conservation... so a warning... for benchmarking it might be a good idea to have an intensive MMX/SSE piece of code before the benchmark just to "spin-up" the processor to full speed.

Ossa
Website (very old): ossa.the-wot.co.uk

ramguru


Processor: Genuine Intel(R) CPU           T2500  @ 2.00GHz

repmovsd_copy:      576 cycles
alu_reg_copy:       372 cycles
MMX_copy:           192 cycles
MMX_blk_pref_4KB:   ERROR
MMX_blk_pref_16KB:  610 cycles
MMXw3DNowPref_copy: ERROR
;stops here

Hmm I think my processor supports all that SSE\SSE2\SSE3 stuff. Since I'm looking for MemCopy replacement I think I'll use alu_reg_copy

hutch--

I get the same effect on both of the dev PIVs I have handy, they stop after "MMXw3DNowPref_copy: ERROR".


ScienceMark 512-byte memory copy timings 2006 by MarkJ.
Processor:               Intel(R) Pentium(R) 4 CPU 2.80GHz

repmovsd_copy:      292 cycles
alu_reg_copy:       280 cycles
MMX_copy:           184 cycles
MMX_blk_pref_4KB:   ERROR
MMX_blk_pref_16KB:  591 cycles
MMXw3DNowPref_copy: ERROR



ScienceMark 512-byte memory copy timings 2006 by MarkJ.
Processor:               Intel(R) Pentium(R) 4 CPU 3.20GHz

repmovsd_copy:      336 cycles
alu_reg_copy:       267 cycles
MMX_copy:           192 cycles
MMX_blk_pref_4KB:   ERROR
MMX_blk_pref_16KB:  637 cycles
MMXw3DNowPref_copy: ERROR
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

hutch--

Here is a test piece that uses an unrolled loop reading 16 bytes each iteration timed against REP MOVSD. The purpose of the test pice is to demonstrate threshold differences depending on the byte length to copy. REP MOVSD is known to be slow under 64 bytes but it does not start to perform until the count is over 1024 bytes. The test piece tests from 64 bytes to 16384 bytes and the unrolled version is faster across all sizes but the gap closes as the block to copy gets larger.

It is possible that the built in REP MOVSD will end up being faster on large blocks but this becomes the realm of SSE2 or later that allows non cached writes which improves the speed due to less cache pollution.

Timings on my older PIV are,


REP MOVSD timings

516 32 bytes
578 64 bytes
781 128 bytes
1672 256 bytes
2375 512 bytes
3734 1024 bytes
12172 4096 bytes
4969 16384 bytes

reg copy timings

172 32 bytes
250 64 bytes
375 128 bytes
656 256 bytes
1328 512 bytes
2516 1024 bytes
9578 4096 bytes
4109 16384 bytes
Press any key to continue ...


On the later one,


REP MOVSD timings

297 32 bytes
375 64 bytes
469 128 bytes
1359 256 bytes
1766 512 bytes
2546 1024 bytes
7360 4096 bytes
3750 16384 bytes

reg copy timings

125 32 bytes
172 64 bytes
265 128 bytes
438 256 bytes
984 512 bytes
1750 1024 bytes
6313 4096 bytes
3406 16384 bytes
Press any key to continue ...

[attachment deleted by admin]
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

ramguru

Say, I want to resize executable file. To increase/decrease size of file I use CreateFileMapping... Now what is the best (or existing) algo for moving data that won't mess everything, like in this situation
[memcopyXXL, memaddr, memaddr+128, moveSize] OR
[memcopyXXL, memaddr, memaddr-128, moveSize]