C++ memcpy function...
Example:
include \masm32\include\masm32rt.inc
include _memcpy.asm
Main PROTO
.data
szSrc db 20h dup(1)
szDst db 20h dup(0)
.code
Start:
Invoke Main
Invoke ExitProcess,0
Main proc
push 10
push offset szSrc
push offset szDst
call _memcpy
ret
Main endp
end Start
:bg
mov esi,offset szSrc
mov edi,offset szDst
mov ecx,sizeof szSrc
rep movsb
sure! :bg
but the code I posted might be useful in some especial situations. as it was for me today :wink
you can use c++ functions.................or you can just roll your own
.code
MemCopy PROC USES esi edi, Source:DWORD, Dest:DWORD, ln:DWORD
cld
mov esi, Source
mov edi, Dest
mov ecx, ln
shr ecx, 2
rep movsd
mov ecx, ln
and ecx, 3
rep movsb
ret
MemCopy ENDP
END
About using C++ functions you are right, but as I said it's not just matter of copying memory! I had to use exactly this code for a special case and I just shared it here maybe someday, someone find it useful. that's all ;)
if speed is need, the CRT (dynamical linked) is probably the best choice. Using this, you application will also take speed gain of future instruction sets.
I have a reasonably simple view on memory copy, if its a terrabyte it matter, if its a megabyte it doesn't. If you have the registers to spare a simple REP MOVSB does the job in most instances.
mov esi, src
mov edi, dst
mov ecx, cnt
rep movsb
that's right
the real problem with trying to speed it up by using REP MOVSD is not dealing with the mod 3 count at the end
it is the fact that strings are not usually both dword aligned - that negates the speed advantage
you have a 1 in 16 chance of going really fast :bg
Quote from: dedndave on May 10, 2011, 03:36:09 AM
that's right
the real problem with trying to speed it up by using REP MOVSD is not dealing with the mod 3 count at the end
it is the fact that strings are not usually both dword aligned - that negates the speed advantage
you have a 1 in 16 chance of going really fast :bg
Dave,
In real life big buffers are 8- or 16-bit aligned - and then rep movsd is almost unbeatable. Remember the Code location sensitivity of timings (http://www.masm32.com/board/index.php?topic=11454.msg87608#msg87608) thread...? Look for MemCo
1.
It looks exactly like if you have disassembled ms crt library memcpy function with IDA.
And invoked it as stdcall while the function is clearly cdecl.
Yes, that's what I did :p