First I called to ZeroMemory and the assembler said Undefined Symbol so i used RtlZeroMemory instead. It worked correctly but CopyMemory didn't though i used Rtl, the assembler always said "undefined symbol". I know that all path defined correctly, i still don't know how to use CopyMemory in MASM32!
Hi SamLe, have you tried RtlCopyMemory? It looks like the equates for "CopyMemory" in \masm32\include\Windows.inc have been disabled. (There must be a good reason for this.)
If you just need to copy some memory, there are many simple (and not-so-simple) algorithms that will work. Try a search at the top-left of the page.
Note that "CopyMemory" appears to be defined in the GoASM v2.0 headers. (I cannot test this at the moment.)
Hello,
RtlZeroMemory is in kernel32
ZeroMemory equ < RtlZeroMemory>
Hi SamLe,
Also, have a look at the Masm32 library function MemCopy :
MemCopy
MemCopy proc public uses esi edi Source:PTR BYTE,Dest:PTR BYTE,ln:DWORD
Note here that the source and destination parameters are DWORD size addresses of the respective source and destination buffers.
Parameters
1. Source Source buffer address.
2. Dest Destination buffer address.
3. ln The number of bytes to copy.
Return Value
There is no return value.
Comments
The destination buffer must be at least the same size as the source buffer otherwise the procedure will write past the end of the destination buffer and cause a write page fault.
Example
invoke MemCopy,ADDR MySource,ADDR MyDest,ln
MemCopy copies the byte count of memory from a source buffer to a destination buffer.
Does this mean anything?
http://www.theregister.co.uk/2009/05/15/microsoft_banishes_memcpy/
... or in GeneSys:
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
CopyMemory proc pDest:dword, pSrc:dword, dwCount:dword
mov edx,edi
mov ecx,[esp+3*4];dwCount
mov eax,esi
mov edi,[esp+1*4];pDest
shr ecx,2
mov esi,[esp+2*4];pSrc
rep movsd
mov ecx,[esp+3*4];dwCount
and ecx,3
jnz @F
mov esi,eax
mov edi,edx
ret 3*4
@@: rep movsb
mov esi,eax
mov edi,edx
ret 3*4
CopyMemory
OPTION PROLOGUE:PROLOGUEDEF
OPTION EPILOGUE:EPILOGUEDEF
Paul
CopyMemory is defined as follows in winbase.h:
#define CopyMemory RtlCopyMemory
So, just use RtlCopyMemory in MASM.
Thank everybody so much, i got it. :U
Another one in the GeneSys project :
\GeneSys\GeneSysLib\copymem.asm
MoveMemory, which appears in various dll's under various names, is a good substitute for CopyMemory. From Microsoft:
VOID MoveMemory (
PVOID Destination, // address of move destination
CONST VOID *Source, // address of block to move
DWORD Length // size, in bytes, of block to move
);
Parameters
Destination -- Points to the starting address of the destination of the move.
Source -- Points to the starting address of the block of memory to move.
Length -- Specifies the size, in bytes, of the block of memory to move.
This function has no return value.
The source and destination blocks may overlap.
-----
or you can roll your own:
movememory proc uses esi edi to:dword,from:dword,amt:dword
mov ecx,amt ;returns ecx=0 if from<to, else ecx=-1.
mov esi,from
mov edi,to
cmp esi,edi
jc short work_downwards
rep movsb
jmp short movememory_ret
work_downwards:
dec ecx
js short movememory_ret
mov al,[esi+ecx]
mov [edi+ecx],al
jmp short work_downwards
movememory_ret:
ret
movememory endp
Judging from what I see in the WinNT.h from the PSDK, on 32-bit systems you are actually calling the CRT memmove or memcpy functions. The attachment contains a cycle-count test. Typical results on a P3:
228 cycles, RtlMoveMemory
229 cycles, memmove
229 cycles, memcpy
1301 cycles, movememory
228 cycles, RtlMoveMemory
229 cycles, memmove
229 cycles, memcpy
1301 cycles, movememory
228 cycles, RtlMoveMemory
228 cycles, memmove
229 cycles, memcpy
1312 cycles, movememory
Celeron M:
149 cycles, RtlMoveMemory
143 cycles, memmove
143 cycles, memcpy
537 cycles, movememory
From the years of people playing with these, the API calls are low on the rank of useful, they work but do nothing that hand coded cannot do better from simple byte level copy to SSE2+ versions with the special case circuitry for REP MOVS? hitting somewhere in the middle.
AMD Athlon 64 X2 DualCore 4600+ (overclocked up to 2676.0 MHz)
90 cycles, RtlMoveMemory
104 cycles, memmove
100 cycles, memcpy
794 cycles, movememory
89 cycles, RtlMoveMemory
90 cycles, memmove
90 cycles, memcpy
784 cycles, movememory
89 cycles, RtlMoveMemory
90 cycles, memmove
90 cycles, memcpy
784 cycles, movememory
89 cycles, RtlMoveMemory
90 cycles, memmove
90 cycles, memcpy
784 cycles, movememory
Press any key to exit...
Sempron 3100+
Pentium IV 3.2 GHz :
317 cycles, RtlMoveMemory
305 cycles, memmove
305 cycles, memcpy
840 cycles, movememory
317 cycles, RtlMoveMemory
305 cycles, memmove
305 cycles, memcpy
837 cycles, movememory
317 cycles, RtlMoveMemory
305 cycles, memmove
305 cycles, memcpy
837 cycles, movememory
It's better to meake you own :P
CopyMemory proc uses esi edi ecx edx src:DWORD, dst:DWORD, NoOfBytesToBeMoved:BYTE
mov esi, src
mov edi, dst
mov ecx, NoOfBytesToBeMoved
.REPEAT
mov dl, byte ptr ds:[esi]
mov byte ptr ds:[edi], dl
inc edi
inc esi
.UNTILCXZ
ret
CopyMemory endp
Hi Germain,
Welcome on board. Your algo will work but it will be a lot slower than it should be. The REP MOVSD style of algo will be faster and it can be done at the byte level a lot faster. Using the UNTILCXZ is very slow and should be avoided.
Here is a faster byte level version.
bcopy proc src:DWORD,dst:DWORD,ln:DWORD
push esi
mov ecx, src
mov edx, dst
mov esi, ln
add ecx, esi
add edx, esi
neg esi
@@:
movzx eax, BYTE PTR [ecx+esi]
mov [edx+esi], al
add esi, 1
jnz @B
pop esi
ret
bcopy endp
Mixed DWORD/BYTE versions are faster again up to about 500 bytes. REP MOVSD/B is faster again over about 500 bytes. MMX versions are faster again and SSE version are faster still.
The MASM MemCopy is the shortest one. It use rep movsx. Dont know about the speed.
RtlMoveMemory is fastest? ::)
90 cycles, RtlMoveMemory
99 cycles, memmove
102 cycles, memcpy
784 cycles, movememory
1303 cycles, Germain_CopyMemory
780 cycles, bcopy
Can some post more fast code?
Quote from: hutch-- on January 18, 2010, 06:07:34 AM
Hi Germain,
Welcome on board. Your algo will work but it will be a lot slower than it should be. The REP MOVSD style of algo will be faster and it can be done at the byte level a lot faster. Using the UNTILCXZ is very slow and should be avoided.
Here is a faster byte level version.
bcopy proc src:DWORD,dst:DWORD,ln:DWORD
push esi
mov ecx, src
mov edx, dst
mov esi, ln
add ecx, esi
add edx, esi
neg esi
@@:
movzx eax, BYTE PTR [ecx+esi]
mov [edx+esi], al
add esi, 1
jnz @B
pop esi
ret
bcopy endp
Mixed DWORD/BYTE versions are faster again up to about 500 bytes. REP MOVSD/B is faster again over about 500 bytes. MMX versions are faster again and SSE version are faster still.
Thanks hutch :wink