Hi, anyone know what library CopyMemory is in? It's not found searching the v9.00 release of \masm32\include. :wink
Here's a general-purpose replacement. I was curious how the two fared in execution speed.
CopyMem PROC USES ESI EDI dst:DWORD,src:DWORD,leng:DWORD
mov esi,src
mov edi,dst
mov ecx,leng
cld ; clear direction to copy forwards
@@: ; copy DWORDs
cmp ecx,4 ; until less than 4 bytes remain
jl @F
sub ecx,4
movsd ; copy DWORD & increment pointers
jmp @B
@@: ; then copy any remaining bytes
cmp ecx,0
je @F
sub ecx,1
movsb
jmp @B
@@:
ret
CopyMem endp
It is found in the PSDK
WinBase.h
#define CopyMemory RtlCopyMemory
winNT.h
#define RtlCopyMemory(Destination,Source,Length)
Paul
Aaah, thanks Paul. Can't seem to find RtlCopyMemory either, lol. :bg Here's another routine, untested, probably faster than the first:
CopyMem2 PROC USES ESI EDI dst:DWORD,src:DWORD,leng:DWORD
nop
mov esi,src
mov edi,dst
mov eax,leng
xor edx,edx ; divide length into dwords
mov ecx,4
div ecx
mov ecx,eax
cld ; clear direction to copy forwards
rep movsd ; copy DWORDs & increment pointers until ecx=0
@@: ; then copy any remaining bytes
test cl,cl
je @F
dec ecx
movsb
jmp @B
@@:
ret
CopyMem2 endp
EDIT: Oh yeah this won't work for sizes not equal to dwords. Well maybe I'll plug away at it tomorrow. :bg
I saw CopyMem in the PSDK, I think. Seems the includes could include more!
Paul
Mark,
Some of the direct memory functions can be found in ntoskrnl.exe but they cannot be considered safe across Windows versions from that DLL. The function is declared in winbase.h but I don't currently have a reference to which DLL or library they are in. There is a library in the server 2003 sdk set for ntoskrnl.
Copy Memory /a.k.a RtlCopyMemory/ is a inline c++ function,located in wdm.lib. You can use RtlMoveMemory too, it does the same job,i always use that with no problems at all. The Source operand is never changed in my system, using RtlMoveMemory, anyway, you can use wdm.lib form poasm package.
I almost forgot something. CopyMemory,RtlCopyMemory and memcpy are one and the SAME function, a c-runtime one :)
So, even if is not located in wdm.lib from poasm package (i was use it once from wdm.lib ,but from VC6 as i remember) This is not an API function:
from winbase.h:
#define CopyMemory RtlCopyMemory
#define RtlCopyMemory(Destination,Source,Length) memcpy((Destination),(Source),(Length))
Litle tricky,is n't ? :bg
Quote from: Mark Jones on March 23, 2006, 11:20:00 PM
mov eax,leng
xor edx,edx ; divide length into dwords
mov ecx,4
div ecx
mov ecx,eax
??? is it a joke ? coz shr eax,2 should do the job
here, a clever code to copy memory, it's for people interrested by non intel dependant code (rep, movsd, movsb, etc...)... it's quite fast
ALIGN 16
memcopy PROC _dest_:DWORD,_src_:DWORD,_size_:DWORD
push ecx
push edx
push esi
push edi
mov eax,_size_
mov esi,_src_
mov edi,_dest_
mov ecx,eax
and ecx,11111111111111111111111111110000b
jz Label1
add esi,ecx
add edi,ecx
neg ecx
Label0: mov edx,DWORD PTR[esi+ecx]
mov DWORD PTR[edi+ecx],edx
mov edx,DWORD PTR[esi+ecx+4]
mov DWORD PTR[edi+ecx+4],edx
mov edx,DWORD PTR[esi+ecx+8]
mov DWORD PTR[edi+ecx+8],edx
mov edx,DWORD PTR[esi+ecx+12]
mov DWORD PTR[edi+ecx+12],edx
add ecx,16
jnz Label0
Label1: mov ecx,eax
and ecx,00000000000000000000000000001100b
jz Label3
add esi,ecx
add edi,ecx
neg ecx
Label2: mov edx,DWORD PTR [esi+ecx]
mov DWORD PTR [edi+ecx],edx
add ecx,4
jnz Label2
Label3: mov ecx,eax
and ecx,00000000000000000000000000000011b
jz Label5
add esi,ecx
add edi,ecx
neg ecx
Label4: mov dl,BYTE PTR [esi+ecx]
mov BYTE PTR [edi+ecx],dl
inc ecx
jnz Label4
Label5:
pop edi
pop esi
pop edx
pop ecx
ret
memcopy ENDP
anyway, even if your not interested by non intel dependant code, it's always interesting to see other approch
Quote from: NightWare on March 25, 2006, 02:48:25 AM
Quote from: Mark Jones on March 23, 2006, 11:20:00 PM
mov eax,leng
xor edx,edx ; divide length into dwords
mov ecx,4
div ecx
mov ecx,eax
??? is it a joke ? coz shr eax,2 should do the job
No it was not a joke. It was untested code typed into this forum in a hurry. The interesting part is how the timings do not change:
db 32-(($-a) AND 31) dup (0CCh) ; ALIGN 32
CopyMem3 PROC USES ESI EDI dst:DWORD,src:DWORD,leng:DWORD
mov esi,src
mov edi,dst
mov ecx,leng
shr ecx,2 ; divide length into dwords
cld ; clear direction to copy forwards
rep movsd ; copy DWORD & increment pointers until ecx=0
@@: ; then copy any remaining bytes
cmp ecx,0
je @F
sub ecx,1
movsb
jmp @B
@@:
ret
CopyMem3 endp
Quote
anyway, even if your not interested by non intel dependant code, it's always interesting to see other approch
Yes, in fact that's why we have a discussion forum here. :bg
Your routine is quite fast. Here's the results from all three, clocked on 32/16/8/4/3/2/1-byte read offsets. All tests pass on 64-byte memory lengths.
Quote from: AMD XP 2500+ / XP SP2
CopyMem1: 113, 113, 113, 113, 113, 113, 113 (esi,edi) movsd
CopyMem2: 89, 89, 89, 89, 99, 98, 99 (esi,edi) rep movs
CopyMem3: 89, 89, 89, 89, 99, 98, 99 (esi,edi) rep movs
CopyMemNW1: 38, 38, 38, 38, 45, 43, 45 (esi,edi) mov dword
Press enter to exit...
I was able to tweak your routine a little to get even better performance on the AMD. Doesn't preserve the other GPR's though.
; by NightWare for non-intel-dependent code
db 32-(($-a) AND 31) dup (0CCh) ; ALIGN 32
CopyMemNW1 PROC dst:DWORD,src:DWORD, siz:DWORD
mov eax,siz
mov esi,src
mov edi,dst
mov ecx,eax
and ecx,11111111111111111111111111110000b
jz checkDWord
add esi,ecx
add edi,ecx
neg ecx
@@:
mov edx,DWORD PTR[esi+ecx]
mov DWORD PTR[edi+ecx],edx
mov edx,DWORD PTR[esi+ecx+4]
mov DWORD PTR[edi+ecx+4],edx
mov edx,DWORD PTR[esi+ecx+8]
mov DWORD PTR[edi+ecx+8],edx
mov edx,DWORD PTR[esi+ecx+12]
mov DWORD PTR[edi+ecx+12],edx
add ecx,16
jnz @B
checkDWord:
mov ecx,eax
and ecx,00000000000000000000000000001100b
jz checkByte
add esi,ecx
add edi,ecx
neg ecx
@@:
mov edx,DWORD PTR [esi+ecx]
mov DWORD PTR [edi+ecx],edx
add ecx,4
jnz @B
checkByte:
mov ecx,eax
and ecx,00000000000000000000000000000011b
jz done
add esi,ecx
add edi,ecx
neg ecx
@@:
mov dl,BYTE PTR [esi+ecx]
mov BYTE PTR [edi+ecx],dl
inc ecx
jnz @B
done:
sub esi,eax
sub edi,eax
ret
CopyMemNW1 ENDP
Quote from: AMD XP 2500+ / XP SP2
CopyMem1: 113, 113, 113, 113, 113, 113, 113 (esi,edi) movsd
CopyMem2: 89, 89, 89, 89, 99, 98, 99 (esi,edi) rep movs
CopyMem3: 89, 89, 89, 89, 99, 98, 99 (esi,edi) rep movs
CopyMemNW1: 34, 34, 34, 34, 42, 38, 42 (esi,edi) mov dword
Press enter to exit...
EDIT: Corrected bug in branching. Whoops! :bg
Ok, i was thought alot before posting this , but i dont see a reason not to do it, since i didn't see anything about it in the license.
Mark, this is the original CopyMemory function, str8 from microsoft's vs7.I only had change the name /memcpy/ This is the function that you had looking for at the begining of this.
I didn't have the time for timing, but you can do it if you like, it will be interesting for me to see the results.
[attachment deleted by admin]
hi all,
by using the technic i've posted previously it's possible to produce lot of memory algo
ZeroMem (mov), MemFill (mov), MemXchg (mov*2), MemFilter (and/xor), MemFusion (or), MemAdd (add), etc...
it's quite easy to adapt it, so i'm not going to post all those algos (i don't want to do all the job for you... and maybe someone will be able to see other possibilities i haven't saw)
shaka as posted a copymem algo (badly named coz it's a memmove/rtlmemmove like algo, it take care about the possible overwrite)
so if you want to make speed test you need something that do exactly the same job, that's why i post my MemMove variant algo (in fact, i've never tested it, so don't blame me if it doesn't work correctly), it's just a bit more complicate than the code i've posted previously...
ALIGN 16
MemMove PROC _dest_:DWORD,_src_:DWORD,_size_:DWORD
push ecx
push edx
push esi
push edi
mov eax,_size_
mov esi,_src_
mov edi,_dest_
cmp esi,edi
jb Label07
Label00: mov ecx,eax
and ecx,11111111111111111111111111110000b
jz Label02
add esi,ecx
add edi,ecx
neg ecx
Label01: mov edx,DWORD PTR[esi+ecx]
mov DWORD PTR[edi+ecx],edx
mov edx,DWORD PTR[esi+ecx+4]
mov DWORD PTR[edi+ecx+4],edx
mov edx,DWORD PTR[esi+ecx+8]
mov DWORD PTR[edi+ecx+8],edx
mov edx,DWORD PTR[esi+ecx+12]
mov DWORD PTR[edi+ecx+12],edx
add ecx,16
jnz Label01
Label02: mov ecx,eax
and ecx,00000000000000000000000000001100b
jz Label04
add esi,ecx
add edi,ecx
neg ecx
Label03: mov edx,DWORD PTR [esi+ecx]
mov DWORD PTR [edi+ecx],edx
add ecx,4
jnz Label03
Label04: mov ecx,eax
and ecx,00000000000000000000000000000011b
jz Label06
add esi,ecx
add edi,ecx
neg ecx
Label05: mov dl,BYTE PTR [esi+ecx]
mov BYTE PTR [edi+ecx],dl
inc ecx
jnz Label05
Label06:
pop edi
pop esi
pop edx
pop ecx
ret
Label07: mov ecx,edi
sub ecx,esi
cmp eax,ecx
jbe Label00
add esi,eax
add edi,eax
mov ecx,eax
and ecx,00000000000000000000000000000011b
jz Label09
sub esi,ecx
sub edi,ecx
Label08: dec ecx
mov dl,BYTE PTR [esi+ecx]
mov BYTE PTR [edi+ecx],dl
jnz Label08
Label09: mov ecx,eax
and ecx,00000000000000000000000000001100b
jz Label11
sub esi,ecx
sub edi,ecx
Label10: add ecx,-4
mov edx,DWORD PTR [esi+ecx]
mov DWORD PTR [edi+ecx],edx
jnz Label10
Label11: mov ecx,eax
and ecx,11111111111111111111111111110000b
jz Label13
sub esi,ecx
sub edi,ecx
Label12: add ecx,-16
mov edx,DWORD PTR[esi+ecx+12]
mov DWORD PTR[edi+ecx+12],edx
mov edx,DWORD PTR[esi+ecx+8]
mov DWORD PTR[edi+ecx+8],edx
mov edx,DWORD PTR[esi+ecx+4]
mov DWORD PTR[edi+ecx+4],edx
mov edx,DWORD PTR[esi+ecx]
mov DWORD PTR[edi+ecx],edx
jnz Label12
Label13:
pop edi
pop esi
pop edx
pop ecx
ret
MemMove ENDP
mark, if you want to remove useless PUSHs and POPs, in the algo you've changed you can add :
sub esi,eax
sub edi,eax
at the end of the code and remove USES ESI,EDI... it's just a bit faster...
It is not badly named, i just cut out the preprocessor directives that make the difference, cause they're the only difference between CopyMemory and memmove !
Thanks everyone.
Execution cycles, 64-byte memory copy, data read-aligned to 32/16/8/4/3/2/1:
Quote from: Athlon XP 2500+ / XP SP2
CopyMem1: 101, 101, 101, 101, 101, 101, 101 (esi,edi) movsd
CopyMem2: 88, 88, 88, 88, 98, 97, 98 (esi,edi) rep movs
CopyMem3: 88, 88, 88, 88, 98, 97, 98 (esi,edi) rep movs
CopyMemory: 53, 53, 53, 53, 63, 62, 63 (From VS7)
MemMoveNW1: 36, 36, 36, 36, 43, 41, 43 (esi,edi) mov dword
CopyMemNW1: 35, 35, 35, 35, 39, 37, 39 (esi,edi) mov dword
Thanks to you too, Mark for that usefull performance info ! :thumbu
yep, thanks... the thing i always suspected is confirmed... microsoft coders are overpaid :wink
Am I wrong or CopyMemNW1 source pointer must be at least 16 bytes to be GPF safe ?
As far as I can see - you are wrong :)
The proc is cheking size
...
Label00:mov ecx,eax
and ecx,11111111111111111111111111110000b ;if less than 16
jz Label02
...
Label02:mov ecx,eax
and ecx,00000000000000000000000000001100b ;if less than 4
jz Label04
...
Label04:mov ecx,eax
and ecx,00000000000000000000000000000011b ;if zero
Quote from: jdoe on March 28, 2006, 12:23:33 AM
Am I wrong
sorry but yep, the svin is right... :thumbu but you involontary dig up something i've forgot to said previously... my code is not only "non-intel specific instruction dependant"... the code is also evolutive... you can use 64 bits register, or simd 128 bits register, etc... there is no limit until you understand how the size have to be managed... i know i've open a small pandora box... but it have to be done... :green2
@NightWare: Sure your memcopy function is great but the "tweaked" one from Mark Jones CopyMemNW1 is missing one jump to avoid GPF... or maybe I need to sleep :lol
can you tell me where the jump is missing ? coz each AND (the instruction that manage the size) is followed by a JZ (jump to avoid the following part)... (the 2 mov instructions don't affect the flags so i don't see were is the problem)...
EDIT : ok, i see it... there is a missing label and the corresponding jump... he has certainly corrected the code otherwise it crash... but you are right, there was a small error on the tweaked one posted by mark...
thanks jdoe...
MARK !!! you shouldn't change the name of the label... or by something that is recognisable... @ methode is generally error generator...
Yep, the branch was lost accidentally! Nice catch, jdoe. :wink
Quote from: NightWare on March 28, 2006, 08:03:39 PM
MARK !!! you shouldn't change the name of the label... or by something that is recognisable... @ methode is generally error generator...
Well, when coded properly, they work properly. :toothy
I like to use anonymous jump labels where looping doesn't need an obvious label. Seems to make it clearer to understand. But everyone is different.