The MASM Forum Archive 2004 to 2012

General Forums => The Workshop => Topic started by: Mark Jones on March 23, 2006, 10:14:03 PM

Title: CopyMemory API
Post by: Mark Jones on March 23, 2006, 10:14:03 PM
Hi, anyone know what library CopyMemory  is in? It's not found searching the v9.00 release of \masm32\include. :wink

Here's a general-purpose replacement. I was curious how the two fared in execution speed.


CopyMem PROC USES ESI EDI dst:DWORD,src:DWORD,leng:DWORD
    mov esi,src
    mov edi,dst
    mov ecx,leng
    cld                     ; clear direction to copy forwards
@@:                         ; copy DWORDs
    cmp ecx,4               ; until less than 4 bytes remain
    jl @F
    sub ecx,4
    movsd                   ; copy DWORD & increment pointers
    jmp @B
@@:                         ; then copy any remaining bytes
    cmp ecx,0
    je @F
    sub ecx,1
    movsb
    jmp @B
@@:
    ret
CopyMem endp
Title: Re: CopyMemory API
Post by: PBrennick on March 23, 2006, 11:06:14 PM
It is found in the PSDK

WinBase.h
#define CopyMemory RtlCopyMemory

winNT.h
#define RtlCopyMemory(Destination,Source,Length)

Paul

Title: Re: CopyMemory API
Post by: Mark Jones on March 23, 2006, 11:20:00 PM
Aaah, thanks Paul. Can't seem to find RtlCopyMemory either, lol. :bg Here's another routine, untested, probably faster than the first:


CopyMem2 PROC USES ESI EDI dst:DWORD,src:DWORD,leng:DWORD
nop
    mov esi,src
    mov edi,dst
    mov eax,leng
    xor edx,edx             ; divide length into dwords
    mov ecx,4
    div ecx
    mov ecx,eax
    cld                     ; clear direction to copy forwards
    rep movsd               ; copy DWORDs & increment pointers until ecx=0
@@:                         ; then copy any remaining bytes
    test cl,cl
    je @F
    dec ecx
    movsb
    jmp @B
@@:
    ret
CopyMem2 endp


EDIT: Oh yeah this won't work for sizes not equal to dwords. Well maybe I'll plug away at it tomorrow.  :bg
Title: Re: CopyMemory API
Post by: PBrennick on March 23, 2006, 11:29:17 PM
I saw CopyMem in the PSDK, I think.  Seems the includes could include more!
Paul
Title: Re: CopyMemory API
Post by: hutch-- on March 24, 2006, 01:14:34 AM
Mark,

Some of the direct memory functions can be found in ntoskrnl.exe but they cannot be considered safe across Windows versions from that DLL. The function is declared in winbase.h but I don't currently have a reference to which DLL or library they are in. There is a library in the server 2003 sdk set for ntoskrnl.
Title: Re: CopyMemory API
Post by: Mincho Georgiev on March 24, 2006, 08:42:53 AM
Copy Memory /a.k.a RtlCopyMemory/ is a inline c++ function,located in wdm.lib. You can use RtlMoveMemory too, it does the same job,i always use that with no problems at all. The Source operand is never changed in my system, using RtlMoveMemory, anyway, you can use wdm.lib form poasm package.
Title: Re: CopyMemory API
Post by: Mincho Georgiev on March 24, 2006, 05:30:18 PM
I almost forgot something. CopyMemory,RtlCopyMemory and memcpy are one and the SAME function, a c-runtime one :)
So, even if is not located in wdm.lib from poasm package (i was use it once from wdm.lib ,but from VC6 as i remember) This is not an API function:

from winbase.h:
#define CopyMemory RtlCopyMemory
#define RtlCopyMemory(Destination,Source,Length) memcpy((Destination),(Source),(Length))

Litle tricky,is n't ?  :bg
Title: Re: CopyMemory API
Post by: NightWare on March 25, 2006, 02:48:25 AM
Quote from: Mark Jones on March 23, 2006, 11:20:00 PM
    mov eax,leng
    xor edx,edx             ; divide length into dwords
    mov ecx,4
    div ecx
    mov ecx,eax

??? is it a joke ? coz shr eax,2 should do the job

here, a clever code to copy memory, it's for people interrested by non intel dependant code (rep, movsd, movsb, etc...)... it's quite fast


ALIGN 16
memcopy PROC _dest_:DWORD,_src_:DWORD,_size_:DWORD
push ecx
push edx
push esi
push edi

mov eax,_size_
mov esi,_src_
mov edi,_dest_
mov ecx,eax
and ecx,11111111111111111111111111110000b
jz Label1
add esi,ecx
add edi,ecx
neg ecx
Label0: mov edx,DWORD PTR[esi+ecx]
mov DWORD PTR[edi+ecx],edx
mov edx,DWORD PTR[esi+ecx+4]
mov DWORD PTR[edi+ecx+4],edx
mov edx,DWORD PTR[esi+ecx+8]
mov DWORD PTR[edi+ecx+8],edx
mov edx,DWORD PTR[esi+ecx+12]
mov DWORD PTR[edi+ecx+12],edx
add ecx,16
jnz Label0
Label1: mov ecx,eax
and ecx,00000000000000000000000000001100b
jz Label3
add esi,ecx
add edi,ecx
neg ecx
Label2: mov edx,DWORD PTR [esi+ecx]
mov DWORD PTR [edi+ecx],edx
add ecx,4
jnz Label2
Label3: mov ecx,eax
and ecx,00000000000000000000000000000011b
jz Label5
add esi,ecx
add edi,ecx
neg ecx
Label4: mov dl,BYTE PTR [esi+ecx]
mov BYTE PTR [edi+ecx],dl
inc ecx
jnz Label4
Label5:
pop edi
pop esi
pop edx
pop ecx
ret
memcopy ENDP


anyway, even if your not interested by non intel dependant code, it's always interesting to see other approch
Title: Re: CopyMemory API
Post by: Mark Jones on March 25, 2006, 08:27:47 PM
Quote from: NightWare on March 25, 2006, 02:48:25 AM
Quote from: Mark Jones on March 23, 2006, 11:20:00 PM
    mov eax,leng
    xor edx,edx             ; divide length into dwords
    mov ecx,4
    div ecx
    mov ecx,eax

??? is it a joke ? coz shr eax,2 should do the job

No it was not a joke. It was untested code typed into this forum in a hurry. The interesting part is how the timings do not change:


    db 32-(($-a) AND 31) dup (0CCh) ; ALIGN 32
CopyMem3 PROC USES ESI EDI dst:DWORD,src:DWORD,leng:DWORD
    mov esi,src
    mov edi,dst
    mov ecx,leng
    shr ecx,2               ; divide length into dwords
    cld                     ; clear direction to copy forwards
    rep movsd               ; copy DWORD & increment pointers until ecx=0
@@:                         ; then copy any remaining bytes
    cmp ecx,0
    je @F
    sub ecx,1
    movsb
    jmp @B
@@:
    ret
CopyMem3 endp


Quote
anyway, even if your not interested by non intel dependant code, it's always interesting to see other approch

Yes, in fact that's why we have a discussion forum here. :bg

Your routine is quite fast. Here's the results from all three, clocked on 32/16/8/4/3/2/1-byte read offsets. All tests pass on 64-byte memory lengths.

Quote from: AMD XP 2500+ / XP SP2
CopyMem1:      113, 113, 113, 113, 113, 113, 113 (esi,edi) movsd
CopyMem2:      89, 89, 89, 89, 99, 98, 99 (esi,edi) rep movs
CopyMem3:      89, 89, 89, 89, 99, 98, 99 (esi,edi) rep movs
CopyMemNW1: 38, 38, 38, 38, 45, 43, 45 (esi,edi) mov dword

Press enter to exit...
Title: Re: CopyMemory API
Post by: Mark Jones on March 25, 2006, 10:20:55 PM
 I was able to tweak your routine a little to get even better performance on the AMD. Doesn't preserve the other GPR's though.


    ; by NightWare for non-intel-dependent code
    db 32-(($-a) AND 31) dup (0CCh) ; ALIGN 32
CopyMemNW1 PROC dst:DWORD,src:DWORD, siz:DWORD
    mov eax,siz
    mov esi,src
    mov edi,dst
    mov ecx,eax
    and ecx,11111111111111111111111111110000b
    jz checkDWord
    add esi,ecx
    add edi,ecx
    neg ecx
@@:
    mov edx,DWORD PTR[esi+ecx]
    mov DWORD PTR[edi+ecx],edx
    mov edx,DWORD PTR[esi+ecx+4]
    mov DWORD PTR[edi+ecx+4],edx
    mov edx,DWORD PTR[esi+ecx+8]
    mov DWORD PTR[edi+ecx+8],edx
    mov edx,DWORD PTR[esi+ecx+12]
    mov DWORD PTR[edi+ecx+12],edx
    add ecx,16
    jnz @B
checkDWord:
    mov ecx,eax
    and ecx,00000000000000000000000000001100b
    jz checkByte
    add esi,ecx
    add edi,ecx
    neg ecx
@@:
    mov edx,DWORD PTR [esi+ecx]
    mov DWORD PTR [edi+ecx],edx
    add ecx,4
    jnz @B
checkByte:
    mov ecx,eax
    and ecx,00000000000000000000000000000011b
    jz done
    add esi,ecx
    add edi,ecx
    neg ecx
@@:
    mov dl,BYTE PTR [esi+ecx]
    mov BYTE PTR [edi+ecx],dl
    inc ecx
    jnz @B
done:
    sub esi,eax
    sub edi,eax
    ret
CopyMemNW1 ENDP


Quote from: AMD XP 2500+ / XP SP2
CopyMem1:       113, 113, 113, 113, 113, 113, 113 (esi,edi) movsd
CopyMem2:       89, 89, 89, 89, 99, 98, 99 (esi,edi) rep movs
CopyMem3:       89, 89, 89, 89, 99, 98, 99 (esi,edi) rep movs
CopyMemNW1: 34, 34, 34, 34, 42, 38, 42 (esi,edi) mov dword

Press enter to exit...

EDIT: Corrected bug in branching. Whoops! :bg
Title: Re: CopyMemory API
Post by: Mincho Georgiev on March 25, 2006, 10:29:47 PM
Ok, i was thought alot before posting this , but i dont see a reason not to do it, since i didn't see anything about it in the license.
Mark, this is the original CopyMemory function, str8 from microsoft's vs7.I only had change the name /memcpy/ This is the function that you had looking for at the begining of this.
I didn't have the time for timing, but you can do it if you like, it will be interesting for me to see the results.

[attachment deleted by admin]
Title: Re: CopyMemory API
Post by: NightWare on March 26, 2006, 08:49:15 PM
hi all,

by using the technic i've posted previously it's possible to produce lot of memory algo

ZeroMem (mov), MemFill (mov), MemXchg (mov*2), MemFilter (and/xor), MemFusion (or), MemAdd (add), etc...

it's quite easy to adapt it, so i'm not going to post all those algos (i don't want to do all the job for you... and maybe someone will be able to see other possibilities i haven't saw)

shaka as posted a copymem algo (badly named coz it's a memmove/rtlmemmove like algo, it take care about the possible overwrite)
so if you want to make speed test you need something that do exactly the same job, that's why i post my MemMove variant algo (in fact, i've never tested it, so don't blame me if it doesn't work correctly), it's just a bit more complicate than the code i've posted previously...


ALIGN 16
MemMove PROC _dest_:DWORD,_src_:DWORD,_size_:DWORD
push ecx
push edx
push esi
push edi

mov eax,_size_
mov esi,_src_
mov edi,_dest_
cmp esi,edi
jb Label07
Label00: mov ecx,eax
and ecx,11111111111111111111111111110000b
jz Label02
add esi,ecx
add edi,ecx
neg ecx
Label01: mov edx,DWORD PTR[esi+ecx]
mov DWORD PTR[edi+ecx],edx
mov edx,DWORD PTR[esi+ecx+4]
mov DWORD PTR[edi+ecx+4],edx
mov edx,DWORD PTR[esi+ecx+8]
mov DWORD PTR[edi+ecx+8],edx
mov edx,DWORD PTR[esi+ecx+12]
mov DWORD PTR[edi+ecx+12],edx
add ecx,16
jnz Label01
Label02: mov ecx,eax
and ecx,00000000000000000000000000001100b
jz Label04
add esi,ecx
add edi,ecx
neg ecx
Label03: mov edx,DWORD PTR [esi+ecx]
mov DWORD PTR [edi+ecx],edx
add ecx,4
jnz Label03
Label04: mov ecx,eax
and ecx,00000000000000000000000000000011b
jz Label06
add esi,ecx
add edi,ecx
neg ecx
Label05: mov dl,BYTE PTR [esi+ecx]
mov BYTE PTR [edi+ecx],dl
inc ecx
jnz Label05
Label06:
pop edi
pop esi
pop edx
pop ecx
ret

Label07: mov ecx,edi
sub ecx,esi
cmp eax,ecx
jbe Label00
add esi,eax
add edi,eax
mov ecx,eax
and ecx,00000000000000000000000000000011b
jz Label09
sub esi,ecx
sub edi,ecx
Label08: dec ecx
mov dl,BYTE PTR [esi+ecx]
mov BYTE PTR [edi+ecx],dl
jnz Label08
Label09: mov ecx,eax
and ecx,00000000000000000000000000001100b
jz Label11
sub esi,ecx
sub edi,ecx
Label10: add ecx,-4
mov edx,DWORD PTR [esi+ecx]
mov DWORD PTR [edi+ecx],edx
jnz Label10
Label11: mov ecx,eax
and ecx,11111111111111111111111111110000b
jz Label13
sub esi,ecx
sub edi,ecx
Label12: add ecx,-16
mov edx,DWORD PTR[esi+ecx+12]
mov DWORD PTR[edi+ecx+12],edx
mov edx,DWORD PTR[esi+ecx+8]
mov DWORD PTR[edi+ecx+8],edx
mov edx,DWORD PTR[esi+ecx+4]
mov DWORD PTR[edi+ecx+4],edx
mov edx,DWORD PTR[esi+ecx]
mov DWORD PTR[edi+ecx],edx
jnz Label12
Label13:
pop edi
pop esi
pop edx
pop ecx
ret
MemMove ENDP


mark, if you want to remove useless PUSHs and POPs, in the algo you've changed you can add :
sub esi,eax
sub edi,eax
at the end of the code and remove USES ESI,EDI... it's just a bit faster...
Title: Re: CopyMemory API
Post by: Mincho Georgiev on March 26, 2006, 09:09:37 PM
It is not badly named, i just cut out the preprocessor directives that make the difference, cause they're the only difference between CopyMemory and memmove !
Title: Re: CopyMemory API
Post by: Mark Jones on March 27, 2006, 05:58:39 AM
Thanks everyone.

Execution cycles, 64-byte memory copy, data read-aligned to 32/16/8/4/3/2/1:
Quote from: Athlon XP 2500+ / XP SP2
CopyMem1:   101, 101, 101, 101, 101, 101, 101 (esi,edi) movsd
CopyMem2:   88, 88, 88, 88, 98, 97, 98 (esi,edi) rep movs
CopyMem3:   88, 88, 88, 88, 98, 97, 98 (esi,edi) rep movs
CopyMemory: 53, 53, 53, 53, 63, 62, 63 (From VS7)
MemMoveNW1: 36, 36, 36, 36, 43, 41, 43 (esi,edi) mov dword
CopyMemNW1: 35, 35, 35, 35, 39, 37, 39 (esi,edi) mov dword
Title: Re: CopyMemory API
Post by: Mincho Georgiev on March 27, 2006, 02:55:00 PM
Thanks to you too, Mark for that usefull performance info !  :thumbu
Title: Re: CopyMemory API
Post by: NightWare on March 27, 2006, 10:13:29 PM
yep, thanks... the thing i always suspected is confirmed... microsoft coders are overpaid  :wink
Title: Re: CopyMemory API
Post by: jdoe on March 28, 2006, 12:23:33 AM
Am I wrong or CopyMemNW1 source pointer must be at least 16 bytes to be GPF safe ?
Title: Re: CopyMemory API
Post by: The Svin on March 28, 2006, 01:33:49 AM
As far as I can see - you are wrong :)
The proc is cheking size
...
Label00:mov ecx,eax
and ecx,11111111111111111111111111110000b ;if less than 16
jz Label02
...
Label02:mov ecx,eax
and ecx,00000000000000000000000000001100b ;if less than 4
jz Label04
...
Label04:mov ecx,eax
and ecx,00000000000000000000000000000011b ;if zero
Title: Re: CopyMemory API
Post by: NightWare on March 28, 2006, 01:51:48 AM
Quote from: jdoe on March 28, 2006, 12:23:33 AM
Am I wrong

sorry but yep, the svin is right... :thumbu  but you involontary dig up something i've forgot to said previously... my code is not only "non-intel specific instruction dependant"... the code is also evolutive... you can use 64 bits register, or simd 128 bits register, etc... there is no limit until you understand how the size have to be managed... i know i've open a small pandora box... but it have to be done...  :green2
Title: Re: CopyMemory API
Post by: jdoe on March 28, 2006, 02:17:17 AM
@NightWare:  Sure your memcopy function is great but the "tweaked" one from Mark Jones CopyMemNW1 is missing one jump to avoid GPF... or maybe I need to sleep  :lol

Title: Re: CopyMemory API
Post by: NightWare on March 28, 2006, 08:03:39 PM
can you tell me where the jump is missing ? coz each AND (the instruction that manage the size) is followed by a JZ (jump to avoid the following part)... (the 2 mov instructions don't affect the flags so i don't see were is the problem)...

EDIT : ok, i see it... there is a missing label and the corresponding jump... he has certainly corrected the code otherwise it crash... but you are right, there was a small error on the tweaked one posted by mark...
thanks jdoe...

MARK !!! you shouldn't change the name of the label... or by something that is recognisable... @ methode is generally error generator...
Title: Re: CopyMemory API
Post by: Mark Jones on March 28, 2006, 09:10:15 PM
Yep, the branch was lost accidentally! Nice catch, jdoe. :wink

Quote from: NightWare on March 28, 2006, 08:03:39 PM
MARK !!! you shouldn't change the name of the label... or by something that is recognisable... @ methode is generally error generator...

Well, when coded properly, they work properly. :toothy

I like to use anonymous jump labels where looping doesn't need an obvious label. Seems to make it clearer to understand. But everyone is different.