News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Where is CopyMemory?

Started by SamLe, May 28, 2009, 03:34:16 PM

Previous topic - Next topic

Vortex

Pentium IV 3.2 GHz :


317 cycles, RtlMoveMemory
305 cycles, memmove
305 cycles, memcpy
840 cycles, movememory

317 cycles, RtlMoveMemory
305 cycles, memmove
305 cycles, memcpy
837 cycles, movememory

317 cycles, RtlMoveMemory
305 cycles, memmove
305 cycles, memcpy
837 cycles, movememory

Germain

It's better to meake you own  :P

CopyMemory proc uses esi edi ecx edx  src:DWORD, dst:DWORD,  NoOfBytesToBeMoved:BYTE
   
   mov esi, src
   mov edi, dst
   mov ecx, NoOfBytesToBeMoved
   
   .REPEAT
    mov dl, byte ptr ds:[esi]
    mov byte ptr ds:[edi], dl
    inc edi
    inc esi
   .UNTILCXZ

   ret
   
CopyMemory endp

hutch--

Hi Germain,

Welcome on board. Your algo will work but it will be a lot slower than it should be. The REP MOVSD style of algo will be faster and it can be done at the byte level a lot faster. Using the UNTILCXZ is very slow and should be avoided.

Here is a faster byte level version.


bcopy proc src:DWORD,dst:DWORD,ln:DWORD

    push esi

    mov ecx, src
    mov edx, dst
    mov esi, ln
    add ecx, esi
    add edx, esi
    neg esi

  @@:
    movzx eax, BYTE PTR [ecx+esi]
    mov [edx+esi], al
    add esi, 1
    jnz @B

    pop esi

    ret

bcopy endp


Mixed DWORD/BYTE versions are faster again up to about 500 bytes. REP MOVSD/B is faster again over about 500 bytes. MMX versions are faster again and SSE version are faster still.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Farabi

The MASM MemCopy is the shortest one. It use rep movsx. Dont know about the speed.
Those who had universe knowledges can control the world by a micro processor.
http://www.wix.com/farabio/firstpage

"Etos siperi elegi"

KSS

RtlMoveMemory is fastest?  ::)
90 cycles, RtlMoveMemory
99 cycles, memmove
102 cycles, memcpy
784 cycles, movememory
1303 cycles, Germain_CopyMemory
780 cycles, bcopy


Can some post more fast code?

Germain

Quote from: hutch-- on January 18, 2010, 06:07:34 AM
Hi Germain,

Welcome on board. Your algo will work but it will be a lot slower than it should be. The REP MOVSD style of algo will be faster and it can be done at the byte level a lot faster. Using the UNTILCXZ is very slow and should be avoided.

Here is a faster byte level version.


bcopy proc src:DWORD,dst:DWORD,ln:DWORD

    push esi

    mov ecx, src
    mov edx, dst
    mov esi, ln
    add ecx, esi
    add edx, esi
    neg esi

  @@:
    movzx eax, BYTE PTR [ecx+esi]
    mov [edx+esi], al
    add esi, 1
    jnz @B

    pop esi

    ret

bcopy endp


Mixed DWORD/BYTE versions are faster again up to about 500 bytes. REP MOVSD/B is faster again over about 500 bytes. MMX versions are faster again and SSE version are faster still.

Thanks hutch :wink