News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

memcpy using the stack pointer register.

Started by Damos, February 05, 2009, 10:40:48 AM

Previous topic - Next topic

Damos

what do you think of this:
memcpystack proc dest,src,cnt

;cnt is a count of dwords
;this function has a granularity of dwords
mov edx,src
mov eax,dest
sub eax,edx
sub eax,4
mov ecx,cnt
xchg esp,edx
.repeat
pop dword ptr[esp+eax]
dec ecx
.until zero?
xchg esp,edx
ret

memcpystack endp

I'm not sure if this is any faster than the usual:

mov eax,[esi+edx]
mov [edi+edx],eax
add edx,4
dec ecx

or using rep movsd
although I suspect that it is faster than using string instructions because within the movsd microcode you would have to
1. increase both edi and esi by 4
2. check the direction flag
3. do something with the es segment register (don't know if this applies to 32 bit code)
I wouldn't be suprised if pop is quite an optimzed code.
you could also do the same using push if you wanted to do it backwards.
i'm going to do some timings see how fast it is.
I think this could be better used when unrolling the inner loop as: pop dword ptr[esp+eax] only uses 3 bytes. Thats got to be the shortest way of doing it if not the quickest.
let me know what you think.

Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius -- and a lot of courage -- to move in the opposite direction. - Albert Einstien

Damos

How WRONG am I, look at this:
memcpystack proc dest,src,cnt

;cnt is a count of dwords
;this function has a granularity of dwords
mov edx,src
mov eax,dest
sub eax,edx
sub eax,4
mov ecx,cnt
xchg esp,edx
.repeat
pop dword ptr[esp+eax]
dec ecx
.until zero?
xchg esp,edx
ret

memcpystack endp
memcpy1 proc dest,src,cnt

mov edx,src
mov ebx,dest
mov ecx,cnt
.repeat
mov eax,dword ptr[edx]
mov dword ptr[ebx],eax
add edx,4
add eax,4
dec ecx
.until zero?
ret

memcpy1 endp
memcpy2 proc dest,src,cnt

mov edx,src
mov ebx,dest
mov ecx,cnt
.repeat
push dword ptr[edx]
pop dword ptr[ebx]
add edx,4
add ebx,4
dec ecx
.until zero?
ret

memcpy2 endp
memcpy3 proc dest,src,cnt

mov esi,src
mov edi,dest
mov ecx,cnt
rep movsd
ret

memcpy3 endp


and the timings:

eax = 13903 (Gravity.asm, 682)
eax = 4805 (Gravity.asm, 686)
eax = 7995 (Gravity.asm, 690)
eax = 2022 (Gravity.asm, 694)

respectivly.
This clearly shows 2 things:
1. The stack instructions are SLOW!
2. String instructions are FAST!
I can't believe the difference the string instructions are 6 1/2 times faster than my origonal idea!
Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius -- and a lot of courage -- to move in the opposite direction. - Albert Einstien

Mark Jones

This is why it is so important to time your code. :bg

If you provide the souce and a makeit.bat file, others here will generally compile and run it also (so you can get an idea how it runs on other CPU's -- AMD and Intel often have radically different timings for snippets like these.)
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08