what do you think of this:
memcpystack proc dest,src,cnt
;cnt is a count of dwords
;this function has a granularity of dwords
mov edx,src
mov eax,dest
sub eax,edx
sub eax,4
mov ecx,cnt
xchg esp,edx
.repeat
pop dword ptr[esp+eax]
dec ecx
.until zero?
xchg esp,edx
ret
memcpystack endp
I'm not sure if this is any faster than the usual:
mov eax,[esi+edx]
mov [edi+edx],eax
add edx,4
dec ecx
or using rep movsd
although I suspect that it is faster than using string instructions because within the movsd microcode you would have to
1. increase both edi and esi by 4
2. check the direction flag
3. do something with the es segment register (don't know if this applies to 32 bit code)
I wouldn't be suprised if pop is quite an optimzed code.
you could also do the same using push if you wanted to do it backwards.
i'm going to do some timings see how fast it is.
I think this could be better used when unrolling the inner loop as: pop dword ptr[esp+eax] only uses 3 bytes. Thats got to be the shortest way of doing it if not the quickest.
let me know what you think.
How WRONG am I, look at this:
memcpystack proc dest,src,cnt
;cnt is a count of dwords
;this function has a granularity of dwords
mov edx,src
mov eax,dest
sub eax,edx
sub eax,4
mov ecx,cnt
xchg esp,edx
.repeat
pop dword ptr[esp+eax]
dec ecx
.until zero?
xchg esp,edx
ret
memcpystack endp
memcpy1 proc dest,src,cnt
mov edx,src
mov ebx,dest
mov ecx,cnt
.repeat
mov eax,dword ptr[edx]
mov dword ptr[ebx],eax
add edx,4
add eax,4
dec ecx
.until zero?
ret
memcpy1 endp
memcpy2 proc dest,src,cnt
mov edx,src
mov ebx,dest
mov ecx,cnt
.repeat
push dword ptr[edx]
pop dword ptr[ebx]
add edx,4
add ebx,4
dec ecx
.until zero?
ret
memcpy2 endp
memcpy3 proc dest,src,cnt
mov esi,src
mov edi,dest
mov ecx,cnt
rep movsd
ret
memcpy3 endp
and the timings:
eax = 13903 (Gravity.asm, 682)
eax = 4805 (Gravity.asm, 686)
eax = 7995 (Gravity.asm, 690)
eax = 2022 (Gravity.asm, 694)
respectivly.
This clearly shows 2 things:
1. The stack instructions are SLOW!
2. String instructions are FAST!
I can't believe the difference the string instructions are 6 1/2 times faster than my origonal idea!
This is why it is so important to time your code. :bg
If you provide the souce and a makeit.bat file, others here will generally compile and run it also (so you can get an idea how it runs on other CPU's -- AMD and Intel often have radically different timings for snippets like these.)