Thank for ,,JJ2007"excellent job for timing and exploring SIMD instructions and other tricks found on the forum
I managed to come up with an universal string copy routine. :clap:
I am not claiming it is the fastest possible routine but it does what a multipurpose string copy should do best fit for high level language compilers.
It is "under read" protected. –Not read before the start of the source string-
It works with unaligned strings.
Return destination address in EAX. –Mostly needed-
Return a pointer on the zero termination in ecx. –Needed for append operation-
It requires 16 extra reserved bytes for strings created by "New" type allocations to avoid memory access violation exception.
Enjoy it!
:bg
.686p
.model flat
.xmm
.code
;Ustrcpy(Dest, Source),EAX = Destination; ECX => EndingZero
; ===========================================================================
Ustrcpy proc near public
push esi
push edi
mov esi, [esp+16] ;ESI = Source
mov edi, [esp+12] ;EDI = Destination
pxor xmm0, xmm0
movups xmm2, [esi]
movaps xmm1, xmm2
pcmpeqb xmm2, xmm0
pmovmskb ecx, xmm2
bsf ecx, ecx
jne Exit
mov edx, esi
and esi, -16
sub edx, esi
movups [edi], xmm1
sub edi, edx
jmp @F
C0: movlps qword ptr [edi], xmm1 ;movups [edi], xmm1 is shorter but slower
movhps qword ptr [edi+8], xmm1
@@: lea esi, [esi+16]
movaps xmm1, [esi]
lea edi, [edi+16]
pcmpeqb xmm0, xmm1
pmovmskb ecx, xmm0
test ecx, ecx
jz C0
bsf ecx, ecx
Exit: inc ecx ;Zero termination
mov eax, ecx
shr ecx, 2
rep movsd
xchg ecx, eax
and ecx, 3
rep movsb
dec edi
mov ecx, edi
mov eax, [esp+12]
pop edi
pop esi
retn 8
Ustrcpy endp
end
Quote from: Ficko on June 04, 2009, 02:37:54 PM
Thank for ,,JJ2007"excellent job for timing and exploring SIMD instructions and other tricks found on the forum
Thanks, I feel honoured.
> It is "under read" protected. –Not read before the start of the source string-
Don't understand what you mean. Can you give an example?
> Return destination address in EAX. –Mostly needed-
Hmmm... interesting statement. I just checked my fattest source, and found out I used 68 times lstrcpy but only once I needed the destination address:
invoke lstrcpy, addr SmlBuf, ecx ; copy body.ext
mov ebx, eax
add ebx, len(ebx) ; end of body.ext in ebx
mov al, [ebx-1]
.if al==34
> Return a pointer on the zero termination in ecx. –Needed for append operation-
And it turns out a pointer on the zero termination in ecx would have been the better choice:
invoke Ustrcpy, addr SmlBuf, ecx ; copy body.ext
mov al, [ecx-1]
.if al==34
:bg
> It requires 16 extra reserved bytes for strings created by "New" type allocations to avoid memory access violation exception.
You might rethink this condition. It hurts the claim of universality.
We need more members with your productive approach :U
Ok ,,JJ2007"
I see you want some explanation. :lol
First probably you didn't get my emphasis on
Quote"best fit for high level language compilers".
By a compiler you connect random user code with fix subs collected in libraries.
In this case usual approach would be is to push destination, source,destination
call the routine pop destination and print it if you for exemplar have a code like:
A$ = B$
PRINT A$,"is great!"
It could look like this:
Push addr A$
Push addr B$
Push addr A$
Call strcpy
Pop eax
Push addr "is great!"
Push eax
Call Print
With my "Ustrcpy " you don't need to push and pop the destination address. :8)
For:
Quote–Not read before the start of the source string-
I meant that it is important by compilers not to start read the source string before it actually start.
–Since the high randomity of variation can cause unpredicted troubles.-
But if you use such SSE routines in your own assembler code you may achieve alignment by starting reading before the actual string starts with "movaps"
getting better performance but such case you have full control over the code to avoid trouble.
Quote
> It requires 16 extra reserved bytes for strings created by "New" type allocations to avoid memory access violation exception.
You might rethink this condition. It hurts the claim of universality.
It is common by compilers to reserve some extra bytes for safety.
Since we are reading 16-byte at once and suppose you request with HeapAlloc some space for a string suppose to be 7 byte long
you may get 8 byte back from the system on page boundary than you have trouble. :bg
-I don't know the chances it can happen maybe 1:100000 but still can be a hidden nasty bug.-
With "universal" I meant you can use it in a compiler because it is safe or in your assembler program if you keep in mind the 16 byte safety margin.
I hope that clears up couple things for you. :wink