Hi,
I often have to convert records from ANSI to ASCII , to upper/lower, EBCDIC to ANSI or customized chrsets and so on....
Here's my working proc, faster than the corresponding MS-APIs. The attachment includes full source with predefined chartables and a timing-example.
Hope you enjoy!
Ulli
CharTranslate proc uses ebx ecx edx edi esi ,
lpTranslateArray:dword, ; Pointer to TranslateTab
lpString:dword, ; Pointer to String to be converted
LenString:dword ; Number of chars to be converted
Local LastOffset:dword
Local LenTail:dword
mov edi, lpString
mov ecx, LenString
mov LastOffset, edi
mov LenTail, ecx
mov esi, lpTranslateArray
and LenTail, 3
xor eax, eax
sub ecx, LenTail
xor ebx, ebx
add LastOffset, ecx
xor ecx, ecx
xor edx, edx
sub LastOffset, 4
.if LenString < 4
jmp @1
.endif
align 4
@@:
mov al, byte ptr[edi]
mov bl, byte ptr[edi + 1]
mov cl, byte ptr[edi + 2]
mov dl, byte ptr[edi + 3]
mov al, byte ptr [esi + eax]
mov bl, byte ptr [esi + ebx]
mov cl, byte ptr [esi + ecx]
mov dl, byte ptr [esi + edx]
mov byte ptr[edi] , al
mov byte ptr[edi + 1], bl
mov byte ptr[edi + 2], cl
mov byte ptr[edi + 3], dl
.if edi < LastOffset
add edi, 4
jmp @B
.endif
@1:
mov edi, LastOffset
add edi, 4
add edi, LenTail
mov ecx, LenTail
neg ecx
.while ecx != 0
mov al, byte ptr[edi + ecx]
mov al, byte ptr[esi + eax]
mov byte ptr[edi + ecx], al
add ecx, 1
.endw
xor eax, eax
ret
CharTranslate endp
Perhaps someone can improve it :-)
[attachment deleted by admin]
Here is a quick play with the algo. i zero extended the byre sized reads to DWORDS and it came down in timing slkightly on my older PIV.
@@:
movzx eax, byte ptr [edi]
movzx ebx, byte ptr [edi + 1]
movzx ecx, byte ptr [edi + 2]
movzx edx, byte ptr [edi + 3]
movzx eax, byte ptr [esi + eax]
movzx ebx, byte ptr [esi + ebx]
movzx ecx, byte ptr [esi + ecx]
movzx edx, byte ptr [esi + edx]
; mov al, byte ptr[edi]
; mov bl, byte ptr[edi + 1]
; mov cl, byte ptr[edi + 2]
; mov dl, byte ptr[edi + 3]
;
; mov al, byte ptr [esi + eax]
; mov bl, byte ptr [esi + ebx]
; mov cl, byte ptr [esi + ecx]
; mov dl, byte ptr [esi + edx]
mov byte ptr[edi] , al
mov byte ptr[edi + 1], bl
mov byte ptr[edi + 2], cl
mov byte ptr[edi + 3], dl
.if edi < LastOffset
add edi, 4
jmp @B
.endif
@1:
Hutch
I thought that reading from four consecutive bytes would cause congestion on the memory bus, as locations within the same physical memory DWORD will share the same electrical connections and must wait for each other to finish being accessed (dependent on memory latency, usually much slower than CPU clock).
Would it not be slighly faster therefore to do something like this?
movzx eax, word ptr [edi] ; only two reads
movzx ecx, word ptr [edi + 2]
xor ebx, ebx ; while waiting for the memory reads
xor edx, edx ; while waiting for the memory reads
xchg ah, bl
xchg ch, dl
Ian,
I would give it a blast if you have time, put it into the code and see if its times faster. i just did the obvious shifting from partial register writes to zero extended writes.