
MASM32 SDK Description, downloads and other helpful links New Forum Link
masmforum WebSite

Fast CharacterTranslate

Started by UlliN, January 08, 2007, 11:58:55 AM

Previous topic - Next topic


I often have to convert records from ANSI to ASCII , to upper/lower, EBCDIC to ANSI or customized chrsets  and so on....

Here's my working proc,  faster than the corresponding MS-APIs. The attachment includes full source with predefined chartables and a timing-example.

Hope you enjoy!

CharTranslate proc uses ebx ecx edx edi esi ,
           lpTranslateArray:dword,         ; Pointer to TranslateTab
           lpString:dword,                 ; Pointer to String to be converted
           LenString:dword                 ; Number of chars to be converted

    Local LastOffset:dword
    Local LenTail:dword
    mov edi,        lpString   
    mov ecx,        LenString   
    mov LastOffset, edi

    mov LenTail,    ecx
    mov esi,        lpTranslateArray

    and LenTail,    3
    xor eax,        eax 
    sub ecx,        LenTail
    xor ebx,        ebx
    add LastOffset, ecx

    xor ecx,        ecx
    xor edx,        edx   

    sub LastOffset, 4 

    .if LenString < 4
      jmp @1

align 4

    mov   al, byte ptr[edi]   
    mov   bl, byte ptr[edi + 1]
    mov   cl, byte ptr[edi + 2]   
    mov   dl, byte ptr[edi + 3]

    mov   al, byte ptr [esi + eax]
    mov   bl, byte ptr [esi + ebx]
    mov   cl, byte ptr [esi + ecx]
    mov   dl, byte ptr [esi + edx]

    mov   byte ptr[edi]    , al
    mov   byte ptr[edi + 1], bl
    mov   byte ptr[edi + 2], cl   
    mov   byte ptr[edi + 3], dl

    .if edi < LastOffset
       add edi, 4
       jmp @B
    mov edi, LastOffset
    add edi, 4
    add edi, LenTail
    mov ecx, LenTail
    neg ecx
    .while ecx != 0
       mov   al, byte ptr[edi + ecx]   
       mov   al, byte ptr[esi + eax]
       mov   byte ptr[edi + ecx], al
       add ecx, 1       
    xor eax, eax
CharTranslate endp

Perhaps someone can improve it :-)

[attachment deleted by admin]


Here is a quick play with the algo. i zero extended the byre sized reads to DWORDS and it came down in timing slkightly on my older PIV.

    movzx eax, byte ptr [edi]   
    movzx ebx, byte ptr [edi + 1]
    movzx ecx, byte ptr [edi + 2]   
    movzx edx, byte ptr [edi + 3]

    movzx eax, byte ptr [esi + eax]
    movzx ebx, byte ptr [esi + ebx]
    movzx ecx, byte ptr [esi + ecx]
    movzx edx, byte ptr [esi + edx]

;     mov   al, byte ptr[edi]   
;     mov   bl, byte ptr[edi + 1]
;     mov   cl, byte ptr[edi + 2]   
;     mov   dl, byte ptr[edi + 3]
;     mov   al, byte ptr [esi + eax]
;     mov   bl, byte ptr [esi + ebx]
;     mov   cl, byte ptr [esi + ecx]
;     mov   dl, byte ptr [esi + edx]

    mov   byte ptr[edi]    , al
    mov   byte ptr[edi + 1], bl
    mov   byte ptr[edi + 2], cl   
    mov   byte ptr[edi + 3], dl

    .if edi < LastOffset
       add edi, 4
       jmp @B
Download site for MASM32      New MASM Forum



I thought that reading from four consecutive bytes would cause congestion on the memory bus, as locations within the same physical memory DWORD will share the same electrical connections and must wait for each other to finish being accessed (dependent on memory latency, usually much slower than CPU clock).

Would it not be slighly faster therefore to do something like this?

    movzx eax, word ptr [edi]       ; only two reads
    movzx ecx, word ptr [edi + 2]
    xor   ebx, ebx                  ; while waiting for the memory reads
    xor   edx, edx                  ; while waiting for the memory reads
    xchg  ah, bl
    xchg  ch, dl



I would give it a blast if you have time, put it into the code and see if its times faster. i just did the obvious shifting from partial register writes to zero extended writes.
Download site for MASM32      New MASM Forum