The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: UlliN on January 08, 2007, 11:58:55 AM

Title: Fast CharacterTranslate
Post by: UlliN on January 08, 2007, 11:58:55 AM
Hi,
I often have to convert records from ANSI to ASCII , to upper/lower, EBCDIC to ANSI or customized chrsets  and so on....

Here's my working proc,  faster than the corresponding MS-APIs. The attachment includes full source with predefined chartables and a timing-example.

Hope you enjoy!
Ulli




CharTranslate proc uses ebx ecx edx edi esi ,
           lpTranslateArray:dword,         ; Pointer to TranslateTab
           lpString:dword,                 ; Pointer to String to be converted
           LenString:dword                 ; Number of chars to be converted

    Local LastOffset:dword
    Local LenTail:dword
   
    mov edi,        lpString   
    mov ecx,        LenString   
    mov LastOffset, edi

    mov LenTail,    ecx
    mov esi,        lpTranslateArray

    and LenTail,    3
    xor eax,        eax 
    sub ecx,        LenTail
    xor ebx,        ebx
    add LastOffset, ecx

    xor ecx,        ecx
    xor edx,        edx   

    sub LastOffset, 4 

    .if LenString < 4
      jmp @1
    .endif

align 4

@@:
   
    mov   al, byte ptr[edi]   
    mov   bl, byte ptr[edi + 1]
    mov   cl, byte ptr[edi + 2]   
    mov   dl, byte ptr[edi + 3]

    mov   al, byte ptr [esi + eax]
    mov   bl, byte ptr [esi + ebx]
    mov   cl, byte ptr [esi + ecx]
    mov   dl, byte ptr [esi + edx]

    mov   byte ptr[edi]    , al
    mov   byte ptr[edi + 1], bl
    mov   byte ptr[edi + 2], cl   
    mov   byte ptr[edi + 3], dl

    .if edi < LastOffset
       add edi, 4
       jmp @B
    .endif
@1:
    mov edi, LastOffset
    add edi, 4
    add edi, LenTail
    mov ecx, LenTail
    neg ecx
    .while ecx != 0
       mov   al, byte ptr[edi + ecx]   
       mov   al, byte ptr[esi + eax]
       mov   byte ptr[edi + ecx], al
       add ecx, 1       
    .endw
    xor eax, eax
   
    ret
   
CharTranslate endp



Perhaps someone can improve it :-)


[attachment deleted by admin]
Title: Re: Fast CharacterTranslate
Post by: hutch-- on January 08, 2007, 02:29:14 PM
Here is a quick play with the algo. i zero extended the byre sized reads to DWORDS and it came down in timing slkightly on my older PIV.


@@:
   
    movzx eax, byte ptr [edi]   
    movzx ebx, byte ptr [edi + 1]
    movzx ecx, byte ptr [edi + 2]   
    movzx edx, byte ptr [edi + 3]

    movzx eax, byte ptr [esi + eax]
    movzx ebx, byte ptr [esi + ebx]
    movzx ecx, byte ptr [esi + ecx]
    movzx edx, byte ptr [esi + edx]

;     mov   al, byte ptr[edi]   
;     mov   bl, byte ptr[edi + 1]
;     mov   cl, byte ptr[edi + 2]   
;     mov   dl, byte ptr[edi + 3]
;
;     mov   al, byte ptr [esi + eax]
;     mov   bl, byte ptr [esi + ebx]
;     mov   cl, byte ptr [esi + ecx]
;     mov   dl, byte ptr [esi + edx]

    mov   byte ptr[edi]    , al
    mov   byte ptr[edi + 1], bl
    mov   byte ptr[edi + 2], cl   
    mov   byte ptr[edi + 3], dl

    .if edi < LastOffset
       add edi, 4
       jmp @B
    .endif
@1:
Title: Re: Fast CharacterTranslate
Post by: Ian_B on January 08, 2007, 06:20:11 PM
Hutch

I thought that reading from four consecutive bytes would cause congestion on the memory bus, as locations within the same physical memory DWORD will share the same electrical connections and must wait for each other to finish being accessed (dependent on memory latency, usually much slower than CPU clock).

Would it not be slighly faster therefore to do something like this?

    movzx eax, word ptr [edi]       ; only two reads
    movzx ecx, word ptr [edi + 2]
    xor   ebx, ebx                  ; while waiting for the memory reads
    xor   edx, edx                  ; while waiting for the memory reads
    xchg  ah, bl
    xchg  ch, dl

Title: Re: Fast CharacterTranslate
Post by: hutch-- on January 08, 2007, 08:52:52 PM
Ian,

I would give it a blast if you have time, put it into the code and see if its times faster. i just did the obvious shifting from partial register writes to zero extended writes.