Best way to convert hexadecimal coded string to binary string?

Started by wht36, November 16, 2008, 04:42:00 AM

Previous topic - Next topic

wht36

Hello, can anyone please help me optimise (size or speedwise) my code below that converts a hexadecimal coded string into an ascii string?

;Store eg. D0a as the ascii cr lf
;ESI-> start of hex phrase => start of non-hex phrase
;EDI-> where string will be stored => after new string
Hex2str:
        push    edi
        db      3Ch     ;Disable next byte
.sto:   stosb           ;Store 1 digit
        lodsb
        sub     al,'0'  ;reduce decimal digits to their binary values
        cmp     al,10   ;if AL 0-9 can quit now
        jb      .sto
        and     al,0DFh ;coerce letters to upper case
        sub     al,7    ;reduce A--F range to 10--15
        cmp     al,10   ;was input below the A--F range?
        jb      .pack
        cmp     al,16   ;Carry If input below a--f range
        jb      .sto    ;If no error, go store it
.pack:  dec     esi     ;Bump back to where we stopped
        mov     ecx,edi
        pop     edi     ;DI->where char will be stored
        sub     ecx,edi
        shr     ecx,1
        adc     edi,0   ;bump forward if odd number of hex-codes
        jecxz   .ret     ;If no digits left, quit
        push    esi
        mov     esi,edi
.chg:   lodsw
        shl     al,4
        add     al,ah
        stosb
        loop    .chg
        pop     esi
.ret:  ret

ragdog

Hi

Try in masm32lib htodw or here in board hexdecode routine

Greets

wht36

Many thanks. Had a look at both routines, both seems to process certain non-hex codes as if they were hex codes (e.g. ":;<=>?" are treated as if they were hex codes ABCDEF). hexdecode routine also does not seem to support lower case.

I suppose size can be reduced by 5 bytes (sacrificing speed) with the following code:
;Store eg. D0a as the ascii cr lf
;SI=> start of non-hex phrase
;DI-> where string will be stored => after new string
Hex2str:
push edi
db 3Ch ;Disable next byte
.sto: stosb ;Store 1 digit
lodsb
sub al,'0' ;Reduce to zero
and al,0xDF ;Coerce to uppercase
aam 16 ;Unpack A-F to AH
daa ;If AL>9 then AL+6
aad 9 ;Pack A-F into AL
cmp al,16 ;Carry If input below a--f range
jb .sto ;If no error, go store it
.pack: dec esi ;Bump back to where we stopped
mov ecx,edi
pop edi ;DI->where char will be stored
sub ecx,edi
shr ecx,1 ;If even number of hex-codes
adc edi,0
jecxz .ret ;If no digits left, quit
push esi
mov esi,edi
.chg: lodsw
shl al,4
add al,ah
stosb
loop .chg
pop esi
.ret: ret


Unfortunately it also does not process input correctly and treats characters such as @ as a hex code

FORTRANS

Hello,

   My idea for converting Base 85 or UUdecode data uses the XLAT
instruction and a conversion table.  The bytes in the translation table
for the data hold their binary values, and characters to be ignored
(spaces, linefeeds, and so forth)  are set to minus one.  Illegal
characters are set to minus two.  I set the length before calling the
conversion routine.  Should work nicely for hexadecimal data as well.

   So the code gets a byte, translates it, tests for positive, if so
it accumulates the data, if not it skips the white space character
or aborts on bad data.

Regards,

Steve N.

wht36

Many thanks, I think xlat is the best solution speedwise and probably would be the preferred method in almost all settings.

Sizewise the code below is the smallest I could come up with that satisfies case insensitivity as well as correctness of parsing

;2 bytes less
;Store eg. D0a as the ascii cr lf
;SI=> start of non-hex phrase
;DI-> where string will be stored => after new string
Hex2str:
push edi
db 3Ch ;Disable next byte
.sto: stosb ;Store 1 digit
lodsb
       
sub al,':' ;Is it a number
jb .num
and al,0xDF ;Coerce to uppercase
sub al,7 ;Is it a letter
jb .pack
.num: add al,10 ;Convert to binary

cmp al,16
jb .sto ;Store if 0..9 A..F
.pack: dec esi ;Bump back to where we stopped
mov ecx,edi
pop edi ;DI->where char will be stored
sub ecx,edi
shr ecx,1 ;If even number of hex-codes
adc edi,0
jecxz .ret ;If no digits left, quit
push esi
mov esi,edi
.chg: lodsw
shl al,4
add al,ah
stosb
loop .chg
pop esi
.ret: ret

hutch--

If speed matters get rid of the following instructions,


.sto: stosb  ;Store 1 digit
lodsb
....
jecxz .ret ;If no digits left, quit
....
stosb
loop .chg


and replace it with normal incremented pointers using registers. The instructions above are very slow.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

wht36

Ah yes, the old loop and string instructions are quite slow. I still keep using them. I guess old habits die hard ;) Although I've used them in this instance for size optimisation. By the way, I've always wondered why they are slow, is it because their execution could not be paired?

Speed optimisation has always been much more difficult for me than size optimisation. For example, Is it faster to use mov eax,[hextable+eax] or xlat? What about aligned dword reads and writes? Is it possible in this instance?

Neil

Have a look at the stb & lob macros for replacing stosb & lodsb.

herge

 Hi wht36:

From C:\masm32\macros\macros.asm


    ; ----------------------
    ; fast lodsb replacement
    ; ----------------------
      lob MACRO
        mov al, [esi]
        inc esi
      ENDM

    ; ----------------------
    ; fast stosb replacement
    ; ----------------------
      stb MACRO
        mov [edi], al
        inc edi
      ENDM


Regards herge
// Herge born  Brussels, Belgium May 22, 1907
// Died March 3, 1983
// Cartoonist of Tintin and Snowy