News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

This is too slow

Started by frktons, November 18, 2010, 03:10:21 AM

Previous topic - Next topic

Antariy

Quote from: frktons on November 19, 2010, 09:34:42 PM
Quote from: Antariy on November 19, 2010, 09:33:10 PM
This will be fast, but will require ~4KB table of numbers.  :eek  :lol

I can afford this, and maybe more ....  :P

:P  :lol

frktons

After that experiment, I'd like to try what happens using MMX and XMM registers to hold data
before filling the formatted string. I have to see some SSE2/3 opcodes that can suit the task.

Not sure at the moment how to do it, but I have a vague intuition something can be done in
a very effective way.  :lol

Probably prefilling an XMM registers with the separators, depending on the magnitude of the
number to format, and after filling the appropriate bytes with the digits extracted with "magic numbers" or
anything fast enough.  ::)
Mind is like a parachute. You know what to do in order to use it :-)

hutch--

For number conversions I have a faster signed DWORD version that was written by Paul Dixon. This may be useful for some of the tasks you have in mind. It also passes exhaustive testing over the full signed range.


; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

align 16

ltoa_ex proc LongVar:DWORD,answer:DWORD

  ; --------------------------------------------------------------------------------
  ; this algorithm was written by Paul Dixon and has been converted to MASM notation
  ; --------------------------------------------------------------------------------

    push esi
    push edi

    mov eax, [esp+4+8]          ; LongVar            ; get number
    mov ecx, [esp+8+8]          ; answer             ; get pointer to answer string
    jmp over

    align 16
    chartab:
      dd "00","10","20","30","40","50","60","70","80","90"
      dd "01","11","21","31","41","51","61","71","81","91"
      dd "02","12","22","32","42","52","62","72","82","92"
      dd "03","13","23","33","43","53","63","73","83","93"
      dd "04","14","24","34","44","54","64","74","84","94"
      dd "05","15","25","35","45","55","65","75","85","95"
      dd "06","16","26","36","46","56","66","76","86","96"
      dd "07","17","27","37","47","57","67","77","87","97"
      dd "08","18","28","38","48","58","68","78","88","98"
      dd "09","19","29","39","49","59","69","79","89","99"

  over:
    ; on entry eax=number to convert, ecx=pointer to answer buffer (minimum 12 bytes)
    ; on exit, eax,ecx,edx are undefined, all other registers are preserved.
    ; answer is in location pointed to by ecx on entry

  signed:
    ; do a signed DWORD to ASCII
    or eax,eax                          ; test sign
    jns udword                          ; if +ve, continue as for unsigned
    neg eax                             ; else, make number positive
    mov byte ptr [ecx],"-"              ; include the - sign
    add ecx, 1                          ; update the pointer

  udword:
    ; unsigned DWORD to ASCII
    mov esi,ecx                         ; get pointer to answer
    mov edi,eax                         ; save a copy of the number

    mov edx, 0D1B71759h                 ; =2^45\10000    13 bit extra shift
    mul edx                             ; gives 6 high digits in edx

    mov eax, 068DB9h                    ; =2^32\10000+1

    shr edx,13                          ; correct for multiplier offset used to give better accuracy
    jz skiphighdigits                   ; if zero then don't need to process the top 6 digits

    mov ecx,edx                         ; get a copy of high digits
    imul ecx,10000                      ; scale up high digits
    sub edi,ecx                         ; subtract high digits from original. EDI now = lower 4 digits

    mul edx                             ; get first 2 digits in edx
    mov ecx,100                         ; load ready for later

    jnc next1                           ; if zero, supress them by ignoring
    cmp edx,9                           ; 1 digit or 2?
    ja  ZeroSupressed                   ; 2 digits, just continue with pairs of digits to the end

    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dh                        ; but only write the 1 we need, supress the leading zero
    add esi, 1
    jmp ZS1                             ; continue with pairs of digits to the end

  next1:
    mul ecx                             ; get next 2 digits
    jnc next2                           ; if zero, supress them by ignoring
    cmp edx,9                           ; 1 digit or 2?
    ja  ZS1a                            ; 2 digits, just continue with pairs of digits to the end

    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dh                        ; but only write the 1 we need, supress the leading zero
    add esi, 1
    jmp ZS2                             ; continue with pairs of digits to the end

  next2:
    mul ecx                             ; get next 2 digits
    jnc short next3                     ; if zero, supress them by ignoring
    cmp edx,9                           ; 1 digit or 2?
    ja  ZS2a                            ; 2 digits, just continue with pairs of digits to the end
     
    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dh                        ; but only write the 1 we need, supress the leading zero
    add esi, 1
    jmp ZS3                             ; continue with pairs of digits to the end

  next3:

  skiphighdigits:
    mov eax,edi                         ; get lower 4 digits

    mov ecx,100

    mov edx,28F5C29h                    ; 2^32\100 +1
    mul edx
    jnc next4                           ; if zero, supress them by ignoring
    cmp edx,9                           ; 1 digit or 2?
    ja  ZS3a                            ; 2 digits, just continue with pairs of digits to the end

    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dh                        ; but only write the 1 we need, supress the leading zero
    add esi, 1
    jmp  ZS4                            ; continue with pairs of digits to the end

    next4:
    mul ecx                             ; this is the last pair so don; t supress a single zero
    cmp edx,9                           ; 1 digit or 2?
    ja  ZS4a                            ; 2 digits, just continue with pairs of digits to the end

    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dh                        ; but only write the 1 we need, supress the leading zero
    mov byte ptr [esi+1],0              ; zero terminate string

    jmp  xit                            ; all done

  ZeroSupressed:
    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dx
    add esi,2                           ; write them to answer

  ZS1:
    mul ecx                             ; get next 2 digits
    ZS1a:
    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dx                        ; write them to answer
    add esi,2

  ZS2:
    mul ecx                             ; get next 2 digits
    ZS2a:
    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dx                        ; write them to answer
    add esi,2

  ZS3:
    mov eax,edi                         ; get lower 4 digits
    mov edx,28F5C29h                    ; 2^32\100 +1
    mul edx                             ; edx= top pair
    ZS3a:
    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dx                        ; write to answer
    add esi,2                           ; update pointer

  ZS4:
    mul ecx                             ; get final 2 digits
    ZS4a:
    mov edx,chartab[edx*4]              ; look them up
    mov [esi],dx                        ; write to answer

    mov byte ptr [esi+2],0              ; zero terminate string

  xit:
  sdwordend:

    pop edi
    pop esi

    ret 8

ltoa_ex endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef


; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

frktons

Thanks Hutch.

This version is faster than the one in the  m32lib, udw2str?

; #########################################################################

    .386
    .model flat, stdcall  ; 32 bit memory model
    option casemap :none  ; case sensitive

  ; ---------------------------------------------------
  ; The original algorithm was written by comrade
  ; <comrade2k@hotmail.com>; http://www.comrade64.com/
  ;
  ;  It has been optimised by Alexander Yackubtchik
  ; ---------------------------------------------------

  ; udw2str

  ; Parameters
  ;     dwNumber - 32-bit double-word to be converted
  ;     pszString - null-terminated string (output)
  ; Result
  ;     None

    .code

; #########################################################################

udw2str proc dwNumber:DWORD, pszString:DWORD

    push ebx
    push esi
    push edi

    mov     eax, [dwNumber]
    mov     esi, [pszString]
    mov     edi, [pszString]
    mov ecx,429496730

  @@redo:
    mov ebx,eax
    mul ecx
    mov eax,edx
    lea edx,[edx*4+edx]
    add edx,edx
    sub ebx,edx
    add bl,'0'
    mov [esi],bl
    inc esi
    test    eax, eax
    jnz     @@redo
    jmp     @@chks

  @@invs:
    dec     esi
    mov     al, [edi]
    xchg    [esi], al
    mov     [edi], al
    inc     edi
  @@chks:
    cmp     edi, esi
    jb      @@invs

    pop edi
    pop esi
    pop ebx


    ret

udw2str endp

; #########################################################################

end


Frank
Mind is like a parachute. You know what to do in order to use it :-)

Antariy

Quote from: frktons on November 19, 2010, 10:14:03 PM
This version is faster than the one in the  m32lib, udw2str?

Yes, Frank. Only look into

xchg    [esi], al


This is dropped timings so much, that other code have no meaning.

frktons

Quote from: Antariy on November 19, 2010, 10:18:44 PM
Quote from: frktons on November 19, 2010, 10:14:03 PM
This version is faster than the one in the  m32lib, udw2str?

Yes, Frank. Only look into

xchg    [esi], al


This is dropped timings so much, that other code have no meaning.


Is that instruction so powerful? I didn't even suspect it.  :lol

Can you explain why this instruction is so important?
Mind is like a parachute. You know what to do in order to use it :-)

Antariy

Quote from: frktons on November 19, 2010, 10:24:46 PM
Quote from: Antariy on November 19, 2010, 10:18:44 PM
Quote from: frktons on November 19, 2010, 10:14:03 PM
This version is faster than the one in the  m32lib, udw2str?

Yes, Frank. Only look into

xchg    [esi], al


This is dropped timings so much, that other code have no meaning.


Is that instruction so powerful? I didn't even suspect it.  :lol

Can you explain why this instruction is so important?

Probably I sayed not right. I meant - it dropped, dropped algo to the one from SLOWEST. Oh... I should choose word too precise...

This instruction itself will cause 50-100 clocks of stall. This is atomical instruction, and CPU waits for all pending transactions in the system bus, before exchange values.
#LOCK is generated implicitly.



Alex

frktons

Quote from: Antariy on November 19, 2010, 10:30:05 PM

Probably I sayed not right. I meant - it dropped, dropped algo to the one from SLOWEST. Oh... I should choose word too precise...

This instruction itself will cause 50-100 clocks of stall. This is atomical instruction, and CPU waits for all pending transactions in the system bus, before exchange values.
#LOCK is generated implicitly.

Alex

Oh!!! Well. This is what I knew about xchg, that it is not efficient mnemonic, better to use
other solutions.  :U
Mind is like a parachute. You know what to do in order to use it :-)

Antariy

Quote from: frktons on November 19, 2010, 10:40:23 PM
Oh!!! Well. This is what I knew about xchg, that it is not efficient mnemonic, better to use
other solutions.  :U

Something like:

@@invs:
    dec     esi
    mov     al, [edi]
    mov     ah, [esi]
    mov    [edi], ah
    mov     [esi], al
    inc     edi
  @@chks:


But this is not make algo faster than Paul's code :lol

frktons

Quote from: Antariy on November 19, 2010, 10:49:10 PM
Something like:

@@invs:
    dec     esi
    mov     al, [edi]
    mov     ah, [esi]
    xchg    [edi], ah
    mov     [esi], al
    inc     edi
  @@chks:


But this is not make algo faster than Paul's code :lol


Is it not possible to avoid xchg and to use other mnemonics, better ones I mean?
Mind is like a parachute. You know what to do in order to use it :-)

Antariy

Quote from: frktons on November 19, 2010, 10:52:13 PM
Is it not possible to avoid xchg and to use other mnemonics, better ones I mean?

Pardon  :green2, I'm make changes not attentively  :lol Look to post again  :bg

frktons

Quote from: Antariy on November 19, 2010, 10:55:30 PM
Quote from: frktons on November 19, 2010, 10:52:13 PM
Is it not possible to avoid xchg and to use other mnemonics, better ones I mean?

Pardon  :green2, I'm make changes not attentively  :lol Look to post again  :bg

:U
Mind is like a parachute. You know what to do in order to use it :-)

hutch--

I have just converted the same algo to unsigned. Its an algo that Paul Dixon wrote in powerbasic that I have converted to MASM notation. Removed the stack frame and run it through exhaustive testing 0 to -1 full unsigned range.


;  ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

utoa_ex proc uvar:DWORD,pbuffer:DWORD

  ; --------------------------------------------------------------------------------
  ; this algorithm was written by Paul Dixon and has been converted to MASM notation
  ; --------------------------------------------------------------------------------

    mov eax, [esp+4]                ; uvar      : unsigned variable to convert
    mov ecx, [esp+8]                ; pbuffer   : pointer to result buffer

    push esi
    push edi

    jmp udword

  align 4
  chartab:
    dd "00","10","20","30","40","50","60","70","80","90"
    dd "01","11","21","31","41","51","61","71","81","91"
    dd "02","12","22","32","42","52","62","72","82","92"
    dd "03","13","23","33","43","53","63","73","83","93"
    dd "04","14","24","34","44","54","64","74","84","94"
    dd "05","15","25","35","45","55","65","75","85","95"
    dd "06","16","26","36","46","56","66","76","86","96"
    dd "07","17","27","37","47","57","67","77","87","97"
    dd "08","18","28","38","48","58","68","78","88","98"
    dd "09","19","29","39","49","59","69","79","89","99"

  udword:
    mov esi, ecx                    ; get pointer to answer
    mov edi, eax                    ; save a copy of the number

    mov edx, 0D1B71759h             ; =2^45\10000    13 bit extra shift
    mul edx                         ; gives 6 high digits in edx

    mov eax, 68DB9h                 ; =2^32\10000+1

    shr edx, 13                     ; correct for multiplier offset used to give better accuracy
    jz short skiphighdigits         ; if zero then don; t need to process the top 6 digits

    mov ecx, edx                    ; get a copy of high digits
    imul ecx, 10000                 ; scale up high digits
    sub edi, ecx                    ; subtract high digits from original. EDI now = lower 4 digits

    mul edx                         ; get first 2 digits in edx
    mov ecx, 100                    ; load ready for later

    jnc short next1                 ; if zero, supress them by ignoring
    cmp edx, 9                      ; 1 digit or 2?
    ja  short ZeroSupressed         ; 2 digits, just continue with pairs of digits to the end

    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dh                   ; but only write the 1 we need, supress the leading zero
    inc esi                         ; update pointer by 1
    jmp short ZS1                   ; continue with pairs of digits to the end

  next1:
    mul ecx                         ; get next 2 digits
    jnc short next2                 ; if zero, supress them by ignoring
    cmp edx, 9                      ; 1 digit or 2?
    ja  short ZS1a                  ; 2 digits, just continue with pairs of digits to the end

    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dh                   ; but only write the 1 we need, supress the leading zero
    inc esi                         ; update pointer by 1
    jmp short ZS2                   ; continue with pairs of digits to the end

  next2:
    mul ecx                         ; get next 2 digits
    jnc short next3                 ; if zero, supress them by ignoring
    cmp edx, 9                      ; 1 digit or 2?
    ja  short ZS2a                  ; 2 digits, just continue with pairs of digits to the end

    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dh                   ; but only write the 1 we need, supress the leading zero
    inc esi                         ; update pointer by 1
    jmp short ZS3                   ; continue with pairs of digits to the end

  next3:

  skiphighdigits:
    mov eax, edi                    ; get lower 4 digits

    mov ecx, 100

    mov edx, 28F5C29h               ; 2^32\100 +1
    mul edx
    jnc short next4                 ; if zero, supress them by ignoring
    cmp edx, 9                      ; 1 digit or 2?
    ja  short ZS3a                  ; 2 digits, just continue with pairs of digits to the end

    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dh                   ; but only write the 1 we need, supress the leading zero
    inc esi                         ; update pointer by 1
    jmp short  ZS4                  ; continue with pairs of digits to the end

  next4:
    mul ecx                         ; this is the last pair so don; t supress a single zero
    cmp edx, 9                      ; 1 digit or 2?
    ja  short ZS4a                  ; 2 digits, just continue with pairs of digits to the end

    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dh                   ; but only write the 1 we need, supress the leading zero
    mov byte ptr [esi+1], 0         ; zero terminate string

    jmp short  sdwordend            ; all done

  ZeroSupressed:
    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dx
    add esi, 2                      ; write them to answer

  ZS1:
    mul ecx                         ; get next 2 digits
  ZS1a:
    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dx                   ; write them to answer
    add esi, 2

  ZS2:
    mul ecx                         ; get next 2 digits
  ZS2a:
    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dx                   ; write them to answer
    add esi, 2

  ZS3:
    mov eax, edi                    ; get lower 4 digits
    mov edx, 28F5C29h               ; 2^32\100 +1
    mul edx                         ; edx= top pair
  ZS3a:
    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dx                   ; write to answer
    add esi, 2                      ; update pointer

  ZS4:
    mul ecx                         ; get final 2 digits
  ZS4a:
    mov edx, chartab[edx*4]         ; look them up
    mov [esi], dx                   ; write to answer

    mov byte ptr [esi+2], 0         ; zero terminate string

  sdwordend:

    pop edi
    pop esi

    ret 8

utoa_ex endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

;  ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

frktons

Quote from: hutch-- on November 19, 2010, 11:47:00 PM
I have just converted the same algo to unsigned. Its an algo that Paul Dixon wrote in powerbasic that I have converted to MASM notation. Removed the stack frame and run it through exhaustive testing 0 to -1 full unsigned range.


;  ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

utoa_ex proc uvar:DWORD,pbuffer:DWORD

  ; --------------------------------------------------------------------------------
  ; this algorithm was written by Paul Dixon and has been converted to MASM notation
  ; --------------------------------------------------------------------------------

    mov eax, [esp+4]                ; uvar      : unsigned variable to convert
    mov ecx, [esp+8]                ; pbuffer   : pointer to result buffer

    push esi
    push edi

    jmp udword

  align 4
  chartab:
    dd "00","10","20","30","40","50","60","70","80","90"
    dd "01","11","21","31","41","51","61","71","81","91"
    dd "02","12","22","32","42","52","62","72","82","92"
    dd "03","13","23","33","43","53","63","73","83","93"
    dd "04","14","24","34","44","54","64","74","84","94"
    dd "05","15","25","35","45","55","65","75","85","95"
    dd "06","16","26","36","46","56","66","76","86","96"
    dd "07","17","27","37","47","57","67","77","87","97"
    dd "08","18","28","38","48","58","68","78","88","98"
    dd "09","19","29","39","49","59","69","79","89","99"

  udword:
    mov esi, ecx                    ; get pointer to answer
    mov edi, eax                    ; save a copy of the number

    mov edx, 0D1B71759h             ; =2^45\10000    13 bit extra shift
    mul edx                         ; gives 6 high digits in edx

    mov eax, 68DB9h                 ; =2^32\10000+1

    shr edx, 13                     ; correct for multiplier offset used to give better accuracy
    jz short skiphighdigits         ; if zero then don; t need to process the top 6 digits

    mov ecx, edx                    ; get a copy of high digits
    imul ecx, 10000                 ; scale up high digits
    sub edi, ecx                    ; subtract high digits from original. EDI now = lower 4 digits

    mul edx                         ; get first 2 digits in edx
    mov ecx, 100                    ; load ready for later

    jnc short next1                 ; if zero, supress them by ignoring
    cmp edx, 9                      ; 1 digit or 2?
    ja  short ZeroSupressed         ; 2 digits, just continue with pairs of digits to the end

    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dh                   ; but only write the 1 we need, supress the leading zero
    inc esi                         ; update pointer by 1
    jmp short ZS1                   ; continue with pairs of digits to the end

  next1:
    mul ecx                         ; get next 2 digits
    jnc short next2                 ; if zero, supress them by ignoring
    cmp edx, 9                      ; 1 digit or 2?
    ja  short ZS1a                  ; 2 digits, just continue with pairs of digits to the end

    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dh                   ; but only write the 1 we need, supress the leading zero
    inc esi                         ; update pointer by 1
    jmp short ZS2                   ; continue with pairs of digits to the end

  next2:
    mul ecx                         ; get next 2 digits
    jnc short next3                 ; if zero, supress them by ignoring
    cmp edx, 9                      ; 1 digit or 2?
    ja  short ZS2a                  ; 2 digits, just continue with pairs of digits to the end

    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dh                   ; but only write the 1 we need, supress the leading zero
    inc esi                         ; update pointer by 1
    jmp short ZS3                   ; continue with pairs of digits to the end

  next3:

  skiphighdigits:
    mov eax, edi                    ; get lower 4 digits

    mov ecx, 100

    mov edx, 28F5C29h               ; 2^32\100 +1
    mul edx
    jnc short next4                 ; if zero, supress them by ignoring
    cmp edx, 9                      ; 1 digit or 2?
    ja  short ZS3a                  ; 2 digits, just continue with pairs of digits to the end

    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dh                   ; but only write the 1 we need, supress the leading zero
    inc esi                         ; update pointer by 1
    jmp short  ZS4                  ; continue with pairs of digits to the end

  next4:
    mul ecx                         ; this is the last pair so don; t supress a single zero
    cmp edx, 9                      ; 1 digit or 2?
    ja  short ZS4a                  ; 2 digits, just continue with pairs of digits to the end

    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dh                   ; but only write the 1 we need, supress the leading zero
    mov byte ptr [esi+1], 0         ; zero terminate string

    jmp short  sdwordend            ; all done

  ZeroSupressed:
    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dx
    add esi, 2                      ; write them to answer

  ZS1:
    mul ecx                         ; get next 2 digits
  ZS1a:
    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dx                   ; write them to answer
    add esi, 2

  ZS2:
    mul ecx                         ; get next 2 digits
  ZS2a:
    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dx                   ; write them to answer
    add esi, 2

  ZS3:
    mov eax, edi                    ; get lower 4 digits
    mov edx, 28F5C29h               ; 2^32\100 +1
    mul edx                         ; edx= top pair
  ZS3a:
    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dx                   ; write to answer
    add esi, 2                      ; update pointer

  ZS4:
    mul ecx                         ; get final 2 digits
  ZS4a:
    mov edx, chartab[edx*4]         ; look them up
    mov [esi], dx                   ; write to answer

    mov byte ptr [esi+2], 0         ; zero terminate string

  sdwordend:

    pop edi
    pop esi

    ret 8

utoa_ex endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

;  ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤


Thanks Steve. I'm afraid I'll have to do the work to insert and adapt it for the new testbed myself, don't I?  :lol

Well if I come back from the week-end sane enough I'll start to convert it and test it with the other algos.

This one misses all the separator stuff, so I've to do quite a lot of work.  :eek

Couldn't you mix this code and your formatting routine to make my life a bit easier?  :P

We'll see.  :P
Mind is like a parachute. You know what to do in order to use it :-)

hutch--

Frank,

The trick is to have your testbed so it uses standard MASM algorithms so you don't have to adapt them. Align each algo if its not aligned itself, if you ned to show the size in bytes (I don't personally care) then use a label either end do the arithmetic at assembly time.

These algo and a number of others are for the masm32 library so they must be presented in that form.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php