News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Notification on the "udw2str" algo as unsound.

Started by hutch--, November 24, 2010, 01:16:07 AM

Previous topic - Next topic

hutch--

Its an algo that was written and contributed about 10 years ago and I have never used it but while testing a range of similar algorithms recently, the "udw2str" algorithm fails over 1 gig and should not be used. It will be replaced with an equate to another algorithm that has been exhaustively tested over the unsigned DWORD range.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave

wow - surprising it wasn't found sooner   :bg

MichaelW

It may be unsound, but it's still good enough for my financial calculations.

eschew obfuscation

hutch--

 :bg

Michael,

That just means you have not been working on your first billion yet.  :P

Seriously though, the algo is broken and I have 3 others that are both faster and exhaustively tested.

If you can be bothered, give this one a try.



;  ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

udw2str2 proc dwNumber:DWORD, pszString:DWORD

    mov eax, [esp+4]

    cmp eax, 9
    ja @F
    mov edx, 1
    jmp nxt
  @@:
    cmp eax, 99
    ja @F
    mov edx, 2
    jmp nxt
  @@:
    cmp eax, 999
    ja @F
    mov edx, 3
    jmp nxt
  @@:
    cmp eax, 9999
    ja @F
    mov edx, 4
    jmp nxt
  @@:
    cmp eax, 99999
    ja @F
    mov edx, 5
    jmp nxt
  @@:
    cmp eax, 999999
    ja @F
    mov edx, 6
    jmp nxt
  @@:
    cmp eax, 9999999
    ja @F
    mov edx, 7
    jmp nxt
  @@:
    cmp eax, 99999999
    ja @F
    mov edx, 8
    jmp nxt
  @@:
    cmp eax, 999999999
    ja @F
    mov edx, 9
    jmp nxt
  @@:
    mov edx, 10

  nxt:
    mov eax, [esp+4]            ; src
    mov ecx, [esp+8]            ; lpdest             ; buffer address

    push ebx
    push esi
    push edi

    mov esi, 0CCCCCCCDh         ; "magic number" multiplier for division by 10
    mov ebx, 10
    add ecx, edx                ; add required buffer length to start address
    mov BYTE PTR [ecx], 0       ; write the terminator
    sub ecx, 1                  ; decrement down to first write position

  @@:
    mul esi                     ; multiply by magic number
    shrd eax, edx, 3            ; binary fractional "remainder" back in EAX
    shr edx, 3                  ; EDX = quotient
    add eax, 1                  ; precaution against occasional "underflow"
    mov edi, edx                ; save current quotient for next division
    mul ebx                     ; x10 gets "decimal" remainder into EDX
    add dl, 30h                 ; convert remainder to ascii
    mov [ecx], dl               ; insert it into buffer
    sub ecx, 1                  ; decrement buffer location
    mov eax, edi                ; retrieve current quotient
    test eax, eax               ; test if done
    jz @F                       ; continue if not done

    mul esi                     ; multiply by magic number
    shrd eax, edx, 3            ; binary fractional "remainder" back in EAX
    shr edx, 3                  ; EDX = quotient
    add eax, 1                  ; precaution against occasional "underflow"
    mov edi, edx                ; save current quotient for next division
    mul ebx                     ; x10 gets "decimal" remainder into EDX
    add dl, 30h                 ; convert remainder to ascii
    mov [ecx], dl               ; insert it into buffer
    sub ecx, 1                  ; decrement buffer location
    mov eax, edi                ; retrieve current quotient
    test eax, eax               ; test if done
    jnz @B                      ; continue if not done

  @@:
    pop edi
    pop esi
    pop ebx

    mov eax, [esp+8]

    ret 8

udw2str2 endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

;  ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave

i can do my taxes with five digits - and that includes pennies   :lol


brethren

hutch, i tested it against mine...just a bit of fun :P

Quote
129 cycles for udw2str2
104 cycles for u32toa

use console, assemble and link to build the test

hutch--

Looks good,

Here is the timing on my Core2 quad.


99 cycles for udw2str2
98 cycles for u32toa
Press any key to continue ...


Now the big question, have you done the exhaustive test over the whole unsigned range ? What beat the old version as that it failed over 1 gig, the older version was about 30% faster once it was optimised.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

brethren

i tested it by using msvcrt ultoa and my algo to create 2 strings which i then strcmp'ed, starting at 0 and ending at 4294967295 (inclusive). my algos output matched ms ultoa over the full range so i'm certain it works for all unsigned values

hutch--

Just ran the same test and its correct over the entire range.  :U

Will have a play as this algo has room to make it faster.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

brethren

i'll be interested to see how much faster you can make it :U

MichaelW

I also tested against the CRT _ultoa function over the entire 32-bit range and found no differences in the returned strings. And I tested a non-null filled destination buffer and dumped the first 20 bytes of the buffer after the test to verify that the procedure is not writing to the buffer beyond the 11th byte.
eschew obfuscation

hutch--

This tweak is testing up OK over the full unsigned range. Remove the PUSHAD POPAD pair dropped the time, removed the stack frame, unrolled it by 2 then added the code to calculate the result length then removed the reverse string loop at the end. In the benchmark I am using time dropped from 250 ms to 187 ms.


;  ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

ALIGN 16
; u32toa by brethren
utoa2 PROC, value:DWORD, szbuf:DWORD

    push ebx
    push esi
    push edi

    mov eax, [esp+4][12]       ; value
    mov esi, [esp+8][12]       ; szbuf

    cmp eax, 9
    ja lb1
    add esi, 1
    jmp nxt
  lb1:
    cmp eax, 99
    ja lb2
    add esi, 2
    jmp nxt
  lb2:
    cmp eax, 999
    ja lb3
    add esi, 3
    jmp nxt
  lb3:
    cmp eax, 9999
    ja lb4
    add esi, 4
    jmp nxt
  lb4:
    cmp eax, 99999
    ja lb5
    add esi, 5
    jmp nxt
  lb5:
    cmp eax, 999999
    ja lb6
    add esi, 6
    jmp nxt
  lb6:
    cmp eax, 9999999
    ja lb7
    add esi, 7
    jmp nxt
  lb7:
    cmp eax, 99999999
    ja lb8
    add esi, 8
    jmp nxt
  lb8:
    cmp eax, 999999999
    ja lb9
    add esi, 9
    jmp nxt
  lb9:
    add esi, 10

  nxt:

    mov ebx, 0CCCCCCCDh
    mov BYTE PTR [esi], 0
    sub esi, 1

  stlp:
    mov ecx, eax
    mul ebx
    shr edx, 3
    mov eax, edx
    lea edx, [edx+edx*4]
    lea edx, [edx+edx]
    sub ecx, edx
    add cl, '0'
    mov [esi], cl
    sub esi, 1
    test eax, eax
    jbe lpqt

    mov ecx, eax
    mul ebx
    shr edx, 3
    mov eax, edx
    lea edx, [edx+edx*4]
    lea edx, [edx+edx]
    sub ecx, edx
    add cl, '0'
    mov [esi], cl
    sub esi, 1
    test eax, eax
    ja stlp

  lpqt:
    pop edi
    pop esi
    pop ebx

    ret 8

utoa2 ENDP

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

;  ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

brethren

#13
a 20% speed increase when run on my pc :clap: btw i really like how you've eliminated the need for reversing the string :wink

Quote131 cycles for udw2str2
104 cycles for u32toa
81 cycles for utoa2

the same improvements could be made to my dwtoa routine as its virtually identical apart from the sign test

.code
dwtoa PROC, val:DWORD, buf:PTR BYTE
pushad
mov eax, val
mov ebx, 0CCCCCCCDh
mov esi, buf

test eax, eax
jge @F
mov BYTE PTR [esi], '-'
add esi, 1
neg eax
@@:
mov edi, esi
@@:
mov ecx, eax
mul ebx
shr edx, 3
mov eax, edx
lea edx, [edx+edx*4]
add edx, edx
sub ecx, edx
add cl, '0'
mov [esi], cl
add esi, 1
test eax, eax
ja @B

mov BYTE PTR [esi], NULL
sub esi, 1
@@:
mov al, [edi]
mov bl, [esi]
mov [edi], bl
mov [esi], al
add edi, 1
sub esi, 1
cmp edi, esi
jb @B

popad
ret
dwtoa ENDP

lingo

brethren,
The same but faster coz I started with binary search:  :wink
align 16
;u32toalin by lingo
db  11 Dup (0)
u32toalin  PROC, value:DWORD, szbuf:DWORD
pop     edx        
pop     eax           
cmp    eax, 99999             
pop     ecx       
jbe      L4
cmp     eax, 9999999       
push    edx
jbe     L2
cmp    eax, 999999999   
        push   ebx
ja       L8
cmp    eax, 99999999     
ja       L1
add    ecx, 8
push   esi   
jne     L9
L1 :
add    ecx, 9
push   esi
jne     L9
L2:
cmp    eax, 999999         
push   ebx
ja       L3
add    ecx, 6
push   esi
jne     L9
L3:
add    ecx, 7
push   esi
jne     L9
L4:
cmp    eax, 999             
push   edx       
jbe     L6
cmp    eax, 9999             
push   ebx
ja       L5
add    ecx, 4
push   esi
jne     L9
L5:
add    ecx, 5
push   esi
jne     L9
L6:
cmp    eax, 99            
push   ebx
jbe     L7
add    ecx, 3
push   esi
jne     L9
L8:
push   esi   
add    ecx, 10
L9:
mov    byte ptr [ecx], 0
mov    esi,  0CCCCCCCDh
@@:
    mov ebx, eax
    mul esi
    shr edx, 3
    sub ecx, 1
    mov eax, edx
    lea edx, [edx+edx*4-18h]
    lea edx, [edx+edx]
    sub ebx, edx
    mov [ecx], bl
    test eax, eax
    je   @f

    mov ebx, eax
    mul esi
    shr edx, 3
    sub ecx, 1
    mov eax, edx
    lea edx, [edx+edx*4-18h]
    lea edx, [edx+edx]
    sub ebx, edx
    mov [ecx], bl
    test eax, eax
    jne  @b
@@:
    pop esi
    mov eax, ecx
    pop ebx
    ret
L7:
    cmp     eax, 9            
    push    esi
    jbe     L10
    add     ecx, 2                     
    jne     L9
L10:
    add eax, 30h             
    pop esi
    mov [ecx],  ax
    pop ebx
    mov eax, ecx
    ret
u32toalin ENDP

and results:
76 cycles for udw2str2-Hutch
65 cycles for u32toalin-Lingo
Press any key to continue ...