Print Page - Notification on the "udw2str" algo as unsound.

Title: Notification on the "udw2str" algo as unsound.
Post by: hutch-- on November 24, 2010, 01:16:07 AM

Its an algo that was written and contributed about 10 years ago and I have never used it but while testing a range of similar algorithms recently, the "udw2str" algorithm fails over 1 gig and should not be used. It will be replaced with an equate to another algorithm that has been exhaustively tested over the unsigned DWORD range.

Title: Re: Notification on the "udw2str" algo as unsound.
Post by: dedndave on November 24, 2010, 02:38:19 AM

wow - surprising it wasn't found sooner :bg

Title: Re: Notification on the "udw2str" algo as unsound.
Post by: MichaelW on November 24, 2010, 02:43:58 AM

It may be unsound, but it's still good enough for my financial calculations.

Title: Re: Notification on the "udw2str" algo as unsound.
Post by: hutch-- on November 24, 2010, 04:24:37 AM

:bg

Michael,

That just means you have not been working on your first billion yet. :P

Seriously though, the algo is broken and I have 3 others that are both faster and exhaustively tested.

If you can be bothered, give this one a try.

Code Select



;  ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

udw2str2 proc dwNumber:DWORD, pszString:DWORD

    mov eax, [esp+4]

    cmp eax, 9
    ja @F
    mov edx, 1
    jmp nxt
  @@:
    cmp eax, 99
    ja @F
    mov edx, 2
    jmp nxt
  @@:
    cmp eax, 999
    ja @F
    mov edx, 3
    jmp nxt
  @@:
    cmp eax, 9999
    ja @F
    mov edx, 4
    jmp nxt
  @@:
    cmp eax, 99999
    ja @F
    mov edx, 5
    jmp nxt
  @@:
    cmp eax, 999999
    ja @F
    mov edx, 6
    jmp nxt
  @@:
    cmp eax, 9999999
    ja @F
    mov edx, 7
    jmp nxt
  @@:
    cmp eax, 99999999
    ja @F
    mov edx, 8
    jmp nxt
  @@:
    cmp eax, 999999999
    ja @F
    mov edx, 9
    jmp nxt
  @@:
    mov edx, 10

  nxt:
    mov eax, [esp+4]            ; src
    mov ecx, [esp+8]            ; lpdest             ; buffer address

    push ebx
    push esi
    push edi

    mov esi, 0CCCCCCCDh         ; "magic number" multiplier for division by 10
    mov ebx, 10
    add ecx, edx                ; add required buffer length to start address
    mov BYTE PTR [ecx], 0       ; write the terminator
    sub ecx, 1                  ; decrement down to first write position

  @@:
    mul esi                     ; multiply by magic number
    shrd eax, edx, 3            ; binary fractional "remainder" back in EAX
    shr edx, 3                  ; EDX = quotient
    add eax, 1                  ; precaution against occasional "underflow"
    mov edi, edx                ; save current quotient for next division
    mul ebx                     ; x10 gets "decimal" remainder into EDX
    add dl, 30h                 ; convert remainder to ascii
    mov [ecx], dl               ; insert it into buffer
    sub ecx, 1                  ; decrement buffer location
    mov eax, edi                ; retrieve current quotient
    test eax, eax               ; test if done
    jz @F                       ; continue if not done

    mul esi                     ; multiply by magic number
    shrd eax, edx, 3            ; binary fractional "remainder" back in EAX
    shr edx, 3                  ; EDX = quotient
    add eax, 1                  ; precaution against occasional "underflow"
    mov edi, edx                ; save current quotient for next division
    mul ebx                     ; x10 gets "decimal" remainder into EDX
    add dl, 30h                 ; convert remainder to ascii
    mov [ecx], dl               ; insert it into buffer
    sub ecx, 1                  ; decrement buffer location
    mov eax, edi                ; retrieve current quotient
    test eax, eax               ; test if done
    jnz @B                      ; continue if not done

  @@:
    pop edi
    pop esi
    pop ebx

    mov eax, [esp+8]

    ret 8

udw2str2 endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

;  ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

Title: Re: Notification on the "udw2str" algo as unsound.
Post by: dedndave on November 24, 2010, 05:59:30 AM

i can do my taxes with five digits - and that includes pennies :lol

Title: Re: Notification on the "udw2str" algo as unsound.
Post by: brethren on November 24, 2010, 06:29:29 PM

Quote from: dedndave on November 24, 2010, 02:38:19 AM
wow - surprising it wasn't found sooner :bg

lol, you must of been busy with the algo timings :lol
http://www.masm32.com/board/index.php?topic=14642.msg118452#msg118452

Title: Re: Notification on the "udw2str" algo as unsound.
Post by: brethren on November 24, 2010, 06:57:15 PM

hutch, i tested it against mine...just a bit of fun :P

Quote
129 cycles for udw2str2
104 cycles for u32toa

use console, assemble and link to build the test

Title: Re: Notification on the "udw2str" algo as unsound.
Post by: hutch-- on November 24, 2010, 07:25:18 PM

Looks good,

Here is the timing on my Core2 quad.

Code Select


99 cycles for udw2str2
98 cycles for u32toa
Press any key to continue ...

Now the big question, have you done the exhaustive test over the whole unsigned range ? What beat the old version as that it failed over 1 gig, the older version was about 30% faster once it was optimised.

Title: Re: Notification on the "udw2str" algo as unsound.
Post by: brethren on November 24, 2010, 07:37:16 PM

i tested it by using msvcrt ultoa and my algo to create 2 strings which i then strcmp'ed, starting at 0 and ending at 4294967295 (inclusive). my algos output matched ms ultoa over the full range so i'm certain it works for all unsigned values

Title: Re: Notification on the "udw2str" algo as unsound.
Post by: hutch-- on November 24, 2010, 07:41:15 PM

Just ran the same test and its correct over the entire range. :U

Will have a play as this algo has room to make it faster.

Title: Re: Notification on the "udw2str" algo as unsound.
Post by: brethren on November 24, 2010, 07:45:22 PM

i'll be interested to see how much faster you can make it :U

Title: Re: Notification on the "udw2str" algo as unsound.
Post by: MichaelW on November 24, 2010, 08:28:28 PM

I also tested against the CRT _ultoa function over the entire 32-bit range and found no differences in the returned strings. And I tested a non-null filled destination buffer and dumped the first 20 bytes of the buffer after the test to verify that the procedure is not writing to the buffer beyond the 11th byte.

Title: Re: Notification on the "udw2str" algo as unsound.
Post by: hutch-- on November 24, 2010, 09:21:15 PM

This tweak is testing up OK over the full unsigned range. Remove the PUSHAD POPAD pair dropped the time, removed the stack frame, unrolled it by 2 then added the code to calculate the result length then removed the reverse string loop at the end. In the benchmark I am using time dropped from 250 ms to 187 ms.

Code Select


;  ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

ALIGN 16
; u32toa by brethren
utoa2 PROC, value:DWORD, szbuf:DWORD

    push ebx
    push esi
    push edi

    mov eax, [esp+4][12]       ; value
    mov esi, [esp+8][12]       ; szbuf

    cmp eax, 9
    ja lb1
    add esi, 1
    jmp nxt
  lb1:
    cmp eax, 99
    ja lb2
    add esi, 2
    jmp nxt
  lb2:
    cmp eax, 999
    ja lb3
    add esi, 3
    jmp nxt
  lb3:
    cmp eax, 9999
    ja lb4
    add esi, 4
    jmp nxt
  lb4:
    cmp eax, 99999
    ja lb5
    add esi, 5
    jmp nxt
  lb5:
    cmp eax, 999999
    ja lb6
    add esi, 6
    jmp nxt
  lb6:
    cmp eax, 9999999
    ja lb7
    add esi, 7
    jmp nxt
  lb7:
    cmp eax, 99999999
    ja lb8
    add esi, 8
    jmp nxt
  lb8:
    cmp eax, 999999999
    ja lb9
    add esi, 9
    jmp nxt
  lb9:
    add esi, 10

  nxt:

    mov ebx, 0CCCCCCCDh
    mov BYTE PTR [esi], 0
    sub esi, 1

  stlp:
    mov ecx, eax
    mul ebx
    shr edx, 3
    mov eax, edx
    lea edx, [edx+edx*4]
    lea edx, [edx+edx]
    sub ecx, edx
    add cl, '0'
    mov [esi], cl
    sub esi, 1
    test eax, eax
    jbe lpqt

    mov ecx, eax
    mul ebx
    shr edx, 3
    mov eax, edx
    lea edx, [edx+edx*4]
    lea edx, [edx+edx]
    sub ecx, edx
    add cl, '0'
    mov [esi], cl
    sub esi, 1
    test eax, eax
    ja stlp

  lpqt:
    pop edi
    pop esi
    pop ebx

    ret 8

utoa2 ENDP

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

;  ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

Title: Re: Notification on the "udw2str" algo as unsound.
Post by: brethren on November 24, 2010, 09:33:59 PM

a 20% speed increase when run on my pc :clap: btw i really like how you've eliminated the need for reversing the string :wink

Quote131 cycles for udw2str2
104 cycles for u32toa
81 cycles for utoa2

the same improvements could be made to my dwtoa routine as its virtually identical apart from the sign test

Code Select

.code
dwtoa PROC, val:DWORD, buf:PTR BYTE
	pushad
	mov eax, val
	mov ebx, 0CCCCCCCDh
	mov esi, buf

	test eax, eax
	jge @F
	mov BYTE PTR [esi], '-'
	add esi, 1
	neg eax
@@:
	mov edi, esi
@@:
	mov ecx, eax
	mul ebx
	shr edx, 3
	mov eax, edx
	lea edx, [edx+edx*4]
	add edx, edx
	sub ecx, edx
	add cl, '0'
	mov [esi], cl
	add esi, 1
	test eax, eax
	ja @B

	mov BYTE PTR [esi], NULL
	sub esi, 1
@@:
	mov al, [edi]
	mov bl, [esi]
	mov [edi], bl
	mov [esi], al
	add edi, 1
	sub esi, 1
	cmp edi, esi
	jb @B

	popad
	ret
dwtoa ENDP

Title: Re: Notification on the "udw2str" algo as unsound.
Post by: lingo on November 25, 2010, 05:22:56 AM

brethren,
The same but faster coz I started with binary search: :wink

Code Select

align 16
;u32toalin by lingo
db  11 Dup (0)
u32toalin  PROC, value:DWORD, szbuf:DWORD
	pop     edx	        	
	pop     eax           
	cmp    eax, 99999              
	pop     ecx        
	jbe      L4	
	cmp     eax, 9999999        
	push    edx				
	jbe     L2
	cmp    eax, 999999999    
        push   ebx 	
	ja       L8
	cmp    eax, 99999999     
	ja       L1	
	add    ecx, 8					 	
	push   esi    
	jne     L9
L1 :
	add    ecx, 9
	push   esi							
	jne     L9
L2:
	cmp    eax, 999999         
	push   ebx
	ja       L3	
	add    ecx, 6
	push   esi							
	jne     L9
L3:
	add    ecx, 7
	push   esi							
	jne     L9
L4:	
	cmp    eax, 999              
	push   edx        
	jbe     L6
	cmp    eax, 9999              	
	push   ebx
	ja       L5
	add    ecx, 4
	push   esi							
	jne     L9
L5:
	add    ecx, 5
	push   esi							
	jne     L9
L6:
	cmp    eax, 99			            
	push   ebx
	jbe     L7
	add    ecx, 3
	push   esi							
	jne     L9
L8:
	push   esi    
	add    ecx, 10
L9:
	mov    byte ptr [ecx], 0
	mov    esi,  0CCCCCCCDh
@@:
    mov	ebx, eax
    mul	esi
    shr	edx, 3
    sub	ecx, 1
    mov	eax, edx
    lea	edx, [edx+edx*4-18h]
    lea	edx, [edx+edx]
    sub	ebx, edx
    mov	[ecx], bl
    test eax, eax
    je   @f

    mov	ebx, eax
    mul	esi
    shr	edx, 3
    sub	ecx, 1
    mov	eax, edx
    lea	edx, [edx+edx*4-18h]
    lea	edx, [edx+edx]
    sub	ebx, edx
    mov	[ecx], bl
    test eax, eax
    jne  @b
@@:
    pop	esi	
    mov	eax, ecx
    pop	ebx 
    ret 	
L7:
    cmp     eax, 9			           
    push    esi
    jbe     L10
    add     ecx, 2                     
    jne     L9
L10:
    add eax, 30h              
    pop	esi	
    mov	[ecx],  ax
    pop	ebx 
    mov	eax, ecx
    ret 
u32toalin ENDP

and results:
76 cycles for udw2str2-Hutch
65 cycles for u32toalin-Lingo
Press any key to continue ...

Title: Re: Notification on the "udw2str" algo as unsound.
Post by: hutch-- on November 25, 2010, 09:27:03 AM

:bg

Use the later version posted in the Lab with this code then try string lengths from 1 to 10.

Code Select



udw2str2 proc dwNumber:DWORD, pszString:DWORD

    push ebx
    push esi

    mov eax, [esp+4][8]       ; value
    mov esi, [esp+8][8]       ; szbuf

    cmp eax, 9999
    ja lb4

    cmp eax, 999999
    ja lb6

    cmp eax, 9
    ja lb1
    add esi, 1
    jmp nxt
  lb1:
    cmp eax, 99
    ja lb2
    add esi, 2
    jmp nxt
  lb2:
    cmp eax, 999
    ja lb3
    add esi, 3
    jmp nxt
  lb3:
    cmp eax, 9999
    ja lb4
    add esi, 4
    jmp nxt
  lb4:
    cmp eax, 99999
    ja lb5
    add esi, 5
    jmp nxt
  lb5:
    cmp eax, 999999
    ja lb6
    add esi, 6
    jmp nxt
  lb6:
    cmp eax, 9999999
    ja lb7
    add esi, 7
    jmp nxt
  lb7:
    cmp eax, 99999999
    ja lb8
    add esi, 8
    jmp nxt
  lb8:
    cmp eax, 999999999
    ja lb9
    add esi, 9
    jmp nxt
  lb9:
    add esi, 10

  nxt:

    mov ebx, 0CCCCCCCDh
    mov BYTE PTR [esi], 0
    sub esi, 1

  stlp:
    mov ecx, eax
    mul ebx
    shr edx, 3
    mov eax, edx
    lea edx, [edx+edx*4]
    lea edx, [edx+edx-48]
    sub ecx, edx
    mov [esi], cl
    sub esi, 1
    test eax, eax
    jbe lpqt

    mov ecx, eax
    mul ebx
    shr edx, 3
    mov eax, edx
    lea edx, [edx+edx*4]
    lea edx, [edx+edx-48]
    sub ecx, edx
    mov [esi], cl
    sub esi, 1
    test eax, eax
    ja stlp

  lpqt:
    pop esi
    pop ebx

    ret 8

udw2str2 endp

Title: Re: Notification on the "udw2str" algo as unsound.
Post by: dedndave on November 25, 2010, 10:48:36 AM

wouldn't it make sense to test for longer values first ?

Code Select

frequency of occurance
32 bit integers converted to decimal strings

unsigned (number of characters = number of digits)

 1 digit  10
 2 digits 90
 3 digits 900
 4 digits 9000
 5 digits 90000
 6 digits 900000
 7 digits 9000000
 8 digits 90000000
 9 digits 900000000
10 digits 3294967296

Title: Re: Notification on the "udw2str" algo as unsound.
Post by: hutch-- on November 25, 2010, 11:08:28 AM

:bg

Yes but it was not any faster. I think Lingo has the right idea, middle branching on a basic tree layout made the first run of the algo faster but it does not seem to effect the sustain rate.

Title: Re: Notification on the "udw2str" algo as unsound.
Post by: dedndave on November 25, 2010, 11:15:40 AM

it would be faster overall !!!
if you wrote a test piece to take the time for each of the 4294967296 possible values,
added those times together, then divided by the same number, you'd see an improvement
(that's why i posted the frequency of occurance chart)

yes - spliting it into groups reduces the number of CMP's/JMP's that are executed, i think

Code Select

        cmp     eax,9999999
        ja      more_than_7_digits

        cmp     eax,9999
        ja      more_than_4_digits

;-------------------------------

        cmp     eax,999
        ja      have_4_digits

        cmp     eax,99
        ja      have_3_digits

        cmp     eax,9
        ja      have_2_digits

have_1_digit:
        add     esi,1
        jmp     nxt

have_2_digits:
        add     esi,2
        jmp     nxt

have_3_digits:
        add     esi,3
        jmp     nxt

have_4_digits:
        add     esi,4
        jmp     nxt

;-------------------------------

more_than_4_digits:
        cmp     eax,999999
        ja      have_7_digits

        cmp     eax,99999
        ja      have_6_digits

have_5_digit:
        add     esi,5
        jmp     nxt

have_6_digits:
        add     esi,6
        jmp     nxt

have_7_digits:
        add     esi,7
        jmp     nxt

;-------------------------------

more_than_7_digits:
        cmp     eax,999999999
        ja      have_10_digits

        cmp     eax,99999999
        ja      have_9_digits

have_8_digits:
        add     esi,8
        jmp     nxt

have_9_digits:
        add     esi,9
        jmp     nxt

have_10_digits:
        add     esi,10

;-------------------------------

nxt:

Title: Re: Notification on the "udw2str" algo as unsound.
Post by: dedndave on November 25, 2010, 11:23:56 AM

in this case, i had 9 digits max
i needed to scale the value - not just adjust a pointer
it might be better if i used 32-bit instructions for EBX and ECX, though
i know - i know - i can unroll it, too :bg

Code Select

;determine the number of digits in the first dword

;ECX = 0 or 1
;ESI = active index pointer (offset of highest order accumulator dword)
;EBP = stack frame base pointer

Ascii0: mov     cl,9                    ;ECX = 9
        mov     eax,[esi]               ;load the first dword
        mov     bl,3                    ;BL = 3
        push    eax                     ;save the first dword

;scale the first dword by 1,000 to count digits
;we may make 0, 1, or 2 passes on this loop

Ascii1: cmp     eax,1000000             ;is it  >= 1,000,000 ?
        jae     Ascii2                  ;yes - determine mod 3 count

        sub     cl,bl                   ;no - subtract 3 from count
        imul    eax,eax,1000            ;scale it by 1,000
        cmp     cl,bl                   ;last 3 digits ?
        ja      Ascii1                  ;no - until it is

Title: Re: Notification on the "udw2str" algo as unsound.
Post by: hutch-- on November 25, 2010, 01:18:39 PM

Dave,

here is a version that does the calculations top down with branch predictions backwards and its still no faster.

Code Select


;  ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

ALIGN 16
; u32toa by brethren
utoa2 PROC, value:DWORD, szbuf:DWORD

    push ebx
    push esi

    mov eax, [esp+4][8]       ; value
    mov esi, [esp+8][8]       ; szbuf

    cmp eax, 1000000000
    jb lb9
    add esi, 10

  nxt:
    mov ebx, 0CCCCCCCDh
    mov BYTE PTR [esi], 0
    sub esi, 1

  align 4
  stlp:
    mov ecx, eax
    mul ebx
    shr edx, 3
    mov eax, edx
    lea edx, [edx+edx*4]
    lea edx, [edx+edx-48]
    sub ecx, edx
    mov [esi], cl
    sub esi, 1
    test eax, eax
    jbe lpqt

    mov ecx, eax
    mul ebx
    shr edx, 3
    mov eax, edx
    lea edx, [edx+edx*4]
    lea edx, [edx+edx-48]
    sub ecx, edx
    mov [esi], cl
    sub esi, 1
    test eax, eax
    ja stlp

  lpqt:
    pop esi
    pop ebx

    ret 8

  align 4
  lb1:
    add esi, 1
    jmp nxt
  lb2:
    cmp eax, 10
    jb lb1
    add esi, 2
    jmp nxt
  lb3:
    cmp eax, 100
    jb lb2
    add esi, 3
    jmp nxt
  lb4:
    cmp eax, 1000
    jb lb3
    add esi, 4
    jmp nxt
  lb5:
    cmp eax, 10000
    jb lb4
    add esi, 5
    jmp nxt
  lb6:
    cmp eax, 100000
    jb lb5
    add esi, 6
    jmp nxt
  lb7:
    cmp eax, 1000000
    jb lb6
    add esi, 7
    jmp nxt
  lb8:
    cmp eax, 10000000
    jb lb7
    add esi, 8
    jmp nxt
  lb9:
    cmp eax, 100000000
    jb lb8
    add esi, 9
    jmp nxt

utoa2 ENDP

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

;  ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

Title: Re: Notification on the "udw2str" algo as unsound.
Post by: dedndave on November 25, 2010, 01:54:59 PM

well - it's no faster on your i7 :P
and - testing it with the same value over and over isn't going to reveal anything

i think the tree here is a good plan
http://www.masm32.com/board/index.php?topic=15427.msg126379#msg126379

i have to get my head into your test code - lol

Title: Re: Notification on the "udw2str" algo as unsound.
Post by: lingo on November 25, 2010, 02:52:44 PM

"Use the later version posted in the Lab..."

I did it and my algo is still faster... :wink
results are:
69 cycles for udw2str2-Hutch
65 cycles for u32toalin-Lingo
Press any key to continue ...

Title: Re: Notification on the "udw2str" algo as unsound.
Post by: brethren on November 25, 2010, 05:04:41 PM

lingos test4 on AMD Turion(tm) 64 X2 Mobile Technology TL-52

Code Select

71 cycles for udw2str2-Hutch
77 cycles for u32toalin-Lingo
Press any key to continue ...

test2

Code Select

80 cycles for udw2str2-Hutch
68 cycles for u32toalin-Lingo
Press any key to continue ...

suprisingly the only difference in the code in u32toalin between lingos test2 and test4(where's test3 :P) is:
test4

Code Select


lea		edx, [edx+edx*4-24]
lea		edx, [edx+edx]

test2

Code Select


lea		edx, [edx+edx*4]
lea		edx, [edx+edx-30h]

Title: Re: Notification on the "udw2str" algo as unsound.
Post by: jj2007 on November 25, 2010, 06:21:10 PM

Quote from: brethren on November 25, 2010, 05:04:41 PM
suprisingly the only difference in the code between lingos test2 and test4(where's test3 :P) is:

Same size, same cycle count on the Celeron M. But Hutch's algo runs at 94 cycles instead of 104.

Title: Re: Notification on the "udw2str" algo as unsound.
Post by: hutch-- on November 25, 2010, 10:38:22 PM

:bg

Dave,

I mainly use the Core2 Quad, the i7 Win64 is for watching TV while I am writing code on the other box. :P

It runs fine and is slightly faster here and there than the Core2 quad but I confess I don't like Win7 much.

The last algo I posted that does the lead calculation top down should have the lower overhead for 10 digit numbers as it just falls through into the conversion. I have my doubts with the benchmarks posted as they do not sweep the range down to small digit counts, that is why I am doing the timings with the dialog benchmark in the lab.

I played with middle range branching to the comparison list but it only seems to favour samples that are near that length and each leading branch slows the rest down. I could not get the algo faster when sweeping different length ranges by doing it so I gave up on the idea. I still think your logic of doing the calculation top down makes the most sense as the number counts are much higher for the large values.

Title: Re: Notification on the "udw2str" algo as unsound.
Post by: dedndave on November 26, 2010, 12:09:35 AM

even the Core2 Quad is fast on so many things
you need to run it on your ShitBox :bg (closely emulates what the rest of us can afford - lol)

i still think the tree in reply #18 is the way to fly, Hutch

The MASM Forum Archive 2004 to 2012

Project Support Forums => MASM32 => Topic started by: hutch-- on November 24, 2010, 01:16:07 AM