News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

This is too slow

Started by frktons, November 18, 2010, 03:10:21 AM

Previous topic - Next topic

frktons

The conversion of an unsigned dword into a formatted string is usually done in two steps:

1] conversion from binary to string
2] format the string, in this case with thousand separators.

This process is quite slow compared to what could be done in a smarter way.

Using the usual way, with a MACRO and an API: ustrv$ + GetNumberFormat I have these results:


┌─────────────────────────────────────────────────────────────[18-Nov-2010 at 03:00 GMT]─┐
│OS  : Microsoft Windows 7 Ultimate Edition, 64-bit (build 7600)                         │
│CPU : Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz with 2 logical core(s) with SSSE3           │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│        Algorithm notes           │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 ustrv$ + GetNumberFormat       │   103   │   46.629 │   46.320 │   46.300 │   46.293 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤


I think it could be done in less than half this time. Feel free to post any suggestion to speed up things a little.

To have a standard test all the routines have to convert the same array of values:

    NumToTest        DWORD 0
                     DWORD 9
                     DWORD 10
                     DWORD 99
                     DWORD 100
                     DWORD 999
                     DWORD 1000
                     DWORD 9999
                     DWORD 10000
                     DWORD 99999
                     DWORD 100000
                     DWORD 999999
                     DWORD 1000000
                     DWORD 9999999
                     DWORD 10000000
                     DWORD 99999999
                     DWORD 100000000
                     DWORD 999999999
                     DWORD 1000000000
                     DWORD 4294967295


After that the measurements can be reliable, and inserted into the testbed.

There are 2 files to get used to the testbed:

Readme.txt  and  Info Screen displayed with the key when
the program shows the results.

This is the final release of the testbed. If you want to use it for any purpose, get used to it.

The sources, the info, the screens, everything is included in the zip file.

Enjoy it and post your results. Press C and paste the content of clipboard into the forum.
That's all.

Frank
Mind is like a parachute. You know what to do in order to use it :-)

GregL

Frank,

I usually use wsprintf to convert an unsigned dword to a string because it's easy to use and fast enough.  It won't be the fastest.

┌─────────────────────────────────────────────────────────────[18-Nov-2010 at 04:30 GMT]─┐
│OS  : Microsoft Windows Vista Home Premium Edition, 32-bit Service Pack 2 (build 6002)  │
│CPU : Intel(R) Core(TM)2 Duo CPU T5750 @ 2.00GHz with 2 logical core(s) with SSSE3      │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│        Algorithm notes           │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 ustrv$ + GetNumberFormat       │   103   │   55.703 │   51.868 │   53.009 │   53.044 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│02 wsprintf                       │    43   │   10.163 │   10.177 │   10.223 │   10.204 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤



; ---------------------------------------------------------------------------
;  Algo #02  to test and code to manage the display of the results.
; ---------------------------------------------------------------------------

.data
; ---------------------------------------------------------------------------
;  put here the description of your algo to test. max 30 chars.
; ---------------------------------------------------------------------------
ALIGN 4
    NumToTest2       DWORD 0
                     DWORD 9
                     DWORD 10
                     DWORD 99
                     DWORD 100
                     DWORD 999
                     DWORD 1000
                     DWORD 9999
                     DWORD 10000
                     DWORD 99999
                     DWORD 100000
                     DWORD 999999
                     DWORD 1000000
                     DWORD 9999999
                     DWORD 10000000
                     DWORD 99999999
                     DWORD 100000000
                     DWORD 999999999
                     DWORD 1000000000
                     DWORD 4294967295

    NumFormat2       BYTE "%u",0
   
    Buffer2          BYTE 32 DUP(0)
   
; -------------------------<123456789012345678901234567890>------------------
    AlgoDesc2        BYTE  "wsprintf                      ",0
; ---------------------------------------------------------------------------

.code

align 4

    mov  AlgoSize, (EndAlgo2 - Algo2)

    jmp  Start2

Algo2:

align 4

AlgoN2 proc

; ----------------------------------------------------------------------
;  put here your code to test
; ----------------------------------------------------------------------


    mov ecx, 20
    lea eax, NumToTest2
  @@:
    push ecx
    push eax
    mov eax, [eax]
    INVOKE wsprintf, ADDR Buffer2, ADDR NumFormat2, eax
    pop eax
    add eax, 4
    pop ecx
    dec ecx
    jnz @B
   
    ret

; ----------------------------------------------------------------------
;  end point of algo to test
; ----------------------------------------------------------------------

AlgoN2 endp


GregL

Using udw2str from masm32 library.  No formatting options.

┌─────────────────────────────────────────────────────────────[18-Nov-2010 at 05:02 GMT]─┐
│OS  : Microsoft Windows Vista Home Premium Edition, 32-bit Service Pack 2 (build 6002)  │
│CPU : Intel(R) Core(TM)2 Duo CPU T5750 @ 2.00GHz with 2 logical core(s) with SSSE3      │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│        Algorithm notes           │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 ustrv$ + GetNumberFormat       │   103   │   52.763 │   52.645 │   52.362 │   52.623 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│02 udw2str                        │    35   │    3.425 │    3.449 │    3.437 │    3.465 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤



; ---------------------------------------------------------------------------
;  Algo #02  to test and code to manage the display of the results.
; ---------------------------------------------------------------------------

.data
; ---------------------------------------------------------------------------
;  put here the description of your algo to test. max 30 chars.
; ---------------------------------------------------------------------------
ALIGN 4
    NumToTest2       DWORD 0
                     DWORD 9
                     DWORD 10
                     DWORD 99
                     DWORD 100
                     DWORD 999
                     DWORD 1000
                     DWORD 9999
                     DWORD 10000
                     DWORD 99999
                     DWORD 100000
                     DWORD 999999
                     DWORD 1000000
                     DWORD 9999999
                     DWORD 10000000
                     DWORD 99999999
                     DWORD 100000000
                     DWORD 999999999
                     DWORD 1000000000
                     DWORD 4294967295

   ;NumFormat2       BYTE "%u",0
   
    Buffer2          BYTE 32 DUP(0)
   
; -------------------------<123456789012345678901234567890>------------------
    AlgoDesc2        BYTE  "udw2str                      ",0
; ---------------------------------------------------------------------------

.code

align 4

    mov  AlgoSize, (EndAlgo2 - Algo2)

    jmp  Start2

Algo2:

align 4

AlgoN2 proc

; ----------------------------------------------------------------------
;  put here your code to test
; ----------------------------------------------------------------------


    mov ecx, 20
    lea eax, NumToTest2
  @@:
    push ecx
    push eax
    mov eax, [eax]
    INVOKE udw2str, eax, ADDR Buffer2
    pop eax
    add eax, 4
    pop ecx
    dec ecx
    jnz @B
   
    ret

; ----------------------------------------------------------------------
;  end point of algo to test
; ----------------------------------------------------------------------

AlgoN2 endp

frktons

Thanks Greg. These two example are only converting the binary into ASCII string,
both need the second step in order to have a fair comparison.

By the way, the second function looks far better than the MACRO and the C function. I didn't know it
existed at all.  :red

I'll add the second step and post the results.   :U

Frank   
Mind is like a parachute. You know what to do in order to use it :-)

dedndave

hiyas Frank
masm32\help\masmlib.chm    :U

frktons

Quote from: dedndave on November 18, 2010, 11:22:25 AM
hiyas Frank
masm32\help\masmlib.chm    :U

Yeah Dave, I had a look at it after Greg posted his results.  :thumbu
I thought the MACROS were all the stuff to do the conversions, but there are
also MASM functions. Faster than C equivalent, and smaller too I guess.
Mind is like a parachute. You know what to do in order to use it :-)

frktons

Strange results, less than expected:


┌─────────────────────────────────────────────────────────────[18-Nov-2010 at 11:55 GMT]─┐
│OS  : Microsoft Windows 7 Ultimate Edition, 64-bit (build 7600)                         │
│CPU : Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz with 2 logical core(s) with SSSE3           │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│        Algorithm notes           │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 ustrv$ + GetNumberFormat       │    95   │   45.734 │   45.647 │   45.622 │   45.484 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│02 udw2str + GetNumberFormat      │    65   │   45.346 │   45.243 │   45.188 │   45.282 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│03 wsprintf + GetNumberFormat     │    73   │   51.642 │   51.644 │   51.640 │   51.642 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤


Maybe I have made some mistake into translating the code Greg posted, attached the whole.


Mind is like a parachute. You know what to do in order to use it :-)

hutch--

Frank,

You will probably find that the API GetNumberFormat() is much slower than any number conversion algo and it will tend to level the times between different algos.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

frktons

Quote from: hutch-- on November 18, 2010, 12:07:27 PM
Frank,

You will probably find that the API GetNumberFormat() is much slower than any number conversion algo and it will tend to level the times between different algos.

Yes Steve. I guessed it, this is the reason I called the prog TwoInOne, I'm sure
merging together the two steps in one ASM PROC will give us much better results.

I'll Start with a chunk of code Clive posted some months ago when I was moving my first
steps into MASM world. I hope I'm now able to adapt it to run inside the Testbed and time it.


; EAX = 32-bit number
; ESI = string buffer for NUL terminated ASCII
; Uses ESI,EDI,EAX,ECX,EDX

    push 0 ; Mark stack end with NUL

divloop:

    mov  ecx,1000 ; Divide into 3 digit groups
    xor edx,edx ; Clear high order 32-bit for divide
    idiv ecx ; eax = edx:eax / ecx, edx = edx:eax % ecx

    mov edi,eax ; Save division result

    mov ecx,10 ; Subdivide in 10's
    mov eax,edx ; Get remainder

    or edi,edi ; Still number left, so at least 3 digits in remainder
    jnz digit000

    cmp eax,10 ; remainder has one digit
    jb  digit0

    cmp eax,100 ; remainder has two digits
    jb  digit00

digit000: ; 3 digits

    xor edx,edx ; Clear high order 32-bit for divide
    idiv ecx    ; eax = edx:eax / ecx, edx = edx:eax % ecx
    add edx,30h ; += '0'
    push edx ; Stack

digit00: ; 2 digits

    xor edx,edx ; Clear high order 32-bit for divide
    idiv ecx    ; eax = edx:eax / ecx, edx = edx:eax % ecx
    add edx,30h ; += '0'
    push edx    ; Stack

digit0: ; 1 digit

    xor edx,edx ; Clear high order 32-bit for divide
;   idiv ecx    ; eax = edx:eax / ecx, edx = edx:eax % ecx
    add edx,30h ; += '0'
    push edx    ; Stack

    mov eax,edi ; Recover remaining number

    or eax,eax ; Zero?
    jz poploop

    push  2Ch ; Comma added to groups of three digits
    jmp divloop

poploop:
    pop eax ; Recover next digit
    mov [esi],al ; Add to string
    inc esi
    or  eax,eax ; Was it a NUL?
    jnz poploop



and after I'll try something a bit more difficult, the reciprocal IMUL.  :eek

Frank
Mind is like a parachute. You know what to do in order to use it :-)

frktons

And with the help of Clive now I can affirm that:


┌─────────────────────────────────────────────────────────────[18-Nov-2010 at 18:56 GMT]─┐
│OS  : Microsoft Windows 7 Ultimate Edition, 64-bit (build 7600)                         │
│CPU : Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz with 2 logical core(s) with SSSE3           │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│        Algorithm notes           │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 ustrv$ + GetNumberFormat       │    95   │   45.182 │   45.237 │   45.148 │   45.400 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│02 udw2str + GetNumberFormat      │    65   │   44.971 │   45.064 │   45.037 │   45.224 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│03 wsprintf + GetNumberFormat     │    73   │   52.046 │   51.991 │   52.041 │   51.994 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│04 Clive - IDIV and Stack         │   120   │    3.187 │    3.185 │    3.150 │    3.185 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤


That's a lot of gain already, and this is not the fastest around.  Maybe we can expect to go below 2000 with
the appropriate algo.

Frank
Mind is like a parachute. You know what to do in order to use it :-)

GregL

Quote from: frktonsboth need the second step in order to have a fair comparison.

Just what special format do you need an unsigned integer to be in?

Nevermind, I looked at Clive's code, you want commas (or whatever) between each group of three digits.  Why didn't you say that?

frktons

Quote from: GregL on November 18, 2010, 09:43:15 PM
Quoteboth need the second step in order to have a fair comparison.

Just what special format do you need an unsigned integer to be in?

unsigned integer = 4294967295
ASCII formatted =  4.294.967.295

This is the reason for using: GetNumberFormat
Mind is like a parachute. You know what to do in order to use it :-)

frktons

It looks like The bigger the size of the prog, the fastest it gets  :P


┌─────────────────────────────────────────────────────────────[18-Nov-2010 at 22:25 GMT]─┐
│OS  : Microsoft Windows 7 Ultimate Edition, 64-bit (build 7600)                         │
│CPU : Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz with 2 logical core(s) with SSSE3           │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│        Algorithm notes           │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 ustrv$ + GetNumberFormat       │    95   │   45.421 │   45.113 │   45.035 │   44.975 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│02 udw2str + GetNumberFormat      │    65   │   44.840 │   44.676 │   45.928 │   44.803 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│03 wsprintf + GetNumberFormat     │    73   │   51.733 │   51.713 │   52.677 │   51.742 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│04 Clive - IDIV and Stack         │   120   │    3.186 │    3.125 │    3.207 │    3.167 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│05 Clive - reciprocal IMUL        │   157   │    2.110 │    2.101 │    2.111 │    2.049 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤


We are nearing the limits probably  :bg
Mind is like a parachute. You know what to do in order to use it :-)

frktons

Quote from: oex on November 18, 2010, 10:58:41 PM
Hey Frank, just a thought for your app.... Honestly I havent used the working version yet, the copy I downloaded was the initial test that didnt work but I was looking at the output and thinking that you could highlight or just highlight :lol the relevent copied info ie best results for forum posts....

I have noticed you were talking about clipboard copying so that should be easy?

I'm not sure I got what you mean. In the last release of the testbed, just the previous post of mine,
There is the [C] option to copy the results embedded in a couple of tags. You then just
paste the content of the clipboard into the forum and that's all.

Give it a try and let me know what you meant.
Mind is like a parachute. You know what to do in order to use it :-)

frktons

Quote from: GregL on November 18, 2010, 09:43:15 PM

Nevermind, I looked at Clive's code, you want commas (or whatever) between each group of three digits.  Why didn't you say that?

Sorry Greg, I thought it was clear from the code posted, but I was apparently wrong.  :P

Thanks for making that  point clear.  :U
Mind is like a parachute. You know what to do in order to use it :-)