The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: frktons on November 18, 2010, 03:10:21 AM

Title: This is too slow
Post by: frktons on November 18, 2010, 03:10:21 AM
The conversion of an unsigned dword into a formatted string is usually done in two steps:

1] conversion from binary to string
2] format the string, in this case with thousand separators.

This process is quite slow compared to what could be done in a smarter way.

Using the usual way, with a MACRO and an API: ustrv$ + GetNumberFormat I have these results:


┌─────────────────────────────────────────────────────────────[18-Nov-2010 at 03:00 GMT]─┐
│OS  : Microsoft Windows 7 Ultimate Edition, 64-bit (build 7600)                         │
│CPU : Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz with 2 logical core(s) with SSSE3           │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│        Algorithm notes           │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 ustrv$ + GetNumberFormat       │   103   │   46.629 │   46.320 │   46.300 │   46.293 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤


I think it could be done in less than half this time. Feel free to post any suggestion to speed up things a little.

To have a standard test all the routines have to convert the same array of values:

    NumToTest        DWORD 0
                     DWORD 9
                     DWORD 10
                     DWORD 99
                     DWORD 100
                     DWORD 999
                     DWORD 1000
                     DWORD 9999
                     DWORD 10000
                     DWORD 99999
                     DWORD 100000
                     DWORD 999999
                     DWORD 1000000
                     DWORD 9999999
                     DWORD 10000000
                     DWORD 99999999
                     DWORD 100000000
                     DWORD 999999999
                     DWORD 1000000000
                     DWORD 4294967295


After that the measurements can be reliable, and inserted into the testbed.

There are 2 files to get used to the testbed:

Readme.txt  and  Info Screen displayed with the key when
the program shows the results.

This is the final release of the testbed. If you want to use it for any purpose, get used to it.

The sources, the info, the screens, everything is included in the zip file.

Enjoy it and post your results. Press C and paste the content of clipboard into the forum.
That's all.

Frank
Title: Re: This is too slow
Post by: GregL on November 18, 2010, 04:12:18 AM
Frank,

I usually use wsprintf to convert an unsigned dword to a string because it's easy to use and fast enough.  It won't be the fastest.

┌─────────────────────────────────────────────────────────────[18-Nov-2010 at 04:30 GMT]─┐
│OS  : Microsoft Windows Vista Home Premium Edition, 32-bit Service Pack 2 (build 6002)  │
│CPU : Intel(R) Core(TM)2 Duo CPU T5750 @ 2.00GHz with 2 logical core(s) with SSSE3      │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│        Algorithm notes           │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 ustrv$ + GetNumberFormat       │   103   │   55.703 │   51.868 │   53.009 │   53.044 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│02 wsprintf                       │    43   │   10.163 │   10.177 │   10.223 │   10.204 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤



; ---------------------------------------------------------------------------
;  Algo #02  to test and code to manage the display of the results.
; ---------------------------------------------------------------------------

.data
; ---------------------------------------------------------------------------
;  put here the description of your algo to test. max 30 chars.
; ---------------------------------------------------------------------------
ALIGN 4
    NumToTest2       DWORD 0
                     DWORD 9
                     DWORD 10
                     DWORD 99
                     DWORD 100
                     DWORD 999
                     DWORD 1000
                     DWORD 9999
                     DWORD 10000
                     DWORD 99999
                     DWORD 100000
                     DWORD 999999
                     DWORD 1000000
                     DWORD 9999999
                     DWORD 10000000
                     DWORD 99999999
                     DWORD 100000000
                     DWORD 999999999
                     DWORD 1000000000
                     DWORD 4294967295

    NumFormat2       BYTE "%u",0
   
    Buffer2          BYTE 32 DUP(0)
   
; -------------------------<123456789012345678901234567890>------------------
    AlgoDesc2        BYTE  "wsprintf                      ",0
; ---------------------------------------------------------------------------

.code

align 4

    mov  AlgoSize, (EndAlgo2 - Algo2)

    jmp  Start2

Algo2:

align 4

AlgoN2 proc

; ----------------------------------------------------------------------
;  put here your code to test
; ----------------------------------------------------------------------


    mov ecx, 20
    lea eax, NumToTest2
  @@:
    push ecx
    push eax
    mov eax, [eax]
    INVOKE wsprintf, ADDR Buffer2, ADDR NumFormat2, eax
    pop eax
    add eax, 4
    pop ecx
    dec ecx
    jnz @B
   
    ret

; ----------------------------------------------------------------------
;  end point of algo to test
; ----------------------------------------------------------------------

AlgoN2 endp

Title: Re: This is too slow
Post by: GregL on November 18, 2010, 05:05:33 AM
Using udw2str from masm32 library.  No formatting options.

┌─────────────────────────────────────────────────────────────[18-Nov-2010 at 05:02 GMT]─┐
│OS  : Microsoft Windows Vista Home Premium Edition, 32-bit Service Pack 2 (build 6002)  │
│CPU : Intel(R) Core(TM)2 Duo CPU T5750 @ 2.00GHz with 2 logical core(s) with SSSE3      │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│        Algorithm notes           │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 ustrv$ + GetNumberFormat       │   103   │   52.763 │   52.645 │   52.362 │   52.623 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│02 udw2str                        │    35   │    3.425 │    3.449 │    3.437 │    3.465 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤



; ---------------------------------------------------------------------------
;  Algo #02  to test and code to manage the display of the results.
; ---------------------------------------------------------------------------

.data
; ---------------------------------------------------------------------------
;  put here the description of your algo to test. max 30 chars.
; ---------------------------------------------------------------------------
ALIGN 4
    NumToTest2       DWORD 0
                     DWORD 9
                     DWORD 10
                     DWORD 99
                     DWORD 100
                     DWORD 999
                     DWORD 1000
                     DWORD 9999
                     DWORD 10000
                     DWORD 99999
                     DWORD 100000
                     DWORD 999999
                     DWORD 1000000
                     DWORD 9999999
                     DWORD 10000000
                     DWORD 99999999
                     DWORD 100000000
                     DWORD 999999999
                     DWORD 1000000000
                     DWORD 4294967295

   ;NumFormat2       BYTE "%u",0
   
    Buffer2          BYTE 32 DUP(0)
   
; -------------------------<123456789012345678901234567890>------------------
    AlgoDesc2        BYTE  "udw2str                      ",0
; ---------------------------------------------------------------------------

.code

align 4

    mov  AlgoSize, (EndAlgo2 - Algo2)

    jmp  Start2

Algo2:

align 4

AlgoN2 proc

; ----------------------------------------------------------------------
;  put here your code to test
; ----------------------------------------------------------------------


    mov ecx, 20
    lea eax, NumToTest2
  @@:
    push ecx
    push eax
    mov eax, [eax]
    INVOKE udw2str, eax, ADDR Buffer2
    pop eax
    add eax, 4
    pop ecx
    dec ecx
    jnz @B
   
    ret

; ----------------------------------------------------------------------
;  end point of algo to test
; ----------------------------------------------------------------------

AlgoN2 endp
Title: Re: This is too slow
Post by: frktons on November 18, 2010, 10:55:33 AM
Thanks Greg. These two example are only converting the binary into ASCII string,
both need the second step in order to have a fair comparison.

By the way, the second function looks far better than the MACRO and the C function. I didn't know it
existed at all.  :red

I'll add the second step and post the results.   :U

Frank   
Title: Re: This is too slow
Post by: dedndave on November 18, 2010, 11:22:25 AM
hiyas Frank
masm32\help\masmlib.chm    :U
Title: Re: This is too slow
Post by: frktons on November 18, 2010, 11:25:44 AM
Quote from: dedndave on November 18, 2010, 11:22:25 AM
hiyas Frank
masm32\help\masmlib.chm    :U

Yeah Dave, I had a look at it after Greg posted his results.  :thumbu
I thought the MACROS were all the stuff to do the conversions, but there are
also MASM functions. Faster than C equivalent, and smaller too I guess.
Title: Re: This is too slow
Post by: frktons on November 18, 2010, 11:58:05 AM
Strange results, less than expected:


┌─────────────────────────────────────────────────────────────[18-Nov-2010 at 11:55 GMT]─┐
│OS  : Microsoft Windows 7 Ultimate Edition, 64-bit (build 7600)                         │
│CPU : Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz with 2 logical core(s) with SSSE3           │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│        Algorithm notes           │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 ustrv$ + GetNumberFormat       │    95   │   45.734 │   45.647 │   45.622 │   45.484 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│02 udw2str + GetNumberFormat      │    65   │   45.346 │   45.243 │   45.188 │   45.282 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│03 wsprintf + GetNumberFormat     │    73   │   51.642 │   51.644 │   51.640 │   51.642 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤


Maybe I have made some mistake into translating the code Greg posted, attached the whole.


Title: Re: This is too slow
Post by: hutch-- on November 18, 2010, 12:07:27 PM
Frank,

You will probably find that the API GetNumberFormat() is much slower than any number conversion algo and it will tend to level the times between different algos.
Title: Re: This is too slow
Post by: frktons on November 18, 2010, 12:36:03 PM
Quote from: hutch-- on November 18, 2010, 12:07:27 PM
Frank,

You will probably find that the API GetNumberFormat() is much slower than any number conversion algo and it will tend to level the times between different algos.

Yes Steve. I guessed it, this is the reason I called the prog TwoInOne, I'm sure
merging together the two steps in one ASM PROC will give us much better results.

I'll Start with a chunk of code Clive posted some months ago when I was moving my first
steps into MASM world. I hope I'm now able to adapt it to run inside the Testbed and time it.


; EAX = 32-bit number
; ESI = string buffer for NUL terminated ASCII
; Uses ESI,EDI,EAX,ECX,EDX

    push 0 ; Mark stack end with NUL

divloop:

    mov  ecx,1000 ; Divide into 3 digit groups
    xor edx,edx ; Clear high order 32-bit for divide
    idiv ecx ; eax = edx:eax / ecx, edx = edx:eax % ecx

    mov edi,eax ; Save division result

    mov ecx,10 ; Subdivide in 10's
    mov eax,edx ; Get remainder

    or edi,edi ; Still number left, so at least 3 digits in remainder
    jnz digit000

    cmp eax,10 ; remainder has one digit
    jb  digit0

    cmp eax,100 ; remainder has two digits
    jb  digit00

digit000: ; 3 digits

    xor edx,edx ; Clear high order 32-bit for divide
    idiv ecx    ; eax = edx:eax / ecx, edx = edx:eax % ecx
    add edx,30h ; += '0'
    push edx ; Stack

digit00: ; 2 digits

    xor edx,edx ; Clear high order 32-bit for divide
    idiv ecx    ; eax = edx:eax / ecx, edx = edx:eax % ecx
    add edx,30h ; += '0'
    push edx    ; Stack

digit0: ; 1 digit

    xor edx,edx ; Clear high order 32-bit for divide
;   idiv ecx    ; eax = edx:eax / ecx, edx = edx:eax % ecx
    add edx,30h ; += '0'
    push edx    ; Stack

    mov eax,edi ; Recover remaining number

    or eax,eax ; Zero?
    jz poploop

    push  2Ch ; Comma added to groups of three digits
    jmp divloop

poploop:
    pop eax ; Recover next digit
    mov [esi],al ; Add to string
    inc esi
    or  eax,eax ; Was it a NUL?
    jnz poploop



and after I'll try something a bit more difficult, the reciprocal IMUL.  :eek

Frank
Title: Re: This is too slow
Post by: frktons on November 18, 2010, 06:58:18 PM
And with the help of Clive now I can affirm that:


┌─────────────────────────────────────────────────────────────[18-Nov-2010 at 18:56 GMT]─┐
│OS  : Microsoft Windows 7 Ultimate Edition, 64-bit (build 7600)                         │
│CPU : Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz with 2 logical core(s) with SSSE3           │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│        Algorithm notes           │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 ustrv$ + GetNumberFormat       │    95   │   45.182 │   45.237 │   45.148 │   45.400 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│02 udw2str + GetNumberFormat      │    65   │   44.971 │   45.064 │   45.037 │   45.224 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│03 wsprintf + GetNumberFormat     │    73   │   52.046 │   51.991 │   52.041 │   51.994 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│04 Clive - IDIV and Stack         │   120   │    3.187 │    3.185 │    3.150 │    3.185 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤


That's a lot of gain already, and this is not the fastest around.  Maybe we can expect to go below 2000 with
the appropriate algo.

Frank
Title: Re: This is too slow
Post by: GregL on November 18, 2010, 09:43:15 PM
Quote from: frktonsboth need the second step in order to have a fair comparison.

Just what special format do you need an unsigned integer to be in?

Nevermind, I looked at Clive's code, you want commas (or whatever) between each group of three digits.  Why didn't you say that?
Title: Re: This is too slow
Post by: frktons on November 18, 2010, 09:48:25 PM
Quote from: GregL on November 18, 2010, 09:43:15 PM
Quoteboth need the second step in order to have a fair comparison.

Just what special format do you need an unsigned integer to be in?

unsigned integer = 4294967295
ASCII formatted =  4.294.967.295

This is the reason for using: GetNumberFormat
Title: Re: This is too slow
Post by: frktons on November 18, 2010, 10:28:18 PM
It looks like The bigger the size of the prog, the fastest it gets  :P


┌─────────────────────────────────────────────────────────────[18-Nov-2010 at 22:25 GMT]─┐
│OS  : Microsoft Windows 7 Ultimate Edition, 64-bit (build 7600)                         │
│CPU : Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz with 2 logical core(s) with SSSE3           │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│        Algorithm notes           │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 ustrv$ + GetNumberFormat       │    95   │   45.421 │   45.113 │   45.035 │   44.975 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│02 udw2str + GetNumberFormat      │    65   │   44.840 │   44.676 │   45.928 │   44.803 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│03 wsprintf + GetNumberFormat     │    73   │   51.733 │   51.713 │   52.677 │   51.742 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│04 Clive - IDIV and Stack         │   120   │    3.186 │    3.125 │    3.207 │    3.167 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│05 Clive - reciprocal IMUL        │   157   │    2.110 │    2.101 │    2.111 │    2.049 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤


We are nearing the limits probably  :bg
Title: Re: This is too slow
Post by: frktons on November 18, 2010, 11:03:31 PM
Quote from: oex on November 18, 2010, 10:58:41 PM
Hey Frank, just a thought for your app.... Honestly I havent used the working version yet, the copy I downloaded was the initial test that didnt work but I was looking at the output and thinking that you could highlight or just highlight :lol the relevent copied info ie best results for forum posts....

I have noticed you were talking about clipboard copying so that should be easy?

I'm not sure I got what you mean. In the last release of the testbed, just the previous post of mine,
There is the [C] option to copy the results embedded in a couple of tags. You then just
paste the content of the clipboard into the forum and that's all.

Give it a try and let me know what you meant.
Title: Re: This is too slow
Post by: frktons on November 18, 2010, 11:05:24 PM
Quote from: GregL on November 18, 2010, 09:43:15 PM

Nevermind, I looked at Clive's code, you want commas (or whatever) between each group of three digits.  Why didn't you say that?

Sorry Greg, I thought it was clear from the code posted, but I was apparently wrong.  :P

Thanks for making that  point clear.  :U
Title: Re: This is too slow
Post by: hutch-- on November 18, 2010, 11:12:24 PM
Frank,

Here is a quick scruffy that may do the job for you. It could be tweaked a bit more but it should be reasonably fast.


IF 0  ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                      Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include\masm32rt.inc

    format_num_string PROTO :DWORD,:DWORD

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    LOCAL buffer[64]:BYTE
    LOCAL pbuf  :DWORD

    mov pbuf, ptr$(buffer)

    fn format_num_string,"1234567890",pbuf
    print pbuf,13,10

    fn format_num_string,"123456789",pbuf
    print pbuf,13,10

    fn format_num_string,"12345678",pbuf
    print pbuf,13,10

    fn format_num_string,"1234567",pbuf
    print pbuf,13,10

    fn format_num_string,"123456",pbuf
    print pbuf,13,10

    fn format_num_string,"12345",pbuf
    print pbuf,13,10

    fn format_num_string,"1234",pbuf
    print pbuf,13,10

    fn format_num_string,"123",pbuf
    print pbuf,13,10

    fn format_num_string,"12",pbuf
    print pbuf,13,10

    fn format_num_string,"1",pbuf
    print pbuf,13,10


    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

format_num_string proc src:DWORD,dst:DWORD

    push ebx
    push esi
    push edi

  ; -----------------
  ; get source length
  ; -----------------
    mov ebx, src
    sub ebx, 1
  @@:
    add ebx, 1
    cmp BYTE PTR [ebx], 0
    jne @B
    sub ebx, src
  ; -----------------

    .data
  ; --------------------------------------------------
  ; store the initial spacing counter value in a table
  ; --------------------------------------------------
      align 4
      tbl1 dd 0,0,0,0,1,2,3,1,2,3,1,0

;     1=0 0
;     2=0 00
;     3=0 000
;     4=1 0000
;     5=2 00000
;     6=3 000000
;     7=1 0000000
;     8=2 00000000
;     9=3 000000000
;    10=1 0000000000

    .code

    mov ebx, [tbl1+ebx*4]

    mov esi, src
    mov edi, dst
    sub esi, 1

  stlp:
    add esi, 1
    movzx eax, BYTE PTR [esi]
    test eax, eax
    jz bye
    mov [edi], al
    add edi, 1

    sub ebx, 1                  ; dec the spacing counter
    jnz stlp                    ; loop back if its not zero

    cmp BYTE PTR [esi+1], 0     ; 1 byte look ahead
    je bye                      ; exit if char its zero terminator

    mov BYTE PTR [edi], ","     ; change the character here
    add edi, 1
    mov ebx, 3                  ; reset the spacing counter to 3

    jmp stlp

  bye:
    mov BYTE PTR [edi], 0

    pop edi
    pop esi
    pop ebx

    ret

format_num_string endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start
Title: Re: This is too slow
Post by: oex on November 18, 2010, 11:13:48 PM
Quote from: frktons on November 18, 2010, 10:28:18 PM
│02 udw2str + GetNumberFormat      │    65   │   44.840 │   44.676 │   45.928 │   44.803 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│03 wsprintf + GetNumberFormat     │    73   │   51.733 │   51.713 │   52.677 │   51.742 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│04 Clive - IDIV and Stack         │   120   │    3.186 │    3.125 │    3.207 │    3.167 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│05 Clive - reciprocal IMUL        │   157   │    2.110 │    2.101 │    2.111 │    2.049
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤

I was thinking somewhere along the lines of the above but removed post because code tags cant contain other tags and it removes the formating
Title: Re: This is too slow
Post by: Antariy on November 18, 2010, 11:22:11 PM
Quote from: oex on November 18, 2010, 11:13:48 PM
Quote from: frktons on November 18, 2010, 10:28:18 PM
│02 udw2str + GetNumberFormat      │    65   │   44.840 │   44.676 │   45.928 │   44.803 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│03 wsprintf + GetNumberFormat     │    73   │   51.733 │   51.713 │   52.677 │   51.742 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│04 Clive - IDIV and Stack         │   120   │    3.186 │    3.125 │    3.207 │    3.167 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│05 Clive - reciprocal IMUL        │   157   │    2.110 │    2.101 │    2.111 │    2.049
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤

I was thinking somewhere along the lines of the above but removed post because code tags cant contain other tags and it removes the formating

This will require to save previous clocks. Scan them after all. Scan the output string buffer for founding place where is needed to instert the [b][/b] tags. That is will be relatively slow for nothing :P I guess Frank not desire to do work which can be done by people  :lol
Title: Re: This is too slow
Post by: frktons on November 18, 2010, 11:27:17 PM
Quote from: hutch-- on November 18, 2010, 11:12:24 PM
Frank,

Here is a quick scruffy that may do the job for you. It could be tweaked a bit more but it should be reasonably fast.


IF 0  ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                      Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include\masm32rt.inc

    format_num_string PROTO :DWORD,:DWORD

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    LOCAL buffer[64]:BYTE
    LOCAL pbuf  :DWORD

    mov pbuf, ptr$(buffer)

    fn format_num_string,"1234567890",pbuf
    print pbuf,13,10

    fn format_num_string,"123456789",pbuf
    print pbuf,13,10

    fn format_num_string,"12345678",pbuf
    print pbuf,13,10

    fn format_num_string,"1234567",pbuf
    print pbuf,13,10

    fn format_num_string,"123456",pbuf
    print pbuf,13,10

    fn format_num_string,"12345",pbuf
    print pbuf,13,10

    fn format_num_string,"1234",pbuf
    print pbuf,13,10

    fn format_num_string,"123",pbuf
    print pbuf,13,10

    fn format_num_string,"12",pbuf
    print pbuf,13,10

    fn format_num_string,"1",pbuf
    print pbuf,13,10


    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

format_num_string proc src:DWORD,dst:DWORD

    push ebx
    push esi
    push edi

  ; -----------------
  ; get source length
  ; -----------------
    mov ebx, src
    sub ebx, 1
  @@:
    add ebx, 1
    cmp BYTE PTR [ebx], 0
    jne @B
    sub ebx, src
  ; -----------------

    .data
  ; --------------------------------------------------
  ; store the initial spacing counter value in a table
  ; --------------------------------------------------
      align 4
      tbl1 dd 0,0,0,0,1,2,3,1,2,3,1,0

;     1=0 0
;     2=0 00
;     3=0 000
;     4=1 0000
;     5=2 00000
;     6=3 000000
;     7=1 0000000
;     8=2 00000000
;     9=3 000000000
;    10=1 0000000000

    .code

    mov ebx, [tbl1+ebx*4]

    mov esi, src
    mov edi, dst
    sub esi, 1

  stlp:
    add esi, 1
    movzx eax, BYTE PTR [esi]
    test eax, eax
    jz bye
    mov [edi], al
    add edi, 1

    sub ebx, 1                  ; dec the spacing counter
    jnz stlp                    ; loop back if its not zero

    cmp BYTE PTR [esi+1], 0     ; 1 byte look ahead
    je bye                      ; exit if char its zero terminator

    mov BYTE PTR [edi], ","     ; change the character here
    add edi, 1
    mov ebx, 3                  ; reset the spacing counter to 3

    jmp stlp

  bye:
    mov BYTE PTR [edi], 0

    pop edi
    pop esi
    pop ebx

    ret

format_num_string endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start


Thanks Steve.  :U

To see if it is reasonably fast and how fast it is, you should insert it into the testbed.  :bg

It should be easy enough for everybody here to do it. It is much harder for me to convert
all the code posted into a suitable form.  :(

Please everybody, why don't you start to use the testbed and get used to it? It is not that hard I guess.

Quote from: Antariy on November 18, 2010, 11:22:11 PMI guess Frank not desire to do work which can be done by people  :lol

Yeah !!!! that's the point my friend.  :U


Frank
Title: Re: This is too slow
Post by: hutch-- on November 18, 2010, 11:40:39 PM
 :bg

I am hard to get work out of.

Her is the same algo tidied up a bit with the stack frame removed and register usage reduced.


; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    .data
  ; --------------------------------------------------
  ; store the initial spacing counter value in a table
  ; --------------------------------------------------
      align 4
      tbl1 dd 0,0,0,0,1,2,3,1,2,3,1,0

;     1=0 0
;     2=0 00
;     3=0 000
;     4=1 0000
;     5=2 00000
;     6=3 000000
;     7=1 0000000
;     8=2 00000000
;     9=3 000000000
;    10=1 0000000000

    .code

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

format_num_string proc src:DWORD,dst:DWORD

  ; -----------------
  ; get source length
  ; -----------------
    mov ecx, [esp+4]
    sub ecx, 1
  @@:
    add ecx, 1
    cmp BYTE PTR [ecx], 0
    jne @B
    sub ecx, [esp+4]
  ; -----------------

    push esi

    mov ecx, [tbl1+ecx*4]       ; set the initial spacing from the table

    mov esi, [esp+4][4]
    mov edx, [esp+8][4]
    sub esi, 1

  stlp:
    add esi, 1
    movzx eax, BYTE PTR [esi]
    test eax, eax
    jz bye
    mov [edx], al
    add edx, 1
    sub ecx, 1                  ; dec the spacing counter
    jnz stlp                    ; loop back if its not zero

    cmp BYTE PTR [esi+1], 0     ; 1 byte look ahead
    je bye                      ; exit if char its zero terminator
    mov BYTE PTR [edx], ","     ; write the spacer. <<<<<< change the character here
    add edx, 1
    mov ecx, 3                  ; reset the spacing counter to 3
    jmp stlp

  bye:
    mov BYTE PTR [edx], 0       ; write terminator
    pop esi
    ret 8

format_num_string endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Title: Re: This is too slow
Post by: frktons on November 18, 2010, 11:51:29 PM
To have a standard test all the routines have to convert the same array of values:

    NumToTest        DWORD 0
                     DWORD 9
                     DWORD 10
                     DWORD 99
                     DWORD 100
                     DWORD 999
                     DWORD 1000
                     DWORD 9999
                     DWORD 10000
                     DWORD 99999
                     DWORD 100000
                     DWORD 999999
                     DWORD 1000000
                     DWORD 9999999
                     DWORD 10000000
                     DWORD 99999999
                     DWORD 100000000
                     DWORD 999999999
                     DWORD 1000000000
                     DWORD 4294967295


After that the measurements can be reliable, and inserted into the testbed.

There are 2 files to get used to the testbed:

Readme.txt  and  Info Screen displayed with the key when
the program shows the results.

This is the final release of the testbed. If you want to use it for any purpose, get used to it.

The sources, the info, the screens, everything is included in the zip file.

Enjoy it and post your results. Press [C] and paste the content of clipboard into the forum.
That's all.

Frank
Title: Re: This is too slow
Post by: GregL on November 19, 2010, 12:04:53 AM
Frank,

I was just being dense, damn painkillers.  I wondered why the heck you were calling GetNumberFormat.  At least I used the testbed :bg

Also, some of us like commas and others like periods for the separators.  Use GetLocaleInfo with LCType flag set to LOCALE_STHOUSAND.

Title: Re: This is too slow
Post by: frktons on November 19, 2010, 12:14:17 AM
Quote from: GregL on November 19, 2010, 12:04:53 AM
Frank,

I was just being dense, damn painkillers.  I wondered why the heck you were calling GetNumberFormat.  At least I used the testbed :bg

Also, some of us like commas and others like periods for the separators.  Use GetLocaleInfo with LCType flag set to LOCALE_STHOUSAND.

I used GetNumberFormat to have the opportunity to make this thread and select the best code around to replace it.  :bg

I'm glad to know that you started to use it.  :U

Feel free to post the replacing code, when you get rid of the painkillers, and I'll gladly do that. For the time being I'm quite tired and here it is
night. Tomorrow I'll have a look at it. By the way the most simple solution is to replace yourself this line:


    Tsep        DD  ".",0   ;   used for thousand number separator - choose yours


with this


    Tsep        DD  ",",0   ;   used for thousand number separator - choose yours


As the comment says, choose yours.  :lol

Frank.
Title: Re: This is too slow
Post by: GregL on November 19, 2010, 12:19:27 AM
Frank,

The painkillers won't be going away any time soon. Have a good night.

Title: Re: This is too slow
Post by: frktons on November 19, 2010, 12:23:31 AM
Quote from: GregL on November 19, 2010, 12:19:27 AM
Frank,

The painkillers won't be going away any time soon. Have a good night.

Good night Greg, I'll stay some more time around. When I feel my eyes are closing, I'll go.  :P
Title: Re: This is too slow
Post by: Antariy on November 19, 2010, 12:27:51 AM
I guess that Greg's suggestion is simple to implementation and worth enough - Western and European formats of separation is different.

And addition of this code at start of testbed seems to be easy...

invoke GetLocaleInfo,LOCALE_USER_DEFAULT,LOCALE_STHOUSAND,offset Tsep,4


...But I have results of non-breakable space for thousand separator, this is right in some degree, but most of users use "." as thousands separator here.
And main point is: in OEM encoding the non-breakable space have other code, and value returned by GetLocaleInfo should be translated to ANSI, otherwise separator looks like a letter and make mess.
Title: Re: This is too slow
Post by: frktons on November 19, 2010, 12:33:36 AM
Quote from: Antariy on November 19, 2010, 12:27:51 AM
I guess that Greg's suggestion is simple to implementation and worth enough - Western and European formats of separation is different.

And addition of this code at start of testbed seems to be easy...

push eax
mov edx,esp
invoke GetLocaleInfo,LOCALE_USER_DEFAULT,LOCALE_STHOUSAND,edx,4
pop Tsep

...But I have results of non-breakable space for thousand separator, this is right in some degree, but most of users use "." as thousands separator here.
And main point is: in OEM encoding the non-breakable space have other code, and value returned by GetLocaleInfo should be translated to ANSI, otherwise separator looks like a letter and make mess.

Alex, if you feel like, please try it in the last posted release and see what you get, after others can test it
and tell us if it works for different countries as well.

If you do, post the new package, or only the part you changed, and we'll give it a try.

Frank
Title: Re: This is too slow
Post by: Antariy on November 19, 2010, 12:44:44 AM
Quote from: frktons on November 19, 2010, 12:33:36 AM
Alex, if you feel like, please try it in the last posted release and see what you get, after others can test it
and tell us if it works for different countries as well.

Frank, if do this in that way, then needed to translate ANSI to OEM, to display things. Otherwise you can get this (as I)

├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 ustrv$ + GetNumberFormat       │    95   │  184а052 │  182а061 │  180а817 │  184а765 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│02 udw2str + GetNumberFormat      │    65   │  181а947 │  180а363 │  181а235 │  181а365 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤


CharToOem is good way to do things.
I get

├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 ustrv$ + GetNumberFormat       │    95   │   74 845 │   73 199 │   72 696 │   71 789 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│02 udw2str + GetNumberFormat      │    65   │   71 465 │   72 461 │   72 418 │   72 388 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│03 wsprintf + GetNumberFormat     │    73   │   90 419 │   90 686 │   89 978 │   89 189 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤


For this code

mov ebx,offset Tsep
invoke GetLocaleInfo,LOCALE_USER_DEFAULT,LOCALE_STHOUSAND,ebx,4
invoke CharToOem,ebx,ebx




Alex
Title: Re: This is too slow
Post by: frktons on November 19, 2010, 12:52:12 AM
Thanks Alex.

As I said before:

Quote
By the way the most simple solution is to replace yoursefl this line:


    Tsep        DD  ".",0   ;   used for thousand number separator - choose yours


with this


    Tsep        DD  ",",0   ;   used for thousand number separator - choose yours


As the comment says, choose yours.  :lol

If somebody posts a working solution, I'll gladly insert it into testbed  :U
Title: Re: This is too slow
Post by: Antariy on November 19, 2010, 12:57:37 AM
Quote from: frktons on November 19, 2010, 12:52:12 AM
If somebody posts a working solution, I'll gladly insert it into testbed  :U


........
Main PROC
mov ebx,offset Tsep  ; THIS IS INSERTED
invoke GetLocaleInfo,LOCALE_USER_DEFAULT,LOCALE_STHOUSAND,ebx,4  ; THIS IS INSERTED
invoke CharToOem,ebx,ebx  ; THIS IS INSERTED

    mov RowInitialFile, One   
    mov RowFinalFile,   MaxRows   
    mov ColInitialFile, One   
    mov ColFinalFile,   MaxCols 
.........


:P



Alex
Title: Re: This is too slow
Post by: frktons on November 19, 2010, 01:00:00 AM
Quote from: Antariy on November 19, 2010, 12:57:37 AM

........
Main PROC
invoke GetLocaleInfo,LOCALE_USER_DEFAULT,LOCALE_STHOUSAND,offset Tsep,4 ; THIS IS INSERTED
invoke CharToOem,offset Tsep,offset Tsep ; THIS IS INSERTED

    mov RowInitialFile, One   
    mov RowFinalFile,   MaxRows   
    mov ColInitialFile, One   
    mov ColFinalFile,   MaxCols 
.........


:P
Alex

The results you posted don't show point or comma separators, so what kind of display we get
with these two added instructions?

Title: Re: This is too slow
Post by: Antariy on November 19, 2010, 01:04:03 AM
Quote from: frktons on November 19, 2010, 01:00:00 AM
The results you posted don't show point or comma separators, so what kind of display we get
with these two added instructions?

That is question for MS - why they are think that here is used non-breakable space for an Thousands separator. Probably they know better, which kind of separators is used European peoples.
This code shoud return and convert to OEM the separator of 1000ds. And it do this - it return sparator which is provided by OS relatively to locale settings. This is not bug of code - this is decision of OS which char to return.



Alex
Title: Re: This is too slow
Post by: frktons on November 19, 2010, 01:07:10 AM
Quote from: Antariy on November 19, 2010, 01:04:03 AM
That is question for MS - why they are think that here is used non-breakable space for an Thousands separator. Probably they know better, which kind of separators is used European peoples.
This code shoud return and convert to OEM the separator of 1000ds. And it do this - it return sparator which is provided by OS relatively to locale settings. This is not bug of code - this is decision of OS which char to return.

Alex

On my pc I get:



┌─────────────────────────────────────────────────────────────[19-Nov-2010 at 01:07 GMT]─┐
│OS  : Microsoft Windows 7 Ultimate Edition, 64-bit (build 7600)                         │
│CPU : Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz with 2 logical core(s) with SSSE3           │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│        Algorithm notes           │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 ustrv$ + GetNumberFormat       │    95   │   45.524 │   45.230 │   45.316 │   45.084 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│02 udw2str + GetNumberFormat      │    65   │   45.406 │   45.070 │   45.114 │   45.248 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│03 wsprintf + GetNumberFormat     │    73   │   52.217 │   52.330 │   52.267 │   52.196 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│04 Clive - IDIV and Stack         │   120   │    3.006 │    3.008 │    3.000 │    3.006 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│05 Clive - reciprocal IMUL        │   157   │    1.966 │    1.984 │    1.998 │    1.951 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤

Title: Re: This is too slow
Post by: Antariy on November 19, 2010, 01:09:50 AM
Quote from: frktons on November 19, 2010, 01:07:10 AM
On my pc I get:

And this is right results for European country  :wink
Title: Re: This is too slow
Post by: GregL on November 19, 2010, 01:10:33 AM
I get a comma.

Quote from: Alex... but most of users use "." as thousands separator here.
Australia, Canada, U.S.A. and UK among others use a comma.
Title: Re: This is too slow
Post by: frktons on November 19, 2010, 01:11:10 AM
Quote from: Antariy on November 19, 2010, 01:09:50 AM
And this is right results for European country  :wink

Now let's see what other countries get  :lol

Thanks Alex, always very helpful.  :U
Title: Re: This is too slow
Post by: Antariy on November 19, 2010, 01:11:26 AM

┌─────────────────────────────────────────────────────────────[19-Nov-2010 at 01:12 GMT]─┐
│OS  : Microsoft Windows XP Professional Service Pack 2 (build 2600)                     │
│CPU : Intel(R) Celeron(R) CPU 2.13GHz with 1 logical core(s) with SSE3                  │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│        Algorithm notes           │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 ustrv$ + GetNumberFormat       │    95   │   74 098 │   72 932 │   72 492 │   72 319 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│02 udw2str + GetNumberFormat      │    65   │   71 791 │   71 894 │   71 039 │   70 818 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│03 wsprintf + GetNumberFormat     │    73   │   90 070 │   90 501 │   91 407 │   90 484 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│04 Clive - IDIV and Stack         │   120   │    9 060 │    9 605 │    8 791 │    8 743 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│05 Clive - reciprocal IMUL        │   157   │    3 785 │    3 444 │    3 575 │    3 566 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤

Title: Re: This is too slow
Post by: frktons on November 19, 2010, 01:12:58 AM
Quote from: GregL on November 19, 2010, 01:10:33 AM
I get a comma.

Quote from: Alex... but most of users use "." as thousands separator here.
Australia, Canada, U.S.A. and UK among others use a comma.


About half population use comma and about half use point, more or less, now all should be happy  :lol
Title: Re: This is too slow
Post by: Antariy on November 19, 2010, 01:13:15 AM
Quote from: GregL on November 19, 2010, 01:10:33 AM
I get a comma.

Quote from: Alex... but most of users use "." as thousands separator here.
Australia, Canada, U.S.A. and UK among others use a comma.

When I sayed "here" I meant here - in my country and in Europe.  :lol



Alex
Title: Re: This is too slow
Post by: frktons on November 19, 2010, 01:13:40 AM
Quote from: Antariy on November 19, 2010, 01:11:26 AM

┌─────────────────────────────────────────────────────────────[19-Nov-2010 at 01:12 GMT]─┐
│OS  : Microsoft Windows XP Professional Service Pack 2 (build 2600)                     │
│CPU : Intel(R) Celeron(R) CPU 2.13GHz with 1 logical core(s) with SSE3                  │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│        Algorithm notes           │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 ustrv$ + GetNumberFormat       │    95   │   74 098 │   72 932 │   72 492 │   72 319 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│02 udw2str + GetNumberFormat      │    65   │   71 791 │   71 894 │   71 039 │   70 818 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│03 wsprintf + GetNumberFormat     │    73   │   90 070 │   90 501 │   91 407 │   90 484 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│04 Clive - IDIV and Stack         │   120   │    9 060 │    9 605 │    8 791 │    8 743 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│05 Clive - reciprocal IMUL        │   157   │    3 785 │    3 444 │    3 575 │    3 566 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤



Alex in your country you use a space separator?
Title: Re: This is too slow
Post by: Antariy on November 19, 2010, 01:17:10 AM
Quote from: frktons on November 19, 2010, 01:13:40 AM
Alex in your country you use a space separator?

Should be

Quote from: frktons on November 19, 2010, 01:13:40 AM
Alex, MS think  that in your country you used a space separator.

:bg
Title: Re: This is too slow
Post by: GregL on November 19, 2010, 01:17:52 AM
Quote from: AlexWhen I sayed "here" I meant here - in my country and in Europe.  lol

Oh, I thought you meant users here in the forum. :lol

Title: Re: This is too slow
Post by: frktons on November 19, 2010, 01:18:56 AM
Quote from: Antariy on November 19, 2010, 01:17:10 AM
Quote from: frktons on November 19, 2010, 01:13:40 AM
Alex in your country you use a space separator?

Should be

Quote from: frktons on November 19, 2010, 01:13:40 AM
Alex, MS think  that in your country you used a space separator.

:bg

:lol :lol :lol :dazzled: :dazzled: :dazzled: :lol :lol :lol :dazzled: :dazzled: :dazzled: :dazzled: :lol :lol :lol

Going to sleep now. Enjoy.

Frank
Title: Re: This is too slow
Post by: Antariy on November 19, 2010, 01:23:02 AM
Quote from: GregL on November 19, 2010, 01:17:52 AM
Oh, I thought you meant users here in the forum. :lol

:bg
Title: Re: This is too slow
Post by: dedndave on November 19, 2010, 03:36:03 AM
i am sure there is a function you can call to get the right seperator for the user's country/code page   :P
Title: Re: This is too slow
Post by: dedndave on November 19, 2010, 03:43:01 AM
well - it is in the  registry

[HKEY_CURRENT_USER\Control Panel\International]
"sMonThousandSep"=","
Title: Re: This is too slow
Post by: oex on November 19, 2010, 03:50:43 AM
:bg


CountryThousandsSeperatorCode PROC USES esi

print "Hello"
mov esi, input("What Country Thousands Seperator Code Are You Looking For? ")
print "Whiz Bang Whir"
print "I'm sorry I dont have that Country Thousands Seperator Code"
mov esi, input("Please Enter your Country Thousands Seperator Code ")
print "Your Country Thousands Seperator Code is: "
print esi

ret

CountryThousandsSeperatorCode ENDP
Title: Re: This is too slow
Post by: dedndave on November 19, 2010, 03:53:58 AM
that'll work - lol

but, i was thinking of a little routine during init that reads the registry value and stores it   :P
then, the conversion routine can grab the stored value, or it can be passed as a parm
Title: Re: This is too slow
Post by: frktons on November 19, 2010, 06:52:43 AM
Quote from: dedndave on November 19, 2010, 03:53:58 AM
that'll work - lol

but, i was thinking of a little routine during init that reads the registry value and stores it   :P
then, the conversion routine can grab the stored value, or it can be passed as a parm

This feature has already been implemented. Do you want to change the actual working one
with some other weird one?  :lol
Title: Re: This is too slow
Post by: frktons on November 19, 2010, 08:40:50 AM
Having a look at Hutch's example, it looks like the code is using an "already converted
unsigned dword into a string":

   fn format_num_string,"1234567890",pbuf


and this is not the task we are trying to accomplish.
The task here is to convert an unsigned dword into an ASCII string with thousand separator.

So the starting point has to be an array of unsigned dword value as stated in the first post.

Frank
Title: Re: This is too slow
Post by: frktons on November 19, 2010, 01:13:15 PM
Just for fun I tried to implement Hutch's code, and guess what?
He got a good result, considering he is working on a two steps
algo:


┌─────────────────────────────────────────────────────────────[19-Nov-2010 at 13:09 GMT]─┐
│OS  : Microsoft Windows 7 Ultimate Edition, 64-bit (build 7600)                         │
│CPU : Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz with 2 logical core(s) with SSSE3           │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│        Algorithm notes           │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 ustrv$ + GetNumberFormat       │    95   │   44.782 │   45.197 │   44.310 │   44.255 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│02 udw2str + GetNumberFormat      │    65   │   43.805 │   45.017 │   43.724 │   43.839 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│03 wsprintf + GetNumberFormat     │    73   │   51.995 │   51.742 │   50.983 │   50.973 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│04 Clive - IDIV and Stack         │   120   │    3.029 │    3.007 │    3.035 │    3.016 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│05 Clive - reciprocal IMUL        │   157   │    2.004 │    1.987 │    1.954 │    1.989 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│06 Hutch ustr$ + format algo      │   159   │    5.792 │    5.901 │    5.862 │    5.783 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤


Congrats Hutch, if you try harder, you can get even better than that.  :P

Frank
Title: Re: This is too slow
Post by: hutch-- on November 19, 2010, 01:37:17 PM
 :bg

ustr$() is a MSVCRT function call. The "format_num_string" was designed to do just what it says.
Title: Re: This is too slow
Post by: frktons on November 19, 2010, 01:44:21 PM
Quote from: hutch-- on November 19, 2010, 01:37:17 PM
:bg

ustr$() is a MSVCRT function call. The "format_num_string" was designed to do just what it says.

I think that combining together the udw2str code and the one you have used to
format the string, you could get results that are 30-40% faster than the combination of
ustrv$ and your formatting algo.  :U
Title: Re: This is too slow
Post by: hutch-- on November 19, 2010, 01:56:50 PM
I think you could combine the numeric conversion and the output formatting with some reasonable gains but the formatting algo is useful in its own right and may end up in a library. Depending on how you do the conversion you can save the length check as you should have the length from the conversion.
Title: Re: This is too slow
Post by: frktons on November 19, 2010, 02:09:48 PM
Quote from: hutch-- on November 19, 2010, 01:56:50 PM
I think you could combine the numeric conversion and the output formatting with some reasonable gains but the formatting algo is useful in its own right and may end up in a library. Depending on how you do the conversion you can save the length check as you should have the length from the conversion.

Considering that udw2str uses a magic number:
    mov ecx,429496730

and it is .386 compatible, there is probably enough room for optimizing the conversion just using
MMX or XMM registers and SSE2 and upwards opcodes. An entire thousand separated unsigned dword
uses only 14 bytes, including the NULL terminator. And an XMM register can hold up to 16 bytes.

I've to think a lot about this simple task. Maybe the limits are still far from what we got till now.

By the way, the tests started yesterday, there is a lot of time ahead.  :P



Title: Re: This is too slow
Post by: Antariy on November 19, 2010, 09:20:33 PM
Quote from: dedndave on November 19, 2010, 03:36:03 AM
i am sure there is a function you can call to get the right seperator for the user's country/code page   :P

Dave, are you make suggestions before checking them???

GetLocaleInfo returned right separator, and IT GET THIS FROM REGISTRY. So, you can use RegOpenKey and RegQueryValue and RegCloseKey APIs to doing this.

I'm prefer use one (one!) API for the same results :P



Alex
Title: Re: This is too slow
Post by: Antariy on November 19, 2010, 09:23:14 PM
Quote from: dedndave on November 19, 2010, 03:53:58 AM
but, i was thinking of a little routine during init that reads the registry value and stores it   :P
then, the conversion routine can grab the stored value, or it can be passed as a parm

The same - read post above. Of course, make things harder is very interesting, though :P
Title: Re: This is too slow
Post by: frktons on November 19, 2010, 09:30:15 PM
I'm tempted to try a combination of algos and see what I get.

First: a lookup table with initialized string four bytes long, with group of three digit and the separator.
Divide the number for 1000 and use the remainder as an index for the look-up table to get the
sequence of digits. Pushing the 4 bytes into the stack, and go to next division, checking if the
number is > 999 to perform he division by 1000, or use it directly as table index.

This should be fast enough, I guess, max 3 division of integer numbers, so IDIV. Pushing and popping
4 bytes at a time, and building the final formatted string.

I'll try it next days, as I've got time enough.

Frank
Title: Re: This is too slow
Post by: Antariy on November 19, 2010, 09:33:10 PM
Quote from: frktons on November 19, 2010, 09:30:15 PM
This should be fast enough, I guess, max 3 division of integer numbers, so IDIV. Pushing and popping
4 bytes at a time, and building the final formatted string.

This will be fast, but will require ~4KB table of numbers.  :eek  :lol
Title: Re: This is too slow
Post by: frktons on November 19, 2010, 09:34:42 PM
Quote from: Antariy on November 19, 2010, 09:33:10 PM
This will be fast, but will require ~4KB table of numbers.  :eek  :lol

I can afford this, and maybe more ....  :P
Title: Re: This is too slow
Post by: Antariy on November 19, 2010, 09:37:34 PM
Quote from: frktons on November 19, 2010, 09:34:42 PM
Quote from: Antariy on November 19, 2010, 09:33:10 PM
This will be fast, but will require ~4KB table of numbers.  :eek  :lol

I can afford this, and maybe more ....  :P

:P  :lol
Title: Re: This is too slow
Post by: frktons on November 19, 2010, 09:40:45 PM
After that experiment, I'd like to try what happens using MMX and XMM registers to hold data
before filling the formatted string. I have to see some SSE2/3 opcodes that can suit the task.

Not sure at the moment how to do it, but I have a vague intuition something can be done in
a very effective way.  :lol

Probably prefilling an XMM registers with the separators, depending on the magnitude of the
number to format, and after filling the appropriate bytes with the digits extracted with "magic numbers" or
anything fast enough.  ::)
Title: Re: This is too slow
Post by: hutch-- on November 19, 2010, 10:02:18 PM
For number conversions I have a faster signed DWORD version that was written by Paul Dixon. This may be useful for some of the tasks you have in mind. It also passes exhaustive testing over the full signed range.


; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

align 16

ltoa_ex proc LongVar:DWORD,answer:DWORD

  ; --------------------------------------------------------------------------------
  ; this algorithm was written by Paul Dixon and has been converted to MASM notation
  ; --------------------------------------------------------------------------------

    push esi
    push edi

    mov eax, [esp+4+8]          ; LongVar            ; get number
    mov ecx, [esp+8+8]          ; answer             ; get pointer to answer string
    jmp over

    align 16
    chartab:
      dd "00","10","20","30","40","50","60","70","80","90"
      dd "01","11","21","31","41","51","61","71","81","91"
      dd "02","12","22","32","42","52","62","72","82","92"
      dd "03","13","23","33","43","53","63","73","83","93"
      dd "04","14","24","34","44","54","64","74","84","94"
      dd "05","15","25","35","45","55","65","75","85","95"
      dd "06","16","26","36","46","56","66","76","86","96"
      dd "07","17","27","37","47","57","67","77","87","97"
      dd "08","18","28","38","48","58","68","78","88","98"
      dd "09","19","29","39","49","59","69","79","89","99"

  over:
    ; on entry eax=number to convert, ecx=pointer to answer buffer (minimum 12 bytes)
    ; on exit, eax,ecx,edx are undefined, all other registers are preserved.
    ; answer is in location pointed to by ecx on entry

  signed:
    ; do a signed DWORD to ASCII
    or eax,eax                          ; test sign
    jns udword                          ; if +ve, continue as for unsigned
    neg eax                             ; else, make number positive
    mov byte ptr [ecx],"-"              ; include the - sign
    add ecx, 1                          ; update the pointer

  udword:
    ; unsigned DWORD to ASCII
    mov esi,ecx                         ; get pointer to answer
    mov edi,eax                         ; save a copy of the number

    mov edx, 0D1B71759h                 ; =2^45\10000    13 bit extra shift
    mul edx                             ; gives 6 high digits in edx

    mov eax, 068DB9h                    ; =2^32\10000+1

    shr edx,13                          ; correct for multiplier offset used to give better accuracy
    jz skiphighdigits                   ; if zero then don't need to process the top 6 digits

    mov ecx,edx                         ; get a copy of high digits
    imul ecx,10000                      ; scale up high digits
    sub edi,ecx                         ; subtract high digits from original. EDI now = lower 4 digits

    mul edx                             ; get first 2 digits in edx
    mov ecx,100                         ; load ready for later

    jnc next1                           ; if zero, supress them by ignoring
    cmp edx,9                           ; 1 digit or 2?
    ja  ZeroSupressed                   ; 2 digits, just continue with pairs of digits to the end

    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dh                        ; but only write the 1 we need, supress the leading zero
    add esi, 1
    jmp ZS1                             ; continue with pairs of digits to the end

  next1:
    mul ecx                             ; get next 2 digits
    jnc next2                           ; if zero, supress them by ignoring
    cmp edx,9                           ; 1 digit or 2?
    ja  ZS1a                            ; 2 digits, just continue with pairs of digits to the end

    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dh                        ; but only write the 1 we need, supress the leading zero
    add esi, 1
    jmp ZS2                             ; continue with pairs of digits to the end

  next2:
    mul ecx                             ; get next 2 digits
    jnc short next3                     ; if zero, supress them by ignoring
    cmp edx,9                           ; 1 digit or 2?
    ja  ZS2a                            ; 2 digits, just continue with pairs of digits to the end
     
    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dh                        ; but only write the 1 we need, supress the leading zero
    add esi, 1
    jmp ZS3                             ; continue with pairs of digits to the end

  next3:

  skiphighdigits:
    mov eax,edi                         ; get lower 4 digits

    mov ecx,100

    mov edx,28F5C29h                    ; 2^32\100 +1
    mul edx
    jnc next4                           ; if zero, supress them by ignoring
    cmp edx,9                           ; 1 digit or 2?
    ja  ZS3a                            ; 2 digits, just continue with pairs of digits to the end

    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dh                        ; but only write the 1 we need, supress the leading zero
    add esi, 1
    jmp  ZS4                            ; continue with pairs of digits to the end

    next4:
    mul ecx                             ; this is the last pair so don; t supress a single zero
    cmp edx,9                           ; 1 digit or 2?
    ja  ZS4a                            ; 2 digits, just continue with pairs of digits to the end

    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dh                        ; but only write the 1 we need, supress the leading zero
    mov byte ptr [esi+1],0              ; zero terminate string

    jmp  xit                            ; all done

  ZeroSupressed:
    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dx
    add esi,2                           ; write them to answer

  ZS1:
    mul ecx                             ; get next 2 digits
    ZS1a:
    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dx                        ; write them to answer
    add esi,2

  ZS2:
    mul ecx                             ; get next 2 digits
    ZS2a:
    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dx                        ; write them to answer
    add esi,2

  ZS3:
    mov eax,edi                         ; get lower 4 digits
    mov edx,28F5C29h                    ; 2^32\100 +1
    mul edx                             ; edx= top pair
    ZS3a:
    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dx                        ; write to answer
    add esi,2                           ; update pointer

  ZS4:
    mul ecx                             ; get final 2 digits
    ZS4a:
    mov edx,chartab[edx*4]              ; look them up
    mov [esi],dx                        ; write to answer

    mov byte ptr [esi+2],0              ; zero terminate string

  xit:
  sdwordend:

    pop edi
    pop esi

    ret 8

ltoa_ex endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef


; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Title: Re: This is too slow
Post by: frktons on November 19, 2010, 10:14:03 PM
Thanks Hutch.

This version is faster than the one in the  m32lib, udw2str?

; #########################################################################

    .386
    .model flat, stdcall  ; 32 bit memory model
    option casemap :none  ; case sensitive

  ; ---------------------------------------------------
  ; The original algorithm was written by comrade
  ; <comrade2k@hotmail.com>; http://www.comrade64.com/
  ;
  ;  It has been optimised by Alexander Yackubtchik
  ; ---------------------------------------------------

  ; udw2str

  ; Parameters
  ;     dwNumber - 32-bit double-word to be converted
  ;     pszString - null-terminated string (output)
  ; Result
  ;     None

    .code

; #########################################################################

udw2str proc dwNumber:DWORD, pszString:DWORD

    push ebx
    push esi
    push edi

    mov     eax, [dwNumber]
    mov     esi, [pszString]
    mov     edi, [pszString]
    mov ecx,429496730

  @@redo:
    mov ebx,eax
    mul ecx
    mov eax,edx
    lea edx,[edx*4+edx]
    add edx,edx
    sub ebx,edx
    add bl,'0'
    mov [esi],bl
    inc esi
    test    eax, eax
    jnz     @@redo
    jmp     @@chks

  @@invs:
    dec     esi
    mov     al, [edi]
    xchg    [esi], al
    mov     [edi], al
    inc     edi
  @@chks:
    cmp     edi, esi
    jb      @@invs

    pop edi
    pop esi
    pop ebx


    ret

udw2str endp

; #########################################################################

end


Frank
Title: Re: This is too slow
Post by: Antariy on November 19, 2010, 10:18:44 PM
Quote from: frktons on November 19, 2010, 10:14:03 PM
This version is faster than the one in the  m32lib, udw2str?

Yes, Frank. Only look into

xchg    [esi], al


This is dropped timings so much, that other code have no meaning.
Title: Re: This is too slow
Post by: frktons on November 19, 2010, 10:24:46 PM
Quote from: Antariy on November 19, 2010, 10:18:44 PM
Quote from: frktons on November 19, 2010, 10:14:03 PM
This version is faster than the one in the  m32lib, udw2str?

Yes, Frank. Only look into

xchg    [esi], al


This is dropped timings so much, that other code have no meaning.


Is that instruction so powerful? I didn't even suspect it.  :lol

Can you explain why this instruction is so important?
Title: Re: This is too slow
Post by: Antariy on November 19, 2010, 10:30:05 PM
Quote from: frktons on November 19, 2010, 10:24:46 PM
Quote from: Antariy on November 19, 2010, 10:18:44 PM
Quote from: frktons on November 19, 2010, 10:14:03 PM
This version is faster than the one in the  m32lib, udw2str?

Yes, Frank. Only look into

xchg    [esi], al


This is dropped timings so much, that other code have no meaning.


Is that instruction so powerful? I didn't even suspect it.  :lol

Can you explain why this instruction is so important?

Probably I sayed not right. I meant - it dropped, dropped algo to the one from SLOWEST. Oh... I should choose word too precise...

This instruction itself will cause 50-100 clocks of stall. This is atomical instruction, and CPU waits for all pending transactions in the system bus, before exchange values.
#LOCK is generated implicitly.



Alex
Title: Re: This is too slow
Post by: frktons on November 19, 2010, 10:40:23 PM
Quote from: Antariy on November 19, 2010, 10:30:05 PM

Probably I sayed not right. I meant - it dropped, dropped algo to the one from SLOWEST. Oh... I should choose word too precise...

This instruction itself will cause 50-100 clocks of stall. This is atomical instruction, and CPU waits for all pending transactions in the system bus, before exchange values.
#LOCK is generated implicitly.

Alex

Oh!!! Well. This is what I knew about xchg, that it is not efficient mnemonic, better to use
other solutions.  :U
Title: Re: This is too slow
Post by: Antariy on November 19, 2010, 10:49:10 PM
Quote from: frktons on November 19, 2010, 10:40:23 PM
Oh!!! Well. This is what I knew about xchg, that it is not efficient mnemonic, better to use
other solutions.  :U

Something like:

@@invs:
    dec     esi
    mov     al, [edi]
    mov     ah, [esi]
    mov    [edi], ah
    mov     [esi], al
    inc     edi
  @@chks:


But this is not make algo faster than Paul's code :lol
Title: Re: This is too slow
Post by: frktons on November 19, 2010, 10:52:13 PM
Quote from: Antariy on November 19, 2010, 10:49:10 PM
Something like:

@@invs:
    dec     esi
    mov     al, [edi]
    mov     ah, [esi]
    xchg    [edi], ah
    mov     [esi], al
    inc     edi
  @@chks:


But this is not make algo faster than Paul's code :lol


Is it not possible to avoid xchg and to use other mnemonics, better ones I mean?
Title: Re: This is too slow
Post by: Antariy on November 19, 2010, 10:55:30 PM
Quote from: frktons on November 19, 2010, 10:52:13 PM
Is it not possible to avoid xchg and to use other mnemonics, better ones I mean?

Pardon  :green2, I'm make changes not attentively  :lol Look to post again  :bg
Title: Re: This is too slow
Post by: frktons on November 19, 2010, 10:56:27 PM
Quote from: Antariy on November 19, 2010, 10:55:30 PM
Quote from: frktons on November 19, 2010, 10:52:13 PM
Is it not possible to avoid xchg and to use other mnemonics, better ones I mean?

Pardon  :green2, I'm make changes not attentively  :lol Look to post again  :bg

:U
Title: Re: This is too slow
Post by: hutch-- on November 19, 2010, 11:47:00 PM
I have just converted the same algo to unsigned. Its an algo that Paul Dixon wrote in powerbasic that I have converted to MASM notation. Removed the stack frame and run it through exhaustive testing 0 to -1 full unsigned range.


;  ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

utoa_ex proc uvar:DWORD,pbuffer:DWORD

  ; --------------------------------------------------------------------------------
  ; this algorithm was written by Paul Dixon and has been converted to MASM notation
  ; --------------------------------------------------------------------------------

    mov eax, [esp+4]                ; uvar      : unsigned variable to convert
    mov ecx, [esp+8]                ; pbuffer   : pointer to result buffer

    push esi
    push edi

    jmp udword

  align 4
  chartab:
    dd "00","10","20","30","40","50","60","70","80","90"
    dd "01","11","21","31","41","51","61","71","81","91"
    dd "02","12","22","32","42","52","62","72","82","92"
    dd "03","13","23","33","43","53","63","73","83","93"
    dd "04","14","24","34","44","54","64","74","84","94"
    dd "05","15","25","35","45","55","65","75","85","95"
    dd "06","16","26","36","46","56","66","76","86","96"
    dd "07","17","27","37","47","57","67","77","87","97"
    dd "08","18","28","38","48","58","68","78","88","98"
    dd "09","19","29","39","49","59","69","79","89","99"

  udword:
    mov esi, ecx                    ; get pointer to answer
    mov edi, eax                    ; save a copy of the number

    mov edx, 0D1B71759h             ; =2^45\10000    13 bit extra shift
    mul edx                         ; gives 6 high digits in edx

    mov eax, 68DB9h                 ; =2^32\10000+1

    shr edx, 13                     ; correct for multiplier offset used to give better accuracy
    jz short skiphighdigits         ; if zero then don; t need to process the top 6 digits

    mov ecx, edx                    ; get a copy of high digits
    imul ecx, 10000                 ; scale up high digits
    sub edi, ecx                    ; subtract high digits from original. EDI now = lower 4 digits

    mul edx                         ; get first 2 digits in edx
    mov ecx, 100                    ; load ready for later

    jnc short next1                 ; if zero, supress them by ignoring
    cmp edx, 9                      ; 1 digit or 2?
    ja  short ZeroSupressed         ; 2 digits, just continue with pairs of digits to the end

    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dh                   ; but only write the 1 we need, supress the leading zero
    inc esi                         ; update pointer by 1
    jmp short ZS1                   ; continue with pairs of digits to the end

  next1:
    mul ecx                         ; get next 2 digits
    jnc short next2                 ; if zero, supress them by ignoring
    cmp edx, 9                      ; 1 digit or 2?
    ja  short ZS1a                  ; 2 digits, just continue with pairs of digits to the end

    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dh                   ; but only write the 1 we need, supress the leading zero
    inc esi                         ; update pointer by 1
    jmp short ZS2                   ; continue with pairs of digits to the end

  next2:
    mul ecx                         ; get next 2 digits
    jnc short next3                 ; if zero, supress them by ignoring
    cmp edx, 9                      ; 1 digit or 2?
    ja  short ZS2a                  ; 2 digits, just continue with pairs of digits to the end

    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dh                   ; but only write the 1 we need, supress the leading zero
    inc esi                         ; update pointer by 1
    jmp short ZS3                   ; continue with pairs of digits to the end

  next3:

  skiphighdigits:
    mov eax, edi                    ; get lower 4 digits

    mov ecx, 100

    mov edx, 28F5C29h               ; 2^32\100 +1
    mul edx
    jnc short next4                 ; if zero, supress them by ignoring
    cmp edx, 9                      ; 1 digit or 2?
    ja  short ZS3a                  ; 2 digits, just continue with pairs of digits to the end

    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dh                   ; but only write the 1 we need, supress the leading zero
    inc esi                         ; update pointer by 1
    jmp short  ZS4                  ; continue with pairs of digits to the end

  next4:
    mul ecx                         ; this is the last pair so don; t supress a single zero
    cmp edx, 9                      ; 1 digit or 2?
    ja  short ZS4a                  ; 2 digits, just continue with pairs of digits to the end

    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dh                   ; but only write the 1 we need, supress the leading zero
    mov byte ptr [esi+1], 0         ; zero terminate string

    jmp short  sdwordend            ; all done

  ZeroSupressed:
    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dx
    add esi, 2                      ; write them to answer

  ZS1:
    mul ecx                         ; get next 2 digits
  ZS1a:
    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dx                   ; write them to answer
    add esi, 2

  ZS2:
    mul ecx                         ; get next 2 digits
  ZS2a:
    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dx                   ; write them to answer
    add esi, 2

  ZS3:
    mov eax, edi                    ; get lower 4 digits
    mov edx, 28F5C29h               ; 2^32\100 +1
    mul edx                         ; edx= top pair
  ZS3a:
    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dx                   ; write to answer
    add esi, 2                      ; update pointer

  ZS4:
    mul ecx                         ; get final 2 digits
  ZS4a:
    mov edx, chartab[edx*4]         ; look them up
    mov [esi], dx                   ; write to answer

    mov byte ptr [esi+2], 0         ; zero terminate string

  sdwordend:

    pop edi
    pop esi

    ret 8

utoa_ex endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

;  ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Title: Re: This is too slow
Post by: frktons on November 19, 2010, 11:51:26 PM
Quote from: hutch-- on November 19, 2010, 11:47:00 PM
I have just converted the same algo to unsigned. Its an algo that Paul Dixon wrote in powerbasic that I have converted to MASM notation. Removed the stack frame and run it through exhaustive testing 0 to -1 full unsigned range.


;  ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

utoa_ex proc uvar:DWORD,pbuffer:DWORD

  ; --------------------------------------------------------------------------------
  ; this algorithm was written by Paul Dixon and has been converted to MASM notation
  ; --------------------------------------------------------------------------------

    mov eax, [esp+4]                ; uvar      : unsigned variable to convert
    mov ecx, [esp+8]                ; pbuffer   : pointer to result buffer

    push esi
    push edi

    jmp udword

  align 4
  chartab:
    dd "00","10","20","30","40","50","60","70","80","90"
    dd "01","11","21","31","41","51","61","71","81","91"
    dd "02","12","22","32","42","52","62","72","82","92"
    dd "03","13","23","33","43","53","63","73","83","93"
    dd "04","14","24","34","44","54","64","74","84","94"
    dd "05","15","25","35","45","55","65","75","85","95"
    dd "06","16","26","36","46","56","66","76","86","96"
    dd "07","17","27","37","47","57","67","77","87","97"
    dd "08","18","28","38","48","58","68","78","88","98"
    dd "09","19","29","39","49","59","69","79","89","99"

  udword:
    mov esi, ecx                    ; get pointer to answer
    mov edi, eax                    ; save a copy of the number

    mov edx, 0D1B71759h             ; =2^45\10000    13 bit extra shift
    mul edx                         ; gives 6 high digits in edx

    mov eax, 68DB9h                 ; =2^32\10000+1

    shr edx, 13                     ; correct for multiplier offset used to give better accuracy
    jz short skiphighdigits         ; if zero then don; t need to process the top 6 digits

    mov ecx, edx                    ; get a copy of high digits
    imul ecx, 10000                 ; scale up high digits
    sub edi, ecx                    ; subtract high digits from original. EDI now = lower 4 digits

    mul edx                         ; get first 2 digits in edx
    mov ecx, 100                    ; load ready for later

    jnc short next1                 ; if zero, supress them by ignoring
    cmp edx, 9                      ; 1 digit or 2?
    ja  short ZeroSupressed         ; 2 digits, just continue with pairs of digits to the end

    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dh                   ; but only write the 1 we need, supress the leading zero
    inc esi                         ; update pointer by 1
    jmp short ZS1                   ; continue with pairs of digits to the end

  next1:
    mul ecx                         ; get next 2 digits
    jnc short next2                 ; if zero, supress them by ignoring
    cmp edx, 9                      ; 1 digit or 2?
    ja  short ZS1a                  ; 2 digits, just continue with pairs of digits to the end

    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dh                   ; but only write the 1 we need, supress the leading zero
    inc esi                         ; update pointer by 1
    jmp short ZS2                   ; continue with pairs of digits to the end

  next2:
    mul ecx                         ; get next 2 digits
    jnc short next3                 ; if zero, supress them by ignoring
    cmp edx, 9                      ; 1 digit or 2?
    ja  short ZS2a                  ; 2 digits, just continue with pairs of digits to the end

    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dh                   ; but only write the 1 we need, supress the leading zero
    inc esi                         ; update pointer by 1
    jmp short ZS3                   ; continue with pairs of digits to the end

  next3:

  skiphighdigits:
    mov eax, edi                    ; get lower 4 digits

    mov ecx, 100

    mov edx, 28F5C29h               ; 2^32\100 +1
    mul edx
    jnc short next4                 ; if zero, supress them by ignoring
    cmp edx, 9                      ; 1 digit or 2?
    ja  short ZS3a                  ; 2 digits, just continue with pairs of digits to the end

    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dh                   ; but only write the 1 we need, supress the leading zero
    inc esi                         ; update pointer by 1
    jmp short  ZS4                  ; continue with pairs of digits to the end

  next4:
    mul ecx                         ; this is the last pair so don; t supress a single zero
    cmp edx, 9                      ; 1 digit or 2?
    ja  short ZS4a                  ; 2 digits, just continue with pairs of digits to the end

    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dh                   ; but only write the 1 we need, supress the leading zero
    mov byte ptr [esi+1], 0         ; zero terminate string

    jmp short  sdwordend            ; all done

  ZeroSupressed:
    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dx
    add esi, 2                      ; write them to answer

  ZS1:
    mul ecx                         ; get next 2 digits
  ZS1a:
    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dx                   ; write them to answer
    add esi, 2

  ZS2:
    mul ecx                         ; get next 2 digits
  ZS2a:
    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dx                   ; write them to answer
    add esi, 2

  ZS3:
    mov eax, edi                    ; get lower 4 digits
    mov edx, 28F5C29h               ; 2^32\100 +1
    mul edx                         ; edx= top pair
  ZS3a:
    mov edx, chartab[edx*4]         ; look up 2 digits
    mov [esi], dx                   ; write to answer
    add esi, 2                      ; update pointer

  ZS4:
    mul ecx                         ; get final 2 digits
  ZS4a:
    mov edx, chartab[edx*4]         ; look them up
    mov [esi], dx                   ; write to answer

    mov byte ptr [esi+2], 0         ; zero terminate string

  sdwordend:

    pop edi
    pop esi

    ret 8

utoa_ex endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

;  ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤


Thanks Steve. I'm afraid I'll have to do the work to insert and adapt it for the new testbed myself, don't I?  :lol

Well if I come back from the week-end sane enough I'll start to convert it and test it with the other algos.

This one misses all the separator stuff, so I've to do quite a lot of work.  :eek

Couldn't you mix this code and your formatting routine to make my life a bit easier?  :P

We'll see.  :P
Title: Re: This is too slow
Post by: hutch-- on November 20, 2010, 12:31:35 AM
Frank,

The trick is to have your testbed so it uses standard MASM algorithms so you don't have to adapt them. Align each algo if its not aligned itself, if you ned to show the size in bytes (I don't personally care) then use a label either end do the arithmetic at assembly time.

These algo and a number of others are for the masm32 library so they must be presented in that form.
Title: Re: This is too slow
Post by: jj2007 on November 20, 2010, 12:40:02 AM
There is also the qword algo by drizz here (http://www.masm32.com/board/index.php?topic=9857.msg72422#msg72422).
Title: Re: This is too slow
Post by: frktons on November 20, 2010, 12:55:21 AM
Quote from: hutch-- on November 20, 2010, 12:31:35 AM
Frank,

The trick is to have your testbed so it uses standard MASM algorithms so you don't have to adapt them. Align each algo if its not aligned itself, if you ned to show the size in bytes (I don't personally care) then use a label either end do the arithmetic at assembly time.

These algo and a number of others are for the masm32 library so they must be presented in that form.

The code is not prepared for the test I'm doing. If I test an array of 16 unsigned dword, the timings are
related to perform that task. This is the reason I have to adapt the code. Moreover the task is to convert from
unsigned dword to ASCII string with thousand separator.
If a routine doesn't do that, but only a partial task, I need to fill the gap if I want to use it in the testbed.

It is a good exercise for me, I have to admit, but sometime I'm just too tired, as I'm now, and I'm going to sleep.

Maybe after the week-end I can undertake this new task. But it would be better if the code were not
posted, but inserted into the testbed. It has its own structure, not difficult to grasp, and info how to use it.

The size is calculated at assemble time through the labels inside which the code should be inserted.

There is no need to adapt or rewrite code, if you start with the template I have inserted into the zip.
Apparently not many have read anything about the use of it, or the task to do in this test.

Thanks anyway for your contribution.

Quote from: jj2007 on November 20, 2010, 12:40:02 AM
There is also the qword algo by drizz here (http://www.masm32.com/board/index.php?topic=9857.msg72422#msg72422).

Thanks Jochen  :U


Frank


Title: Re: This is too slow
Post by: clive on November 20, 2010, 05:37:32 AM
┌─────────────────────────────────────────────────────────────[20-Nov-2010 at 05:36 GMT]─┐
│OS  : Microsoft Windows 7 Home Premium Edition, 64-bit (build 7600)                     │
│CPU : AMD Athlon(tm) II X2 215 Processor with 2 logical core(s) with SSE3               │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│        Algorithm notes           │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 ustrv$ + GetNumberFormat       │    95   │   53,977 │   43,026 │   41,538 │   41,453 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│02 udw2str + GetNumberFormat      │    65   │   42,022 │   41,842 │   41,732 │   50,379 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│03 wsprintf + GetNumberFormat     │    73   │   59,343 │   66,041 │   48,354 │   56,633 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│04 Clive - IDIV and Stack         │   120   │   13,839 │   11,373 │   13,800 │   13,972 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│05 Clive - reciprocal IMUL        │   157   │    3,371 │    5,327 │    6,293 │    6,316 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│06 Hutch ustr$ + format algo      │   159   │   12,321 │    6,406 │   12,503 │   11,744 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
Title: Re: This is too slow
Post by: frktons on November 21, 2010, 09:49:37 PM
Clive, your algo has been adopted into the testbed to format the numbers  :U

The attached release also should correct the problems with partial copy of clipboard
due to incompatibilities between MASM 10 that I use, and older versions before MASM 9.

Try this and let me know.

IMPORTANT if you use a machine that is not SSE2 capable, in the main prog: TestBed.asm
set the flag: SSE2 to OFF, otherwise you won't see the screens.

Frank
Title: Re: This is too slow
Post by: Antariy on November 21, 2010, 09:57:25 PM
It works. Results are copied with original EXE and with ML8 recompilation.
Title: Re: This is too slow
Post by: frktons on November 21, 2010, 10:04:04 PM
Quote from: Antariy on November 21, 2010, 09:57:25 PM
It works. Results are copied with original EXE and with ML8 recompilation.


I put a mov BYTE PTR [esi], al
instead of mov [esi], al because I already experienced this problem
with other PROCs, when compiling with MASM older than ver. 9.

So it should work from MASM 6.15 up in this fashion.  :P
Title: Re: This is too slow
Post by: Antariy on November 21, 2010, 10:08:45 PM
Quote from: frktons on November 21, 2010, 10:04:04 PM
I put a mov DWORD PTR [esi], al
instead of mov [esi], al because I already experienced this problem
with other PROCs, when compiling with MASM older than ver. 9.

So it should work from MASM 6.15 up in this fashion.  :P

I have some doubts about that :P
Title: Re: This is too slow
Post by: frktons on November 21, 2010, 10:09:17 PM
Quote from: Antariy on November 21, 2010, 10:08:45 PM
Quote from: frktons on November 21, 2010, 10:04:04 PM
I put a mov DWORD PTR [esi], al
instead of mov [esi], al because I already experienced this problem
with other PROCs, when compiling with MASM older than ver. 9.

So it should work from MASM 6.15 up in this fashion.  :P

I have some doubts about that :P


Why? Oh sure it should be Byte PTR

The program is correct, I wrote rubbish here.  :P
Title: Re: This is too slow
Post by: dedndave on November 21, 2010, 10:12:14 PM
with MOV [ESI],AL - the assembler knows the size is a byte
if you MOV [ESI],0 - the assember does not know, so you need MOV BYTE PTR [ESI],0
it is the same for all versions of masm   :8)
Title: Re: This is too slow
Post by: frktons on November 21, 2010, 10:13:12 PM
Quote from: dedndave on November 21, 2010, 10:12:14 PM
with AL, the assembler knows the size is a byte
if you MOV [ESI],0 - the assember does not know, so you need MOV BYTE PTR [ESI],0

This happens with MASM from 8 below, from 9 upwards it doesn't.  :P

Try this new version with your pc and tell me if the clipboard copy  is still incomplete.
Title: Re: This is too slow
Post by: Antariy on November 21, 2010, 10:13:41 PM
Quote from: frktons on November 21, 2010, 10:09:17 PM
Quote from: Antariy on November 21, 2010, 10:08:45 PM
Quote from: frktons on November 21, 2010, 10:04:04 PM
I put a mov DWORD PTR [esi], al
instead of mov [esi], al because I already experienced this problem
with other PROCs, when compiling with MASM older than ver. 9.

So it should work from MASM 6.15 up in this fashion.  :P

I have some doubts about that :P


Why? Oh sure it should be Byte PTR

The program is correct, I wrote rubbish here.  :P


:lol :P
Title: Re: This is too slow
Post by: dedndave on November 21, 2010, 10:16:00 PM
i am sitting with dad - to give mom a break   :P
so, i am not on the same machine
i hate this keyboard - lol
in fact, i hate this computer
Title: Re: This is too slow
Post by: Antariy on November 21, 2010, 10:16:19 PM
Quote from: frktons on November 21, 2010, 10:13:12 PM
Quote from: dedndave on November 21, 2010, 10:12:14 PM
with AL, the assembler knows the size is a byte
if you MOV [ESI],0 - the assember does not know, so you need MOV BYTE PTR [ESI],0

This happens with MASM from 8 below, from 9 upwards it doesn't.  :P

Try this new version with your pc and tell me if the clipboard copy  is still incomplete.

Frank, Dave is right with an immediate - in any case you should specify datasize when you move immediate to memory. Assembler don't know which "0" is you write - byte/word/dword sized.
Title: Re: This is too slow
Post by: frktons on November 21, 2010, 10:17:28 PM
This rel 1.51 should be able to run on old and new machines, provided that the user set the SSE2 flag
to the correct state. ON = machine SSE2 capable, OFF = machine SSE2 not capable  :lol

Well I guess I was moving al to [esi] and it didn't work with older MASM versions.

Title: Re: This is too slow
Post by: frktons on November 21, 2010, 10:18:38 PM
Quote from: dedndave on November 21, 2010, 10:16:00 PM
i am sitting with dad - to give mom a break   :P
so, i am not on the same machine
i hate this keyboard - lol

Well, when you can, obviously.  :U
Title: Re: This is too slow
Post by: Antariy on November 21, 2010, 10:19:50 PM
Quote from: frktons on November 21, 2010, 10:17:28 PM
Well I guess I was moving al to [esi] and it didn't work with older MASM versions.

No, this *should* (should!!!) work. At assembly-time is known the size of the operand.

Maybe Dave will test the new release, and we can see, what is up? What you say, Dave?
Title: Re: This is too slow
Post by: frktons on November 21, 2010, 10:21:04 PM
Quote from: Antariy on November 21, 2010, 10:19:50 PM
Quote from: frktons on November 21, 2010, 10:17:28 PM
Well I guess I was moving al to [esi] and it didn't work with older MASM versions.

No, this *should* (should!!!) work. At assembly-time is known the size of the operand.

Maybe Dave will test the new release, and we can see, what is up? What you say, Dave?


Alex you have MASM 6.15 on your machine. You can try it yourself.  :P
Title: Re: This is too slow
Post by: dedndave on November 21, 2010, 10:22:36 PM

┌─────────────────────────────────────────────────────────────[21-Nov-2010 at 22:21 GMT]─┐
│OS  : Microsoft Windows XP Professional Service Pack 2 (build 2600)                     │
│CPU : Intel(R) Pentium(R) 4 CPU 3.00GHz with 2 logical core(s) with SSE3                │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│        Algorithm notes           │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 ustrv$ + GetNumberFormat     
Title: Re: This is too slow
Post by: frktons on November 21, 2010, 10:23:06 PM
Quote from: dedndave on November 21, 2010, 10:22:36 PM

┌─────────────────────────────────────────────────────────────[21-Nov-2010 at 22:21 GMT]─┐
│OS  : Microsoft Windows XP Professional Service Pack 2 (build 2600)                     │
│CPU : Intel(R) Pentium(R) 4 CPU 3.00GHz with 2 logical core(s) with SSE3                │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│        Algorithm notes           │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 ustrv$ + GetNumberFormat     

:lol :lol :lol :lol :lol :lol :lol :lol


; -------------------------------------------------------------------------
; The data read from the Screen Buffer is converted into DOS style.
; The New Buffer is filled with the content of the Screen Buffer, removing
; a char after each byte making it DOS compatible.
; -------------------------------------------------------------------------

ConvertToDOS PROC DestBuffer:DWORD, SourceBuffer:DWORD


    mov  esi, SourceBuffer
    mov  edi, DestBuffer

    mov  ecx, CharNumber
    xor  eax, eax

    xor  ebx, ebx

NextChar:

    mov eax, [esi]
    mov BYTE PTR [edi], al
    shr eax, N16
    mov BYTE PTR [edi+1], al

    add esi, Four
    add edi, Two

    dec ecx
    jnz NextChar

End_cycle:

    ret
   
ConvertToDOS ENDP



Do you see any problem here?

Title: Re: This is too slow
Post by: dedndave on November 21, 2010, 10:24:33 PM
manually..
┌─────────────────────────────────────────────────────────────[21-Nov-2010 at 22:21 GMT]─┐
│OS  : Microsoft Windows XP Professional Service Pack 2 (build 2600)                     │
│CPU : Intel(R) Pentium(R) 4 CPU 3.00GHz with 2 logical core(s) with SSE3                │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│        Algorithm notes           │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 ustrv$ + GetNumberFormat       │    95   │   93,311 │   92,294 │   89,304 │   89,287 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│02 udw2str + GetNumberFormat      │    65   │   87,343 │   87,847 │   87,956 │   87,071 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│03 wsprintf + GetNumberFormat     │    73   │  106,150 │  106,907 │  106,463 │  120,352 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│04 Clive - IDIV and Stack         │   120   │    8,638 │    8,480 │    8,414 │    8,112 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│05 Clive - reciprocal IMUL        │   157   │    3,516 │    3,481 │    3,530 │    3,516 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│06 Hutch ustr$ + format algo      │   159   │   12,250 │   11,931 │   12,162 │   12,289 │
Title: Re: This is too slow
Post by: Antariy on November 21, 2010, 10:26:07 PM
Quote from: frktons on November 21, 2010, 10:21:04 PM
Alex you have MASM 6.15 on your machine. You can try it yourself.  :P

This is thing which is not needed in proving :P It should work, and it work :P
Title: Re: This is too slow
Post by: frktons on November 21, 2010, 10:27:34 PM
Quote from: Antariy on November 21, 2010, 10:26:07 PM
Quote from: frktons on November 21, 2010, 10:21:04 PM
Alex you have MASM 6.15 on your machine. You can try it yourself.  :P

This is thing which is not needed in proving :P It should work, and it work :P


Not in Dave's machine.
Title: Re: This is too slow
Post by: Antariy on November 21, 2010, 10:28:37 PM
Quote from: frktons on November 21, 2010, 10:23:06 PM
Do you see any problem here?

No.

You should search for the runtime dependency: maybe somewhere you did not preserve some register, or you rely that this register have the same/specific value, etc.
Title: Re: This is too slow
Post by: hutch-- on November 21, 2010, 10:28:57 PM
Framk,

The notation,


    mov [esi], al


Has always worked in MASM. The guys are right that its only if the data size is not known that you must specify its size. MASM is historically a fully specified Intel notation but where it can determine the size from a register you can use abbreviated notation.

Either,


    mov BYTE PTR [esi], al
    ; or
    mov [esi], al


are valid as the size can be determined from the register AL.
Title: Re: This is too slow
Post by: Antariy on November 21, 2010, 10:30:27 PM
Quote from: frktons on November 21, 2010, 10:27:34 PM
Quote from: Antariy on November 21, 2010, 10:26:07 PM
Quote from: frktons on November 21, 2010, 10:21:04 PM
Alex you have MASM 6.15 on your machine. You can try it yourself.  :P

This is thing which is not needed in proving :P It should work, and it work :P


Not in Dave's machine.

I used ML6.15 just as fovour to ask, and all is copied.
See my previous post for probably reason.
Title: Re: This is too slow
Post by: frktons on November 21, 2010, 10:31:36 PM
Probably there is something in this PROC thas is not MASM-old compatible:

; -------------------------------------------------------------------------
; The text is extracted from the Screen Buffer and prepared to be copied
; into the Windows Clipboard with code tags enclosing it and with
; LF+CR at the end of each text line.
; -------------------------------------------------------------------------

CopyScreenText PROC


    lea  esi, TopCode
    lea  edi, TextResults

    movq mm7, qword ptr [esi]
    movq qword ptr [edi], mm7

    add  edi, Eight   
    lea  esi, SavedScreen

    mov  eax, AlgoRow
    imul eax, MaxCols

    mov  ecx, eax
   
    xor  eax, eax
    xor  ebx, ebx



NextCharText:

    mov eax, [esi]
   
    mov BYTE PTR [edi], al

    add esi, Four
    add edi, One
    inc ebx

    cmp ebx, MaxCols
    jl  GoOn

    mov BYTE PTR [edi], CR
    mov BYTE PTR [edi + 1], LF
    add edi, Two
    xor ebx, ebx

GoOn:

    dec ecx
    jnz NextCharText

SetNULL:

    lea  esi, BottomCode

    movq mm7, qword ptr [esi]
    movq qword ptr [edi], mm7

    add  edi, Eight   

    mov WORD PTR [edi], 00A0h;   CR + NULL


End_cycle:

    ret
   
CopyScreenText ENDP


Title: Re: This is too slow
Post by: dedndave on November 21, 2010, 10:34:00 PM
that conclusion is not valid
i am not assembling it - i only execute it   :U
Title: Re: This is too slow
Post by: Antariy on November 21, 2010, 10:38:29 PM
Quote from: frktons on November 21, 2010, 10:31:36 PM

mov WORD PTR [edi], 00A0h;   CR + NULL


This code is not past LF to end of the string, Frank. 000Ah is LF+NULL.
Title: Re: This is too slow
Post by: Antariy on November 21, 2010, 10:39:36 PM
Quote from: dedndave on November 21, 2010, 10:34:00 PM
that conclusion is not valid
i am not assembling it - i only execute it   :U

Dave, do you have anything debugger, 32 bit Windows debugger at your current machine?
Title: Re: This is too slow
Post by: dedndave on November 21, 2010, 10:40:53 PM
dad's machine - i have nothing set up here   :'(
Title: Re: This is too slow
Post by: frktons on November 21, 2010, 10:41:11 PM
Quote from: dedndave on November 21, 2010, 10:34:00 PM
that conclusion is not valid
i am not assembling it - i only execute it   :U

The strange thing is that somebody with win XP sp2, as yours, is running it correct.  ::)
Quote from: Antariy on November 21, 2010, 10:38:29 PM
Quote from: frktons on November 21, 2010, 10:31:36 PM

mov WORD PTR [edi], 00A0h;   CR + NULL


This code is not past LF to end of the string, Frank. 000Ah is LF+NULL.


Well, that's right. How does it work on so many machines?

I'm changing it, let's see if something change...  :wink
Title: Re: This is too slow
Post by: Antariy on November 21, 2010, 10:42:31 PM
Quote from: dedndave on November 21, 2010, 10:40:53 PM
dad's machine - i have nothing set up here   :'(

I can post old WinDbg which can be freely distributed, but it have size 625 KB - cannot be attached.
Title: Re: This is too slow
Post by: dedndave on November 21, 2010, 10:44:32 PM
typo !!
00A0h <> 000Ah
hmmmmm
that could mean that some machines do not like "á"
Title: Re: This is too slow
Post by: frktons on November 21, 2010, 10:45:05 PM
Corrected the typo.  :P
Title: Re: This is too slow
Post by: Antariy on November 21, 2010, 10:45:20 PM
Quote from: frktons on November 21, 2010, 10:41:11 PM
Quote from: dedndave on November 21, 2010, 10:34:00 PM
that conclusion is not valid
i am not assembling it - i only execute it   :U

The strange thing is that somebody with win XP sp2, as yours, is running it correct.  ::)
Quote from: Antariy on November 21, 2010, 10:38:29 PM
Quote from: frktons on November 21, 2010, 10:31:36 PM

mov WORD PTR [edi], 00A0h;   CR + NULL


This code is not past LF to end of the string, Frank. 000Ah is LF+NULL.


Well, that's right. How does it work on so many machines?

I'm changing it, let's see if something change...  :wink


No, this is not reason, Frank  :lol, I have just draw your notice on, but this is not reason for uncopied results.
Title: Re: This is too slow
Post by: dedndave on November 21, 2010, 10:48:02 PM

┌─────────────────────────────────────────────────────────────[21-Nov-2010 at 22:47 GMT]─┐
│OS  : Microsoft Windows XP Professional Service Pack 2 (build 2600)                     │
│CPU : Intel(R) Pentium(R) 4 CPU 3.00GHz with 2 logical core(s) with SSE3                │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│        Algorithm notes           │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 ustrv$ + GetNumberFormat     
Title: Re: This is too slow
Post by: Antariy on November 21, 2010, 10:48:50 PM
Quote from: frktons on November 21, 2010, 10:45:05 PM
Corrected the typo.  :P

Also GMEM_DDESHARE is not specified within allocation of memory for clipboard buffer.

Just strange suggestions :P
Title: Re: This is too slow
Post by: Antariy on November 21, 2010, 10:49:56 PM
Dave, which OS is installed on dad's machine?
This is serious question, answer without jokes, please :P
Title: Re: This is too slow
Post by: frktons on November 21, 2010, 10:50:59 PM
Quote from: Antariy on November 21, 2010, 10:45:20 PM
No, this is not reason, Frank  :lol, I have just draw your notice on, but this is not reason for uncopied results.

I don't know so far what else could it be. Your explanation is not clear enough for me.

Let's carry on one problem at the time and see what we get.
Quote from: Antariy on November 21, 2010, 10:48:50 PM
Quote from: frktons on November 21, 2010, 10:45:05 PM
Corrected the typo.  :P

Also GMEM_DDESHARE is not specified within allocation of memory for clipboard buffer.

Just strange suggestions :P


What should I do in your opinion?
Title: Re: This is too slow
Post by: frktons on November 21, 2010, 10:52:15 PM
Quote from: Antariy on November 21, 2010, 10:49:56 PM
Dave, which OS is installed on dad's machine?
This is serious question, answer without jokes, please :P


the program says:

Microsoft Windows XP Professional Service Pack 2 (build 2600)
Title: Re: This is too slow
Post by: Antariy on November 21, 2010, 10:54:17 PM
Quote from: frktons on November 21, 2010, 10:52:15 PM
Quote from: Antariy on November 21, 2010, 10:49:56 PM
Dave, which OS is installed on dad's machine?
This is serious question, answer without jokes, please :P


the program says:

Microsoft Windows XP Professional Service Pack 2 (build 2600)

Which time at your location :P
Title: Re: This is too slow
Post by: Antariy on November 21, 2010, 10:55:34 PM
If Dave will return to the thread, I'll say the solution for his machine to find the culprit. Dave would be glady, because this solution will be familar to him.

:bg
Title: Re: This is too slow
Post by: frktons on November 21, 2010, 10:56:03 PM
Quote from: Antariy on November 21, 2010, 10:54:17 PM
Which time at your location :P

21-Nov-2010 at 22:55 GMT + 1
Quote from: Antariy on November 21, 2010, 10:55:34 PM
If Dave will return to the thread, I'll say the solution for his machine to find the culprit. Dave would be glady, because this solution will be familar to him.

:bg

:lol :lol :lol :lol :lol :lol :lol :lol

Title: Re: This is too slow
Post by: dedndave on November 21, 2010, 10:57:26 PM
that is correct
Title: Re: This is too slow
Post by: dedndave on November 21, 2010, 10:59:01 PM
the time is not correct - it is now 3:58 PM
the OS is correct

edit - my mistake - it is GMT time
Title: Re: This is too slow
Post by: Antariy on November 21, 2010, 10:59:12 PM
Dave, your OS (in this case) have integrated debugger. It is console interface, and very similar to DEBUG.EXE in handling.
Can you do debugging? This will be simple for you, and will not require to install any software (or updates !!!).
Should I say details?
Title: Re: This is too slow
Post by: frktons on November 21, 2010, 11:00:38 PM
Quote from: dedndave on November 21, 2010, 10:59:01 PM
the time is not correct - it is now 3:58 PM
the OS is correct

We are talking about GMT not your country time.
Title: Re: This is too slow
Post by: dedndave on November 21, 2010, 11:02:45 PM
sorry
i am taking care of dad, and his dog (lol) and this keyboard is about to be thrown against the wall   :bdg
Title: Re: This is too slow
Post by: Antariy on November 21, 2010, 11:05:49 PM
Quote from: dedndave on November 21, 2010, 11:02:45 PM
sorry
i am taking care of dad, and his dog (lol) and this keyboard is about to be thrown against the wall   :bdg

So, you do not want to be dedicated, and make debugging with standard tools?
You leave battlefield at the same interesting time?

:eek


http://www.masm32.com/board/index.php?topic=15365.msg125873#msg125873
Title: Re: This is too slow
Post by: frktons on November 21, 2010, 11:13:13 PM
Quote from: Antariy on November 21, 2010, 11:05:49 PM
Quote from: dedndave on November 21, 2010, 11:02:45 PM
sorry
i am taking care of dad, and his dog (lol) and this keyboard is about to be thrown against the wall   :bdg

So, you do not want to be dedicated, and make debugging with standard tools?
You leave battlefield at the same interesting time?

:eek


http://www.masm32.com/board/index.php?topic=15365.msg125873#msg125873


Well, Dave is not in the best environment to do that debugging session, better to do it
in another time/day, or leave here the info he needs, and when he's got time he'll see
what to do.   :wink
Title: Re: This is too slow
Post by: dedndave on November 21, 2010, 11:17:10 PM
i would not mind if it weren't for this keyboard
it misses half the letters i type - very annoying

also - just realized i am out of smokes   :dazzled:
Title: Re: This is too slow
Post by: frktons on November 21, 2010, 11:21:00 PM
Quote from: Antariy on November 21, 2010, 11:17:21 PM
That's not should take a long time.
And it is not clear - maybe he not want to spent his time for this.


Alex, do you mind if Dave does the debugging session later or tomorrow?
It doesn't seem the right moment to me.  ::)
Title: Re: This is too slow
Post by: dedndave on November 21, 2010, 11:21:48 PM
i should be at home in about an hour - we can play, then   :U
Title: Re: This is too slow
Post by: frktons on November 21, 2010, 11:25:15 PM
Quote from: dedndave on November 21, 2010, 11:21:48 PM
i should be at home in about an hour - we can play, then   :U

Great!  :U
Title: Re: This is too slow
Post by: Antariy on November 21, 2010, 11:26:38 PM
Quote from: frktons on November 21, 2010, 11:21:00 PM
Alex, do you mind if Dave does the debugging session later or tomorrow?
It doesn't seem the right moment to me.  ::)

:eek

I'm not persist on something. Just:
1. Dave is first reporter of this issue.
2. Dave like consoles
3. My English is not take me understanding of difference between his jokes and truth  :lol
Title: Re: This is too slow
Post by: frktons on November 21, 2010, 11:29:23 PM
Quote from: Antariy on November 21, 2010, 11:28:16 PM
Quote from: dedndave on November 21, 2010, 11:21:48 PM
i should be at home in about an hour - we can play, then   :U

Not hurry, at first destroy the bad keyboard  :lol


That's a good idea anyway.  :P
Title: Re: This is too slow
Post by: dedndave on November 21, 2010, 11:33:16 PM
i don't know what the problem is
new keyboard - new batteries - receiver is 10 inches away
Title: Re: This is too slow
Post by: frktons on November 21, 2010, 11:34:51 PM
Quote from: dedndave on November 21, 2010, 11:33:16 PM
i don't know what the problem is
new keyboard - new batteries - receiver is 10 inches away
:lol :lol :lol :lol :lol :lol :lol :lol :lol :lol :lol

Maybe it is too advanced for your dad's computer  :lol :lol :lol :lol :lol :lol
Title: Re: This is too slow
Post by: Antariy on November 21, 2010, 11:36:37 PM
Frank, make this:

invoke GlobalAlloc,GMEM_MOVEABLE,DWORD PTR Lenght


to this


invoke GlobalAlloc,GMEM_MOVEABLE or GMEM_DDESHARE,DWORD PTR Lenght


Just for make sure  :lol
Title: Re: This is too slow
Post by: frktons on November 21, 2010, 11:40:03 PM
Quote from: Antariy on November 21, 2010, 11:36:37 PM
Frank, make this:

invoke GlobalAlloc,GMEM_MOVEABLE,DWORD PTR Lenght


to this


invoke GlobalAlloc,GMEM_MOVEABLE or GMEM_DDESHARE,DWORD PTR Lenght


Just for make sure  :lol


OK! Before posting the 100th release, I'll wait some hour. Better to test it before posting.  :P
Title: Re: This is too slow
Post by: Antariy on November 21, 2010, 11:44:13 PM
Quote from: frktons on November 21, 2010, 11:40:03 PM
Quote from: Antariy on November 21, 2010, 11:36:37 PM
Frank, make this:

invoke GlobalAlloc,GMEM_MOVEABLE,DWORD PTR Lenght


to this


invoke GlobalAlloc,GMEM_MOVEABLE or GMEM_DDESHARE,DWORD PTR Lenght


Just for make sure  :lol


OK! Before posting the 100th release, I'll wait some hour. Better to test it before posting.  :P

I'm guess problem is not in DDESHARE flag, but if you set it and post it now, we can see results now :P
Title: Re: This is too slow
Post by: frktons on November 21, 2010, 11:50:23 PM
here it is.
Title: Re: This is too slow
Post by: frktons on November 22, 2010, 12:16:28 AM
Alex you are running Win XP pro sp2, the same OS as Dave's one.
What could be so different between your machines? Oex has the same problem, same OS,
I really don't understand.  ::)
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 12:19:28 AM
Quote from: frktons on November 22, 2010, 12:16:28 AM
Alex you are running Win XP pro sp2, the same OS as Dave's one.
What could be so different between your machines? Oex has the same problem, same OS,

http://www.masm32.com/board/index.php?topic=15365.msg125844#msg125844 - a first thing which can be
Title: Re: This is too slow
Post by: frktons on November 22, 2010, 12:24:38 AM
Quote from: Antariy on November 21, 2010, 10:28:37 PM
You should search for the runtime dependency: maybe somewhere you did not preserve some register,
or you rely that this register have the same/specific value, etc.

I don't think so. Otherwise bad results should appear in many more systems. They show up only
in a few machines, when people probably compile with old MASM versions. A corrupted register
should break entire logic of PROC.

Something else in my opinion is working bad here.  ::)

Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 12:33:56 AM
Quote from: frktons on November 22, 2010, 12:24:38 AM
I don't think so. Otherwise bad results should appear in many more systems. They show up only
in a few machines, when people probably compile with old MASM versions. A corrupted register
should break entire logic of PROC.

All bugs is based on thing, that many of the equal systems has the same layout of things. So, this is very possible that most of machines have, for example, zero some register after some operation, etc. At equal sub-builds of the system this should be something as rule.

But this is only first supposition - no more than.
Title: Re: This is too slow
Post by: frktons on November 22, 2010, 12:44:14 AM
Quote from: Antariy on November 22, 2010, 12:33:56 AM
All bugs is based on thing, that many of the equal systems has the same layout of things. So, this is very possible that most of machines have, for example, zero some register after some operation, etc. At equal sub-builds of the system this should be something as rule.

But this is only first supposition - no more than.

Of course, but some bugs can depend on things you don't even suspect. Maybe an antivirus, or a system
program, or a corrupted dll. I hope it is only a program bug, because, if it is, we'll find it and correct it.
But if it is something related to OS, or dll, or any other thing, it could be very complex to get rid of it.

Let's see if the changes we've made produce some effect. Otherwise a debug session can help find
were the problem is.
Title: Re: This is too slow
Post by: dedndave on November 22, 2010, 12:48:30 AM
ok - it still has the problem
i am debugging it - i will figure it out   :U
it seems only logical to troubleshoot it on a system that exhibits the problem
it cannot be easily solved on a machine that doesn't   :bg
Title: Re: This is too slow
Post by: frktons on November 22, 2010, 12:55:14 AM
Quote from: dedndave on November 22, 2010, 12:48:30 AM
ok - it still has the problem
i am debugging it - i will figure it out   :U
it seems only logical to troubleshoot it on a system that exhibits the problem
it cannot be easily solved on a machine that doesn't   :bg
You are right, indeed.  :U
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 12:57:58 AM
Frank, this is culprit:


ProgData.inc:

AlgoDesc         BYTE  31 DUP (?)
AlgoDescSize     DWORD SIZEOF AlgoDesc

Algo1.asm:
AlgoDesc1        byte  "ustrv$ + GetNumberFormat      ",0 ; ZERO BYTE SHOULD BE REPLACED TO SPACE
...
invoke DisplayAt, dword ptr AlgoColDesc, dword ptr AlgoRow, addr AlgoDesc,
                     dword ptr AlgoDescSize


So, you are display a binary zero into screen, because SIZEOF is includes zero to the length of the string.
You can solve all that strings by:

AlgoDesc1        byte  "ustrv$ + GetNumberFormat       " ; one space added
or, if you want to have zero terminated string
AlgoDesc1        byte  "ustrv$ + GetNumberFormat       " ; one space added
db 0




Alex
Title: Re: This is too slow
Post by: dedndave on November 22, 2010, 01:00:08 AM
i haven't completely solved the problem
but - i have isolated it to some degree
there appears to be a null byte at that point in the text data


something else, not related to this specific problem ...
ConvertToDOS PROC DestBuffer:DWORD, SourceBuffer:DWORD


   mov  esi, SourceBuffer
   mov  edi, DestBuffer

   mov  ecx, CharNumber  ;shouldn't this be NumCycles ?????? - we do 2 chars per pass
Title: Re: This is too slow
Post by: dedndave on November 22, 2010, 01:00:59 AM
he may have found it   :U
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 01:05:11 AM
Quote from: dedndave on November 22, 2010, 01:00:59 AM
he may have found it   :U

:P  :lol
Title: Re: This is too slow
Post by: dedndave on November 22, 2010, 01:07:21 AM
well - i count 6 spaces at the end of that string
in the pasted text, i only count 5   :P

let me see if i can find it
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 01:09:55 AM
Even better:

invoke lstrlen,addr AlgoDesc
invoke DisplayAt, dword ptr AlgoColDesc, dword ptr AlgoRow, addr AlgoDesc,
                     eax




Alex
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 01:13:50 AM
Quote from: dedndave on November 22, 2010, 01:07:21 AM
well - i count 6 spaces at the end of that string
in the pasted text, i only count 5   :P

let me see if i can find it

Would be better to calculate string length dinamically, as I made at my version of the manager.

Of course do the search  :U

But I guess that found a culprit, or the main reason at least.  :lol That's something strange in the treatment of the results by the console in defferent systems - on my system, it seems - zero byte is replaced to space by system, on your - it is leaves as is, and terminate string.
Title: Re: This is too slow
Post by: dedndave on November 22, 2010, 01:15:01 AM
this may not be the end solution, but it does solve the problem
should help find the bug
CopyScreenText PROC


   lea  esi, TopCode
   lea  edi, TextResults

   movq mm7, qword ptr [esi]
   movq qword ptr [edi], mm7

   add  edi, Eight  
   lea  esi, SavedScreen

   mov  eax, AlgoRow
   imul eax, MaxCols

   mov  ecx, eax
   
   xor  eax, eax
   xor  ebx, ebx

NextCharText:

   mov eax, [esi]
cmp al,0
jnz around1

mov al,20h

around1:
   mov BYTE PTR [edi], al
Title: Re: This is too slow
Post by: frktons on November 22, 2010, 01:20:24 AM
Dave, did you try it? Does it work?
Title: Re: This is too slow
Post by: dedndave on November 22, 2010, 01:22:25 AM
yes - using the "C" command.....

┌─────────────────────────────────────────────────────────────[22-Nov-2010 at 01:21 GMT]─┐
│OS  : Microsoft Windows XP Professional Service Pack 2 (build 2600)                     │
│CPU : Intel(R) Pentium(R) 4 CPU 3.00GHz with 2 logical core(s) with SSE3                │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│        Algorithm notes           │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 ustrv$ + GetNumberFormat       │    95   │   75,602 │   72,022 │   73,863 │   71,714 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│02 udw2str + GetNumberFormat      │    65   │   72,385 │   71,391 │   71,918 │   70,983 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│03 wsprintf + GetNumberFormat     │    73   │   90,708 │   88,303 │   91,428 │   88,650 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│04 Clive - IDIV and Stack         │   120   │    8,627 │    8,596 │    8,448 │    8,441 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│05 Clive - reciprocal IMUL        │   157   │    3,353 │    3,607 │    3,697 │    3,357 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│06 Hutch ustr$ + format algo      │   159   │   11,877 │   11,919 │   12,173 │   12,370 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤

Title: Re: This is too slow
Post by: GregL on November 22, 2010, 01:23:24 AM
Another suggestion, the following code in TestBed.asm could cause problems. Why not just test for SSE2.


SSE2          EQU  ON ; If your CPU is SSE2 capable set this var to ON


Procedure(s) to test for SSE2.

;---------------------------------------
ChkSSE2 PROC
    ; returns: TRUE  (1) if SSE2 is supported or
    ;          FALSE (0) if SSE2 is not supported
    call ChkCPUID
    test eax, eax
    jz @F           ; CPUID not supported
    xor eax, eax
    cpuid
    test eax, eax
    jz @F           ; function 1 not supported
    mov eax, 1
    cpuid
    xor eax, eax    ; set up for return of FALSE
    bt edx, 26      ; SSE2 supported?
    jnc @f          ; return FALSE
    mov eax, 1      ; return TRUE
  @@:
    ret
ChkSSE2 ENDP
;---------------------------------------
ChkCPUID PROC USES ebx
   ; Return: True (1) if CPUID supported
   ;         False(0) if CPUID not supported
   pushfd
   pop     eax
   btc     eax, 21                ; check if CPUID bit can toggle
   push    eax
   popfd
   pushfd
   pop     ebx
   xor     ebx, eax
   xor     eax, eax               ; set up to return FALSE
   bt      ebx, 21
   jc      @F                     ; CPUID not supported, return FALSE
   mov     eax, 1                 ; CPUID supported, return TRUE
 @@:    
   ret
ChkCPUID ENDP
;---------------------------------------

Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 01:23:51 AM
Frank, have a look into AssignStr ;)
Title: Re: This is too slow
Post by: frktons on November 22, 2010, 01:25:44 AM
OK, let's see if it works on my system as well  :P

Well you have posted 20 corrections. Now which one is the good one?
All secondary points can be solved later...  :P
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 01:26:37 AM
Quote from: GregL on November 22, 2010, 01:23:24 AM
Another suggestion, the following code in TestBed.asm could cause problems. Why not just test for SSE2.

My old manager handle that issue at runtime :lol
Title: Re: This is too slow
Post by: dedndave on November 22, 2010, 01:30:33 AM
while we are discussing SSE   :bg

in 2 places, the CopyTextScreen routine uses
    movq mm7, qword ptr [esi]
    movq qword ptr [edi], mm7


is that necessary ?
i mean - sure it is a little faster, but they are executed only once
that code makes the program incompatible with a P3 machine
i am sure you have several other similar pieces of code in there

i would think it might be better to use non-SSE code, unless speed is an issue
that way, you have fewer pieces of code to write replacement routines for if SSE is not available
Title: Re: This is too slow
Post by: oex on November 22, 2010, 01:30:59 AM
I would recommend a global struct something like

CPUFeatures STRUCT
MMX
SSE
SSE2
ETC                                   ;<---- Awesome feature if you have got it :lol
CPUFeatures ENDS

Check for all features at once and then just check the global struct when needed....

Best always to code everything with basic x86 commands first and only optimise where needed at the end (Obviously I used Dave's post as a reference :lol)
Title: Re: This is too slow
Post by: frktons on November 22, 2010, 01:34:37 AM
One thing at a time.  :P

1] implemented workaround proposed by Dave.
2] I'm going to implement check for SSE2 capable code, this will solve also:

Quote from: dedndave on November 22, 2010, 01:30:33 AM
while we are discussing SSE   :bg

in 2 places, the CopyTextScreen routine uses
    movq mm7, qword ptr [esi]
    movq qword ptr [edi], mm7


is that necessary ?
i mean - sure it is a little faster, but they are executed only once
that code makes the program incompatible with a P3 machine
i am sure you have several other similar pieces of code in there

i would think it might be better to use non-SSE code, unless speed is an issue
that way, you have fewer pieces of code to write replacement routines for if SSE is not available

Because if the system doesn't support MMX, that is probably supported by P3, it will run/compile alternative code.
Title: Re: This is too slow
Post by: dedndave on November 22, 2010, 01:35:12 AM
i suggest this code
http://www.masm32.com/board/index.php?topic=15338.msg125149#msg125149
the value can be stored in a single dword (byte, actually)
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 01:36:44 AM
Dave, is this EXE runs properly?

Problem is with many definitions of the size of the string, and not equally of the reeal things.



Alex
Title: Re: This is too slow
Post by: frktons on November 22, 2010, 01:37:54 AM

┌─────────────────────────────────────────────────────────────[22-Nov-2010 at 01:37 GMT]─┐
│OS  : Microsoft Windows 7 Ultimate Edition, 64-bit (build 7600)                         │
│CPU : Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz with 2 logical core(s) with SSSE3           │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│        Algorithm notes           │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 ustrv$ + GetNumberFormat       │    95   │   44.100 │   43.737 │   43.864 │   43.768 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│02 udw2str + GetNumberFormat      │    65   │   43.794 │   43.569 │   43.541 │   43.612 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│03 wsprintf + GetNumberFormat     │    73   │   50.065 │   49.866 │   50.556 │   50.546 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│04 Clive - IDIV and Stack         │   120   │    3.134 │    3.142 │    3.149 │    3.113 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│05 Clive - reciprocal IMUL        │   157   │    1.981 │    1.966 │    1.994 │    1.960 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│06 Hutch ustr$ + format algo      │   159   │    5.576 │    5.640 │    5.607 │    5.592 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤


On my pc almost everything works.  :lol
Title: Re: This is too slow
Post by: dedndave on November 22, 2010, 01:38:33 AM
no Alex - same problem

Frank - perhaps this is related to the CPU, itself
you have a newer one than i do   :'(
you don't have any SSE4 code in there, do you ?

also - i noticed in Hutch's thread that you have common controls disbaled or something ???
that makes your machine different than everyone elses
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 01:42:14 AM
Quote from: dedndave on November 22, 2010, 01:38:33 AM
no Alex - same problem

First row is not copied properly? I have changed only one testing description.
Title: Re: This is too slow
Post by: dedndave on November 22, 2010, 01:43:50 AM
the first row ?
the first algo data row is where it quits
see the previous posts with the short results - it looks the same
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 01:45:07 AM
Dave, how about this? First row should shows.
This is not CPU issues, this is entanglement of string lengths.
Title: Re: This is too slow
Post by: dedndave on November 22, 2010, 01:47:35 AM
i know - but i was trying to give Frank something to think about, in terms of CPU/SSE   :bdg

yes - that one works, Alex

┌─────────────────────────────────────────────────────────────[22-Nov-2010 at 01:46 GMT]─┐
│OS  : Microsoft Windows XP Professional Service Pack 2 (build 2600)                     │
│CPU : Intel(R) Pentium(R) 4 CPU 3.00GHz with 2 logical core(s) with SSE3                │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│        Algorithm notes           │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 ustrv$ + GetNumberFormat       │    95   │   74,853 │   76,643 │   73,079 │   77,197 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│02 udw2str + GetNumberFormat      │    65   │   71,043 │   75,563 │   70,802 │   86,603 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│03 wsprintf + GetNumberFormat     │    73   │   92,975 │   96,372 │  100,538 │   90,121 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│04 Clive - IDIV and Stack         │   120   │    8,812 │    9,840 │    9,729 │   11,084 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│05 Clive - reciprocal IMUL        │   157   │    3,566 │    3,369 │    3,184 │    3,952 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│06 Hutch ustr$ + format algo      │   159   │   12,113 │   12,253 │   11,941 │   11,992 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤

Title: Re: This is too slow
Post by: frktons on November 22, 2010, 01:48:02 AM
Instead of the:

   SSE2    EQU  ON


I should do something like:

CALL ChkSSE2
.if eax
  SSE2  EQU  ON
.else
  SSE2  EQU  OFF
.endif

I don't actually know if this syntax is correct. Is it?  
Title: Re: This is too slow
Post by: dedndave on November 22, 2010, 01:49:55 AM
well - you are going to want to test for MMX, SSE, SSE2, SSE3   :bg
use the routine i posted earlier
store the result in a dword and use BT on that dword to see if a feature is present

or the "&"

if FeatureFlags & 20h
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 01:50:49 AM
Quote from: dedndave on November 22, 2010, 01:47:35 AM
i know - but i was trying to give Frank something to think about, in terms of CPU/SSE   :bdg

yes - that one works, Alex

Thanks!  :U

I'm changed, in ProgData.inc:

AlgoDescSize     DWORD SIZEOF AlgoDesc-1


:lol

Try to make this changement in original source, and tell results  :U
Title: Re: This is too slow
Post by: hutch-- on November 22, 2010, 01:53:37 AM
Looks good here Frank.


┌─────────────────────────────────────────────────────────────[22-Nov-2010 at 01:52 GMT]─┐
│OS  : Microsoft Windows XP Professional Service Pack 3 (build 2600)                     │
│CPU : Intel(R) Core(TM)2 Quad CPU Q9650 @ 3.00GHz with 4 logical core(s) with SSE4.1    │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│        Algorithm notes           │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 ustrv$ + GetNumberFormat       │    95   │   32,115 │   32,048 │   31,269 │   31,559 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│02 udw2str + GetNumberFormat      │    65   │   32,146 │   31,748 │   32,162 │   31,828 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│03 wsprintf + GetNumberFormat     │    73   │   37,187 │   37,389 │   37,251 │   37,388 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│04 Clive - IDIV and Stack         │   120   │    2,583 │    2,583 │    2,583 │    2,577 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│05 Clive - reciprocal IMUL        │   157   │    2,052 │    2,017 │    2,009 │    2,019 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│06                                │         │          │          │          │          │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│07                                │         │          │          │          │          │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│08                                │         │          │          │          │          │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│09                                │         │          │          │          │          │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│10                                │         │          │          │          │          │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│11                                │         │          │          │          │          │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│12                                │         │          │          │          │          │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│13                                │         │          │          │          │          │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│14                                │         │          │          │          │          │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│15                                │         │          │          │          │          │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│16                                │         │          │          │          │          │
├──────────────────────────────────┴─────────┴──────────┴──────────┴──────────┴──────────┤
│ Esc         Exit       Copy       Run       View       Save       Info       F1 Help   │
└────────────────────────────────────────────────────────────────────────────────────────┘
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 01:54:10 AM
Quote from: frktons on November 22, 2010, 01:48:02 AM
Instead of the:

   SSE2    EQU  ON


I should do something like:

CALL ChkSSE2
.if eax
  SSE2  EQU  ON
.else
  SSE2  EQU  OFF
.endif

I don't actually know if this syntax is correct. Is it?  

No, you cannot mix macro equations and code - equations is compile time stuff.


Quote from: dedndave on November 22, 2010, 01:49:55 AM
well - you are going to want to test for MMX, SSE, SSE2, SSE3? ?:bg

Dave, at original thread of "New TestBed", I'm already posted fully functional and working manager of the algos, which is exclude unsupported algos from testing at runtime. Its decision based on result returned by CPUID code at the start of the program.
Title: Re: This is too slow
Post by: dedndave on November 22, 2010, 01:54:15 AM
that's it Alex - you found it   :U

as for mixing code and equates - you can handle that as an assembly-time conditional
but, that isn't how i would do it - lol
Title: Re: This is too slow
Post by: frktons on November 22, 2010, 01:55:16 AM
Quote from: dedndave on November 22, 2010, 01:49:55 AM
well - you are going to want to test for MMX, SSE, SSE2, SSE3   :bg
use the routine i posted earlier
store the result in a dword and use BT on that dword to see if a feature is present

I think the routine for CPU detection has already everything I need. At least, I think
Alex, the routine is yours, what do you think?
Quote from: Antariy on November 22, 2010, 01:50:49 AM
Quote from: dedndave on November 22, 2010, 01:47:35 AM
i know - but i was trying to give Frank something to think about, in terms of CPU/SSE   :bdg

yes - that one works, Alex

Thanks!  :U

I'm changed, in ProgData.inc:

AlgoDescSize     DWORD SIZEOF AlgoDesc-1


:lol

Try to make this changement in original source, and tell results  :U


This is logical as well.  :U I'll change it, but the routine already works with Dave's workaround.
Maybe this will make it not necessary to use the workaround. Better if Dave tests it.

Quote from: hutch-- on November 22, 2010, 01:53:37 AM
Looks good here Frank.

Yes Steve. Thanks. It has some problem with oldies.
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 01:57:54 AM
Quote from: dedndave on November 22, 2010, 01:54:15 AM
that's it Alex - you found it   :U

With your help  :U
Title: Re: This is too slow
Post by: dedndave on November 22, 2010, 01:58:43 AM
Frank
my work-around was merely a debuging tool used to help isolate the problem

Alex's fix is the right way to do it - and it works
remove my temporary code
Title: Re: This is too slow
Post by: frktons on November 22, 2010, 02:02:44 AM
Quote from: dedndave on November 22, 2010, 01:58:43 AM
Frank
my work-around was merely a debuging tool used to help isolate the problem

Alex's fix is the right way to do it - and it works
remove my temporary code

Already done.  :P
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 02:03:10 AM
I think the routine for CPU detection has already everything I need. At least, I think
Alex, the routine is yours, what do you think?

I'm already sayed many times that my CPUid code returns the maximal supported instruction set. And can be used for determination of execution of the algos. Moreover, that "ultra hidded" feature is used in the my "Algos Manager" :P :lol

Just read comments for AxCPUid code.

This is logical as well.  :U I'll change it, but the routine already works with Dave's workaround.

Daves workaround is straightforward as it is possible, and show that culprit is the zero byte, as mentinioned. But it is slow enough - search of string, and replace the nulls. Some time ago you didn't wants to search "Here can be your advertisement" for my manger  :lol

Maybe this will make it not necessary to use the workaround. Better if Dave tests it.

He already test it, it work.
Title: Re: This is too slow
Post by: frktons on November 22, 2010, 02:07:21 AM
OK guys. It was a nice debugging session. Now it is 3 o'clock in the morning. In few hours
I have to go somewhere else other than Virtual world. We'll carry on the good job another time.

Testbed is still work in progress. Many things will change along the way. And Alex Algo Manager
is still waiting to be implemented.  :P

Stay tuned. I'll be back.  :lol

Good night

Frank
Title: Re: This is too slow
Post by: dedndave on November 22, 2010, 02:09:05 AM
as to the CPU features.....

i am not just talking about the code in the algos
i am seeing MMX code in the TestBed procs

my suggestion is - you may want to re-think using any code in the testbed
program that prevents it from being used on older CPU's
it is nice to be able to run test algos on older machines

good nite Frank   :bg
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 02:10:50 AM
Quote from: frktons on November 22, 2010, 02:07:21 AM
Testbed is still work in progress. Many things will change along the way. And Alex Algo Manager
is still waiting to be implemented.  :P

Well, your testbed is updated with such frequency, that is not possible - to insert of the Manager into each next updated release :P
Title: Re: This is too slow
Post by: GregL on November 22, 2010, 02:12:52 AM
I just threw those procedures out there so I wasn't making a suggestion without some code.  It doesn't matter to me which code is used.  I agree with Dave's last post too.
Title: Re: This is too slow
Post by: dedndave on November 22, 2010, 02:13:20 AM
he'll get there, Alex   :bg

good work   :U
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 02:13:37 AM
Quote from: dedndave on November 22, 2010, 02:09:05 AM
my suggestion is - you may want to re-think using any code in the testbed
program that prevents it from being used on older CPU's
it is nice to be able to run test algos on older machines

MMX code can be runned from 1997 PI MMX, I guess - it is old enough CPU :eek :lol

Initially Frank wants to make algos as SSE2, but I'm dissuade him from this, with hardness.  :lol
Title: Re: This is too slow
Post by: dedndave on November 22, 2010, 02:14:26 AM
yah, Greg - i wasn't aware that Alex already had code written
i am sure they will make it fly right   :P



Alex - yah - come to think of it, my oldest pentium machine is a P1-MMX (one of the first MMX, i guess)
it is suitible as a win 98 test machine - not really enough guts for XP
200 MHz - i run it at 225   :lol
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 02:20:15 AM
Quote from: dedndave on November 22, 2010, 02:14:26 AM
Alex - yah - come to think of it, my oldest pentium machine is a P1-MMX (one of the first MMX, i guess)
it is suitible as a win 98 test machine - not really enough guts for XP
200 MHz - i run it at 225   :lol

Yes, for XP is PII is eno-o-o-o-o-o-o-ough...  :bg

Good old machine, when M/B have switches for multiplier :wink
But I guess you are overclock it by system bus?
Title: Re: This is too slow
Post by: dedndave on November 22, 2010, 02:22:22 AM
jumpers
no book - i found them by reading the silkscreen on the m/b   :P
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 02:25:01 AM
Quote from: dedndave on November 22, 2010, 02:22:22 AM
no book - i found them by reading the silkscreen on the m/b   :P

:lol
Title: Re: This is too slow
Post by: GregL on November 22, 2010, 02:32:12 AM
I think MMX, SSE and SSE2 code is OK in the TestBed as long as you test for it and provide alternative code.  Which is what Frank was talking about doing.
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 02:38:20 AM
Quote from: GregL on November 22, 2010, 02:32:12 AM
I think MMX, SSE and SSE2 code is OK in the TestBed as long as you test for it and provide alternative code.  Which is what Frank was talking about doing.

That's right, of course!
But such thing as testbed, I guess - would be better to provide one code path, no need in the different patchs with runtime switching. This will make testbed's code entangled enough, without any reason. Speed isn't critical in the testbed, which runs tests for about of millions clocks.

That's because I have suggested to Frank to use MMX code instead of SSE2 code. He strongly wants to do testbed very fast, and I suggest to do this in way of better compatibility.  :lol
Title: Re: This is too slow
Post by: frktons on November 22, 2010, 08:30:02 AM
Quote from: dedndave on November 22, 2010, 02:09:05 AM
as to the CPU features.....

i am not just talking about the code in the algos
i am seeing MMX code in the TestBed procs

my suggestion is - you may want to re-think using any code in the testbed
program that prevents it from being used on older CPU's
it is nice to be able to run test algos on older machines

good nite Frank   :bg

The Testbed is also my learning  project. This is one of the reason it changes so often  :P

The Conditional assembly will be done the reverse way:

I start with the minimum CPU requisites, and if the user wants to switch to internal
faster PROC, he has to change the switch/es because he should know what machine
he is running. The default settings should be "old enough compatible".  :lol
And if they aren't, somebody will say it.   :P

Quote from: GregL on November 22, 2010, 02:32:12 AM
I think MMX, SSE and SSE2 code is OK in the TestBed as long as you test for it and provide alternative code. 
Which is what Frank was talking about doing.

Yes this is my project. I'll  learn many ways of doing the same thing, and give the prog
sort of flexibility. Alex is rightly proud of his flexible "Algo Manager", but I had not time
to rearrange the code in order to use it. The future is ahead, by  the way.

Quote from: Antariy on November 22, 2010, 02:38:20 AM
But such thing as testbed, I guess - would be better to provide one code path, no need in the different patchs with runtime switching. This will make testbed's code entangled enough, without any reason. Speed isn't critical in the testbed, which runs tests for about of millions clocks.

That's because I have suggested to Frank to use MMX code instead of SSE2 code.
He strongly wants to do testbed very fast, and I suggest to do this in way of better compatibility.  :lol

You are right my friend. But I learn more making more errors than fewer.  :lol
And I should only use old code. This way I can use also more recent one.

MMX is not that young, nobody has noticed it with all the tests done.
SSE2/3/4 are another thing. For those, better to have switches, in order to allow
people who want to try them, to be able to do it.

Because I'll be away most of the week, I leave you the last version, corrected with your
help, that should be able to run on whatever [I'm exaggerating] pc with a pentium ans windows
98 upwards. Sorry not being able to make it DOS compatible and 286 standard  :lol :lol :lol :lol

Frank

Tested with Win XP SP3:

┌─────────────────────────────────────────────────────────────[22-Nov-2010 at 11:29 GMT]─┐
│OS  : Microsoft Windows XP Professional Service Pack 3 (build 2600)                     │
│CPU : Intel(R) Core(TM)2 Duo CPU E4500 @ 2.20GHz with 2 logical core(s) with SSSE3      │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│        Algorithm notes           │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 ustrv$ + GetNumberFormat       │    95   │   35.571 │   35.407 │   35.333 │   35.281 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│02 udw2str + GetNumberFormat      │    65   │   34.130 │   34.297 │   34.346 │   34.246 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│03 wsprintf + GetNumberFormat     │    73   │   41.227 │   41.215 │   41.189 │   41.197 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│04 Clive - IDIV and Stack         │   120   │    2.952 │    2.963 │    2.990 │    3.030 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│05 Clive - reciprocal IMUL        │   157   │    1.931 │    1.975 │    1.932 │    2.012 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│06 Hutch ustr$ + format algo      │   159   │    6.262 │    6.274 │    6.300 │    6.280 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤





Title: Re: This is too slow
Post by: ramguru on November 22, 2010, 01:47:56 PM
kinda strange, when I re-Run the benchmark, timings differ by 1 to 4 seconds, is this considered accurate ?
Title: Re: This is too slow
Post by: frktons on November 22, 2010, 02:56:31 PM
Quote from: ramguru on November 22, 2010, 01:47:56 PM
kinda strange, when I re-Run the benchmark, timings differ by 1 to 4 seconds, is this considered accurate ?

What do you mean 1-4 seconds? The time is calculated in CPU cycles, in billionth of seconds.
If the tests differ 1-4 cycles it is quite normal.
Title: Re: This is too slow
Post by: ramguru on November 22, 2010, 03:01:36 PM
sry, the dot confused me, I meant the number before dot 35.571
I guess that would be 1000 to 4000 cycles, the precision ..
Title: Re: This is too slow
Post by: frktons on November 22, 2010, 03:07:10 PM
Quote from: ramguru on November 22, 2010, 03:01:36 PM
sry, the dot confused me, I meant the number before dot 35.571
I guess that would be 1000 to 4000 cycles, the precision ..

1,000 - 4,000 cycles are about nothing, like millionth os seconds. Quite  normal I guess.
Title: Re: This is too slow
Post by: dedndave on November 22, 2010, 03:16:48 PM
no - not normal
it helps if you restrict execution to a single core during the tests
        INVOKE  GetCurrentProcess
        INVOKE  SetProcessAffinityMask,eax,1

you may want to use GetProcessAffinityMask to restore it when an algo is not running

also - not sure which priority you are running with - HIGH_PRIORITY_CLASS usually works well
Title: Re: This is too slow
Post by: frktons on November 22, 2010, 06:49:43 PM
Quote from: dedndave on November 22, 2010, 03:16:48 PM
no - not normal
it helps if you restrict execution to a single core during the tests
        INVOKE  GetCurrentProcess
        INVOKE  SetProcessAffinityMask,eax,1

you may want to use GetProcessAffinityMask to restore it when an algo is not running

also - not sure which priority you are running with - HIGH_PRIORITY_CLASS usually works well

I'm using Michael Timing Macros, and the program uses:

counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS

I'm not sure about:

        INVOKE  GetCurrentProcess
        INVOKE  SetProcessAffinityMask,eax,1


You should know better than me what the default settings of Michael Macros are.

Title: Re: This is too slow
Post by: dedndave on November 22, 2010, 07:04:55 PM
well - Michael's macros don't mess with the affinity mask
i can tell you this from experience   :P
my machine is known to jump around on timing numbers
i think it has something to do with Media Center
Title: Re: This is too slow
Post by: frktons on November 22, 2010, 09:52:36 PM
Quote from: dedndave on November 22, 2010, 07:04:55 PM
well - Michael's macros don't mess with the affinity mask
i can tell you this from experience   :P
my machine is known to jump around on timing numbers
i think it has something to do with Media Center

What if I leave the default settings in Michael Macros?
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 10:00:31 PM
Quote from: frktons on November 22, 2010, 03:07:10 PM
Quote from: ramguru on November 22, 2010, 03:01:36 PM
sry, the dot confused me, I meant the number before dot 35.571
I guess that would be 1000 to 4000 cycles, the precision ..

1,000 - 4,000 cycles are about nothing, like millionth os seconds. Quite  normal I guess.

For 1000 cycles, 4 cycles of difference is 0.4% of flow. Almost all instruments allow flow +/- 3%. So, timings is quite stable  :lol
Title: Re: This is too slow
Post by: frktons on November 22, 2010, 10:01:29 PM
Quote from: Antariy on November 22, 2010, 10:00:31 PM
Quote from: frktons on November 22, 2010, 03:07:10 PM
Quote from: ramguru on November 22, 2010, 03:01:36 PM
sry, the dot confused me, I meant the number before dot 35.571
I guess that would be 1000 to 4000 cycles, the precision ..

1,000 - 4,000 cycles are about nothing, like millionth os seconds. Quite  normal I guess.

For 1000 cycles, 4 cycles of difference is 0.4% of flow. Almost all instruments allow flow +/- 3%. So, timings is quite stable  :lol


I told them  :lol
Title: Re: This is too slow
Post by: dedndave on November 22, 2010, 10:04:30 PM
the affinity mask has nothing to do with Michael's macros   :bg
you are simply selecting a single core to run the test
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 10:05:06 PM
Quote from: frktons on November 22, 2010, 08:30:02 AM
Alex is rightly proud of his flexible "Algo Manager", but I had not time to rearrange the code in order to use it.

"Proud" is the too strong therm :P

Well, if I'm incorporate the Manager into your latest release, and *if you will  use this tweaked release* for further development - that's will not hard.
It's just impossible to add manager into each new release :P, so, if you will stop for an moment, and accept a manager as "standard" - then things will be simpler. Since at current moment Manager is not "standard" (I have insert in manually) - that's not simple to made this for each step of development.
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 10:11:50 PM
Frank, would be better to implement selection of the one core to test.
If thread of the testbed would be switched to other core after first RDTSC, the timings will not fair, because cores have different counters of clocks.
Title: Re: This is too slow
Post by: frktons on November 22, 2010, 10:15:11 PM
Quote from: Antariy on November 22, 2010, 10:05:06 PM
Well, if I'm incorporate the Manager into your latest release, and *if you will  use this tweaked release*
for further development - that's will not hard.
It's just impossible to add manager into each new release :P, so, if you will stop for an moment,
and accept a manager as "standard" - then things will be simpler. Since at current moment Manager
is not "standard" (I have insert in manually) - that's not simple to made this for each step of development.

I'll be far from my pc for 4-5 days, so maybe this is the right moment to do it, if you want.
After that, I'll make new improvements using your Algo Manager as the standard. It'll
be easy when I have the complete program, with all the algos [6 up to now] already transformed
to work with the Manager, to add other PROCs. options, and algo as well.  :U

Please leave the columns as they are now, don't change them  :eek

Quote from: Antariy on November 22, 2010, 10:11:50 PM
Frank, would be better to implement selection of the one core to test.
If thread of the testbed would be switched to other core after first RDTSC, the timings will not fair,
because cores have different counters of clocks.

What and where should I insert the code?
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 10:24:15 PM
I'll be far from my pc for 4-5 days, so maybe this is the right moment to do it, if you want.  

Maybe :P

with all the algos [6 up to now] already transformed to work with the Manager

My latest this http://www.masm32.com/board/index.php?topic=14871.msg125063#msg125063" release contain all 16 files, which is ready for algos inclusion and testing  :lol

Please leave the columns as they are now, don't change them  :eek

I'm not changed them at previous insertion, what is up??? :eek

What and where should I insert the code?

Some post above, Dave post code with "GetCurrentProcess" - that it is.
Title: Re: This is too slow
Post by: dedndave on November 22, 2010, 10:29:30 PM
selecting a single core is simple
it is also a good idea to do this when reading CPUID values to identify a processor

a good idea to:
1) select a single core
2) insure that CPUID is supported
3) insure that RDTSC is supported

that way, you know that Michael's timing macro will work on the machine
i assume that Alex's code verifies that

        .DATA?

hProc   dd ?       ;current process handle
dwPMask dd ?       ;process affinity mask
dwSMask dd ?       ;system affinity mask

        .CODE

;------------------------------------------------------------------------------
;initialization code section
;------------------------------------------------------------------------------

;get and save current process handle

        INVOKE  GetCurrentProcess
        mov     hProc,eax

;get and save system and process affinity masks

        INVOKE  GetProcessAffinityMask,hProc,offset dwPMask,offset dwSMask

;------------------------------------------------------------------------------

;------------------------------------------------------------------------------
;run timing test code section
;------------------------------------------------------------------------------

;restrict execution to a single core (mask = 1 selects core 0)

        INVOKE  SetProcessAffinityMask,hProc,1

;
;timing test code goes here
;

;restore original process affinity mask

        INVOKE  SetProcessAffinityMask,hProc,dwPMask

;------------------------------------------------------------------------------
Title: Re: This is too slow
Post by: frktons on November 22, 2010, 10:31:32 PM
This is the display my version produces:


┌─────────────────────────────────────────────────────────────[22-Nov-2010 at 22:28 GMT]─┐
│OS  : Microsoft Windows 7 Ultimate Edition, 64-bit (build 7600)                         │
│CPU : Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz with 2 logical core(s) with SSSE3           │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│        Algorithm notes           │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 ustrv$ + GetNumberFormat       │    95   │   44.362 │   44.218 │   44.254 │   44.123 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│02 udw2str + GetNumberFormat      │    65   │   44.190 │   43.981 │   44.181 │   43.973 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│03 wsprintf + GetNumberFormat     │    73   │   50.618 │   50.611 │   50.737 │   50.675 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│04 Clive - IDIV and Stack         │   120   │    3.068 │    3.068 │    3.053 │    3.063 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│05 Clive - reciprocal IMUL        │   157   │    2.024 │    2.006 │    1.970 │    1.972 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│06 Hutch ustr$ + format algo      │   159   │    5.563 │    5.623 │    5.565 │    5.539 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤


This is your version:


┌─────────────────────────────────────────────────────────────[22 Nov 2010 at 22:29 GMT]─┐
│OS  : Microsoft Windows 7 Ultimate Edition, 64-bit (build 7600)                         │
│CPU : Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz with 2 logical core(s) with SSSE3           │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│        Algorithm notes           │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 Alex / MMX                     │  55     │ 5.415    │ 5.409    │ 5.411    │ 5.407    │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│02 Frank / SSE2                   │  45     │ 4.465    │ 4.464    │ 4.464    │ 4.464    │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│03 Here can be your advertisement │  0      │          │          │          │          │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│04 Here can be your advertisement │  0      │          │          │          │          │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│05 Here can be your advertisement │  0      │          │          │          │          │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│06 Here can be your advertisement │  0      │          │          │          │          │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│07 Here can be your advertisement │  0      │          │          │          │          │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤


look carefully and see what I mean.

I would like to know where to put the :


       INVOKE  GetCurrentProcess
       INVOKE  SetProcessAffinityMask,eax,1





Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 10:34:13 PM
Quote from: dedndave on November 22, 2010, 10:29:30 PM
that way, you know that Michael's timing macro will work on the machine
i assume that Alex's code verifies that

Yes, I check for presence of the CPUID. All other code of CPUid routine is i386.

About restoring of the affinity - I'm not sure that this should be done, since program will exit. Just setting affinity as in start of piece, and no restoring after all tests.
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 10:36:55 PM
This is the display my version produces:
...
This is your version:
...
look carefully and see what I mean.

Well, I'm not insert algos about a week - just not seen that Manager is appreciated, or worth for efforts - no feedback - no bothering  :green2

I would like to know where to put the :


At the same start of the program.
Title: Re: This is too slow
Post by: dedndave on November 22, 2010, 10:38:17 PM
no - the program will not exit
it simply allows use of all cores during execution of the rest of the TestBed code

Michael's macros use RDTSC
if it is executed on a machine that does not support RDTSC, it will hang
Title: Re: This is too slow
Post by: dedndave on November 22, 2010, 10:39:10 PM
Frank - maybe you missed this post...

http://www.masm32.com/board/index.php?topic=15365.msg126090#msg126090
Title: Re: This is too slow
Post by: dedndave on November 22, 2010, 10:41:38 PM
RDTSC support may be verified by reading

CPUID with EAX = 1
EDX, bit 4
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 10:41:50 PM
Quote from: dedndave on November 22, 2010, 10:38:17 PM
no - the program will not exit
it simply allows use of all cores during execution of the rest of the TestBed code

Michael's macros use RDTSC
if it is executed on a machine that does not support RDTSC, it will hang

The program will run forever? :P No, after all it will exit, and running of post-testing code would be possible for power of one code  :lol

I honestly hope, that nobody will run the TestBed on i486 machine :green2
Title: Re: This is too slow
Post by: frktons on November 22, 2010, 10:42:53 PM
Quote from: Antariy on November 22, 2010, 10:36:55 PM
This is the display my version produces:
...
This is your version:
...
look carefully and see what I mean.

Well, I'm not insert algos about a week - just not seen that Manager is appreciated, or worth for efforts - no feedback - no bothering  :green2

I would like to know where to put the :


        INVOKE  GetCurrentProcess
        INVOKE  SetProcessAffinityMask,eax,1
[/color]

At the same start of the program.


I told you that your manager will be the standard. I don't care about too much feed-back, and we had
already a lot of feed-back.
Dave is giving us a lot of feed-back, and others have done as well: oex, ramguru, Michaelw, clive, GregL
Hutch, jj2007 and so on.

I'll leave the Tesbed as it is in your hands. Modify the Manager, and leave the columns as they are,
eliminate those rows about "advertisement", and make any optimization you want.

I'll take back control of Testbed in about 2 weeks. During this time do whatever you like with it.  :lol
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 10:43:03 PM
Quote from: dedndave on November 22, 2010, 10:41:38 PM
RDTSC support may be verified by reading

CPUID with EAX = 1
EDX, bit 4

Pentium has it, AFAIK
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 10:46:44 PM
I told you that your manager will be the standard. I don't care about too much feed-back, and we had
already a lot of feed-back...

...feedback not about manager :lol

I'll leave the Tesbed as it is in your hands. Modify the Manager, and leave the columns as they are,
eliminate those rows about "advertisement", and make any optimization you want.

Why did you dislike the "advertisement" strings? The is very funny, in style of modern world :green2

Title: Re: This is too slow
Post by: frktons on November 22, 2010, 10:47:27 PM
Quote from: dedndave on November 22, 2010, 10:39:10 PM
Frank - maybe you missed this post...

http://www.masm32.com/board/index.php?topic=15365.msg126090#msg126090

I didn't. Read what I posted.  :U
Title: Re: This is too slow
Post by: frktons on November 22, 2010, 10:50:02 PM
Quote from: Antariy on November 22, 2010, 10:46:44 PM
I told you that your manager will be the standard. I don't care about too much feed-back, and we had
already a lot of feed-back...

...feedback not about manager :lol

I'll leave the Tesbed as it is in your hands. Modify the Manager, and leave the columns as they are,
eliminate those rows about "advertisement", and make any optimization you want.

Why did you dislike the "advertisement" strings? The is very funny, in style of modern world :green2

Ask Dave what he thinks about advertisement. I'm preparing the luggage, because tomorrow I have a flight
to catch. No Testbed development for about 2 weeks. Have a nice optimization trip with Dave. I think he
could be a good advisor.  :P But don't believe everything he says.  :lol
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 10:53:04 PM
Quote from: frktons on November 22, 2010, 10:50:02 PM
Ask Dave what he thinks about advertisement.

Dave, what you think about Manager's default description for the non-existent algos? Did you run it? Then you shoud see image as in the TV  :bg
Title: Re: This is too slow
Post by: dedndave on November 22, 2010, 10:54:31 PM
yes - i was waiting to hear...
"Oh, I wish I were an Oscar Meyer hot dog
for that is truly what I want to be......."


listen guys....
CPUID, RDTSC, affinity....
i am just giving my best knowledge and info
you can use it or ignore it   :bg

there are other ways to acquire more stable result numbers
but, if you are not interested in the basics, i know you won't be interested in those
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 10:56:18 PM
Quote from: dedndave on November 22, 2010, 10:54:31 PM
yes - i was waiting to hear...
"Oh, I wish I were an Oscar Meyer hot dog
for that is truly what I want to be......."

So, it is quite useful as description, or it is annoying? Second is preferable, and if it is - the it will be that  :green2
Title: Re: This is too slow
Post by: frktons on November 22, 2010, 10:56:33 PM
Quote from: dedndave on November 22, 2010, 10:54:31 PM
yes - i was waiting to hear...
"Oh, I wish I were an Oscar Meyer hot dog
for that is truly what I want to be......."

In other words he thinks you can throw that rubbish where you like, but not in the Testbed.  :lol
Title: Re: This is too slow
Post by: hutch-- on November 22, 2010, 10:56:38 PM
Frank,

Make sure you enjoy yourself while you are away.
Title: Re: This is too slow
Post by: dedndave on November 22, 2010, 10:58:03 PM
that is a song, or as we call it, a "jingle"
a jingle is a song from an advertisement
you have never heard this song ????   :P
watch the movie "Demolition Man"
Title: Re: This is too slow
Post by: frktons on November 22, 2010, 10:58:52 PM
Quote from: dedndave on November 22, 2010, 10:54:31 PM
listen guys....
CPUID, RDTSC, affinity....
i am just giving my best knowledge and info
you can use it or ignore it   :bg

there are other ways to acquire more stable result numbers
but, if you are not interested in the basics, i know you won't be interested in those

We are interested Dave. Just give Alex the time to understand, his english is not
like his ASM, you know?

Quote from: hutch-- on November 22, 2010, 10:56:38 PM
Frank,

Make sure you enjoy yourself while you are away.

I promise.  :P

Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 10:58:58 PM
Quote from: frktons on November 22, 2010, 10:56:33 PM
In other words he thinks you can throw that rubbish where you like, but not in the Testbed.  :lol

That's very interesting, well formatted for the testbed, funny and simple thing, not rubbish :(
Title: Re: This is too slow
Post by: frktons on November 22, 2010, 11:01:21 PM
Quote from: Antariy on November 22, 2010, 10:58:58 PM
That's very interesting, well formatted for the testbed, funny and simple thing, not rubbish :(

Take care of respecting the columns, the numbers alignment and don't care too much about
what you think is funny to put into the display.  :U

You can have your own version with advertisement and songs. Not the Testbed that will
be used in the forum.
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 11:02:41 PM
Quote from: dedndave on November 22, 2010, 10:54:31 PM
listen guys....
CPUID, RDTSC, affinity....
i am just giving my best knowledge and info
you can use it or ignore it   :bg

Dave, I appreciate your suggestion for affinity, it is right :U Just read some my previous posts :P
But I'm really guess - nobody will run the testbed on i486... So, we can safely decise (and jump over Pentium) of having PMMX as minimal test machine. Or not? :eek
Title: Re: This is too slow
Post by: frktons on November 22, 2010, 11:04:00 PM
Quote from: Antariy on November 22, 2010, 11:02:41 PM
Quote from: dedndave on November 22, 2010, 10:54:31 PM
listen guys....
CPUID, RDTSC, affinity....
i am just giving my best knowledge and info
you can use it or ignore it   :bg

Dave, I appreciate your suggestion for affinity, it is right :U Just read some my previous posts :P
But I'm really guess - nobody will run the testbed on i486... So, we can safely decise (and jump over Pentium) of having PMMX as minimal test machine. Or not? :eek


I agree  :U
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 11:04:22 PM
Quote from: frktons on November 22, 2010, 11:01:21 PM
You can have your own version with advertisement and songs.

Yeah! Playback of some MIDI or MP3 would be very nice :P
Title: Re: This is too slow
Post by: frktons on November 22, 2010, 11:08:45 PM
Quote from: Antariy on November 22, 2010, 11:04:22 PM
Quote from: frktons on November 22, 2010, 11:01:21 PM
You can have your own version with advertisement and songs.

Yeah! Playback of some MIDI or MP3 would be very nice :P


We can implement a "P"lay key to choose a midi or mp3 to play while doing the tests, but the tests
will be a little bit out of precision then.  :lol
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 11:09:02 PM
Quote from: frktons on November 22, 2010, 10:58:52 PM
Quote from: hutch-- on November 22, 2010, 10:56:38 PM
Frank,

Make sure you enjoy yourself while you are away.

I promise.  :P

Don't pick off a small piece of Colloseum as souvenir :P :lol
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 11:10:10 PM
Quote from: frktons on November 22, 2010, 11:08:45 PM
We can implement a "P"lay key to choose a midi or mp3 to play while doing the tests, but the tests will be a little bit out of precision then.  :lol

Or just test the MP3 decoding algos only  :lol
Title: Re: This is too slow
Post by: frktons on November 22, 2010, 11:10:57 PM
Quote from: Antariy on November 22, 2010, 11:09:02 PM
Don't pick off a small piece of Colloseum as souvenir :P :lol

Not really. I have some friends to meet, and it will be better than stones.  :wink


Quote from: Antariy on November 22, 2010, 11:10:10 PM
Quote from: frktons on November 22, 2010, 11:08:45 PM
We can implement a "P"lay key to choose a midi or mp3 to play while doing the tests, but the tests will be a little bit out of precision then.  :lol

Or just test the MP3 decoding algos only  :lol


That's good.  :U

Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 11:15:08 PM
Quote from: frktons on November 22, 2010, 11:10:57 PM

Not really. I have some friends to meet, and it will be better than stones.  :wink


Well, with help of friends you can take a very big stone of Colloseum as souvenir then. :lol
Title: Re: This is too slow
Post by: frktons on November 22, 2010, 11:21:20 PM
Quote from: Antariy on November 22, 2010, 11:15:08 PM
Well, with help of friends you can take a very big stone of Colloseum as souvenir then. :lol

We can transport Colosseum to my town if we want.  :P
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 11:27:51 PM
Quote from: frktons on November 22, 2010, 11:21:20 PM
We can transport Colosseum to my town if we want.  :P

:lol
Title: Re: This is too slow
Post by: frktons on November 22, 2010, 11:34:18 PM
By the way, I've already written on paper the code to highlight the best
performing algo row. I'm afraid I can't send the image to you because I don't have a scanner
at home.

When I'll be back home, if you have inserted the Manager, I'll implement the highlight of the
code, assuming I'll be able to grasp your code and your comment in cyrillic english  :lol
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 11:36:52 PM
Quote from: frktons on November 22, 2010, 11:34:18 PM
your comment in cyrillic english  :lol

:green2 :green2 :green2 :green2 :green2 :green2 :green2 :green2 :green2 :green2 :green2
Title: Re: This is too slow
Post by: frktons on November 22, 2010, 11:39:17 PM
Maybe it is easier if you write comments in russian and I google translate them or ask a russian
friend of mine to do it for me.  :dazzled:
Title: Re: This is too slow
Post by: dedndave on November 22, 2010, 11:48:08 PM
Jochen lives over there
someday, if i get to Italy, i may stop and visit him   :P

(http://easu.jrc.ec.europa.eu/eas/downloads/images/Staff_Photos_EAS_Images_jesinghaus.jpg)

http://easu.jrc.ec.europa.eu/eas/sipa/staff/index.htm
Title: Re: This is too slow
Post by: frktons on November 22, 2010, 11:55:10 PM
If I see around somebody who looks like Jochen I'll offer a coffee to him.  :bg
Title: Re: This is too slow
Post by: Antariy on November 22, 2010, 11:59:29 PM
Quote from: dedndave on November 22, 2010, 11:48:08 PM
Jochen lives over there
someday, if i get to Italy, i may stop and visit him   :P

Give Dave's photo, Dave's photo - scan of the public  :P
Title: Re: This is too slow
Post by: oex on November 22, 2010, 11:59:48 PM
:lol is he in charge of the website?

http://easu.jrc.ec.europa.eu/eas/sipa/staff/jochen.jesinghaus'at'jrc.ec.europa.eu
Title: Re: This is too slow
Post by: Antariy on November 23, 2010, 12:02:48 AM
Quote from: oex on November 22, 2010, 11:59:48 PM
:lol is he in charge of the website?

http://easu.jrc.ec.europa.eu/eas/sipa/staff/jochen.jesinghaus'at'jrc.ec.europa.eu

Link is not work :P

Give Dave's photo... :P
Title: Re: This is too slow
Post by: dedndave on November 23, 2010, 12:03:04 AM
that is the mail link
they do that to reduce spam   :U
Title: Re: This is too slow
Post by: dedndave on November 23, 2010, 12:05:33 AM
http://www.qrz.com/db/K7NL
click on the pic to enlarge - i assume no responsibility if it breaks your monitor

:P
Title: Re: This is too slow
Post by: oex on November 23, 2010, 12:08:18 AM
I think this is also Dave :lol

(http://www.hereford.tv/dave.png)
Title: Re: This is too slow
Post by: dedndave on November 23, 2010, 12:10:18 AM
yes - that is me as Count Dracula
as you can see, i now have a beard and glasses   :bg
(and thick eyebrows)
Title: Re: This is too slow
Post by: oex on November 23, 2010, 12:11:12 AM
.... and pointy ears :lol

Much better picture I think :lol.... I'm a little concerned as to what happened to your jaw though....

It's amazing the things that come back to 'haunt' you isnt it :lol

I have no picture fortunately I am an AI entity lost in a repetative loop on this forum :bg
Title: Re: This is too slow
Post by: frktons on November 23, 2010, 12:14:45 AM
Quote from: dedndave on November 23, 2010, 12:05:33 AM
http://www.qrz.com/db/K7NL
click on the pic to enlarge - i assume no responsibility if it breaks your monitor

:P

Quite scaring indeed  :lol
Title: Re: This is too slow
Post by: Antariy on November 23, 2010, 12:15:40 AM
Quote from: oex on November 23, 2010, 12:08:18 AM
I think this is also Dave :lol

(http://www.hereford.tv/dave.png)

That is Dave which simulate the Dracula, or this is the Dracula which simulate the Dave???  :eek

Just kidding  :bg
Title: Re: This is too slow
Post by: dedndave on November 23, 2010, 12:58:15 AM
that is a Dracula emulation - not simulation   :lol
Title: Re: This is too slow
Post by: Antariy on November 23, 2010, 01:00:56 AM
Quote from: dedndave on November 23, 2010, 12:58:15 AM
that is a Dracula emulation - not simulation   :lol

:lol
Title: Re: This is too slow
Post by: dedndave on November 23, 2010, 01:03:06 AM
but, you are missing the other 2 Daves   :P

(http://img98.imageshack.us/img98/8356/monsterrap.jpg)


doesn't Zara make a kick-ass witch !!!

(http://www.nestreetriders.com/forum/images/smilies/threadjacked.gif)
Title: Re: This is too slow
Post by: Antariy on November 23, 2010, 01:08:47 AM
Quote from: dedndave on November 23, 2010, 01:03:06 AM
but, you are missing the other 2 Daves   :P

(http://img98.imageshack.us/img98/8356/monsterrap.jpg)


doesn't Zara make a kick-ass witch !!!

:U
Title: Re: This is too slow
Post by: Antariy on November 24, 2010, 01:30:18 AM
Algo Manager : The Return: http://www.masm32.com/board/index.php?topic=14871.msg126258#msg126258

:P

:bg
Title: Re: This is too slow
Post by: Antariy on November 24, 2010, 10:03:03 PM
Quote from: dedndave on November 23, 2010, 01:03:06 AM
(http://www.nestreetriders.com/forum/images/smilies/threadjacked.gif)

By the way - this funny image is not optimized yet. It can have 2 times smaller code size, really :P
Title: Re: This is too slow
Post by: dedndave on November 24, 2010, 10:26:28 PM
i think it was an animated gif that someone screwed up - lol

speaking of icons, here is one for the testbed program   :P

(http://img202.imageshack.us/img202/8241/tbicon.png)

what a handsome devil !
Title: Re: This is too slow
Post by: Antariy on November 24, 2010, 10:28:46 PM
Quote from: dedndave on November 24, 2010, 10:26:28 PM
i think it was an animated gif that someone screwed up - lol

speaking of icons, here is one for the testbed program   :P

(http://img202.imageshack.us/img202/8241/tbicon.png)

what a handsome devil

No, it is just one-frame GIF file, with can be smaller 2.14 times at least, and have no differencies to human's look.

Icon is good enough, but too dark... Some brightness should be adjusted :P
Title: Re: This is too slow
Post by: Antariy on November 24, 2010, 10:30:40 PM
48x48 would be better :P
Title: Re: This is too slow
Post by: dedndave on November 24, 2010, 10:44:52 PM
(http://img256.imageshack.us/img256/3863/clock1296128.png)
Title: Re: This is too slow
Post by: Antariy on November 24, 2010, 10:48:34 PM
Quote from: dedndave on November 24, 2010, 10:44:52 PM
(http://img256.imageshack.us/img256/3863/clock1296128.png)

This is not Dave, apparently  :lol
Title: Re: This is too slow
Post by: dedndave on November 24, 2010, 10:49:24 PM
notice how the bg is transparent ?
Title: Re: This is too slow
Post by: Antariy on November 24, 2010, 10:53:57 PM
Quote from: dedndave on November 24, 2010, 10:49:24 PM
notice how the bg is transparent ?

Yes, I have noticed that it is not transparent. Just my current browser is not support of Alpha-channel transparency by default, it is needed some tricks with JS to display transparency via alpha-channel. Current image with current browser is not transparent. I know - background should be transparent, and clocks have a shadow, which is smoothed via alpha-channel, too.

P.S. FireFox is support alpha-channel, that is reason why you see transparency :P