News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Replacement for atodw and atodw_ex test pieces.

Started by hutch--, July 31, 2010, 11:24:17 AM

Previous topic - Next topic

lingo

"I swapped the short atou to the longer version, unrolled JJs algo by 8 and changed the algo order and got this result on the i7."

Thank you Hutch, but you used my old file. Sorry...
Real code size of my algo is:  180 bytes for atodL
So, please, re-download last asc2bin_testbed1.zip file and test it again. Thanks. :wink

hutch--

I had used the new version but in any case here is another benchmark that includes your new unrolled version and Rockoon's unrolled version. Interestingly enough the i7 runs all of the full unrolls in the same timing. Your algo is faster on the 2 P4s but a bit slower on the Core2 quad.


i7 quad.
=======
172 atou
156 atodL
187 Axa2l
203 atodJJ
156 atou_ex
156 atou_rock
172 atou
156 atodL
187 Axa2l
203 atodJJ
156 atou_ex
156 atou_rock
171 atou
156 atodL
188 Axa2l
202 atodJJ
156 atou_ex
156 atou_rock
172 atou
156 atodL
187 Axa2l
203 atodJJ
156 atou_ex
156 atou_rock


171 ms average atou
156 ms average atodL
202 ms average atodJJ
156 ms average atou_ex
187 ms average Axa2l
156 ms average atou_rock
Press any key to continue ...

Core2 Quad.
==========
188 atou
203 atodL
203 Axa2l
219 atodJJ
218 atou_ex
188 atou_rock
187 atou
203 atodL
204 Axa2l
203 atodJJ
234 atou_ex
203 atou_rock
188 atou
203 atodL
219 Axa2l
218 atodJJ
203 atou_ex
219 atou_rock
188 atou
203 atodL
203 Axa2l
219 atodJJ
203 atou_ex
187 atou_rock

187 ms average atou
203 ms average atodL
214 ms average atodJJ
214 ms average atou_ex
207 ms average Axa2l
199 ms average atou_rock
Press any key to continue ...

Prescott P4.
===========
359 atou
297 atodL
344 Axa2l
500 atodJJ
359 atou_ex
329 atou_rock
375 atou
296 atodL
344 Axa2l
516 atodJJ
359 atou_ex
328 atou_rock
375 atou
297 atodL
344 Axa2l
484 atodJJ
360 atou_ex
328 atou_rock
375 atou
297 atodL
328 Axa2l
500 atodJJ
359 atou_ex
328 atou_rock

371 ms average atou
296 ms average atodL
500 ms average atodJJ
359 ms average atou_ex
340 ms average Axa2l
328 ms average atou_rock
Press any key to continue ...

Northwood P4.
============
625 atou
438 atodL
484 Axa2l
703 atodJJ
563 atou_ex
484 atou_rock
641 atou
453 atodL
516 Axa2l
703 atodJJ
562 atou_ex
469 atou_rock
641 atou
437 atodL
563 Axa2l
672 atodJJ
562 atou_ex
484 atou_rock
563 atou
453 atodL
531 Axa2l
703 atodJJ
563 atou_ex
484 atou_rock


617 ms average atou
445 ms average atodL
695 ms average atodJJ
562 ms average atou_ex
523 ms average Axa2l
480 ms average atou_rock
Press any key to continue ...

Antique Celeron.
===============
1102 atou
1092 atodL
1042 Axa2l
972 atodJJ
851 atou_ex
791 atou_rock
1112 atou
1092 atodL
1062 Axa2l
952 atodJJ
871 atou_ex
781 atou_rock
1092 atou
1122 atodL
1042 Axa2l
972 atodJJ
851 atou_ex
791 atou_rock
1082 atou
1112 atodL
1041 Axa2l
982 atodJJ
842 atou_ex
791 atou_rock


1097 ms average atou
1104 ms average atodL
969 ms average atodJJ
853 ms average atou_ex
1046 ms average Axa2l
788 ms average atou_rock
Press any key to continue ...


Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

lingo

"I had used the new version but"

Thank you Hutch, but you used my new version algo for JJ testing program in your testing program and it is
wrong.  Sorry...
I have second version for your testing program and it is in the my last bm7.zip file
Note: these are just for Intel users.

Soon, I will post similar two algos (bmx.exe and asc2binx) for the AMD users.

"I am pleased you are catching up instead of talking."
Are you pleased now?  :lol

Antariy

Hutch, here new version of my proc.
It seems, that unrolling by 4 - the same good (on my machine).


OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

align 16
    repeat 12
    ret
    endm
Axa2l proc lpszStr:DWORD
;int 3
    mov edx,[esp+4]
        mov eax,0
       
        movzx ecx,byte ptr [edx]
        add ecx,-30h
       
        js @done

        add edx,1

      @mainloop:
        lea eax,[eax+eax*4]
        lea eax,[eax*2+ecx]
        movzx ecx,byte ptr [edx]
        add ecx,-30h
        js @done
       
        lea eax,[eax+eax*4]
        lea eax,[eax*2+ecx]
        movzx ecx,byte ptr [edx+1]
        add ecx,-30h
        js @done
       
        lea eax,[eax+eax*4]
        lea eax,[eax*2+ecx]
        movzx ecx,byte ptr [edx+2]
        add ecx,-30h
        js @done

        lea eax,[eax+eax*4]
        lea eax,[eax*2+ecx]
        movzx ecx,byte ptr [edx+3]
        add edx,4
        add ecx,-30h
        jns @mainloop

      @done:       

    ret 4
Axa2l endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef




Alex

lingo

I reordered something in my algo and now it works well enough with AMD too. Please, try it to receive the final results. Thank you.
Note:Due to Hutch's testing program is very memory usage sensitive please close all other programs and start the test minimum 3 times. Get the best values. Thanks. :toothy



Intel(R) Core(TM)2 Duo CPU     E8500  @ 3.16GHz (SSE4)
C:\8>bmu6
172 atou
140 atodL
219 Axa2l
203 atodJJ
171 atou_ex
172 atou_rock
171 atou
141 atodL
218 Axa2l
203 atodJJ
172 atou_ex
156 atou_rock
171 atou
141 atodL
202 Axa2l
188 atodJJ
171 atou_ex
172 atou_rock
234 atou
140 atodL
219 Axa2l
202 atodJJ
188 atou_ex
156 atou_rock

187 ms average atou
140 ms average atodL
199 ms average atodJJ
175 ms average atou_ex
214 ms average Axa2l
164 ms average atou_rock
Press any key to continue ...

AMD Turion(tm) 64 Mobile Technology ML-30 (SSE3)
C:\8>bmu6
703 atou
437 atodL
812 Axa2l
672 atodJJ
532 atou_ex
484 atou_rock
703 atou
438 atodL
812 Axa2l
672 atodJJ
531 atou_ex
485 atou_rock
703 atou
437 atodL
813 Axa2l
672 atodJJ
531 atou_ex
484 atou_rock
703 atou
438 atodL
812 Axa2l
672 atodJJ
531 atou_ex
485 atou_rock

703 ms average atou
437 ms average atodL
672 ms average atodJJ
531 ms average atou_ex
812 ms average Axa2l
484 ms average atou_rock
Press any key to continue ...

C:\8>asc2bin_testbed2
Intel(R) Core(TM)2 Duo CPU     E8500  @ 3.16GHz (SSE4)
158     cycles for 10*Lingo's replacement
184     cycles for 10*atou
175     cycles for 10*atou_rock
258     cycles for 10*atodJJ

159     cycles for 10*Lingo's replacement
186     cycles for 10*atou
177     cycles for 10*atou_rock
260     cycles for 10*atodJJ

Code size:
118      bytes for atodL
52       bytes for atou
158      bytes for atou_rock
24       bytes for atodJJ

--- ok ---

C:\8>asc2bin_testbed2
AMD Turion(tm) 64 Mobile Technology ML-30 (SSE3)
240     cycles for 10*Lingo's replacement
304     cycles for 10*atou
314     cycles for 10*atou_rock
312     cycles for 10*atodJJ

240     cycles for 10*Lingo's replacement
299     cycles for 10*atou
306     cycles for 10*atou_rock
293     cycles for 10*atodJJ

Code size:
118      bytes for atodL
52       bytes for atou
158      bytes for atou_rock
24       bytes for atodJJ

--- ok ---




Antariy

Lingo, you use other version of my proc, see 2 posts about for newer version. Thanks.


719 atou
641 atodL
656 Axa2l
937 atodJJ
672 atou_ex
625 atou_rock
688 atou
625 atodL
625 Axa2l
953 atodJJ
672 atou_ex
625 atou_rock
687 atou
625 atodL
641 Axa2l
906 atodJJ
672 atou_ex
609 atou_rock
688 atou
625 atodL
641 Axa2l
921 atodJJ
672 atou_ex
610 atou_rock


695 ms average atou
629 ms average atodL
929 ms average atodJJ
672 ms average atou_ex
640 ms average Axa2l
617 ms average atou_rock
Press any key to continue ...




Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
491     cycles for 10*Lingo's replacement
489     cycles for 10*atou
458     cycles for 10*atou_rock
882     cycles for 10*atodJJ

487     cycles for 10*Lingo's replacement
490     cycles for 10*atou
458     cycles for 10*atou_rock
888     cycles for 10*atodJJ

Code size:
118      bytes for atodL
52       bytes for atou
158      bytes for atou_rock
24       bytes for atodJJ

--- ok ---


There is best values, for MY old proc, of course  :toothy



Alex

hutch--

#81
Ignore this posting, I made a mess of the algos that should have been used.

Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Rockoon

AMD Phenom(tm) II X6 1055T Processor
1420 atodw library
406 atou_rock
452 Alex
344 Lingo
405 atou_ex
When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

hutch--

Note there is something wrong with the results in terms of correctness testing in the last benchmark.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

hutch--

you are right, read the above posts.  :dazzled:
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Antariy

Hutch, see this new Axa2l algo.
This is true unrolled and data-controlled algo. It work in paradigm of OOP: "Data controls execution process". And it works slow in my machine... So, one more provement of slowness of OOP algos :)

But it interesting, See and test it, please. It check for string length, and return 0 in ecx, if length is wrong (biggest than 10bytes). If ecx not 0 - then length is OK.
I attach archive. This is old-test bed, but you may copy proc to yours newest and improved test-beds.
Interesting, which timings it have on newest CPUs...



Alex
P.S. This algo is just for fun :)

Rockoon

It would be nice to have one where size is known priori, as well as one that terminates on any non-digit rather than on null.

I think in most common large string processing scenarios you are dealing with a large file that just isnt going to have nice convenient nulls in it, so you are either processing it as-is (size not known, but terminator is any non-digit) or have indexed the data already (size is known)
When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

hutch--

Rockoon,

If all of the data is in the correct form (IE does not need to be parsed from other test) its easy enough to tokenise it in one pass.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

Quote from: Rockoon on August 13, 2010, 12:35:14 PM
It would be nice to have one where size is known priori, as well as one that terminates on any non-digit rather than on null.

"If all of the data is in the correct form..."

That is ambitious. For inspiration, attached a list of examples for valid number formats. MasmBasic reads them all correctly, but it took me a while :green

hutch--

JJ,

No doubt but the parser is another animal, not just a conversion.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php