News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Looking for better "atol" algo.

Started by hutch--, August 12, 2010, 01:43:36 AM

Previous topic - Next topic

hutch--

Here is the problem for Lingo, his algorithms best suits a well known test piece that uses an interpreted result and that is what makes it easier to manipulate. Where he comes unstuck is with real time testing where the trickery to "game" the test mechanism does not work.

Like it or lump it qword's algo is simply faster on late model hardware, lingo's is faster than my unrolled version on a i7 but slower on a Core2 quad.

Now there are enough programmers around here who can read assembler code to get through the attempt at propaganda, the test piece runs a nominated processor core, it preloads each algo with the identical code, it runs the same inter-algorithm padding and it runs each algo on the same data so the only problem Lingo has is he cannot fiddle the results, qword's algo is simply faster and by a reasonable amount.

Now this may suit Lingo in trying to be a superstar with the test pieces but real time testing is to determine which algo is fastest in the type of context where it will be used.

Lingo is much better at fast code than fast talking, perhaps he should keep it that way.  :bg
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

lingo

Don't trust to Hutch if he doesn't use MichelW-JJ testing program. It is a rough manipulation.[/u]


" it runs the same inter-algorithm padding and it runs each algo on the same data so the only problem Lingo has is he cannot fiddle the results, qword's algo is simply faster and by a reasonable amount."

qword's algo is simply slower with your new "testing" program compared with qword's algo with your old "testing"
program.(everyone can see and compare the "times")

"Where he comes unstuck is with real time testing where the trickery to "game" the test mechanism does not work."

Your "test mechanism" is just a garbage because no one (including you) can't receive two similar results for the
qWord's algo...(see my files batol2.zip and batol3.zip)

"makes it easier to manipulate. "
It is just your bad emotions...will you be so kind to explain to us how to do that?  :lol

hutch--

 :bg

Welcome to the real world, that is what real time testing is all about, instead of nice easy comparisons you get them with all of the irritations of real time and this is the only comparison that mattters.

Like it or lump it qword's algo is the fastest by a reasonable amount.

You may desire to be a superstar with benchmarks but what about somewthing useful in real time ?
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

lingo

Don't trust to Hutch if he doesn't use MichelW-JJ testing program. It is a rough manipulation.[/u]

"but what about somewthing useful in real time ?"

What Real/unreal time and blah,bla,blab..when you can't receive two similar results for other's algos with your garbage?
I can't see the answer of my question "will you be so kind to explain to us how to manipulate the MichelW's program?"
...and why you reject it?

Hutch,
All normal programs are not allowed to work without passing the different types of tests.
So it would be better not to use your program before being tested, corrected and approved by people like MichelW, A.Fog etc ...
Otherwise, I will continue to treat it like a manipulating  technique  with stupid, precipitance and weak defense at any price. :toothy

hutch--

 :bg

Yes yes, we all know you can type with BOLD TEXT but you lack the information behind what you assert. Interpreted test techniques that show your algos as faster than the rest may suit your purpose but they do not demonstrate how the algorithm runs in real time.

Why do timings vary in real time ? (a) Because computers vary in real time, thats why you run a specific set of timings in one test piece at one time. (b) Timings in ring3 vary depending on the OS task switching and Core allocation.

Why do you preload the processor ? Because processors do not normally sit idling while your algo is waiting to run.

Why do you set the core on the processor ? So that the load on the core you are testing is isolated from task switching and variable processor work loads.

RE: Test methods.

The original test bed that Michael designed was intended to time small sections of code. It must make assumptions to do this and in part it does so by testing the bare framework first to time it then tests the algos and does the arithmetic to calculate the difference.

When you feed complerte algos through the technique you have granularity problems and you are making assumptions about machine cycles that have not been valid since the original i486.

Testing in real time has different problems, it has to be run long enough because the granularity is too coarse and it is subject to real time fluctuation from task switching, OS loadings at a higher privelege level etc etc ....

You may aspire to be the master of UNREAL TIME but computers run code in REAL TIME so for all of the problems of testing in real time, the alternative is deluding yourself based on the theories that best suit what you are after.

Now remember to type a response using BOLD TEXT, wild assertions, don't waste your time trying to write an algo as fast as qword's fast version.  :P
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

lingo

"don't waste your time trying to write an algo as fast as qword's fast version"

I don't care about your opinion because you always talk about things that you don't understand.
For example: may I see just one your fastest algo with real/unreal testing?  :lol
It is sad when someone understand his time for speed optimization is over but..c'est la vie. :(

Yes, qword's algo is very fast ( similar to atou_rock by Rockoon but without ebx register)
I was the first and certainly the last user of your "real time" garbage.  In my first file with your garbage (batol1.zip) my algo was different with equal time with qword's algo.
After some modifications the result did change for me.(203/218) (see my file bato2.zip)  with your "old" garbage. Just don't be so emotional and  read again prev. posts.
When you saw that you did modify your "old " garbage("real" time) in your "new" garbage^2 ("super real time"^2)  (in your testbed1.zip file) just to manipulate the people with
very rough, heavily biased and distorted results. With your new garbage you can manipulate just some newbies without enough experience.
Hence, don't  worry about my time because the results from your "old" garbage and from MichelW-JJ's program are still enough for me. :lol


hutch--

 :bg

> Hence, don't  worry about my time because the results from your "old" garbage and from MichelW-JJ's program are still enough for me.

I am pleased that you are so easily pleased !.

Now its a shame that you cannot deliver in real time like qword did with his fast algo.  :lol

Now tell us, why are you still assuming the machine cycles mean anything, is it to complex to understand that scheduling is where the action is ?
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

lingo

#22
"Now its a shame that you cannot deliver in real time like qword did with his fast algo."

If I want I can beat everyone independently with or without your "testing" garbage and independently of it's current version.  :lol
But the problem is in you again. Hurry up to do modification  of you "new" garbage^2 ("super real time"^2) to "hyper new garbage"^3 ("hyper real time"^3)
and continue to manipulate the people.  :lol
For me is easy to change some rows and voila...new results: :lol
-------------
Timing result
-------------
312 atol
219 atolsL
265 atol_2
312 atol
218 atolsL
249 atol_2
312 atol
219 atolsL
250 atol_2
312 atol
219 atolsL
250 atol_2
Press any key to continue ...



hutch--

#23
 :bg

Your modified test piece run on my i7.


-------------
Timing result
-------------
327 atol
250 atolsL
234 atol_2
296 atol
249 atolsL
234 atol_2
296 atol
249 atolsL
234 atol_2
297 atol
250 atolsL
234 atol_2
Press any key to continue ...


You are getting faster on the Core2 though.  :P

One day you will write code as fast as qword.  :bg

PSSSST, sorry I forgot the BOLD TEXT !

LATER : here is a double unrolled by 10 version for you to play with, try it out in your preferred test bed so we can see what you want to do with it.


; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

iapadd

align 16

atol proc lpSrc:DWORD

    mov edx, [esp+4]
    movzx ecx, BYTE PTR [edx]

    cmp ecx, "-"
    jne lb0

    movzx ecx, BYTE PTR [edx+1]
    add edx, 1
    xor eax, eax
    mov DWORD PTR [esp+4], 1
    sub ecx, 48
    jc  lbl2

  lbl1:
    lea eax, [eax+eax*4]
    lea eax, [ecx+eax*2]
    movzx ecx, BYTE PTR [edx+1]
    sub ecx, 48
    jc  lbl2

    lea eax, [eax+eax*4]
    lea eax, [ecx+eax*2]
    movzx ecx, BYTE PTR [edx+2]
    sub ecx, 48
    jc  lbl2

    lea eax, [eax+eax*4]
    lea eax, [ecx+eax*2]
    movzx ecx, BYTE PTR [edx+3]
    sub ecx, 48
    jc  lbl2

    lea eax, [eax+eax*4]
    lea eax, [ecx+eax*2]
    movzx ecx, BYTE PTR [edx+4]
    sub ecx, 48
    jc  lbl2

    lea eax, [eax+eax*4]
    lea eax, [ecx+eax*2]
    movzx ecx, BYTE PTR [edx+5]
    sub ecx, 48
    jc  lbl2

    lea eax, [eax+eax*4]
    lea eax, [ecx+eax*2]
    movzx ecx, BYTE PTR [edx+6]
    sub ecx, 48
    jc  lbl2

    lea eax, [eax+eax*4]
    lea eax, [ecx+eax*2]
    movzx ecx, BYTE PTR [edx+7]
    sub ecx, 48
    jc  lbl2

    lea eax, [eax+eax*4]
    lea eax, [ecx+eax*2]
    movzx ecx, BYTE PTR [edx+8]
    sub ecx, 48
    jc  lbl2

    lea eax, [eax+eax*4]
    lea eax, [ecx+eax*2]
    movzx ecx, BYTE PTR [edx+9]
    sub ecx, 48
    jc  lbl2

    lea eax, [eax+eax*4]
    lea eax, [ecx+eax*2]
    movzx ecx, BYTE PTR [edx+10]
    sub ecx, 48
    jc  lbl2

    lea eax, [eax+eax*4]
    lea eax, [ecx+eax*2]
    movzx ecx, BYTE PTR [edx+11]
    sub ecx, 48
    jc  lbl2

  lbl2:
    neg eax
    ret 4

; ****************************************************

  lb0:
    xor eax, eax
    sub ecx, 48
    jc  lb2

  lb1:
    lea eax, [eax+eax*4]
    lea eax, [ecx+eax*2]
    movzx ecx, BYTE PTR [edx+1]
    sub ecx, 48
    jc  lb2

    lea eax, [eax+eax*4]
    lea eax, [ecx+eax*2]
    movzx ecx, BYTE PTR [edx+2]
    sub ecx, 48
    jc  lb2

    lea eax, [eax+eax*4]
    lea eax, [ecx+eax*2]
    movzx ecx, BYTE PTR [edx+3]
    sub ecx, 48
    jc  lb2

    lea eax, [eax+eax*4]
    lea eax, [ecx+eax*2]
    movzx ecx, BYTE PTR [edx+4]
    sub ecx, 48
    jc  lb2

    lea eax, [eax+eax*4]
    lea eax, [ecx+eax*2]
    movzx ecx, BYTE PTR [edx+5]
    sub ecx, 48
    jc  lb2

    lea eax, [eax+eax*4]
    lea eax, [ecx+eax*2]
    movzx ecx, BYTE PTR [edx+6]
    sub ecx, 48
    jc  lb2

    lea eax, [eax+eax*4]
    lea eax, [ecx+eax*2]
    movzx ecx, BYTE PTR [edx+7]
    sub ecx, 48
    jc  lb2

    lea eax, [eax+eax*4]
    lea eax, [ecx+eax*2]
    movzx ecx, BYTE PTR [edx+8]
    sub ecx, 48
    jc  lb2

    lea eax, [eax+eax*4]
    lea eax, [ecx+eax*2]
    movzx ecx, BYTE PTR [edx+9]
    sub ecx, 48
    jc  lb2

    lea eax, [eax+eax*4]
    lea eax, [ecx+eax*2]
    movzx ecx, BYTE PTR [edx+10]
    sub ecx, 48
    jc  lb2

    lea eax, [eax+eax*4]
    lea eax, [ecx+eax*2]
    movzx ecx, BYTE PTR [edx+11]
    sub ecx, 48
    jc  lb2

  lb2:
    ret 4

atol endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

lingo

As I await... it is a next proof that your program is just a garbage.
Without any change in the code, we see a very big differences of your and qWord's algos.
Wow, I forgot... it is due to the "super real time"....  :lol
Timing result
-------------
prev. last
328 327  atol
               atolsL
266 234  atol_2
328 296  atol
               atolsL
265  234 atol_2
328  296 atol
               atolsL
266  234 atol_2
328  297 atol
               atolsL
266 234  atol_2
Press any key to continue ...


"LATER : here is a double unrolled by 10 version for you to play with, try it out in your preferred test bed so we can see what you want to do with it."

You use Antariy's code from link, so I reject to test it because I don't want to lose my sunny weekend for nonsense.
Bye. :toothy

dedndave

 :lol
"it's my ball, and i am taking it home with me - i don't wanna play anymore"

Rockoon

I dont understand why its so hard to understand that the only real measure of importance is real task.

I've argued against test benches already, but i'll try to narrow the foxus just a bit. In the atou() performance testing bench, (A) the strings were always in L1, and in fact (B) were always the exact same strings in the exact same locations in memory. Still further, the last generation of the bench I have is even more worthless because (C) the strings were aligned by 16. Finally, the bench assumes 10 digit strings are just as likely as 3 digit strings, and 2 digit strings arent even tested for.

    dalg equ <16>                   ; change data alignment here

    .data
      align dalg
      itm1 db "4294967295",0
      align dalg
      itm2 db "3948593",0
      align dalg
      itm3 db "293",0
      align dalg
      itm4 db "0",0

The last thing string work is, is nice clean neat-and-tidy stuff like this. I can swallow several of these considerations together, but not all four of them and will never accept (B)

At the very least if the bench is to be even remotely realistic to a performance critical scenario, it must actually have millions of numeric strings as an input. This isnt debatable. There are cache effects, something that isnt sampled at all by the bench.
When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

hutch--

 :bg

Lingo,

Now I am offended, I bothered to do the double unroll from the existing "atou" because both qword's version and yours were expanded in that manner and the idea was to test algos of similar sizes, not long ones against short ones. Lets hope that failing to catch up to qword's code does not spoil your weekend.  :P

Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

cork

This is better than a cat fight. Way better!

jj2007

Quote from: Rockoon on August 14, 2010, 03:37:09 PMI dont understand why its so hard to understand that the only real measure of importance is real task.

You exaggerate a little bit. This is like saying "lab tests are useless, all substances must be tested with real human beings".

Quote
At the very least if the bench is to be even remotely realistic to a performance critical scenario, it must actually have millions of numeric strings as an input. This isnt debatable. There are cache effects, something that isnt sampled at all by the bench.

That is feasible, but the only algo that returns # of bytes scanned in was posted here and discussed here. For all others, you would have to set up an array of pointers - not very realistic :bg