News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Looking for better "atol" algo.

Started by hutch--, August 12, 2010, 01:43:36 AM

Previous topic - Next topic

hutch--

The range of atou algos have been good for doing unsigned conversions of the ascii data, we do need a fast one for signed conversion as well. This is the one I could find in my bits and pieces, any extra fast ones would be appreciated.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

lingo

#1
 :toothy
Note:Due to Hutch's testing program is very memory usage sensitive please close all other programs and run the test minimum 3 times. Get the best values. Thanks and don't trust to Hutch and to his "proven garbage" testing programs.  :U 
Intel(R) Core(TM)2 Duo CPU     E8500  @ 3.16GHz (SSE4)
C:\8>batol2
343 atol
203 atolsLingo
218 atol_2qword
344 atol
202 atolsLingo
218 atol_2qword
343 atol
203 atolsLingo
218 atol_2qword
344 atol
202 atolsLingo
219 atol_2qword


343 ms average atol
202 ms average atolsLingo
218 ms average atol_2qWord
Press any key to continue ...

AMD Turion(tm) 64 Mobile Technology ML-30 (SSE3)
1047 atol
641 atolsLingo
766 atol_2qword
1000 atol
656 atolsLingo
797 atol_2qword
1078 atol
672 atolsLingo
781 atol_2qword
1047 atol
656 atolsLingo
750 atol_2qword


1043 ms average atol
656 ms average atolsLingo
773 ms average atol_2qWord
Press any key to continue ...


P.S: qWord's algo included

ecube


1047 atol
734 atolsLingo
1156 atol
891 atolsLingo
1172 atol
734 atolsLingo
1047 atol
734 atolsLingo


1105 ms average atol
773 ms average atolsLingo
Press any key to continue ...

qWord

very similarly to lingo ones:
align 16
atol_2 proc pszSrc: ptr BYTE

mov edx,[esp+4] ;pszSrc
movzx ecx,BYTE ptr [edx]
test ecx,ecx
jz @err
cmp ecx,'-'
jz @2
lea eax,[ecx-'0']
cntr = 1
REPEAT 9
movzx ecx,BYTE ptr [edx+cntr]
test ecx,ecx
@@: jnz @F
ret 4
@@:
lea eax,[eax*4+eax]
lea eax,[eax*2+ecx-'0']
cntr = cntr + 1
ENDM
ret 4

align 16
@2:
movzx ecx,BYTE ptr [edx+1]
test ecx,ecx
jz @fin2
lea eax,[ecx-'0']
cntr = 2
REPEAT 9
movzx ecx,BYTE ptr [edx+cntr]
test ecx,ecx
jz @fin2
lea eax,[eax*4+eax]
lea eax,[eax*2+ecx-'0']
cntr = cntr + 1
ENDM
;align 16
@fin2:
neg eax
ret 4

align 16
@err:
ret 4

atol_2 endp

qWord
FPU in a trice: SmplMath
It's that simple!

hutch--

I added qword's algo and unrolled the first one I posted. A couple of minor tweaks to try and get the test bed more consistent. qword's algo is clearly faster on this Core2 quad, my unrolled version is about the same speed as Lingo's version that was a lot faster than my original.


-------------
Timing result
-------------
328 atol
328 atolsL
266 atol_2
328 atol
328 atolsL
265 atol_2
328 atol
328 atolsL
266 atol_2
328 atol
328 atolsL
266 atol_2
Press any key to continue ...
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

Celeron M:
906 atol
844 atolsL
734 atol_2

lingo

"A couple of minor tweaks to try and get the test bed more consistent"

or
Don't trust to Hutch. It is a rough manipulation.

Every time when Hutch has a problem with optimization of his algos he "create" new "testing program" to "prove" that his program is faster.
With his last testing program-masterpiece (testbed1.zip) the results are very rough, heavily biased and distorted.
With his older masterpiece (batol2.zip) the results are rough and biased too.
I included Hutch's new algo in his "old" masterpiece (batol3.zip) and just for this reason the results of all algos are heavily distorted.(see bellow)
Hutch has a similar problem in the previous thread, named  from him problem of "code placement".
Hence, for me all Hutch's testing programs are still "proven in the practice garbage" and I reject to use them and if he includes my algos in them don't trust him.
For me it is just a manipulation to "prove" that his program is "faster".
I will continue to use MichelW-JJ testing program in parallel. Everyone who wants to be manipulated must use Hutch's testing programs otherwise please use MichelW-JJ testing program.
It is not perfect but the results are not so rough and heavily biased. I'm sure most of the people here have a knowledge to found the true too. :toothy
Intel(R) Core(TM)2 Duo CPU     E8500  @ 3.16GHz (SSE4)
C:\8>batol3
297 atol
219 atolsLingo
234 atol_2qword
281 atol
219 atolsLingo
250 atol_2qword
281 atol
219 atolsLingo
250 atol_2qword
281 atol
219 atolsLingo
250 atol_2qword


285 ms average atol
219 ms average atolsLingo
246 ms average atol_2qWord
Press any key to continue ...

And results from Michel-JJ testing program:

C:\8>atolmj1
Intel(R) Core(TM)2 Duo CPU     E8500  @ 3.16GHz (SSE4)
183     cycles for 10*atolsL
269     cycles for 10*atol_2
286     cycles for 10*atol

183     cycles for 10*atolsL
269     cycles for 10*atol_2
282     cycles for 10*atol

Code size:
339      bytes for atolsL
355      bytes for atol_2
135      bytes for atol

--- ok ---


hutch--

Lingo,

Now you are worrying me, the fastest algo on my Core2 box was qwords, not mine.


328 atol
328 atolsL
266 atol_2


Now I am surprised that someone of your experience does not know about variations based on code OFFSET, processor loads before tests etc ....

Here are the timings on my i7, qwords algo is faster there as well.


-------------
Timing result
-------------
297 atol
266 atolsL
234 atol_2
296 atol
265 atolsL
234 atol_2
297 atol
265 atolsL
234 atol_2
296 atol
265 atolsL
266 atol_2
Press any key to continue ...
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

lingo

Don't trust to Hutch if he doesn't use MichelW-JJ testing program. It is a rough manipulation. :toothy

hutch--

 :bg

Must be tough when you cannot "game" the test bed.  :bg
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

NightWare

 :lol ALL algos posted recently ARE manipulations !!! from everybody, because you unroll your algos !!! ask yourself some questions :

for 1 call (the most common use, if i remember well...) :
what sort of algos can obtain a benefit from the cache because of the number of iterations ?
what sort of algos CAN'T obtain a benefit from the cache because there is NO iterations ?
however your results with Hutch/MichaelW test bed show the contrary (because they're all IN the cache due to the loop ), etc... do you think it's normal ?

for several call :
what sort of algo will eject (previously stored) other algos faster than the smaller version ? what sort of algo will reduce the impact of the cache (due to the ratio use/size) what sort of algo has more chance to be in 2 different 4kb page than the smaller one (in terme of slowdown due to code placement, it's difficult to do better), etc...

don't you think that it's time to think your code ?
unrolling is an old optimisation technic, it was efficient years ago for old processor... BUT NOW ?

:bdg guys, remember one thing : if you're not able to evolve, you're condamned to disappear...

lingo

"Must be tough when you cannot "game" the test bed.

If anyone add a new algo to other algos in your "testing" programs it becomes impossible to get the same results for other algos.
Example: in  my files balot2.zip and balot3.zip the results for my  and for qWord's algos are heavily biased, hence invalid.
Everyone can test it to see who is right/wrong. :toothy


NightWare,

"ALL algos posted recently ARE manipulations !!! from everybody, because you unroll your algos !!"

May be I don't understand you well but unrolling is a technic to improve the speed of your algo and it is invented, promoted and recommended by Intel (AMD) (see Optimization Reference Manual->Loop Unrolling). :toothy

Rockoon

The thing is that the performance isnt critical *unless* there is opportunity for the code to be in the caches.

The opposite end of the spectrum isnt 'small+fast', its simply 'smallest'

When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

lingo

"The thing is that the performance isnt critical *unless* there is opportunity for the code to be in the caches. "

We don't talk about this.. We talk which testing program to use, Hutch's or MichelW-JJ's! :toothy

dedndave

i have been wondering if a GUI application shouldn't be employed for some of these tests
maybe we need to stick MichaelW's timing algo into a GUI
just a thought   :P