The range of atou algos have been good for doing unsigned conversions of the ascii data, we do need a fast one for signed conversion as well. This is the one I could find in my bits and pieces, any extra fast ones would be appreciated.
:toothy
Note:Due to Hutch's testing program is very memory usage sensitive please close all other programs and run the test minimum 3 times. Get the best values. Thanks and don't trust to Hutch and to his "proven garbage" testing programs. :U
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz (SSE4)
C:\8>batol2
343 atol
203 atolsLingo
218 atol_2qword
344 atol
202 atolsLingo
218 atol_2qword
343 atol
203 atolsLingo
218 atol_2qword
344 atol
202 atolsLingo
219 atol_2qword
343 ms average atol
202 ms average atolsLingo
218 ms average atol_2qWord
Press any key to continue ...
AMD Turion(tm) 64 Mobile Technology ML-30 (SSE3)
1047 atol
641 atolsLingo
766 atol_2qword
1000 atol
656 atolsLingo
797 atol_2qword
1078 atol
672 atolsLingo
781 atol_2qword
1047 atol
656 atolsLingo
750 atol_2qword
1043 ms average atol
656 ms average atolsLingo
773 ms average atol_2qWord
Press any key to continue ...
P.S: qWord's algo included
1047 atol
734 atolsLingo
1156 atol
891 atolsLingo
1172 atol
734 atolsLingo
1047 atol
734 atolsLingo
1105 ms average atol
773 ms average atolsLingo
Press any key to continue ...
very similarly to lingo ones:
align 16
atol_2 proc pszSrc: ptr BYTE
mov edx,[esp+4] ;pszSrc
movzx ecx,BYTE ptr [edx]
test ecx,ecx
jz @err
cmp ecx,'-'
jz @2
lea eax,[ecx-'0']
cntr = 1
REPEAT 9
movzx ecx,BYTE ptr [edx+cntr]
test ecx,ecx
@@: jnz @F
ret 4
@@:
lea eax,[eax*4+eax]
lea eax,[eax*2+ecx-'0']
cntr = cntr + 1
ENDM
ret 4
align 16
@2:
movzx ecx,BYTE ptr [edx+1]
test ecx,ecx
jz @fin2
lea eax,[ecx-'0']
cntr = 2
REPEAT 9
movzx ecx,BYTE ptr [edx+cntr]
test ecx,ecx
jz @fin2
lea eax,[eax*4+eax]
lea eax,[eax*2+ecx-'0']
cntr = cntr + 1
ENDM
;align 16
@fin2:
neg eax
ret 4
align 16
@err:
ret 4
atol_2 endp
qWord
I added qword's algo and unrolled the first one I posted. A couple of minor tweaks to try and get the test bed more consistent. qword's algo is clearly faster on this Core2 quad, my unrolled version is about the same speed as Lingo's version that was a lot faster than my original.
-------------
Timing result
-------------
328 atol
328 atolsL
266 atol_2
328 atol
328 atolsL
265 atol_2
328 atol
328 atolsL
266 atol_2
328 atol
328 atolsL
266 atol_2
Press any key to continue ...
Celeron M:
906 atol
844 atolsL
734 atol_2
"A couple of minor tweaks to try and get the test bed more consistent"
or
Don't trust to Hutch. It is a rough manipulation.
Every time when Hutch has a problem with optimization of his algos he "create" new "testing program" to "prove" that his program is faster.
With his last testing program-masterpiece (testbed1.zip) the results are very rough, heavily biased and distorted.
With his older masterpiece (batol2.zip) the results are rough and biased too.
I included Hutch's new algo in his "old" masterpiece (batol3.zip) and just for this reason the results of all algos are heavily distorted.(see bellow)
Hutch has a similar problem in the previous thread, named from him problem of "code placement".
Hence, for me all Hutch's testing programs are still "proven in the practice garbage" and I reject to use them and if he includes my algos in them don't trust him.
For me it is just a manipulation to "prove" that his program is "faster".
I will continue to use MichelW-JJ testing program in parallel. Everyone who wants to be manipulated must use Hutch's testing programs otherwise please use MichelW-JJ testing program.
It is not perfect but the results are not so rough and heavily biased. I'm sure most of the people here have a knowledge to found the true too. :toothy
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz (SSE4)
C:\8>batol3
297 atol
219 atolsLingo
234 atol_2qword
281 atol
219 atolsLingo
250 atol_2qword
281 atol
219 atolsLingo
250 atol_2qword
281 atol
219 atolsLingo
250 atol_2qword
285 ms average atol
219 ms average atolsLingo
246 ms average atol_2qWord
Press any key to continue ...
And results from Michel-JJ testing program:
C:\8>atolmj1
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz (SSE4)
183 cycles for 10*atolsL
269 cycles for 10*atol_2
286 cycles for 10*atol
183 cycles for 10*atolsL
269 cycles for 10*atol_2
282 cycles for 10*atol
Code size:
339 bytes for atolsL
355 bytes for atol_2
135 bytes for atol
--- ok ---
Lingo,
Now you are worrying me, the fastest algo on my Core2 box was qwords, not mine.
328 atol
328 atolsL
266 atol_2
Now I am surprised that someone of your experience does not know about variations based on code OFFSET, processor loads before tests etc ....
Here are the timings on my i7, qwords algo is faster there as well.
-------------
Timing result
-------------
297 atol
266 atolsL
234 atol_2
296 atol
265 atolsL
234 atol_2
297 atol
265 atolsL
234 atol_2
296 atol
265 atolsL
266 atol_2
Press any key to continue ...
Don't trust to Hutch if he doesn't use MichelW-JJ testing program. It is a rough manipulation. :toothy
:bg
Must be tough when you cannot "game" the test bed. :bg
:lol ALL algos posted recently ARE manipulations !!! from everybody, because you unroll your algos !!! ask yourself some questions :
for 1 call (the most common use, if i remember well...) :
what sort of algos can obtain a benefit from the cache because of the number of iterations ?
what sort of algos CAN'T obtain a benefit from the cache because there is NO iterations ?
however your results with Hutch/MichaelW test bed show the contrary (because they're all IN the cache due to the loop ), etc... do you think it's normal ?
for several call :
what sort of algo will eject (previously stored) other algos faster than the smaller version ? what sort of algo will reduce the impact of the cache (due to the ratio use/size) what sort of algo has more chance to be in 2 different 4kb page than the smaller one (in terme of slowdown due to code placement, it's difficult to do better), etc...
don't you think that it's time to think your code ?
unrolling is an old optimisation technic, it was efficient years ago for old processor... BUT NOW ?
:bdg guys, remember one thing : if you're not able to evolve, you're condamned to disappear...
"Must be tough when you cannot "game" the test bed.
If anyone add a new algo to other algos in your "testing" programs it becomes impossible to get the same results for other algos.
Example: in my files balot2.zip and balot3.zip the results for my and for qWord's algos are heavily biased, hence invalid.
Everyone can test it to see who is right/wrong. :toothy
NightWare,
"ALL algos posted recently ARE manipulations !!! from everybody, because you unroll your algos !!"
May be I don't understand you well but unrolling is a technic to improve the speed of your algo and it is invented, promoted and recommended by Intel (AMD) (see Optimization Reference Manual->Loop Unrolling). :toothy
The thing is that the performance isnt critical *unless* there is opportunity for the code to be in the caches.
The opposite end of the spectrum isnt 'small+fast', its simply 'smallest'
"The thing is that the performance isnt critical *unless* there is opportunity for the code to be in the caches. "
We don't talk about this.. We talk which testing program to use, Hutch's or MichelW-JJ's! :toothy
i have been wondering if a GUI application shouldn't be employed for some of these tests
maybe we need to stick MichaelW's timing algo into a GUI
just a thought :P
Here is the problem for Lingo, his algorithms best suits a well known test piece that uses an interpreted result and that is what makes it easier to manipulate. Where he comes unstuck is with real time testing where the trickery to "game" the test mechanism does not work.
Like it or lump it qword's algo is simply faster on late model hardware, lingo's is faster than my unrolled version on a i7 but slower on a Core2 quad.
Now there are enough programmers around here who can read assembler code to get through the attempt at propaganda, the test piece runs a nominated processor core, it preloads each algo with the identical code, it runs the same inter-algorithm padding and it runs each algo on the same data so the only problem Lingo has is he cannot fiddle the results, qword's algo is simply faster and by a reasonable amount.
Now this may suit Lingo in trying to be a superstar with the test pieces but real time testing is to determine which algo is fastest in the type of context where it will be used.
Lingo is much better at fast code than fast talking, perhaps he should keep it that way. :bg
Don't trust to Hutch if he doesn't use MichelW-JJ testing program. It is a rough manipulation.[/u]
" it runs the same inter-algorithm padding and it runs each algo on the same data so the only problem Lingo has is he cannot fiddle the results, qword's algo is simply faster and by a reasonable amount."
qword's algo is simply slower with your new "testing" program compared with qword's algo with your old "testing"
program.(everyone can see and compare the "times")
"Where he comes unstuck is with real time testing where the trickery to "game" the test mechanism does not work."
Your "test mechanism" is just a garbage because no one (including you) can't receive two similar results for the
qWord's algo...(see my files batol2.zip and batol3.zip)
"makes it easier to manipulate. "
It is just your bad emotions...will you be so kind to explain to us how to do that? :lol
:bg
Welcome to the real world, that is what real time testing is all about, instead of nice easy comparisons you get them with all of the irritations of real time and this is the only comparison that mattters.
Like it or lump it qword's algo is the fastest by a reasonable amount.
You may desire to be a superstar with benchmarks but what about somewthing useful in real time ?
Don't trust to Hutch if he doesn't use MichelW-JJ testing program. It is a rough manipulation.[/u]
"but what about somewthing useful in real time ?"
What Real/unreal time and blah,bla,blab..when you can't receive two similar results for other's algos with your garbage?
I can't see the answer of my question "will you be so kind to explain to us how to manipulate the MichelW's program?"
...and why you reject it?
Hutch,
All normal programs are not allowed to work without passing the different types of tests.
So it would be better not to use your program before being tested, corrected and approved by people like MichelW, A.Fog etc ...
Otherwise, I will continue to treat it like a manipulating technique with stupid, precipitance and weak defense at any price. :toothy
:bg
Yes yes, we all know you can type with BOLD TEXT but you lack the information behind what you assert. Interpreted test techniques that show your algos as faster than the rest may suit your purpose but they do not demonstrate how the algorithm runs in real time.
Why do timings vary in real time ? (a) Because computers vary in real time, thats why you run a specific set of timings in one test piece at one time. (b) Timings in ring3 vary depending on the OS task switching and Core allocation.
Why do you preload the processor ? Because processors do not normally sit idling while your algo is waiting to run.
Why do you set the core on the processor ? So that the load on the core you are testing is isolated from task switching and variable processor work loads.
RE: Test methods.
The original test bed that Michael designed was intended to time small sections of code. It must make assumptions to do this and in part it does so by testing the bare framework first to time it then tests the algos and does the arithmetic to calculate the difference.
When you feed complerte algos through the technique you have granularity problems and you are making assumptions about machine cycles that have not been valid since the original i486.
Testing in real time has different problems, it has to be run long enough because the granularity is too coarse and it is subject to real time fluctuation from task switching, OS loadings at a higher privelege level etc etc ....
You may aspire to be the master of UNREAL TIME but computers run code in REAL TIME so for all of the problems of testing in real time, the alternative is deluding yourself based on the theories that best suit what you are after.
Now remember to type a response using BOLD TEXT, wild assertions, don't waste your time trying to write an algo as fast as qword's fast version. :P
"don't waste your time trying to write an algo as fast as qword's fast version"
I don't care about your opinion because you always talk about things that you don't understand.
For example: may I see just one your fastest algo with real/unreal testing? :lol
It is sad when someone understand his time for speed optimization is over but..c'est la vie. :(
Yes, qword's algo is very fast ( similar to atou_rock by Rockoon but without ebx register)
I was the first and certainly the last user of your "real time" garbage. In my first file with your garbage (batol1.zip) my algo was different with equal time with qword's algo.
After some modifications the result did change for me.(203/218) (see my file bato2.zip) with your "old" garbage. Just don't be so emotional and read again prev. posts.
When you saw that you did modify your "old " garbage("real" time) in your "new" garbage^2 ("super real time"^2) (in your testbed1.zip file) just to manipulate the people with
very rough, heavily biased and distorted results. With your new garbage you can manipulate just some newbies without enough experience.
Hence, don't worry about my time because the results from your "old" garbage and from MichelW-JJ's program are still enough for me. :lol
:bg
> Hence, don't worry about my time because the results from your "old" garbage and from MichelW-JJ's program are still enough for me.
I am pleased that you are so easily pleased !.
Now its a shame that you cannot deliver in real time like qword did with his fast algo. :lol
Now tell us, why are you still assuming the machine cycles mean anything, is it to complex to understand that scheduling is where the action is ?
"Now its a shame that you cannot deliver in real time like qword did with his fast algo."
If I want I can beat everyone independently with or without your "testing" garbage and independently of it's current version. :lol
But the problem is in you again. Hurry up to do modification of you "new" garbage^2 ("super real time"^2) to "hyper new garbage"^3 ("hyper real time"^3)
and continue to manipulate the people. :lol
For me is easy to change some rows and voila...new results: :lol
-------------
Timing result
-------------
312 atol
219 atolsL
265 atol_2
312 atol
218 atolsL
249 atol_2
312 atol
219 atolsL
250 atol_2
312 atol
219 atolsL
250 atol_2
Press any key to continue ...
:bg
Your modified test piece run on my i7.
-------------
Timing result
-------------
327 atol
250 atolsL
234 atol_2
296 atol
249 atolsL
234 atol_2
296 atol
249 atolsL
234 atol_2
297 atol
250 atolsL
234 atol_2
Press any key to continue ...
You are getting faster on the Core2 though. :P
One day you will write code as fast as qword. :bg
PSSSST, sorry I forgot the BOLD TEXT !
LATER : here is a double unrolled by 10 version for you to play with, try it out in your preferred test bed so we can see what you want to do with it.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
iapadd
align 16
atol proc lpSrc:DWORD
mov edx, [esp+4]
movzx ecx, BYTE PTR [edx]
cmp ecx, "-"
jne lb0
movzx ecx, BYTE PTR [edx+1]
add edx, 1
xor eax, eax
mov DWORD PTR [esp+4], 1
sub ecx, 48
jc lbl2
lbl1:
lea eax, [eax+eax*4]
lea eax, [ecx+eax*2]
movzx ecx, BYTE PTR [edx+1]
sub ecx, 48
jc lbl2
lea eax, [eax+eax*4]
lea eax, [ecx+eax*2]
movzx ecx, BYTE PTR [edx+2]
sub ecx, 48
jc lbl2
lea eax, [eax+eax*4]
lea eax, [ecx+eax*2]
movzx ecx, BYTE PTR [edx+3]
sub ecx, 48
jc lbl2
lea eax, [eax+eax*4]
lea eax, [ecx+eax*2]
movzx ecx, BYTE PTR [edx+4]
sub ecx, 48
jc lbl2
lea eax, [eax+eax*4]
lea eax, [ecx+eax*2]
movzx ecx, BYTE PTR [edx+5]
sub ecx, 48
jc lbl2
lea eax, [eax+eax*4]
lea eax, [ecx+eax*2]
movzx ecx, BYTE PTR [edx+6]
sub ecx, 48
jc lbl2
lea eax, [eax+eax*4]
lea eax, [ecx+eax*2]
movzx ecx, BYTE PTR [edx+7]
sub ecx, 48
jc lbl2
lea eax, [eax+eax*4]
lea eax, [ecx+eax*2]
movzx ecx, BYTE PTR [edx+8]
sub ecx, 48
jc lbl2
lea eax, [eax+eax*4]
lea eax, [ecx+eax*2]
movzx ecx, BYTE PTR [edx+9]
sub ecx, 48
jc lbl2
lea eax, [eax+eax*4]
lea eax, [ecx+eax*2]
movzx ecx, BYTE PTR [edx+10]
sub ecx, 48
jc lbl2
lea eax, [eax+eax*4]
lea eax, [ecx+eax*2]
movzx ecx, BYTE PTR [edx+11]
sub ecx, 48
jc lbl2
lbl2:
neg eax
ret 4
; ****************************************************
lb0:
xor eax, eax
sub ecx, 48
jc lb2
lb1:
lea eax, [eax+eax*4]
lea eax, [ecx+eax*2]
movzx ecx, BYTE PTR [edx+1]
sub ecx, 48
jc lb2
lea eax, [eax+eax*4]
lea eax, [ecx+eax*2]
movzx ecx, BYTE PTR [edx+2]
sub ecx, 48
jc lb2
lea eax, [eax+eax*4]
lea eax, [ecx+eax*2]
movzx ecx, BYTE PTR [edx+3]
sub ecx, 48
jc lb2
lea eax, [eax+eax*4]
lea eax, [ecx+eax*2]
movzx ecx, BYTE PTR [edx+4]
sub ecx, 48
jc lb2
lea eax, [eax+eax*4]
lea eax, [ecx+eax*2]
movzx ecx, BYTE PTR [edx+5]
sub ecx, 48
jc lb2
lea eax, [eax+eax*4]
lea eax, [ecx+eax*2]
movzx ecx, BYTE PTR [edx+6]
sub ecx, 48
jc lb2
lea eax, [eax+eax*4]
lea eax, [ecx+eax*2]
movzx ecx, BYTE PTR [edx+7]
sub ecx, 48
jc lb2
lea eax, [eax+eax*4]
lea eax, [ecx+eax*2]
movzx ecx, BYTE PTR [edx+8]
sub ecx, 48
jc lb2
lea eax, [eax+eax*4]
lea eax, [ecx+eax*2]
movzx ecx, BYTE PTR [edx+9]
sub ecx, 48
jc lb2
lea eax, [eax+eax*4]
lea eax, [ecx+eax*2]
movzx ecx, BYTE PTR [edx+10]
sub ecx, 48
jc lb2
lea eax, [eax+eax*4]
lea eax, [ecx+eax*2]
movzx ecx, BYTE PTR [edx+11]
sub ecx, 48
jc lb2
lb2:
ret 4
atol endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
As I await... it is a next proof that your program is just a garbage.
Without any change in the code, we see a very big differences of your and qWord's algos.
Wow, I forgot... it is due to the "super real time".... :lol
Timing result
-------------
prev. last
328 327 atol
atolsL
266 234 atol_2
328 296 atol
atolsL
265 234 atol_2
328 296 atol
atolsL
266 234 atol_2
328 297 atol
atolsL
266 234 atol_2
Press any key to continue ...
"LATER : here is a double unrolled by 10 version for you to play with, try it out in your preferred test bed so we can see what you want to do with it."
You use Antariy's code from link (http://www.masm32.com/board/index.php?topic=14522.15), so I reject to test it because I don't want to lose my sunny weekend for nonsense.
Bye. :toothy
:lol
"it's my ball, and i am taking it home with me - i don't wanna play anymore"
I dont understand why its so hard to understand that the only real measure of importance is real task.
I've argued against test benches already, but i'll try to narrow the foxus just a bit. In the atou() performance testing bench, (A) the strings were always in L1, and in fact (B) were always the exact same strings in the exact same locations in memory. Still further, the last generation of the bench I have is even more worthless because (C) the strings were aligned by 16. Finally, the bench assumes 10 digit strings are just as likely as 3 digit strings, and 2 digit strings arent even tested for.
dalg equ <16> ; change data alignment here
.data
align dalg
itm1 db "4294967295",0
align dalg
itm2 db "3948593",0
align dalg
itm3 db "293",0
align dalg
itm4 db "0",0
The last thing string work is, is nice clean neat-and-tidy stuff like this. I can swallow several of these considerations together, but not all four of them and will never accept (B)
At the very least if the bench is to be even remotely realistic to a performance critical scenario, it must actually have millions of numeric strings as an input. This isnt debatable. There are cache effects, something that isnt sampled at all by the bench.
:bg
Lingo,
Now I am offended, I bothered to do the double unroll from the existing "atou" because both qword's version and yours were expanded in that manner and the idea was to test algos of similar sizes, not long ones against short ones. Lets hope that failing to catch up to qword's code does not spoil your weekend. :P
This is better than a cat fight. Way better!
Quote from: Rockoon on August 14, 2010, 03:37:09 PMI dont understand why its so hard to understand that the only real measure of importance is real task.
You exaggerate a little bit. This is like saying "lab tests are useless, all substances must be tested with real human beings".
Quote
At the very least if the bench is to be even remotely realistic to a performance critical scenario, it must actually have millions of numeric strings as an input. This isnt debatable. There are cache effects, something that isnt sampled at all by the bench.
That is feasible, but the only algo that returns # of bytes scanned in was posted here (http://www.masm32.com/board/index.php?topic=14438.msg117055#msg117055) and discussed here (http://www.masm32.com/board/index.php?topic=14438.msg117108#msg117108). For all others, you would have to set up an array of pointers - not very realistic :bg
I agree with Rockoon to this extent that the final test of any code design is how well it performs a specific task but this does not particularly help you when you are designing components to perform the task. The item being tested here is the conversion algorithm of an ascii representation of a signed number to a signed DWORD and to add another range of variable to it that is specific to one particular task would remove the refrence to any other particular task.
Now test bed design is not without its problems in that there must be some means of associating the test mechanism to the end task usage. I have always thought that the test technique that Michael developed and that JJ has further refined does something that is very hard to do in any other way, the capacity to test small sections of code against other small sections of code for purposes of comparison but I also know that like any other technique it has its limitations and I have seen these limitations when the timings start to get very low.
I have a background in testing code in real time as it more closely reflects how code is used in applications. Now this introduces another set of problems and getting consistent results is not without its difficulties either. Any ring3 application suffers from timing fluctuations due to higher privilege level interference then in a multitasking context and on a multicore processor task and core switching introduces yet another range of variables, prior core loading is another variable and for reasons that I don't claim to have quantified all that well, code OFFSET even when aligned often effects the speed of a particular algorithm and this is evident when you add another algo to test and it messes up the timings of ones already in the test bed.
The layout of the data is yet another variable which effects one of two alternate conditions, linear addressed data with all of its cached advantages versus random memory access with page thrashing disadvantages and the problem here is to differentiate between memory restrictions versus algorithm speed with data passed to it.
What I have tried to do with the testbed types I have posted here is reduce the range of variables while retaining the real time conditions, core selection, preload the core, adjust the inter algorithm padding and space the core loading with timed pauses between algo tests and let each algo set its own alignment as this is yet another variable from algo to algo.
What I have not tracked down is why some code changes timing depending on its OFFSET and the code placed both before and after it. Some algos are relatively insensitive to it where other fluctuate very badly when other code is added to the test piece.
Wow, so much writing and no one believes you and no one wants to try your "testing "garbage... :lol
"it's my ball, and i am taking it home with me - i don't wanna play anymore"
Try to catch as a tester dedndave due to he has nothing to do now and it seems his wife has left him because
his computer is very old and doesn't do the job anymore... :lol
:bg
Awe,
I did not think you would ruin your weekend just because gword writes fast algos. :P
lol lingo
Zara and i are stuck together like glue
i bet your wife has little use for you
let's face it - how badly does she need fast code ?
and.....no matter how fast your code is, you have the personality of a slug
:bg
Now come on Dave, don't be too hard on him, once his wife gets a new i7 she may even let him use it. Then he may be able to write algos as fast as gword. :P
"let's face it - how badly does she need fast code ?"
When necessary I can write pretty slow and continuous code, still... :lol
"you have the personality of a slug"
I do not understand what that means because my specialty is high-voltage generators and transformers
rather then to be a specialist in different species of snails. :lol
"Zara and i are stuck together like glue"
Really? Interesting, if she is on the forum continually, as you, then who will feed the family and pets..
and why you replaced her photo with this ugly slug or snail...sorry, but I'm not a specialist like you in this area... :lol
"once his wife gets a new i7 she may even let him use it.
Highly doubt in it because she is a database administrator and I still hesitate because my car wants new winter tires too... :(
"Then he may be able to write algos as fast as gword."
Will be better to ask gWord for his opinion rather than to lying yourself and others with your garbage. :lol
:bg
> I still hesitate because my car wants new winter tires too...
I think we all know this problem so you are excused for not writing code as fast as gWord until your wife buys an i7 and lets you use it. :P
.....and your car has new tires
we all know how that can slow a person down
I have tried this one and to me is hard. I'm posting the code here so anybody can do it better(like align) i think, but at least, this is better if cmp to original.
align 16
atol_m proc String:DWORD
pop eax
pop edx
push eax
xor ecx,ecx
xor eax,eax
xor al,[edx+ecx+0]
je @erro
xor eax,"-"
push eax
je @F
xor eax,0000001dh
@@:
xor cl,[ecx+edx+1]
je @F
lea eax,[eax+eax*4]
lea eax,[ecx+eax*2-30h]
xor cl,cl
or cl,[ecx+edx+2]
je @F
lea eax,[eax+eax*4]
lea eax,[ecx+eax*2-30h]
xor ecx,ecx
or cl,[ecx+edx+3]
je @F
lea eax,[eax+eax*4]
lea eax,[ecx+eax*2-30h]
xor cl,cl
xor cl,[ecx+edx+4]
je @F
lea eax,[eax+eax*4]
lea eax,[ecx+eax*2-30h]
xor ecx,ecx
xor cl,[ecx+edx+5]
je @F
lea eax,[eax+eax*4]
lea eax,[ecx+eax*2-30h]
xor cl,cl
xor cl,[ecx+edx+6]
je @F
lea eax,[eax+eax*4]
lea eax,[ecx+eax*2-30h]
xor ecx,ecx
xor cl,[ecx+edx+7]
je @F
lea eax,[eax+eax*4]
lea eax,[ecx+eax*2-30h]
xor ecx,ecx
xor cl,[ecx+edx+8]
je @F
lea eax,[eax+eax*4]
lea eax,[ecx+eax*2-30h]
xor ecx,ecx
xor cl,[ecx+edx+9]
je @F
lea eax,[eax+eax*4]
lea eax,[ecx+eax*2-30h]
xor ecx,ecx
xor cl,[ecx+edx+10]
je @F
lea eax,[eax+eax*4]
lea eax,[ecx+eax*2-30h]
pop ecx
neg eax
ret
@@:
pop ecx
or ecx,ecx
ja @F
neg eax
@@:
@erro:
ret
nop
atol_m endp
I have a question: is the number "2147483648" a bug??? Asking this because in these algos (including mine) it returns 80000000h, and this number it's not possible.
regards.
For a signed dword the decimal value 2147483648 is outside the valid range, -2147483648 to 2147483647.
Quote from: MichaelW on August 19, 2010, 02:52:55 AM
For a signed dword the decimal value 2147483648 is outside the valid range, -2147483648 to 2147483647.
thank you for the answer sr MichaelW, well, ...
zero == minus zero ok
nothing == minus nothing == zero ok (so, this is why about the question)
In these algos, 2147483648 == - 2147483648, these procedures here supose that this number have a signal, while it don't have.
regards.
align 4
atol_m2 proc String:DWORD
pop ebx ;to jmp ebx
pop edx
movzx eax,byte ptr [edx]
cmp eax,"-"
je @neg
jb @F
movzx ecx,byte ptr [edx+1]
test ecx,ecx
je @ssub30
lea eax, [eax+eax*4] ;lea eax, [eax+eax*4-30h*5]
lea eax, [ecx+eax*2-30h*11] ;lea eax, [ecx+eax*2-30h]
movzx ecx, byte ptr [edx+2]
test ecx,ecx
je @F
lea eax,[eax+eax*4]
lea eax,[ecx+eax*2-30h]
movzx ecx, byte ptr [edx+3]
test ecx,ecx ;cmp ecx,?? slow results in my pc
je @F
lea eax,[eax+eax*4]
lea eax,[ecx+eax*2-30h]
movzx ecx, byte ptr [edx+4]
test ecx,ecx
je @F
lea eax,[eax+eax*4]
lea eax,[ecx+eax*2-30h]
movzx ecx, byte ptr [edx+5]
test ecx,ecx
je @F
lea eax,[eax+eax*4]
lea eax,[ecx+eax*2-30h]
movzx ecx, byte ptr [edx+6]
test ecx,ecx
je @F
lea eax,[eax+eax*4]
lea eax,[ecx+eax*2-30h]
movzx ecx, byte ptr [edx+7]
test ecx,ecx
je @F
lea eax,[eax+eax*4] ;lea eax,[eax+eax*4]
lea eax,[ecx+eax*2-30h] ;lea eax,[ecx+eax*2-30h]
movzx ecx, byte ptr [edx+8] ;movzx ecx, byte ptr [edx+8]
test ecx,ecx ;jecxz @F ;is slow in my pc
je @F ;
lea eax,[eax+eax*4]
lea eax,[ecx+eax*2-30h]
movzx ecx, byte ptr [edx+9]
test ecx,ecx
je @F
lea eax,[eax+eax*4]
lea eax,[ecx+eax*2-30h]
@@:
jmp ebx ;dword ptr [esp-2*4]
@ssub30:
xor eax,30h ;lea eax,[eax-30h]
jmp ebx ;dword ptr [esp-2*4]
align 4
@neg:
movzx eax, byte ptr [edx+1]
test eax,eax
je @F
movzx ecx, byte ptr [edx+2]
test ecx,ecx
je @sub30
lea eax, [eax+eax*4]
lea eax, [ecx+eax*2-30h*11]
movzx ecx, byte ptr [edx+3]
test ecx,ecx
je @F
lea eax,[eax+eax*4]
lea eax,[ecx+eax*2-30h]
movzx ecx, byte ptr [edx+4]
test ecx,ecx
je @F
lea eax,[eax+eax*4]
lea eax,[ecx+eax*2-30h]
movzx ecx, byte ptr [edx+5]
test ecx,ecx
je @F
lea eax,[eax+eax*4]
lea eax,[ecx+eax*2-30h]
movzx ecx, byte ptr [edx+6]
test ecx,ecx
je @F
lea eax,[eax+eax*4]
lea eax,[ecx+eax*2-30h]
movzx ecx, byte ptr [edx+7]
test ecx,ecx
je @F
lea eax,[eax+eax*4]
lea eax,[ecx+eax*2-30h]
movzx ecx, byte ptr [edx+8]
test ecx,ecx
je @F
lea eax,[eax+eax*4]
lea eax,[ecx+eax*2-30h]
movzx ecx, byte ptr [edx+9]
test ecx,ecx
je @F
lea eax,[eax+eax*4]
lea eax,[ecx+eax*2-30h]
movzx ecx, byte ptr [edx+10]
test ecx,ecx
je @F
lea eax,[eax+eax*4]
lea eax,[ecx+eax*2-30h]
@@:
neg eax
@fim:
jmp ebx ;dword ptr [esp-2*4]
@sub30:
xor eax,30h ;lea eax,[eax-30h]
neg eax
jmp ebx ;dword ptr [esp-2*4]
atol_m2 endp