News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Replacement for atodw and atodw_ex test pieces.

Started by hutch--, July 31, 2010, 11:24:17 AM

Previous topic - Next topic

hutch--

The originals of these algos were intended to be ascii/ansi to unsigned DWORD. Here is a test piece that tests the old atodw against two later versions, a short version and a long version that is unrolled to pick up some speed.

The test piece is to determine relative speeds of the 3 algos. It tests the 3 algos for correctness then benchmarks them for comparison purposes.

These are the results on this Core2 quad.


atodw Version

4294967295
987
0
9876

Short Version

4294967295
987
0
9876

Long Version

4294967295
987
0
9876
-------
Timings
-------

Timing atodw version
312

Timing short version
266

Timing long version
109

Press any key to continue ...
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Twister

What are these measurements in? Clocks or (milli/nano)seconds?

hutch--

if you need such measurements feel free to write your own. his benchmark is specifically written to run in real time and reference the earlier version that will be replaced.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave

prescott w/htt:
Timing atodw version
344

Timing short version
266

Timing long version
250

had to change because of throttle

hutch--

Dave,

It looks like the Prescott does not like the paired LEA instructions which is probably normal.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

MichaelW

Here are the timings for my P3:

Timing atodw version
2834

Timing short version
1482

Timing long version
981


I also tested all 32-bit values, using crt__ultoa to generate the strings, and found no problems.
eschew obfuscation

dedndave

QuoteDave,

It looks like the Prescott does not like the paired LEA instructions which is probably normal.

the long version still has a pay-off for non-P4 machines
at least it isn't slower than the short version on the P4   :bg

GregL

Core 2 Duo  2.0 GHz

atodw Version

4294967295
987
0
9876

Short Version

4294967295
987
0
9876

Long Version

4294967295
987
0
9876
-------
Timings
-------

Timing atodw version
500

Timing short version
390

Timing long version
172

Press any key to continue ...

rags

AMD Athlon II x2 215 2.7Ghz


atodw Version

4294967295
987
0
9876

Short Version

4294967295
987
0
9876

Long Version

4294967295
987
0
9876
-------
Timings
-------

Timing atodw version
296

Timing short version
187

Timing long version
171

God made Man, but the monkey applied the glue -DEVO

KeepingRealBusy

Hutch, or anyone, I have a question about libraries.

When you statically or dynamically link to the library, doesn't the entire library become mapped into your virtual space? I mean, it has to be mapped, otherwise you could not "call" and mapped function, right? The loader does not load pieces of the library and skip the rest.  It would be different if you had individual .obj files assembled and you linked only the ones that you needed.

If that is the case, is there any real reason two have two functions in the library that do the same thing? Since Hutch's sample has two functions that do the same conversion, and the unrolled one is the fastest, why use anything else? Why keep the short slow one in the library?

Hutch,

Several things about your sample.

1.    I tried 17 different ways to do this faster using the decade table (my own code in my own library matched this algo, two leas, process the string once). I could beat atodw, but not the other two. OBTW, this same method of two leas can be used for hex conversions once the character has been converted to a numeric 0-15 by using lea eax,[eax*4] and lea eax,[ecx+eax*4].

2.    You have no checking for invalid characters other than the null. If you checked and terminated the conversion when a non numeric was found, then the function could be used for processing a string containing multiple values, i.e., "1234 5678 9124", "12:30:05" "08/02/2010".

3.    In the long version, you return an error condition in ecx. What about returning the converted value in eax, and returning  in  the pointer to the character that stopped the scan in edx, and use the following encoding for ecx:

        -2    invalid numeric character > '9'
        -1    invalid numeric character < '0'
         0    valid conversion (edx points to the null)
        +1   valid conversion but more than 9 characters (00000000000001)
        +2   invalid conversion exceeded 32 bits (500000000)

By returning the error flag and the terminating character pointer, the caller can determine whether to continue on to the next piece of the string.

Dave.

frktons

Here it is my test on Core 2 Duo 2.4 Ghz x64 bit:

atodw Version

4294967295
987
0
9876

Short Version

4294967295
987
0
9876

Long Version

4294967295
987
0
9876
-------
Timings
-------

Timing atodw version
390

Timing short version
296

Timing long version
109

Press any key to continue ...



Mind is like a parachute. You know what to do in order to use it :-)

jj2007

Celeron M:
Timing atodw version
625

Timing short version
532

Timing long version
281

GregL

Quote from: KeepingRealBusyWhen you statically or dynamically link to the library, doesn't the entire library become mapped into your virtual space? I mean, it has to be mapped, otherwise you could not "call" and mapped function, right? The loader does not load pieces of the library and skip the rest.  It would be different if you had individual .obj files assembled and you linked only the ones that you needed.

If the library is built from individual .obj files for each procedure, which is how the masm32 libraries are done,  then when you link you only get the individual procedures that you use mapped into your program space and none of the others.

KeepingRealBusy

Greg,

Thank you. That is good to know. What about DLL's, especially user and system?

GregL

If your program loads a DLL, you get the whole thing mapped into your program space.