News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Benchmark and test for htodw algos.

Started by hutch--, August 03, 2010, 07:05:52 AM

Previous topic - Next topic

hutch--

Here is the later benchmark. It tests the algos on 1 million random hex strings of variable length. As with an earlier benchmark, run the batch file first to build the test file of hex numbers. Once it is built you can run BM without recreating the test file.

Here are the times I am getting, Lingo's algo is slightly faster on the Core2 and i7 where Alex's long algo is clearly faster on 2 generations of P4 and the antique Celeron.


Intel(R) Core(TM)2 Quad CPU Q9650 @ 3.00GHz
2055 MS library htodw average
750 MS Alex_Short average
629 MS lingo_htodw average
664 MS Alex_Long average
734 MS clive_htodw average

Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz
2039 MS library htodw average
733 MS Alex_Short average
581 MS lingo_htodw average
592 MS Alex_Long average
733 MS clive_htodw average

Prescott Core P4
Genuine Intel(R) CPU 3.80GHz
3121 MS library htodw average
1218 MS Alex_Short average
1047 MS lingo_htodw average
984 MS Alex_Long average
1265 MS clive_htodw average

Northwood Core P4
Intel(R) Pentium(R) 4 CPU 2.80GHz
3511 MS library htodw average
1219 MS Alex_Short average
1140 MS lingo_htodw average
968 MS Alex_Long average
1355 MS clive_htodw average

Intel(R) Celeron(TM) CPU 1200MHz
8372 MS library htodw average
4737 MS Alex_Short average
4444 MS lingo_htodw average
4196 MS Alex_Long average
4977 MS clive_htodw average

Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

mineiro

Writing 1000000 HEX strings to file
..............................................
...................Done
Intel(R) Pentium(R) Dual CPU E2160 @ 1.80GHz
1000000 = item count in file


3421 ms library htodw
1079 ms Alex_Long
1203 ms Alex_Short
1062 ms lingo_htodw
1172 ms clive_htodw


3438 ms library htodw
1093 ms Alex_Long
1235 ms Alex_Short
1063 ms lingo_htodw
1172 ms clive_htodw


3468 ms library htodw
1079 ms Alex_Long
1203 ms Alex_Short
1062 ms lingo_htodw
1172 ms clive_htodw


3422 ms library htodw
1094 ms Alex_Long
1250 ms Alex_Short
1062 ms lingo_htodw
1172 ms clive_htodw


3437 MS library htodw average
1222 MS Alex_Short average
1062 MS lingo_htodw average
1086 MS Alex_Long average
1172 MS clive_htodw average


Press any key to continue ...

lingo

mineiro, These results are invalid due to:
Invalid bm.exe file -> no .data section in it... :lol
Can't create new  bm.exe file from bm.asm ->error
"Assembling: bm.asm
bm.asm(106) : error A2006:undefined symbol : ltok"

Sorry...

Rockoon

Writing 1000000 HEX strings to file
................................................................................
...................Done
Cannot Identify x86 Processor
1000000 = item count in file


1653 ms library htodw
577 ms Alex_Long
702 ms Alex_Short
499 ms lingo_htodw
702 ms clive_htodw


1669 ms library htodw
562 ms Alex_Long
702 ms Alex_Short
515 ms lingo_htodw
702 ms clive_htodw


1653 ms library htodw
578 ms Alex_Long
702 ms Alex_Short
499 ms lingo_htodw
702 ms clive_htodw


1669 ms library htodw
577 ms Alex_Long
687 ms Alex_Short
515 ms lingo_htodw
702 ms clive_htodw


1661 MS library htodw average
698 MS Alex_Short average
507 MS lingo_htodw average
573 MS Alex_Long average
702 MS clive_htodw average


Press any key to continue ...

Again, whats up with the CPU detection algorithm?

Its an AMD Phenom II x6 1055T @ 3.36GHz
When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

mineiro

sorry about what I posted Sr lingo.
Intel(R) Pentium(R) Dual CPU E2160 @ 1.80GHz

h2dt1.exe516 atodw library171 Alex short110 Lingo long109 Alex long172 clive short
h2dt2.exe515 atodw library172 Alex short94 Lingo long125 Alex long172 clive short
h2dt3.exe516 atodw library187 Alex short94 Lingo long109 Alex long172 clive short
h2dt4.exe484 atodw library172 Alex short203 Lingo long110 Alex long171 clive short
Press any key to continue ...

hutch--

 :bg

Poor Lingo, can't read the contents of an EXE file yet and does not have up to date libraries.


Section Table
-------------
01  .text    Virtual Address         00001000
Virtual Size            00001454
Raw Data Offset         00000400
Raw Data Size           00001600
Relocation Offset       00000000
Relocation Count        0000
Line Number Offset      00000000
Line Number Count       0000
Characteristics         60000020
Code
Executable
Readable

02  .rdata  Virtual Address         00003000
Virtual Size            00000210
Raw Data Offset         00001A00
Raw Data Size           00000400
Relocation Offset       00000000
Relocation Count        0000
Line Number Offset      00000000
Line Number Count       0000
Characteristics         40000040
Initialized Data
Readable

03  .data    Virtual Address         00004000
Virtual Size            00000318
Raw Data Offset         00001E00
Raw Data Size           00000400
Relocation Offset       00000000
Relocation Count        0000
Line Number Offset      00000000
Line Number Count       0000
Characteristics         C0000040
Initialized Data
Readable
Writeable


In case you have missed it, the DumpPE result shows the EXE file's .DATA section.

"ltok" has been parrt of the masm32 library for years.

Come on Lingo, you can do better than that.

Rockoon,

Sorry but I don't have a late AMD to test CPUID algos on.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

Quote from: lingo on August 06, 2010, 03:28:37 AM
"Intel(R) Celeron(R) CPU 2.13GHz
468 htodw JJ short (124 bytes)->wrong
703 Lingo long
Bravo, Jochen!
Thanks, Alex ...bla..bla..blah.."

It is a new attempt of the two liars to manipulate the people again, because JJ didn't include the creation time of his table... :lol

The table has to be created once, which costs a few nanoseconds. It has no influence on average timings, and that's the only thing that counts in real life. You have that strange belief that benchmarks are meant to win a prize for the fastest algo ever under the most peculiar constraints. Nope, they serve to improve code for libraries, and real life conditions determine the design of algo and benchmarks.

P.S. Calling other members liars gives you the image of an immature person.

Rockoon

Quote from: hutch-- on August 06, 2010, 05:25:48 AM
Rockoon,

Sorry but I don't have a late AMD to test CPUID algos on.

I am not quite sure that I understand.

Are you not using the 48-byte processor name string reported by CPUID?

(CPUID functions 80000002h, 80000003h, and 80000004h)
When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

hutch--

Rockoon,

I take your point but the code is near the end of the test piece. If I had access at a late AMD it would be easy to fix but it at least works on all of the Intel hardware I have available.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Rockoon

I see the problem. Your CPU detection code has a bug that will bite you in the ass on Intels as well (if not now, then in the future)

Specifically, you are testing the highest extended function number and only allowing string collection when it is exactly 4 or exactly 8. What you actually want to do is to collect the string whenever the highest extended function number is greater than or equal to 4 (because extended function 4 is the highest extended function number that you are calling)

For more information, check either Intel's or AMD's CPUID specifications.

From Intel's manual: http://www.intel.com/Assets/PDF/appnote/241618.pdf

Quote
2.2.1 Largest Extended Function # (Function 8000_0000h)

When EAX is initialized to a value of 8000_0000h, the CPUID instruction returns the
largest extended function number supported by the processor in register EAX.

It almost seems like you reverse engineered the magic values of 4 and 8 by looking at specific processor output, rather than checked the specs!
When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

hutch--

 :bg

You could be right but the specs are all over the place like a mad womans sewerage. I can test on Intel hardware but have no AMD machines to test with.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Rockoon

Quote from: hutch-- on August 06, 2010, 10:13:55 AM
:bg

You could be right but the specs are all over the place like a mad womans sewerage. I can test on Intel hardware but have no AMD machines to test with.

I took the first specs I could find from both Intel and AMD and they agree. I'm not sure what source of information you are using. Maybe you should stop using them.
When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.


FORTRANS

Hi,

   FWIW a PIII.


G:\WORK>runme
Writing 1000000 HEX strings to file
................................................................................
...................Done
Pentium Pro, II or Celeron Processor
1000000 = item count in file


11787 ms library htodw
4757 ms Alex_Long
5598 ms Alex_Short
5187 ms lingo_htodw
5918 ms clive_htodw


11717 ms library htodw
4786 ms Alex_Long
5638 ms Alex_Short
5187 ms lingo_htodw
5919 ms clive_htodw


11697 ms library htodw
4737 ms Alex_Long
5599 ms Alex_Short
5188 ms lingo_htodw
5919 ms clive_htodw


11696 ms library htodw
4737 ms Alex_Long
5598 ms Alex_Short
5187 ms lingo_htodw
5918 ms clive_htodw


11724 MS library htodw average
5608 MS Alex_Short average
5187 MS lingo_htodw average
4754 MS Alex_Long average
5918 MS clive_htodw average


Press any key to continue ...


   Needs comma separators.  <g>

Regards,

Steve N.

lingo

"Poor Lingo, can't read the contents of an EXE file yet...
For many years I use HIEW32 and IDA rather then DumpPE but no time and interest to investigate and use your "testing" program.

...and does not have up to date libraries."
This is true because for many years I do not use your ancient code libraries.
They are slow, with C-like code; without SSE, etc... or with other words smell of old age...Sorry  :(
They are useful for newbies to start but advanced users have nothing to learn from them and it is a reason most of them to use GoASM or other stuff.

JJ,

'The table has to be created once,...

By you or by Hutch as a publisher of your algo? :lol
Because you don't post the file with it...

...which costs a few nanoseconds"

How do you know when you haven't a ready to use table in your file?
Why these "few nanoseconds" are not included in the: 468 htodw JJ short?

"It has no influence on average timings, and that's the only thing that counts in real life"
I'm sure that Hutch's "testing" program will have the same "problem" of "code placement" with it... :lol


"P.S. Calling other members liars gives you the image of an immature person."

I'm mature enough to know the thief is always a liar... :lol