News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Benchmark and test for htodw algos.

Started by hutch--, August 03, 2010, 07:05:52 AM

Previous topic - Next topic

hutch--

 :bg

I am not sure what you are after here, I bothered to do the testing then keep fiddling the test piece so that your algo ran at it highest tested speed but I have addressed the problems in doing so. Simply by adding another algo AFTER IT the speed dropped by more than 50% so I moved it around until it ran at its highest speed again.

Nothing like a test piece to prove the result. Include or disallow some of the other algos and it slows down by about half.


312 atodw library
110 Alex short
93 Lingo long
110 clive short
Press any key to continue ...


Now this is a speed difference of 47 to 93 ms, about twice as slow depending on code location.

Instead of flapping your mouth off at me, try addressing the problem, your code is fast but it is sensitive to where it is in the executable where none of the others are.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

lingo

"Instead of flapping your mouth off at me, try addressing the problem"

The "problem" as I expected is in the fluctoations in the testing program.
1. I used Hutch's  h2dtimings.zip testing program  from the begining of the thread
2. In 1st file fileh2dt1.asm I moved lingo_htodw proc in the first position and run fileh2dt1.exe 5 times
3. In 2nd file fileh2dt2.asm I moved lingo_htodw proc in the 2nd position and run fileh2dt2.exe 5 times
4. In 3rd file fileh2dt3.asm I moved lingo_htodw proc in the 3rd position and run fileh2dt3.exe 5 times
5. In 4th file fileh2dt4.asm I moved lingo_htodw proc in the 4th place and run fileh2dt4.exe 5 times
Results:

lingo_htodw in 1st place ->fileh2dt1.exe
Results:
C:\5>h2dt1
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz
296 atodw library
109 Alex short
62 Lingo long
62 Alex long
94 clive short
Press any key to continue ...
C:\5>h2dt1
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz
312 atodw library
110 Alex short
47 Lingo long
62 Alex long
110 clive short
Press any key to continue ...
C:\5>h2dt1
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz
296 atodw library
110 Alex short
62 Lingo long
63 Alex long
94 clive short
Press any key to continue ...
C:\5>h2dt1
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz
312 atodw library
109 Alex short
47 Lingo long
62 Alex long
94 clive short
Press any key to continue ...
C:\5>h2dt1
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz
296 atodw library
108 Alex short
47 Lingo long
94 Alex long
94 clive short
Press any key to continue ...

End Results for 1st place:
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz
296 atodw library
108 Alex short
47 Lingo long
62 Alex long
94 clive short
Press any key to continue ...


lingo_htodw in 2nd place ->fileh2dt2.exe
Results:
C:\5>h2dt2
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz
312 atodw library
110 Alex short
47 Lingo long
62 Alex long
110 clive short
Press any key to continue ...
C:\5>h2dt2
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz
312 atodw library
109 Alex short
62 Lingo long
62 Alex long
94 clive short
Press any key to continue ...
C:\5>h2dt2
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz
312 atodw library
109 Alex short
63 Lingo long
62 Alex long
93 clive short
Press any key to continue ...
C:\5>h2dt2
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz
296 atodw library
109 Alex short
62 Lingo long
63 Alex long
93 clive short
Press any key to continue ...
C:\5>h2dt2
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz
281 atodw library
109 Alex short
47 Lingo long
62 Alex long
94 clive short
Press any key to continue ...

End Results for 2nd place:
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz
281 atodw library
108 Alex short
47 Lingo long
62 Alex long
93 clive short
Press any key to continue ...

lingo_htodw in 3rd place ->fileh2dt3.exe
Results:
C:\5>h2dt3
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz
312 atodw library
109 Alex short
62 Lingo long
62 Alex long
94 clive short
Press any key to continue ...
C:\5>h2dt3
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz
281 atodw library
109 Alex short
63 Lingo long
62 Alex long
93 clive short
Press any key to continue ...
C:\5>h2dt3
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz
296 atodw library
109 Alex short
46 Lingo long
78 Alex long
93 clive short
Press any key to continue ...
C:\5>h2dt3
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz
296 atodw library
110 Alex short
47 Lingo long
63 Alex long
94 clive short
Press any key to continue ...
C:\5>h2dt3
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz
296 atodw library
110 Alex short
62 Lingo long
63 Alex long
94 clive short
Press any key to continue ...

End Results for 3rd place:
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz
281 atodw library
109 Alex short
46 Lingo long
62 Alex long
93 clive short
Press any key to continue ...


lingo_htodw in 4th place ->fileh2dt4.exe
Results:
C:\5>h2dt4
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz
297 atodw library
109 Alex short
63 Lingo long
62 Alex long
93 clive short
Press any key to continue ...
C:\5>h2dt4
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz
312 atodw library
110 Alex short
47 Lingo long
62 Alex long
110 clive short
Press any key to continue ...
C:\5>h2dt4
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz
343 atodw library
109 Alex short
47 Lingo long
63 Alex long
109 clive short
Press any key to continue ...
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz
297 atodw library
109 Alex short
63 Lingo long
62 Alex long
94 clive short
Press any key to continue ...
C:\5>h2dt4
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz
296 atodw library
109 Alex short
47 Lingo long
62 Alex long
109 clive short
Press any key to continue ...

End Results for 4th place:
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz
297 atodw library
109 Alex short
47 Lingo long
62 Alex long
93 clive short
Press any key to continue ...

If someone wants to test my files let try every exe file 5 times consecutively  and  get the best results for every algo. Thanks!

hutch--

This does not help you after flapping your mouth off. I originall reported that your algo was the fastest but was subject to slowdowns depending on the placement of the code.

I posted a second test piece which is proof that your algo is unreliable in terms of timing due to code placement.

Here are your 5 consecutive runs with the SECOND test piece on the 3 gig Core2 quad I am using. Memory is 1333.


313 atodw library
109 Alex short
94 Lingo long
93 clive short
Press any key to continue ...
312 atodw library
109 Alex short
157 Lingo long
93 clive short
Press any key to continue ...
297 atodw library
109 Alex short
110 Lingo long
109 clive short
Press any key to continue ...
313 atodw library
109 Alex short
94 Lingo long
109 clive short
Press any key to continue ...
313 atodw library
109 Alex short
94 Lingo long
94 clive short
Press any key to continue ...
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

lingo

You use test file from lingo_slow.zip rather than files from h2dt.zip...Somebody else? :lol

hutch--

 :bg

Like it or lump it your algo is unreliable due to code placement and I posted the example to prove it.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

lingo

"algo is unreliable due to code placement"

Send quickly your invention to Intel and get a big prize :lol

ecube

have you tried different masm/jwasm versions hutch? maybe just the one you're using is bugged. because code not working consistantly due to placement, beyond a slight differences seems like a serious assembler bug to me.

jj2007


ecube

Quote from: jj2007 on August 04, 2010, 06:52:24 PM
It has absolutely nothing to do with "assembler bugs". See the code location sensitivity thread.
alignment and code location aren't the same thing, if it were a simple alignment issue i'm sure hutch would of tweaked it by now.

jj2007


ecube

Quote from: jj2007 on August 04, 2010, 08:54:39 PM
You are the expert :toothy
you may post quite a bit of code, but that doesn't make you an expert either. What i'm saying is common sense, if I have a 1000 different functions all aligned by 16 and one happens to be lingos, which gives drastically different speed results by simply making it the first function rather than the last, then that's a problem and as said is not the same as a simple align issue.

jj2007

If you are not too tired, just read the sensitivity thread - it has very little to do with alignment.

Antariy

Lingo,
Quote
"Yes, it seems, which Lingo's algo with big look-up table is very dependent from alignment and placement in the file. Need some "dancing with tambourine" to do it works faster :)" by the asian lamer with archaic CPU.

"When arguments is finished - started an insults". English proverb, for your info.
So, you don't adequate man, which may say something interesting to Mans civilization, because you NEVER arguments your thinks.
You don't understand hardware, and you cannot generate useful code, you may do something "great" only on newest hardware, which have tolerance to yours lamer's techincs, and work fast only on newest hardware, which may run your BLOATED algos satisfactory (but with creaking :).

And, epilog:
Lingo, yours Russian so good, as my English :P



Alex

Antariy


Hutch, remember yours benchmark, which you made for "String-to-DWORD conversion procs, and one Bug of atodw proc" thread (http://www.masm32.com/board/index.php?board=6;topic=14438.0)? This BM be in bmk2.zip archive.
This BM contain generator of hex-text file, which used in real-world testing.

I think, to real justice results, need to run BM on lines not only 8 bytes long.
I write similar to yours prog, which generate text file with variable length of strings: from 1 byte to 8, cyclic.
Yours testing algorithm I don't change, I only substitute my procs, and change console output text. In .BAT file I change hextxt.exe to hextxt2.exe - for generating variable length strings-file.
And, I add code to printing size of all procs.

So, with variable length strings, results other (my PC test):


1000000 = item count in file


1797 ms Lingo 1
1610 ms Lingo 2
1390 ms Alex Unrolled
1610 ms Alex Unrolled (AMD)


1765 ms Lingo 1
1594 ms Lingo 2
1406 ms Alex Unrolled
1594 ms Alex Unrolled (AMD)


1766 ms Lingo 1
1593 ms Lingo 2
1438 ms Alex Unrolled
1594 ms Alex Unrolled (AMD)


1765 ms Lingo 1
1610 ms Lingo 2
1421 ms Alex Unrolled
1610 ms Alex Unrolled (AMD)


1781 ms Lingo 1
1594 ms Lingo 2
1406 ms Alex Unrolled
1594 ms Alex Unrolled (AMD)


1781 ms Lingo 1
1594 ms Lingo 2
1437 ms Alex Unrolled
1594 ms Alex Unrolled (AMD)


1766 ms Lingo 1
1593 ms Lingo 2
1407 ms Alex Unrolled
1593 ms Alex Unrolled (AMD)


1766 ms Lingo 1
1609 ms Lingo 2
1422 ms Alex Unrolled
1594 ms Alex Unrolled (AMD)


1773 MS Lingo average
1599 MS Lingo2 average
1415 MS Alex average
1597 MS Alex (AMD) average

Size of code:
171      Lingo 1 proc
2076     Lingo 2 proc
396      Alex proc
396      Alex proc (AMD)


Press any key to continue ...



With consideration, which this versions are universal - i.e. may work with short-notated strings, need to run all benchmarks on variable (short-notated) length hex-strings.
Because I have SSE2 version of proc also, which is limited only to 8bytes strings and have speed by 67% faster, than my unrolled version. But I don't use it in testing, because it not universal and cannot run on CPUs less than PIV.


My "AMD" version - should runs faster on AMDs' CPUs, but this is not guarantee, because it tested only on E^cube's and Dave's AMD CPUs.



So, request to all peoples, run this benchmark, please, if you have ~1minute. This is more real-world benchmark, idea of this benchmark belong to Hutch.
I don't try prove something - this is very interesting info about different hardware, and only.



Alex

Antariy

Hutch's test
Quote from: hutch-- on August 04, 2010, 01:20:18 PM
Instead of flapping your mouth off at me, try addressing the problem, your code is fast but it is sensitive to where it is in the executable where none of the others are.

Results:

594 atodw library
250 Alex short
235 Lingo long
406 clive short
Press any key to continue ...


Wow! Lingo gets 6% of performance, by getting 3000% of bigger code size! This is great, really!  :bdg



Alex