Guys, :lol
I played with your ideas and algos about Dword2Hex from The Campus
and want to continue here
Biterider,
Your algo Dword2Hex is nice but unfortunately is not the fastest
I improved it with some clocks (see Dword2Hex1)
Jeff,
Your algo is easy to follow but slow...sorry
dsouza123,
Congratulations! Your algo with translation table is the fastest
The times on my P4 3.6Ghz are:
Dword2Hex Tests:
Dword2Hex1: 24 clocks; Result: 1C2B3A4B
Dword2Hex : 27 clocks; Result: 1C2B3A4B
DwH->Jeff : 168 clocks; Result: 1C2B3A4B
D-dsouzaf : 19 clocks; Result: 1C2B3A4B
Press ENTER to exit...
Regards,
Lingo
[attachment deleted by admin]
Lingo,
If you have time, will you add this algo to the benchmark from the MASM32 library. "dw2hex_ex".
Hutch, :lol
The algo is the same but coders are different
Dword2Hex Tests:
Dword2Hex1: 24 clocks; Result: 1C2B3A4B
Dword2Hex : 27 clocks; Result: 1C2B3A4B
DwH->Jeff : 168 clocks; Result: 1C2B3A4B
D-dsouzaf : 19 clocks; Result: 1C2B3A4B
dw2hex_ex: 38 clocks; Result: 1C2B3A4B
Press ENTER to exit...
Regards,
Lingo
[attachment deleted by admin]
Lingo,
Thanks for adding the algo to the test. The version by dsouza is clearly faster and it will be very useful to benchmark against.
Hutch, :lol
"The version iby dsouza is clearly faster..."
The idea is from dsouza and the code is mine
Here is faster version
Dword2Hex Tests:
Dword2Hex1: 24 clocks; Result: 1C2B3A4B
Dword2Hex : 27 clocks; Result: 1C2B3A4B
DwH->Jeff : 168 clocks; Result: 1C2B3A4B
d2H-lingo : 16 clocks; Result: 1C2B3A4B
dw2hex_ex: 38 clocks; Result: 1C2B3A4B
Press ENTER to exit...
Regards,
Lingo
[attachment deleted by admin]
Just ran the new one.
Dword2Hex1: 15 clocks; Result: 1C2B3A4B
Dword2Hex : 16 clocks; Result: 1C2B3A4B
DwH->Jeff : 147 clocks; Result: 1C2B3A4B
d2H-lingo : 3 clocks; Result: 1C2B3A4B
dw2hex_ex : 20 clocks; Result: 1C2B3A4B
Two consecutive runs on a P3:
Dword2Hex1: 32 clocks; Result: 1C2B3A4B
Dword2Hex : 32 clocks; Result: 1C2B3A4B
DwH->Jeff : 90 clocks; Result: 1C2B3A4B
d2H-lingo : 32 clocks; Result: 1C2B3A4B
dw2hex_ex : 26 clocks; Result: 1C2B3A4B
Dword2Hex1: 32 clocks; Result: 1C2B3A4B
Dword2Hex : 32 clocks; Result: 1C2B3A4B
DwH->Jeff : 90 clocks; Result: 1C2B3A4B
d2H-lingo : 32 clocks; Result: 1C2B3A4B
dw2hex_ex : 26 clocks; Result: 1C2B3A4B
I was getting strange timings with the published benchmark so I wrote a different one. I rewrote the dw2hex_ex to try out a few ideas and included the fast one by lingo, one that I think was written by BiteRider and the new one I have been playing with.
These are the results I have on 2 machines. One is a 2.8 g PIV, the other is an AMD Sempron 2.4. The AMD handles the SHR or ROR instruction better than the PIV were the PIV has a substantial hole after the shift or rotate instruction.
PIV result
1578 hutch
1297 lingo
2828 BiteRider
1562 hutch
1281 lingo
2844 BiteRider
1563 hutch
1281 lingo
2859 BiteRider
1563 hutch
1281 lingo
2875 BiteRider
1563 hutch
1281 lingo
2797 BiteRider
1562 hutch
1281 lingo
2844 BiteRider
1563 hutch
1281 lingo
2812 BiteRider
1563 hutch
1250 lingo
2859 BiteRider
-------
Results
-------
1564 hutch average
1279 lingo average
2839 BiteRider average
AMD Sempron result
3375 hutch
3063 lingo
3312 BiteRider
3375 hutch
3063 lingo
3328 BiteRider
3375 hutch
3078 lingo
3328 BiteRider
3375 hutch
3063 lingo
3312 BiteRider
3375 hutch
3063 lingo
3328 BiteRider
3375 hutch
3062 lingo
3329 BiteRider
3375 hutch
3062 lingo
3328 BiteRider
3375 hutch
3063 lingo
3328 BiteRider
-------
Results
-------
3375 hutch average
3064 lingo average
3324 BiteRider average
[attachment deleted by admin]
On a P3:
11609 hutch average
15546 lingo average
12544 BiteRider average
Thanks Hutch, :lol
On my P4 3.6 ghz with WinXP SP2
1906 hutch
1469 lingo
2781 BiteRider
1906 hutch
1469 lingo
2766 BiteRider
1906 hutch
1469 lingo
2735 BiteRider
1922 hutch
1468 lingo
2750 BiteRider
1906 hutch
1485 lingo
2765 BiteRider
1906 hutch
1469 lingo
2750 BiteRider
1906 hutch
1485 lingo
2812 BiteRider
1906 hutch
1485 lingo
2781 BiteRider
-------
Results
-------
1908 hutch average
1474 lingo average
2767 BiteRider average
Regards,
Lingo
On my AMD Sempron 2200+, with XP Pro SP2:
Quote4282 hutch
3906 lingo
4203 BiteRider
4281 hutch
3906 lingo
4235 BiteRider
4281 hutch
3890 lingo
4266 BiteRider
4281 hutch
3891 lingo
4234 BiteRider
4297 hutch
3891 lingo
4234 BiteRider
4281 hutch
3891 lingo
4234 BiteRider
4297 hutch
3891 lingo
4234 BiteRider
4297 hutch
3891 lingo
4234 BiteRider
-------
Results
-------
4287 hutch average
3894 lingo average
4234 BiteRider average
I have kept playing with this algo and have a version that is a bit faster than the last attempt. Lingo's version is still about 1.5% faster on my PIV but this version is faster than Lingo's on the test AMD Sempron I have available by about 2 - 3%.
These are the results on the two test machines.
PIV result
1329 hutch
1296 lingo
1313 hutch
1297 lingo
1312 hutch
1282 lingo
1312 hutch
1297 lingo
1312 hutch
1297 lingo
1313 hutch
1297 lingo
1312 hutch
1297 lingo
1313 hutch
1312 lingo
-------
Results
-------
1314 hutch average
1296 lingo average
AMD Sempron 2.4
3266 hutch
3438 lingo
3250 hutch
3437 lingo
3250 hutch
3438 lingo
3250 hutch
3437 lingo
3250 hutch
3438 lingo
3250 hutch
3437 lingo
3250 hutch
3438 lingo
3250 hutch
3437 lingo
-------
Results
-------
3252 hutch average
3437 lingo average
[attachment deleted by admin]
Hutch,
I have the same variant but it is slower on my box.
It runs faster if I work with data in the .data segment
rather then with data in the stack segment
I started your test in DOS mode and
received different results on my P4 3.6 GHz with XP Pro SP2 :lol
C:\TEMPA>bench.exe
1765 hutch
1500 lingo
1672 hutch
1484 lingo
1657 hutch
1547 lingo
1734 hutch
1468 lingo
1750 hutch
1469 lingo
1656 hutch
1500 lingo
1719 hutch
1500 lingo
1656 hutch
1516 lingo
-------
Results
-------
1701 hutch average
1498 lingo average
Press any key to continue ...