The New Testbed has been destructured and rewritten in part.
Now it manages up to 16 algos and uses a screen with 40 rows and 90 columns.
All the algos, descriptions, data and procs are in include files.
Please test it on your machines and post the partial screens like the following:
┌────────────────────────────────────────────────────────────────────────────────────────┐
│OS : Microsoft Windows 7 Ultimate Edition, 64-bit (build 7600) │
│CPU : Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz with 2 logical core(s) with SSSE3 │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│ Algorithm notes │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 Alex / MMX - PUNPCKLBW MOVQ │ 64 │ 3.751 │ 3.746 │ 3.748 │ 3.743 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│02 Frank / 486 - MOV-BSWAP │ 43 │ 10.702 │ 10.708 │ 10.703 │ 10.712 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│03 Frank / XMM PUNPCKLBW MOVDQA │ 45 │ 2.347 │ 2.347 │ 2.348 │ 2.350 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│04 Alex / MMX - PUNPCKLBW MOVNTQ │ 64 │ 7.141 │ 7.211 │ 7.208 │ 7.140 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│05 Frank / 386 - MOV-SHIFT │ 42 │ 10.353 │ 10.161 │ 10.291 │ 10.370 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
Frank
The only problem I see is the 0 logical cores.
┌────────────────────────────────────────────────────────────────────────────────────────┐
│OS : Microsoft Windows 2000 Professional Service Pack 4 (build 2195) │
│CPU : Pentium III with 0 logical core(s) with SSE1 │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│ Algorithm notes │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 Alex / MMX - PUNPCKLBW MOVQ │ 64 │ 17.532 │ 17.551 │ 17.560 │ 17.558 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│02 Frank / 486 - MOV-BSWAP │ 43 │ 19.390 │ 19.380 │ 19.383 │ 19.377 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│03 Frank / XMM PUNPCKLBW MOVDQA │ 45 │ 1.591 │ 1.598 │ 1.591 │ 1.592 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│04 Alex / MMX - PUNPCKLBW MOVNTQ │ 64 │ 10.154 │ 10.152 │ 10.153 │ 10.152 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│05 Frank / 386 - MOV-SHIFT │ 42 │ 18.874 │ 18.893 │ 18.886 │ 18.955 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
Quote from: MichaelW on November 14, 2010, 12:11:44 AM
The only problem I see is the 0 logical cores.
That is because this thing is not implemented on Intel's CPU which is earlyer than PIV. I used CPUID EAX=1 EBX 16...23 bits to get this value. But at time of PIII CPUID EAX=1 - EBX was "undefined".
This is can be fixed, of course, but funny enough :lol
Frank, find this place at sources, use context search for
shr ebx,16
mov eax,1
cpuid
shr ebx,16
and ebx,255
and add this line after
and ebx,255:
sete al
add bl,al
Alex
OK Alex.
Changed and posted here.
The results MichaelW posted seems a bit strange:
01 Alex / MMX - PUNPCKLBW MOVQ │ 64 │ 17.532 │ 17.551 │ 17.560 │ 17.558
and
03 Frank / XMM PUNPCKLBW MOVDQA │ 45 │ 1.591 │ 1.598 │ 1.591 │ 1.592
::)
Frank
Quote from: frktons on November 14, 2010, 01:39:17 AM
OK Alex.
Changed and posted here.
The results MichaelW posted seems a bit strange:
01 Alex / MMX - PUNPCKLBW MOVQ │ 64 │ 17.532 │ 17.551 │ 17.560 │ 17.558
and
03 Frank / XMM PUNPCKLBW MOVDQA │ 45 │ 1.591 │ 1.598 │ 1.591 │ 1.592
::)
Your XMM code is SSE2 - it is not work on PIII. But PIII is not generate #UD for some of SSE2 instructions (and used in your proc), but treat them as MMX. So, results just show nothing.
Frank, you new archive does not contain the sources :P
00402E17 C1EB10 shr ebx,10h
00402E1A 81E3FF000000 and ebx,0FFh
00402E20 0F94C0 sete al
00402E23 02D8 add bl,al
00402E25 895C2404 mov [esp+4],ebx
Alex
Quote from: Antariy on November 14, 2010, 02:01:09 AM
Your XMM code is SSE2 - it is not work on PIII. But PIII is not generate #UD for some of SSE2 instructions (and used in your proc), but treat them as MMX. So, results just show nothing.
Frank, you new archive does not contain the sources :P
00402E17 C1EB10 shr ebx,10h
00402E1A 81E3FF000000 and ebx,0FFh
00402E20 0F94C0 sete al
00402E23 02D8 add bl,al
00402E25 895C2404 mov [esp+4],ebx
Alex
I only posted the modified pieces, not everything. Just overwrite previous files on the folder
and all the stuff will be updated. :U
Frank, you putted ProgData.inc into archive, instead of ProgProc.inc.
Quote from: Antariy on November 14, 2010, 02:07:25 AM
Frank, you putted ProgData.inc into archive, instead of ProgProc.inc.
The few neurons still ON didn't realize it. :dazzled:
Here it is the ProgProc.inc
For the 0 logical cores problem, not sure if you want to do it but you could alternatively use GetProcessAffinityMask
yah - that is the easy way
seeing as you should call that function anyways to get the system mask
(i also like to select core 0 while reading CPUID to insure the results all come from the same place)
you can use the system affinity mask to get total logical cores
then, examine the HTT bit in 0_1:EDX[10]
if they have hyper-threading, divide the total logical cores by 2 to find physical cores
otherwise, the bits counted in the system affinity mask represent the number of physical cores
i am having a look at your Info page, Frank
give me some time :P
Quote from: dedndave on November 14, 2010, 07:19:11 PM
yah - that is the easy way
Nothing can be simpler that using of native CPUs info with help of couple commands :P
Quote from: Slugsnack on November 14, 2010, 06:35:27 PM
For the 0 logical cores problem, not sure if you want to do it but you could alternatively use GetProcessAffinityMask
0 cores is not a problem - that feature is just not exist at old CPUs. I guess, if we get zero as counter of cores - that is nonsense due to unimplementation of that feature, and we can just set counter to 1, that is done currently.
Quote from: Antariy on November 14, 2010, 09:37:21 PM
Quote from: Slugsnack on November 14, 2010, 06:35:27 PM
For the 0 logical cores problem, not sure if you want to do it but you could alternatively use GetProcessAffinityMask
0 cores is not a problem - that feature is just not exist at old CPUs. I guess, if we get zero as counter of cores - that is nonsense due to unimplementation of that feature, and we can just set counter to 1, that is done currently.
Hi Alex, can we keep the:
and add this line after and ebx,255:
sete al
add bl,al
To display
"1 core" or are you thinking about something else?
Frank
Quote from: frktons on November 14, 2010, 10:29:24 PM
Quote from: Antariy on November 14, 2010, 09:37:21 PM
Quote from: Slugsnack on November 14, 2010, 06:35:27 PM
For the 0 logical cores problem, not sure if you want to do it but you could alternatively use GetProcessAffinityMask
0 cores is not a problem - that feature is just not exist at old CPUs. I guess, if we get zero as counter of cores - that is nonsense due to unimplementation of that feature, and we can just set counter to 1, that is done currently.
Hi Alex, can we keep the:
and add this line after and ebx,255:
sete al
add bl,al
To display "1 core" or are you thinking about something else?
For 99.99% I think that is simplest possible way. Dave's way much more harder (and slower :P), and Dave's way is not guarantee right results (because counter which is returned by CPU is the real counter, and value returned by OS is counter which is implemented by OS :P).
Mine current solution is: if EBX 16...23 bits was zero (on old CPUs), then counter of cores will be set to 1. That is simplest and reliable enough way :P
Quote from: Antariy on November 14, 2010, 10:40:14 PM
For 99.99% I think that is simplest possible way. Dave's way much more harder (and slower :P), and Dave's way is not guarantee right results (because counter which is returned by CPU is the real counter, and value returned by OS is counter which is implemented by OS :P).
Mine current solution is: if EBX 16...23 bits was zero (on old CPUs), then counter of cores will be set to 1. That is simplest and reliable enough way :P
OK :U
We keep the code as it is :P
Frank,
The last version is much easier to read with the .DATA section info in a separate file. One more suggestion, where you have dynamic code (procedures, instructions etc ...) in a file, use the ASM extension, not INC. "include" extensions are generaly used for prototypes, data and similar. This way you more routinely know what type of code is where.
This is the result from the last version.
┌────────────────────────────────────────────────────────────────────────────────────────┐
│OS : Microsoft Windows XP Professional Service Pack 3 (build 2600) │
│CPU : Intel(R) Core(TM)2 Quad CPU Q9650 @ 3.00GHz with 4 logical core(s) with SSE4.1 │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│ Algorithm notes │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 Alex / MMX - PUNPCKLBW MOVQ │ 64 │ 3.750 │ 3.748 │ 3.750 │ 3.752 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│02 Frank / 486 - MOV-BSWAP │ 43 │ 10.702 │ 10.701 │ 10.703 │ 10.700 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│03 Frank / XMM PUNPCKLBW MOVDQA │ 45 │ 1.847 │ 1.842 │ 1.842 │ 1.847 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│04 Alex / MMX - PUNPCKLBW MOVNTQ │ 64 │ 6.108 │ 6.120 │ 6.102 │ 6.087 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│05 Frank / 386 - MOV-SHIFT │ 42 │ 10.022 │ 10.011 │ 10.019 │ 10.010 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│06 │ │ │ │ │ │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│07 │ │ │ │ │ │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│08 │ │ │ │ │ │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│09 │ │ │ │ │ │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│10 │ │ │ │ │ │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│11 │ │ │ │ │ │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│12 │ │ │ │ │ │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│13 │ │ │ │ │ │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│14 │ │ │ │ │ │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│15 │ │ │ │ │ │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│16 │ │ │ │ │ │
├──────────────────────────────────┴─────────┴──────────┴──────────┴──────────┴──────────┤
│ Esc Exit Copy Run View Save Info F1 Help │
└────────────────────────────────────────────────────────────────────────────────────────┘
Thanks Steve, the results look good :U
Your suggestion is a wise one, I'm going to implement it in the next version. :bg
Frank
Frank,
I had a suggestion for you, its been many years since i laid out a text mode screen but I still remember the basics of using the ASCII framing characters and something that should not be a big deal to do is having a dynamic layout control so that you adjust the number of test slots based on the number of tests you want to perform.
You would basically do it by building a series of components, a top bar, a space bar, a top info bar with downward branches, a space info bar with vertical dividers, a bottom info bar with matching upwards branches and for the very bottom of the display a bottom bar.
You store each as a string then construct the entire text mode window from these components in memory. Then you display it in one console write so its fast.
Text placement is done after by locating the text insertion position.
Quote from: hutch-- on November 15, 2010, 11:10:11 PM
Frank,
I had a suggestion for you, its been many years since i laid out a text mode screen but I still remember the basics of using the ASCII framing characters and something that should not be a big deal to do is having a dynamic layout control so that you adjust the number of test slots based on the number of tests you want to perform.
You would basically do it by building a series of components, a top bar, a space bar, a top info bar with downward branches, a space info bar with vertical dividers, a bottom info bar with matching upwards branches and for the very bottom of the display a bottom bar.
You store each as a string then construct the entire text mode window from these components in memory. Then you display it in one console write so its fast.
Text placement is done after by locating the text insertion position.
I had some thought about this way of designing the grid, adapting it to the number of the algo to test,
but I decided to leave this part for the future. :P
When I fell like I'll probably code it. I also need the free necessary time. We'll see. :U
Frank
Please, goto: "http://www.masm32.com/board/index.php?topic=14871.msg125063#msg125063" - the main thread :P
And test my variation of algos manager and way of addition of algos and tuning of testbed.