The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: hutch-- on April 28, 2010, 03:11:26 AM

Title: Intel specific instruction set recognition.
Post by: hutch-- on April 28, 2010, 03:11:26 AM
I have given up on trying to write code for AMD hardware as I have no way of testing it. The attached file is the final test piece for the original task of producing a library to test SSE up to the current SSE4.2. The code works on i7, Core series and PIV hardware and from the testing done by some of our members it appears to work on much older hardware back to the early Pentium series processors so the libraries do what they were originally intended to do.

If anyone is interested in writing AMD specific code to do the same task I would be happy to add it to the collection but I just don't have the hardware to design and test on.
Title: Re: Intel specific instruction set recognition.
Post by: dedndave on April 28, 2010, 04:42:53 AM
as i get time, i am working on a routine that just returns bits in EAX, Hutch - no strings
i think i have a feasible work-around for Cyrix and NexGen

in the low word of EAX (i.e. AX):

00 MMX
01 MMX+
02 SSE
03 SSE2
04 SSE3
05 SSSE3
06 SSE4a
07 SSE4.1 Intel
08 SSE4.1 AMD
09 SSE4.2
10 3DNow!
11 3DNow!+
12 RDTSC supported
13 486+
14 CPUID supported
15 OS supports SetProcessAffinityMask

each feature bit is set only if all processor cores in the system support that feature

the upper word will be used for basic ID of Core 0 - family/model/brand index/L2 cache/vendor ID number
i hafta cram stuff in there, so i haven't defined the bits yet

i had this partially written when i lost my other drive   :'(
thought i better re-do it while it is still somewhat fresh in my head
Title: Re: Intel specific instruction set recognition.
Post by: hutch-- on April 28, 2010, 04:46:52 AM
Dave,

Stick it into a structure so its easier to use, no bit masking in the evaluation, just 0 or 1.

Pseudo code.

.if struct.sse == 1
  Show YES
.else
  Show NO
.endif
Title: Re: Intel specific instruction set recognition.
Post by: dedndave on April 28, 2010, 04:48:58 AM
Quoteno bit masking in the evaluation, just 0 or 1.

not sure i understand what you mean, Hutch
(i understand the structure part   :P )

oh - you mean they don't want to test bits to find if a feature is supported ?
Title: Re: Intel specific instruction set recognition.
Post by: jj2007 on April 28, 2010, 06:34:25 AM
You could use RECORD/FIELD also...
Title: Re: Intel specific instruction set recognition.
Post by: dedndave on April 28, 2010, 06:37:14 AM
i think he wants it to be easy enough for a PB programmer   :P
Title: Re: Intel specific instruction set recognition.
Post by: hutch-- on April 28, 2010, 07:40:15 AM
 :bg

No, as a matter of fact I mean write it as re-usable code that does not have to do the bit masking for each application that may need it. Passing a structure to the procedure that does the testing is fast easy and convenient. Using a structure has another advantage, you can add to the structure without breaking existing code. Add support for MMXXXXX and SSSSSSE 123456 is just an extra couple of structure members.
Title: Re: Intel specific instruction set recognition.
Post by: Vortex on April 28, 2010, 05:08:47 PM
Here is my log file :

Vendor String = GenuineIntel
CPU String    = Intel(R) Pentium(R) 4 CPU 3.20GHz

SSE4.2     NO
SSE4.1     NO
SSE3       YES
SSE2       YES
SSE        YES
MMX        YES
Title: Re: Intel specific instruction set recognition.
Post by: dedndave on April 28, 2010, 09:02:40 PM
well - i was trying to keep it simple, is all
if i go with a structure, i may as well provide the strings
Title: Re: Intel specific instruction set recognition.
Post by: hutch-- on April 29, 2010, 02:48:54 AM
Dave,

Strings are fine for the vendor and processor ID but for instruction sets numeric options are a better bet and easier to use if an app needs to test if it can use 3DNow or sse4.1, they just have to test against a structure member.
Title: Re: Intel specific instruction set recognition.
Post by: dedndave on April 29, 2010, 02:56:23 AM
is it ok if it is a byte ?
or would you prefer dwords ?
Title: Re: Intel specific instruction set recognition.
Post by: hutch-- on April 29, 2010, 05:50:22 AM
Dave, an architecture something like this to make it easy to track and easy for an end user.


    DAVE_CPU_INFO STRUCT
      SSE5  dd ?
      SSE4a dd ?
      SSE42 dd ?
      SSE41 dd ?
      etc etc ....
    DAVE_CPU_INFO ENDS

In caller

    LOCAL dci   :DAVE_CPU_INFO

    invoke DaveCpuInto,ADDR dci

    .if dci.SSE4a == 1
      print "SSE4a is supported",13,10
    .endif

    .if dci.MMXX == 1
      print "Extended MMX is supported",13,10
    .endif

  ; etc etc ....



DaveCpuInto proc pStruct:DWORD

    push esi

    mov esi, pStruct

  ; produce each result then write it to each required structure member

    .if eax == 1
      mov (DAVE_CPU_INFO PTR [esi]).SSE4a, 1
    .else
      mov (DAVE_CPU_INFO PTR [esi]).SSE4a, 0
    .endif

  ; etc etc ....

    pop esi
    ret

DaveCpuInto endp
Title: Re: Intel specific instruction set recognition.
Post by: dedndave on April 29, 2010, 03:41:59 PM
i can make a structure   :bg

here is what i am getting at
if we are going to have a dword value, we may as well put the string in there
if the first byte - or, for that matter the first 4 bytes, are 0 it means the feature is not supported
it won't take that much more room to make the following strings:

"MMX",0
"MMX+",0
"SSE",0
"SSE2",0
"SSE3",0
"SSSE3",0
"SSE4.1",0
"SSE4.2",0
"SSE4a",0
"3DNow!",0
"3DNow!+",0

in the structure definition, we can give them a dword size
if they want the string - they don't usually access it by bytes - they access it by address
they can grab the address of the dword to display the string
if they grab the dword, any non-zero value means the feature is supported
it doesn't take much more code to have
mov dword ptr SomeStructure.SomeValue,1
mov dword ptr SomeStructure.SomeValue,"SSE3"
(that isn't the exact code, but you get the point)
Title: Re: Intel specific instruction set recognition.
Post by: hutch-- on April 29, 2010, 09:27:23 PM
You could do it that way but then the end user has to string match each result to determine what instruction set an app can use.


.if SSE3 == 1
  call SSEfunc
.else
  call IntegerAlternative
.endif
Title: Re: Intel specific instruction set recognition.
Post by: dedndave on April 29, 2010, 09:32:56 PM
oh - they can't test for "SSE3 not equal to 0"

or
.if SSE3 == 0
  call IntegerAlternative
.else
  call SSEfunc
.endif

we really need to help those PB&J guys   :bg
when they say "1's and 0's", they really mean it
i am glad this isn't 64-bit
Title: Re: Intel specific instruction set recognition.
Post by: hutch-- on April 29, 2010, 09:42:30 PM
 :bg


! and rax, 0000000000000000000000000000000000000000000000000000000000000001b
Title: Re: Intel specific instruction set recognition.
Post by: hutch-- on April 30, 2010, 07:21:44 AM
Dave,

would you give this a whirl on your 3 gig Prescott with XP. Its a logical core count that works OK on both quads but reports 1 core on the PIV as it has win2000 with hyperthreading disabled. If I have it right on an OS that enables hyperthreading it should show 2 logical cores.


IF 0  ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                      Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include\masm32rt.inc

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    call logical_core_count
    print str$(eax)," logical cores",13,10

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

logical_core_count proc

    mov eax, 4
    mov ecx, 0
    cpuid

    shr eax, 26
    add eax, 1                ; correct for zero base

    ret

logical_core_count endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start
Title: Re: Intel specific instruction set recognition.
Post by: dedndave on April 30, 2010, 11:51:54 AM
after the include, i had to add...

.586

so it would allow CPUID

it reports...

1 logical cores

enumerating the cores is a little bit involved - lol
the easy way is to grab GetProcessAffinityMask and count SystemAffinityMask bits   :P
but, that doesn't tell you cores per package
at the moment, i have my other drive removed to do something else - later today, i will find some code....
Title: Re: Intel specific instruction set recognition.
Post by: hutch-- on April 30, 2010, 12:48:05 PM
DSave,

The thing I need that does not appear all that easy to find is the physical core count. I have climbed through CPUID documentation and there is little there of any use to do this. I had a preference to get it direct from the processor rather than the OS but getting the physical core count is useful in high performance multi-thread code.
Title: Re: Intel specific instruction set recognition.
Post by: dedndave on April 30, 2010, 01:07:02 PM
yah - that is where it gets a little tricky   :bg

first - only Intel chips have hyper-threading
if you go to (near) the end of their CPUID documentation, there is a paragraph that explains it

for any other manufacturer, the physical core count is the logical core count
Title: Re: Intel specific instruction set recognition.
Post by: Siekmanski on May 03, 2010, 09:15:03 AM
I had the same with my netbook dualcore....
1 instead of 2 processors.


mov    eax,1
cpuid
shr    ebx,16
and    ebx,255
mov    dwProcessorCount,ebx


Now it shows 2 processors
Title: Re: Intel specific instruction set recognition.
Post by: dedndave on May 03, 2010, 09:46:06 AM
it depends on how the programmer that wrote the routine defines processor cores
i have a similar processor - a prescott
it is 1 package with 1 physical processor, but 2 logical cores
this is intel's HTT - hyperthreading technology
it uses a single processor core to run 2 threads at the same time
well - not exactly the same time - lol
i think it involves something like running one thread on the negative clock edges and another on the positive clock edges
that is probably a bit simplified, but you get the idea
Title: Re: Intel specific instruction set recognition.
Post by: hutch-- on May 03, 2010, 01:39:23 PM
The mechanism is pretty straight forward, Intel hyperthreading was a single core with faster task switching. Order of about 30% improvement from the data I have read. The Core2 Quad has 4 physical cores but no hyperthreading where the i7 Quad has 4 cores with hyperthreading which gives it a logical core count of 8. I will have to test the algo on the PIV after I enable hyperthreading in the BIOS, Win2000 does not run it properly and reports 2 processors but it may resolve the difference.
Title: Re: Intel specific instruction set recognition.
Post by: clive on May 03, 2010, 02:41:31 PM
While I think bi-phased and/or multi-phased clocks are probably used within the design (like double clocked ALU), the hyperthreading is using standard resources to dispatch as many instructions to as many available execution units as possible in a common clock domain. The system has additional register banks to hold the multiple contexts (visible registers, flags, etc), but they share a pool of registers used by the renaming/retire unit, and feeding the execution units. The threads share the cache(s), bus, instruction fetching, write buffers, branch predictions, etc.

There is arbitration between the instruction streams so one stream does not hog all the resources (the sort of ping-pong Dave infers), but one stream is unlikely to saturate all the execution units within the device. The execution units have pipelined stages so a new function can typically be initiated on each unit at every cycle. If one stream stalls with a dependency, the other takes up the slack. If both are going full bore there is going to be some fighting/sharing or resources.

Using hyperthreading can slow down tasks as the available resources have not been doubled, and must be shared by the threads. There are however more execution units than can be saturated by a single thread.