News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

clock speed

Started by allynm, August 04, 2009, 01:58:04 AM

Previous topic - Next topic

dedndave

#15
i agree with the priority setting
and - if you want to measure the actual clock frequency, that measurement SHOULD be affected if the CPU is throttled
however, isn't the performance counter based on the tsc ?
system time should be linear, no matter what the clock frequency is
i think Sleep is also linear (1 ms per tick)
also, if you are going to use RDTSC, you should restrict the affinity to a single core

1) use GetProcessAffinityMask, GetPriorityClass, and GetThreadPriority to save the initial settings
2) use SetProcessAffinityMask to select core 0
3) use pushfd/popfd to test bit 21 of the eflags register to verify the cpu supports cpuid
4) test cpuid function 1, edx bit 4 to verify the cpu supports rdtsc
5) use SetPriorityClass and SetThreadPriority as Edgar suggested to elevate priority
6) use cpuid function 0 to serialize instructions
7) use rdtsc to get an initial clock count (save values on the stack)
8) use Sleep to consume some specific amount of linear time
9) use cpuid function 0 to serialize instructions (this one may not be required, as Sleep may do it for you)
10) use rdtsc to get a terminal clock count (save values on the stack)
11) set thread and process priority back to initial settings
12) set affinity back to initial setting
13) recall the values from the stack
14) calculate frequency (terminal count - initial count) / Sleep time in seconds

EDIT - of course, there will be some errors due to measurement overhead
this can be overcome by repeating steps 5-11 with a different Sleep value
then, use the delta values to calculate actual frequency

if we use a shorter period for Sleep in the first measurement and a longer one in the second measurement:
CI1 = initial count, pass 1
CT1 = terminal count, pass 1
CI2 = initial count, pass 2
CT2 = terminal count, pass 2
T1 = Sleep interval for pass 1 (in milliseconds)
T2 = Sleep interval for pass 2 (in milliseconds)

FMHz = ( CT2 + CI1 - CT1 - CI2 ) / 1,000 ( T2 - T1 )

donkey

Hi Dave,

Can't remember why I put the sleep in, I think I was having trouble with some piece of software or other under Win2K and that solved the problem by giving a processor slice. Since the TSC is directly related to the CPU frequency (ticks) I use the high perf timer to calculate 1 second and check the number of ticks in that second (Sleep is not reliable). It is fairly accurate and seems to return good results across a wide range of processors including dual cores, for example the AMD QL-60 in my laptop returns 1902.56 Mhz which is in line with AMD's test software. Of course for multiprocessor systems it would make sense to use SetAffinityMask to select the processor being tested otherwise the results could get weird.

AFAIK QueryPerformanceCounter does not use the TSC, at least thats the impression you get from Intel's articles

http://software.intel.com/en-us/articles/measure-code-sections-using-the-enhanced-timer

See the section "Enhanced Timer"
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

dedndave

#17
ahhhh yes, i see now, Edgar
and, you use QueryPerformanceFrequency to account for any mulitplication factor

i have a 3.0 GHz prescott
if i am not mistaken, the CpuZ program accesses the actual frequency via ring 0
it installs a hidden device driver and enumerates many hardware devices, so i imagine it has ring 0 privilege
my machine reports 2999.9 to 3000.3 MHz with that program

source and program attached
i would say they are pretty close...

Total System Processor Cores: 2
CPU 0: Intel(R) Pentium(R) 4 CPU 3.00GHz MMX SSE3 Cores: 2
3000010440 Hz

also, i do see the reading vary, which is to be expected
if it was the same every time, i would say you are measurung multipliers

nice job Edgar   :U

EDIT - added version 2 - continually updates the freq display until a key is pressed

[attachment deleted by admin]

dedndave

added version 2 to the previous post

MichaelW

Mark,

I was assuming that you were trying to get the maximum rated clock frequency from the data returned by the CPUID instruction for the Intel processors starting with the Pentium 4. The code that I posted was intended to do that, but because I don't have a suitable processor I can't test it.

In the Application Note 485 document, example 10-5 is a real-mode application that measures the clock frequency. Since under Windows RDTSC can be executed at any privilege level, the basic method will work fine from a normal ring 3 application. The example I posted here takes care of most of the details, including verifying that the CPUID and RDTSC instructions are supported, but it still might misbehave on a multi-core processor. This example is about as simple as I could make it:

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc
    .686
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
    .code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    invoke Sleep, 2000    ; allow time for everything to settle
   
    rdtsc                 ; get TSC value
    push edx              ; preserve high-order dword of start count
    push eax              ; preserve low-order dword of start count
    invoke Sleep, 1000    ; wait one second
    rdtsc                 ; get TSC value
    pop ecx               ; recover low-order dword of start count
    sub eax, ecx          ; subtract from low-order dword of end count
    pop ecx               ; recover high-order dword of start count
    sbb edx, ecx          ; subtract from high-order dword of end count
   
    ; EDX:EAX now has the elapsed clock cycles for a one-second period.
   
    mov ecx, 1000000      ; load divisor 
    div ecx               ; divide EDX:EAX by ECX to get MHz
   
    print ustr$(eax)," MHz",13,10

    inkey "Press any key to exit..."
    exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start

eschew obfuscation

donkey

Hi MichaelW

My issue with Sleep to provide the baseline 1 second timer is that it is not entirely reliable, for example the following q&d test:

TESTER:
rdtsc
push eax
invoke Sleep,100
rdtsc
pop ecx
sub eax,ecx
PrintDec(eax)

rdtsc
push eax
invoke Sleep,100
rdtsc
pop ecx
sub eax,ecx
PrintDec(eax)

rdtsc
push eax
invoke Sleep,100
rdtsc
pop ecx
sub eax,ecx
PrintDec(eax)

rdtsc
push eax
invoke Sleep,100
rdtsc
pop ecx
sub eax,ecx
PrintDec(eax)

rdtsc
push eax
invoke Sleep,100
rdtsc
pop ecx
sub eax,ecx
PrintDec(eax)


Returns the following:

Line 123: eax = 204162018
Line 131: eax = 205620024
Line 139: eax = 205415514
Line 147: eax = 206248354
Line 155: eax = 205761414


There are some pretty large discrepancies there that will throw off any results.
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

MichaelW

Hi donkey,

I agree that sleep is not the best way to delay for one second, but running on my P3 the code above will almost always return 504 MHz, where code that uses the high-resolution performance counter and runs at HIGH_PRIORITY_CLASS will return 503.52... MHz. I was mainly shooting for simple and easy to understand.
eschew obfuscation

donkey

Quote from: MichaelW on August 05, 2009, 04:43:05 AM
Hi donkey,

I agree that sleep is not the best way to delay for one second, but running on my P3 the code above will almost always return 504 MHz, where code that uses the high-resolution performance counter and runs at HIGH_PRIORITY_CLASS will return 503.52... MHz. I was mainly shooting for simple and easy to understand.

Hi MichaelW,

Of course you're absolutely right, in most cases the simple and easy way is the best. For this function a close approximation is good enough since it's only real use (as far as I can see) would be to adjust an execution path for run time optimization or to check to see if a system meets specifications before installing software. In both those cases having an exact clock speed is not as important as disposing of the test quickly.
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

hutch--

Trick as usual is to run the test for longer if higher levels of accuracy are required, try 10 seconds.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave

only so much resolution is actually required, once the testing is done
when benchmarking computers, it takes roughly an 7 or 8% improvement to make a "noticable change"

percent           result

  <1     can be difficult to measure
   3         barely perceivable
   5         barely noticable
   7             noticable

so displaying more than 4 to 6 digits really isn't needed
of course, to test the measurement method, more digits are nice to have

GregL

#25
MichaelW,

I modified your (first) program so that it worked correctly for me.  You said you couldn't test it so I did.


; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc
    .686
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data

        name$       db 64 dup(0)
        multiplier$ db 8  dup(0)
        maxfreq$    db 8  dup(0)
        maxfreq     REAL8 ?

    .code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

    mov eax, 80000000h
    cpuid
    .IF eax & 80000000h

        .IF eax >= 80000004h

            print "Processor Brand String Supported",13,10

            mov eax, 80000002h
            cpuid
            mov DWORD PTR name$,    eax
            mov DWORD PTR name$+4,  ebx
            mov DWORD PTR name$+8,  ecx
            mov DWORD PTR name$+12, edx
            mov eax, 80000003h
            cpuid
            mov DWORD PTR name$+16, eax
            mov DWORD PTR name$+20, ebx
            mov DWORD PTR name$+24, ecx
            mov DWORD PTR name$+28, edx
            mov eax, 80000004h
            cpuid
            mov DWORD PTR name$+32, eax
            mov DWORD PTR name$+36, ebx
            mov DWORD PTR name$+40, ecx
            mov DWORD PTR name$+44, edx

            mov DWORD PTR maxfreq$, ecx
            mov DWORD PTR multiplier$, edx

            print ADDR name$,13,10,13,10
       
            invoke crt_printf, cfm$("maxfreq = %s %s\n\n"), ADDR maxfreq$, ADDR multiplier$

            invoke crt_atof, ADDR maxfreq$
            fstp maxfreq

            invoke crt_printf, cfm$("maxfreq = %1.3f %s\n\n"), maxfreq, ADDR multiplier$
           
        .ENDIF

    .ELSE

        print "Processor Brand String Not Supported",13,10

    .ENDIF

    inkey "Press any key to exit..."
    exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start


The output is:

Processor Brand String Supported
              Intel(R) Pentium(R) D CPU 3.20GHz

maxfreq = 3.20 GHz

maxfreq = 3.200 GHz

Press any key to exit...



GregL

dedndave and donkey,

Your program works fine.  :thumbu

CPU Frequency by Edgar (Donkey) and Dave (DednDave)

Total System Processor Cores: 2
CPU 0: Intel(R) Pentium(R) D CPU 3.20GHz MMX SSE3 Cores: 2

CPU Frequency: 3196.869680  MHz

Press any key to exit


MichaelW

Thanks Greg,

So the maximum qualified frequency field included a decimal point. In the Intel document that I was using the flowchart showed a decimal but the wrong number of digits, and the sample brand string showed:

ECX = 30303531H   "0051"
eschew obfuscation

allynm

Hi everyone -

Had to do some university stuff today and yesterday and sort of lost control of where I was in the thread. 

I think I have finally got clear on MichaelW's code and what it does.  I am sorry it took me so long.  I had to go back in to the Intel PDF that describes RDTSC and compare with MichaelW's code and I also took account of what several others had contributed, especially Bogdan.  I think I got the hang of what's going on.  I ended up writing my own buggy version of what MichaelW did.  My problem was partly I couldn't figure out where in blazes the Maxfreq data was in the Brandstring info.  I couldn't figure why there were all these repeated movs and calls. 

To quote dedndave's recent memorable statement:  just learned another instruction...or something like that.

Thanks,
Mark Allyn