News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Accurate CPU clock speed procedure

Started by MichaelW, February 19, 2006, 08:35:59 PM

Previous topic - Next topic

MichaelW

The attachment contains a test app for a procedure that returns the CPU clock speed in MHz. I specifically coded it to return a consistent and, hopefully, accurate value. On my system the run to run variation is only about .001 MHz. The CPU is a 500 MHz P3. The Intel Processor Frequency ID utility shows 500 MHz, the AMD CPUID app 504 MHz, dxdiag "~503" MHz, and this app 503.52 MHz.

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc
    .586
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
    .code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

    call CpuClockSpeed
    .IF eax
      push  eax
      push  eax
      fstp  QWORD PTR[esp]
      pop   eax
      pop   edx
      invoke crt_printf,chr$("%.2f MHz%c"),edx::eax,10
    .ENDIF 

    inkey "Press any key to exit..."
    exit

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
; This proc determines the CPU clock speed in MHz by counting TSC
; cycles over a one-second interval timed with the high-resolution
; performance counter. If the processor supports CPUID and RDTSC
; and the system supports a high-resolution performance counter,
; the clock speed is left on the FPU stack in ST(0) and the return
; value is non-zero. Otherwise, the return value is zero.
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

CpuClockSpeed proc uses edi esi

    LOCAL pcFreq  :QWORD
    LOCAL pcCount :QWORD

    ;-----------------------------------------------------------
    ; CPUID supported if can set/clear ID flag (EFLAGS bit 21).
    ;-----------------------------------------------------------

    pushfd
    pop   edx
    pushfd
    pop   eax
    xor   eax, 200000h  ; flip ID flag
    push  eax
    popfd
    pushfd
    pop   eax
    xor   eax, edx
    jz    fail

    ;------------------------------------------------
    ; TSC supported if CPUID function 1 returns with
    ; bit 4 of EDX set.
    ;------------------------------------------------

    mov   eax, 1
    cpuid
    and   edx, 10h
    jz    fail

    invoke QueryPerformanceFrequency, ADDR pcFreq
    or    eax, eax
    jz    fail
    ;pushad
    ;invoke crt_printf,chr$("pcFreq:%I64d%c"),pcFreq,10
    ;popad

    invoke GetCurrentProcess
    invoke SetPriorityClass, eax, HIGH_PRIORITY_CLASS

    ;----------------------------------------------------
    ; Sync with performance counter and get start count.
    ;----------------------------------------------------

    invoke QueryPerformanceCounter, ADDR pcCount
    mov   edi, DWORD PTR pcCount
  @@:
    invoke QueryPerformanceCounter, ADDR pcCount
    cmp   edi, DWORD PTR pcCount
    je    @B

    rdtsc
    push  edx
    push  eax

    ;-----------------------------------------
    ; Calc terminal count for 1 second delay.
    ;-----------------------------------------

    mov   edi, DWORD PTR pcCount
    mov   esi, DWORD PTR pcCount + 4
    add   edi, DWORD PTR pcFreq   
    adc   esi, DWORD PTR pcFreq + 4

    ;---------------------------------------------
    ; Loop until PC count exceeds terminal count.
    ;
    ; Cannot check low-order dword for equality
    ; because PC cannot be depended on to always
    ; increment count by one.
    ;---------------------------------------------
  @@: 
    invoke QueryPerformanceCounter, ADDR pcCount
    cmp   DWORD PTR pcCount+4, esi
    jne   @B
    cmp   DWORD PTR pcCount, edi
    jb    @B   

    rdtsc
   
    pop   ecx
    sub   eax, ecx
    pop   ecx
    sbb   edx, ecx

    push  edx
    push  eax
    finit
    fild  QWORD PTR[esp]
    fld8  1000000.0
    fdiv

    add   esp, 8    ; Not necessary here, but still a good practice.

    invoke GetCurrentProcess
    invoke SetPriorityClass, eax, NORMAL_PRIORITY_CLASS

    return 1

  fail:

    return 0

CpuClockSpeed endp

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

end start



[attachment deleted by admin]
eschew obfuscation

sluggy

Muhahaha  :bdg I ran it lots of times, and got a different speed every time  :toothy

But that isn't your fault - all the latest cpus have auto stepping built in, so they only go faster when they have to. And measuring in MHz brings up the whole argument that AMD has been having for the last x number of years. But all that aside, you have written a nice piece of code, and i know i will find a use for it in the future  :U


dioxin

Michael,
   be careful with such timing routines, all you're actually measuring is the product of the clock multipliers (in your case 422) since the CPU and the high performance counters have the same base crystal.

   If that crystal is out by 1% then your figure won't change since both the CPU clock and the high performance timer clock will shift by 1% in the same direction.

   Instead, you need to find a different timebase such as the RTC which has its own crystal and contains bits that change every 1 second.

   Attached is a version I wrote years ago to do this, it's in BASIC but all the important parts are inline ASM.
   It was intended to run in DOS and Win98.

Paul.
PS you can download the DOS EXE file from : http://www.axol-electronics.com/cpuspeed.exe

[attachment deleted by admin]

MichaelW

Thanks Paul. If I boot my system from a Windows 9x boot diskette and run your app with a 10 second measurement period, after 40 tests the average is ~503.4 MHz, and 503.52 MHz (503,522,560 Hz) after 200 tests, with an oscillation of ~0.02 MHz around this value thereafter. After converting my code to a DOS app that uses the RTC to time the test period, running off a Windows 9x boot diskette I got 503.24 MHz on the single run I tried. After converting my code so I could use an external timer, starting and stopping the test manually, running under Windows 2000 I got 503.7 and 503.4 MHz on the two runs I tried. So I am now confident that on my system under Windows 2000 the measured clock speed is at least reasonably accurate.

I understand your point about the CPU and the timer both using the same frequency reference. But if this is so, and assuming that the clock generator is accurately synthesizing the timer and FSB frequencies, and that the CPU is accurately scaling up the FSB frequency, then it seems to me that I should get something closer to 500 MHz. Also, for Windows 9x I recall the PC frequency being reported as something close to 1,193,182 Hz, but for Windows 2000 and XP it is reported as 3,579,545 Hz, 3X the Windows 9x value, and 3X system timer input frequency. The PC count seems to update every 6.4 counts, which corresponds to a counter frequency of about 560 kHz, which might be doable with the system timer, but not with the RTC. Perhaps Microsoft is somehow combining the system timer with the RTC.

eschew obfuscation

skywalker

Quote from: MichaelW on February 19, 2006, 08:35:59 PM
The attachment contains a test app for a procedure that returns the CPU clock speed in MHz. I specifically coded it to return a consistent and, hopefully, accurate value. On my system the run to run variation is only about .001 MHz. The CPU is a 500 MHz P3. The Intel Processor Frequency ID utility shows 500 MHz, the AMD CPUID app 504 MHz, dxdiag "~503" MHz, and this app 503.52 MHz.



Good piece of code.
It's showing a consistent 448.88 MHz on my 448 MHz system.


dioxin

Michael,
   The usual way to derive the clock frequencies is (or at least it used to be) to start with the NTSC colour subcarrier crystal as it was the cheapest and most widely available crystal at the time PCs came on the market. The colour subcarrier frequency is 3,579,545Hz for an NTSC TV.
   This frequency is derived from a crystal running at 4x that frequency to allow for quadrature signals to be produced giving
4 x 3,579,545Hz = 14,318,180Hz, the crystal timebase for the PC.

14,318,180Hz / 4 gives 3,579,545, the colour subcarrier frequency and also the high performance counter frequency you quoted.

Divide this by 3 and you get the PIT timer frequency of 1,193,181.666Hz which the PIT divides by 65,536 to give the more familiar 18.2Hz timer interrupt. But on WinXP machines the PIT loads a smaller value to give the timebase for the OS of either 64Hz or 1000Hz depending on circumstances and I have heard of 10ms being used.

The "33MHz" PCI bus clock is derived from the PIT clock, it's 28x1,193,181.6666Hz =33.4009086MHz and you can get at that one easily to measure it and check.

The CPU FSB is PCI clk x4,5 or 6 to give a selection of FSBs of 133.6363, 167.045 or 200.4545MHz

The CPU clk is the FSB clk x one of a large range of integer and half integer values such as 5x, 5.5x, 6x, 6.5x, 7x, 7.5x.. etc.

At this point everything works on my old PC, my FSB is 4x and the CPU is 3x giving 400.90904MHz and I can measure it with my cpuspeed code at 400.90925MHz, an error of 0.5ppm. It looks like my crystals are very well matched! I'd expect and error of upto 100ppm but I know my RTC crystal is well tuned as the RTC gains only a second a month so it's good to about 0.3ppm.


Now, if we take your accurate, long term CPU measurement of 503,522,560 Hz and divide it by the PIT timer reference of 1,193,181.6666Hz

503,522,560 Hz/1,193,181.6666= 421.99991  so it looks like you have a total multiplier of 422, but I can't see why!
I'd have expected PCI clk x 15=501MHz which is a total multiplier of 420 which fits with all the other frequencies.

I hope that helps throw some light on the situation.. but it's not as clear as it used to be 5 years ago!

<<The PC count seems to update every 6.4 counts>>

I'm not sure which counts you refer to here.

Paul.

MichaelW

Quote14,318,180Hz / 4 gives 3,579,545, the colour subcarrier frequency and also the high performance counter frequency you quoted.

Thanks, I tried to derive this relationship, but apparently I did not try hard enough. This explains how the PC frequency could be derived, but how could this actually be done using normal PC hardware? AFAIK it cannot be done with the PIT, or at least not using a single timer channel.

Quote
<<The PC count seems to update every 6.4 counts>>

I'm not sure which counts you refer to here.

The PC output value does not update after each cycle at the stated PC frequency. Instead, it updates, on average, about every 6.4 cycles, as determined by this app:

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc
    .586
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
        pcCount   dq 0
        prevCount dd 0
        rvals     dd 10100 dup(0)
    .code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    invoke GetCurrentProcess
    invoke SetPriorityClass, eax, HIGH_PRIORITY_CLASS

    xor   ebx, ebx
    .WHILE ebx < 100
      .REPEAT
        invoke QueryPerformanceCounter, ADDR pcCount
        mov   esi, DWORD PTR pcCount
      .UNTIL esi != prevCount
      mov   eax, esi
      sub   eax, prevCount
      mov   prevCount, esi
      mov   [rvals+ebx*4], eax
      inc   ebx
    .ENDW

    invoke GetCurrentProcess
    invoke SetPriorityClass, eax, NORMAL_PRIORITY_CLASS

    xor   ebx, ebx
    .WHILE ebx < 100
      print ustr$([rvals+ebx*4]),13,10
      inc   ebx
    .ENDW

    invoke GetCurrentProcess
    invoke SetPriorityClass, eax, HIGH_PRIORITY_CLASS

    xor   ebx, ebx
    .WHILE ebx < 10100
      .REPEAT
        invoke QueryPerformanceCounter, ADDR pcCount
        mov   esi, DWORD PTR pcCount
      .UNTIL esi != prevCount
      mov   eax, esi
      sub   eax, prevCount
      mov   prevCount, esi
      .IF eax < 9
        mov   [rvals+ebx*4], eax
        inc   ebx
      .ENDIF
    .ENDW

    invoke GetCurrentProcess
    invoke SetPriorityClass, eax, NORMAL_PRIORITY_CLASS

    xor   eax, eax
    mov   ebx, 100
    .WHILE ebx < 10100
      add   eax, [rvals+ebx*4]
      inc   ebx
    .ENDW
    print ustr$(eax),13,10

    inkey "Press any key to exit..."
    exit

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

end start


Why Microsoft decided to have a stated counter frequency of 3,579,545 Hz but only update the counter output value every 6-7 cycles is beyond me.
eschew obfuscation

dioxin

Michael,
   just a guess, but if the PIT (or the bit of silicon which now does the job of the PIT) is still accessed according to ISA bus timing for compatibility with old software then 6-7 cycles is about how long it would take to read the registers.

   As for deriving the other CLKs, the PIT isn't used itself, but the same timebase used by the PIT is also used by the motherboard chipset to derive the higher frequencies needed.


Paul.

Petroizki

#8
This is a CPU speed proc made by me. I made my own, because the other versions I have found so far, kinda "freezes" the computer for a while, and I truly hate that. It works just fine with my AMD64 3000+ and 1.4GHz Athlon, but no idea what it might display on slow-end comps.

To get longer counting, set the 'FREQ_DIVIDE_POWER_OF_2' to 3 or even 2, but 4 seems to work just fine for me. When set to '4' the CPU speed timing takes about 1/2^4 seconds (63ms).

Note, that it does not check the cpuid flag for rdtsc.

.586
.model flat, stdcall
include \masm32\include\windows.inc
include \masm32\include\masm32.inc
include \masm32\include\kernel32.inc
include \masm32\include\msvcrt.inc

includelib \masm32\lib\masm32.lib
includelib \masm32\lib\kernel32.lib
includelib \masm32\lib\msvcrt.lib

include \masm32\macros\macros.asm
.data
DIVIDOR REAL4 1000000.0
.data?
SPEED dq ?
.code
GetCPUSpeed proc uses ebx edi esi
LOCAL qwCycles:QWORD, qwTimer:QWORD
LOCAL dwPriority:DWORD, hProcess:HANDLE

FREQ_DIVIDE_POWER_OF_2 EQU <4>

lea ebx, [qwTimer]

invoke GetCurrentProcess
mov hProcess, eax
invoke GetPriorityClass, eax
mov dwPriority, eax
invoke SetPriorityClass, hProcess, HIGH_PRIORITY_CLASS

invoke QueryPerformanceFrequency, ebx
test eax, eax
jz @no_timer

mov esi, dword ptr [ebx + 4]
mov edi, dword ptr [ebx]

mov eax, esi
shr edi, FREQ_DIVIDE_POWER_OF_2
shl eax, 32-FREQ_DIVIDE_POWER_OF_2
shr esi, FREQ_DIVIDE_POWER_OF_2
or edi, eax

push ebx
rdtsc
mov dword ptr [qwCycles], eax
mov dword ptr [qwCycles + 4], edx

call QueryPerformanceCounter
add edi, dword ptr [ebx]
adc esi, dword ptr [ebx + 4]

@@: invoke QueryPerformanceCounter, ebx
cmp esi, dword ptr [ebx + 4]
jb @F
cmp edi, dword ptr [ebx]
jnb @B

@@: rdtsc

sub eax, dword ptr [qwCycles]
sbb edx, dword ptr [qwCycles + 4]

mov ecx, eax
shl edx, FREQ_DIVIDE_POWER_OF_2
shr ecx, 32-FREQ_DIVIDE_POWER_OF_2
shl eax, FREQ_DIVIDE_POWER_OF_2
or edx, ecx

mov edi, eax
mov esi, edx

invoke SetPriorityClass, hProcess, dwPriority

mov eax, edi
mov edx, esi

@no_timer:
ret
GetCPUSpeed endp

start:
print chr$("CPU Speed: ")
invoke GetCPUSpeed

mov dword ptr [SPEED], eax
mov dword ptr [SPEED + 4], edx

fild qword ptr [SPEED]
fdiv dword ptr [DIVIDOR]

fstp qword ptr [SPEED]

mov eax, dword ptr [SPEED]
mov edx, dword ptr [SPEED + 4]

invoke crt_printf,chr$("%.2f MHz%c"),edx::eax,10 

inkey chr$(13,10,"Press any key to exit...")

ret
end start


dl: http://personal.inet.fi/atk/partsu/speed.zip

PBrennick

Petroizki,
I am concerned by your cpuspeed utility.  It continually returns 500 without any variations which would be the first thing for me to doubt.  On a busy system there MUST be some variation.  Anyhow, Michaels program returns a value that is in the 900s and varies by as much as 25.  This is very close to what I have which is AMD Athlon 1GHz.

Am I supposed to play with FREQ_DIVIDE_POWER_OF_2 as you were mentioning or is there something else you would like me to try?

Paul
The GeneSys Project is available from:
The Repository or My crappy website

Petroizki

Quote from: PBrennick on April 06, 2006, 08:50:04 AM
Petroizki,
I am concerned by your cpuspeed utility.  It continually returns 500 without any variations which would be the first thing for me to doubt.  On a busy system there MUST be some variation.  Anyhow, Michaels program returns a value that is in the 900s and varies by as much as 25.  This is very close to what I have which is AMD Athlon 1GHz.

Am I supposed to play with FREQ_DIVIDE_POWER_OF_2 as you were mentioning or is there something else you would like me to try?

Paul
The example rounds the clock speed to integer, I changed it to show 2 decimals.

Do you mean that the example returns 500MHz on a 1GHz comp?

The FREQ_DIVIDE_POWER_OF_2 is used as the dividor to shorten the time used to get the CPU speed. The time used to get the speed is '1/2^FREQ_DIVIDE_POWER_OF_2 seconds'. So by using smaller number you would get longer, and probably more accurate CPU speed timing.

MichaelW

Petroizki,

On my P3-500 system your version runs in ~64ms and returns 503.54 or 503.55. If I change FREQ_DIVIDE_POWER_OF_2 to 2, it runs in ~255ms and returns a consistent 503.53. For reference, my version runs in one second and returns a consistent 503.52 (I display only two decimal digits because there is some variation in the third).

BTW I had to add an "option casemap:none" before I could assemble your code.

Paul,

Isn't your processor a mobile Athlon with PowerNow! Technology? If it is then a variation in clock speed would be normal. I wonder if there is some simple method of temporarily forcing the processor to run at its maximum speed.


eschew obfuscation

Mark Jones

#12
AMD XP 2500+ (1.84GHz): 1837.57MHz +/- 0.03MHz

EDIT: Try setting the thread priority to REALTIME_PRIORITY_CLASS for the duration of the clocking. Might provide more accurate results.
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

asmfan

to tell you the truth all CPU frequency measure procedures are based on timing cycles during some time. frequency = num_of_cycles per second.
all you need is to measure some time then calculate the number of CPU cycles and finally divide cycles by time and you'll get the Hz. divide by 1000000 and you have MHz.
use CPU timing macros and any internal clock.
Russia is a weird place

PBrennick

Michael,
I expect variations, that is what I said in my post.

QuoteOn a busy system there MUST be some variation.

I think your clock speed program is running in an acceptable manner, and you are right about the PowerNow thing.  Thank you for the nice utility.

Petroizki,
Your program only returns one value, no matter how often I run it and so, at least on my machine, does not seem to be working correctly.

Here are the results from my last test runs:
Quote
Running Michael's program:
877.90 MHz
921.33 MHz
863.21 MHz
901.21 MHz
925.57 MHz
875.64 MHz
920.84 MHz
926.96 MHz
858.86 MHz
928.83 MHz

Running Petroizki's Program:
CPU Speed: 500MHz
CPU Speed: 500MHz
CPU Speed: 500MHz
CPU Speed: 500MHz
CPU Speed: 500MHz
CPU Speed: 500MHz
CPU Speed: 500MHz
CPU Speed: 500MHz
CPU Speed: 500MHz
CPU Speed: 500MHz

I hope this clears up any confusion.  Can someone tell me why Petroizki's program is doing what it is doing?

Paul
The GeneSys Project is available from:
The Repository or My crappy website