The MASM Forum Archive 2004 to 2012

General Forums => The Workshop => Topic started by: MichaelW on February 19, 2006, 08:35:59 PM

Title: Accurate CPU clock speed procedure
Post by: MichaelW on February 19, 2006, 08:35:59 PM
The attachment contains a test app for a procedure that returns the CPU clock speed in MHz. I specifically coded it to return a consistent and, hopefully, accurate value. On my system the run to run variation is only about .001 MHz. The CPU is a 500 MHz P3. The Intel Processor Frequency ID utility shows 500 MHz, the AMD CPUID app 504 MHz, dxdiag "~503" MHz, and this app 503.52 MHz.

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc
    .586
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
    .code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

    call CpuClockSpeed
    .IF eax
      push  eax
      push  eax
      fstp  QWORD PTR[esp]
      pop   eax
      pop   edx
      invoke crt_printf,chr$("%.2f MHz%c"),edx::eax,10
    .ENDIF 

    inkey "Press any key to exit..."
    exit

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
; This proc determines the CPU clock speed in MHz by counting TSC
; cycles over a one-second interval timed with the high-resolution
; performance counter. If the processor supports CPUID and RDTSC
; and the system supports a high-resolution performance counter,
; the clock speed is left on the FPU stack in ST(0) and the return
; value is non-zero. Otherwise, the return value is zero.
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

CpuClockSpeed proc uses edi esi

    LOCAL pcFreq  :QWORD
    LOCAL pcCount :QWORD

    ;-----------------------------------------------------------
    ; CPUID supported if can set/clear ID flag (EFLAGS bit 21).
    ;-----------------------------------------------------------

    pushfd
    pop   edx
    pushfd
    pop   eax
    xor   eax, 200000h  ; flip ID flag
    push  eax
    popfd
    pushfd
    pop   eax
    xor   eax, edx
    jz    fail

    ;------------------------------------------------
    ; TSC supported if CPUID function 1 returns with
    ; bit 4 of EDX set.
    ;------------------------------------------------

    mov   eax, 1
    cpuid
    and   edx, 10h
    jz    fail

    invoke QueryPerformanceFrequency, ADDR pcFreq
    or    eax, eax
    jz    fail
    ;pushad
    ;invoke crt_printf,chr$("pcFreq:%I64d%c"),pcFreq,10
    ;popad

    invoke GetCurrentProcess
    invoke SetPriorityClass, eax, HIGH_PRIORITY_CLASS

    ;----------------------------------------------------
    ; Sync with performance counter and get start count.
    ;----------------------------------------------------

    invoke QueryPerformanceCounter, ADDR pcCount
    mov   edi, DWORD PTR pcCount
  @@:
    invoke QueryPerformanceCounter, ADDR pcCount
    cmp   edi, DWORD PTR pcCount
    je    @B

    rdtsc
    push  edx
    push  eax

    ;-----------------------------------------
    ; Calc terminal count for 1 second delay.
    ;-----------------------------------------

    mov   edi, DWORD PTR pcCount
    mov   esi, DWORD PTR pcCount + 4
    add   edi, DWORD PTR pcFreq   
    adc   esi, DWORD PTR pcFreq + 4

    ;---------------------------------------------
    ; Loop until PC count exceeds terminal count.
    ;
    ; Cannot check low-order dword for equality
    ; because PC cannot be depended on to always
    ; increment count by one.
    ;---------------------------------------------
  @@: 
    invoke QueryPerformanceCounter, ADDR pcCount
    cmp   DWORD PTR pcCount+4, esi
    jne   @B
    cmp   DWORD PTR pcCount, edi
    jb    @B   

    rdtsc
   
    pop   ecx
    sub   eax, ecx
    pop   ecx
    sbb   edx, ecx

    push  edx
    push  eax
    finit
    fild  QWORD PTR[esp]
    fld8  1000000.0
    fdiv

    add   esp, 8    ; Not necessary here, but still a good practice.

    invoke GetCurrentProcess
    invoke SetPriorityClass, eax, NORMAL_PRIORITY_CLASS

    return 1

  fail:

    return 0

CpuClockSpeed endp

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

end start



[attachment deleted by admin]
Title: Re: Accurate CPU clock speed procedure
Post by: sluggy on February 20, 2006, 11:54:43 AM
Muhahaha  :bdg I ran it lots of times, and got a different speed every time  :toothy

But that isn't your fault - all the latest cpus have auto stepping built in, so they only go faster when they have to. And measuring in MHz brings up the whole argument that AMD has been having for the last x number of years. But all that aside, you have written a nice piece of code, and i know i will find a use for it in the future  :U

Title: Re: Accurate CPU clock speed procedure
Post by: dioxin on February 20, 2006, 12:42:14 PM
Michael,
   be careful with such timing routines, all you're actually measuring is the product of the clock multipliers (in your case 422) since the CPU and the high performance counters have the same base crystal.

   If that crystal is out by 1% then your figure won't change since both the CPU clock and the high performance timer clock will shift by 1% in the same direction.

   Instead, you need to find a different timebase such as the RTC which has its own crystal and contains bits that change every 1 second.

   Attached is a version I wrote years ago to do this, it's in BASIC but all the important parts are inline ASM.
   It was intended to run in DOS and Win98.

Paul.
PS you can download the DOS EXE file from : http://www.axol-electronics.com/cpuspeed.exe

[attachment deleted by admin]
Title: Re: Accurate CPU clock speed procedure
Post by: MichaelW on February 21, 2006, 12:07:45 PM
Thanks Paul. If I boot my system from a Windows 9x boot diskette and run your app with a 10 second measurement period, after 40 tests the average is ~503.4 MHz, and 503.52 MHz (503,522,560 Hz) after 200 tests, with an oscillation of ~0.02 MHz around this value thereafter. After converting my code to a DOS app that uses the RTC to time the test period, running off a Windows 9x boot diskette I got 503.24 MHz on the single run I tried. After converting my code so I could use an external timer, starting and stopping the test manually, running under Windows 2000 I got 503.7 and 503.4 MHz on the two runs I tried. So I am now confident that on my system under Windows 2000 the measured clock speed is at least reasonably accurate.

I understand your point about the CPU and the timer both using the same frequency reference. But if this is so, and assuming that the clock generator is accurately synthesizing the timer and FSB frequencies, and that the CPU is accurately scaling up the FSB frequency, then it seems to me that I should get something closer to 500 MHz. Also, for Windows 9x I recall the PC frequency being reported as something close to 1,193,182 Hz, but for Windows 2000 and XP it is reported as 3,579,545 Hz, 3X the Windows 9x value, and 3X system timer input frequency. The PC count seems to update every 6.4 counts, which corresponds to a counter frequency of about 560 kHz, which might be doable with the system timer, but not with the RTC. Perhaps Microsoft is somehow combining the system timer with the RTC.

Title: Re: Accurate CPU clock speed procedure
Post by: skywalker on February 21, 2006, 01:48:46 PM
Quote from: MichaelW on February 19, 2006, 08:35:59 PM
The attachment contains a test app for a procedure that returns the CPU clock speed in MHz. I specifically coded it to return a consistent and, hopefully, accurate value. On my system the run to run variation is only about .001 MHz. The CPU is a 500 MHz P3. The Intel Processor Frequency ID utility shows 500 MHz, the AMD CPUID app 504 MHz, dxdiag "~503" MHz, and this app 503.52 MHz.



Good piece of code.
It's showing a consistent 448.88 MHz on my 448 MHz system.

Title: Re: Accurate CPU clock speed procedure
Post by: dioxin on February 21, 2006, 09:18:51 PM
Michael,
   The usual way to derive the clock frequencies is (or at least it used to be) to start with the NTSC colour subcarrier crystal as it was the cheapest and most widely available crystal at the time PCs came on the market. The colour subcarrier frequency is 3,579,545Hz for an NTSC TV.
   This frequency is derived from a crystal running at 4x that frequency to allow for quadrature signals to be produced giving
4 x 3,579,545Hz = 14,318,180Hz, the crystal timebase for the PC.

14,318,180Hz / 4 gives 3,579,545, the colour subcarrier frequency and also the high performance counter frequency you quoted.

Divide this by 3 and you get the PIT timer frequency of 1,193,181.666Hz which the PIT divides by 65,536 to give the more familiar 18.2Hz timer interrupt. But on WinXP machines the PIT loads a smaller value to give the timebase for the OS of either 64Hz or 1000Hz depending on circumstances and I have heard of 10ms being used.

The "33MHz" PCI bus clock is derived from the PIT clock, it's 28x1,193,181.6666Hz =33.4009086MHz and you can get at that one easily to measure it and check.

The CPU FSB is PCI clk x4,5 or 6 to give a selection of FSBs of 133.6363, 167.045 or 200.4545MHz

The CPU clk is the FSB clk x one of a large range of integer and half integer values such as 5x, 5.5x, 6x, 6.5x, 7x, 7.5x.. etc.

At this point everything works on my old PC, my FSB is 4x and the CPU is 3x giving 400.90904MHz and I can measure it with my cpuspeed code at 400.90925MHz, an error of 0.5ppm. It looks like my crystals are very well matched! I'd expect and error of upto 100ppm but I know my RTC crystal is well tuned as the RTC gains only a second a month so it's good to about 0.3ppm.


Now, if we take your accurate, long term CPU measurement of 503,522,560 Hz and divide it by the PIT timer reference of 1,193,181.6666Hz

503,522,560 Hz/1,193,181.6666= 421.99991  so it looks like you have a total multiplier of 422, but I can't see why!
I'd have expected PCI clk x 15=501MHz which is a total multiplier of 420 which fits with all the other frequencies.

I hope that helps throw some light on the situation.. but it's not as clear as it used to be 5 years ago!

<<The PC count seems to update every 6.4 counts>>

I'm not sure which counts you refer to here.

Paul.
Title: Re: Accurate CPU clock speed procedure
Post by: MichaelW on February 21, 2006, 10:59:30 PM
Quote14,318,180Hz / 4 gives 3,579,545, the colour subcarrier frequency and also the high performance counter frequency you quoted.

Thanks, I tried to derive this relationship, but apparently I did not try hard enough. This explains how the PC frequency could be derived, but how could this actually be done using normal PC hardware? AFAIK it cannot be done with the PIT, or at least not using a single timer channel.

Quote
<<The PC count seems to update every 6.4 counts>>

I'm not sure which counts you refer to here.

The PC output value does not update after each cycle at the stated PC frequency. Instead, it updates, on average, about every 6.4 cycles, as determined by this app:

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc
    .586
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
        pcCount   dq 0
        prevCount dd 0
        rvals     dd 10100 dup(0)
    .code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    invoke GetCurrentProcess
    invoke SetPriorityClass, eax, HIGH_PRIORITY_CLASS

    xor   ebx, ebx
    .WHILE ebx < 100
      .REPEAT
        invoke QueryPerformanceCounter, ADDR pcCount
        mov   esi, DWORD PTR pcCount
      .UNTIL esi != prevCount
      mov   eax, esi
      sub   eax, prevCount
      mov   prevCount, esi
      mov   [rvals+ebx*4], eax
      inc   ebx
    .ENDW

    invoke GetCurrentProcess
    invoke SetPriorityClass, eax, NORMAL_PRIORITY_CLASS

    xor   ebx, ebx
    .WHILE ebx < 100
      print ustr$([rvals+ebx*4]),13,10
      inc   ebx
    .ENDW

    invoke GetCurrentProcess
    invoke SetPriorityClass, eax, HIGH_PRIORITY_CLASS

    xor   ebx, ebx
    .WHILE ebx < 10100
      .REPEAT
        invoke QueryPerformanceCounter, ADDR pcCount
        mov   esi, DWORD PTR pcCount
      .UNTIL esi != prevCount
      mov   eax, esi
      sub   eax, prevCount
      mov   prevCount, esi
      .IF eax < 9
        mov   [rvals+ebx*4], eax
        inc   ebx
      .ENDIF
    .ENDW

    invoke GetCurrentProcess
    invoke SetPriorityClass, eax, NORMAL_PRIORITY_CLASS

    xor   eax, eax
    mov   ebx, 100
    .WHILE ebx < 10100
      add   eax, [rvals+ebx*4]
      inc   ebx
    .ENDW
    print ustr$(eax),13,10

    inkey "Press any key to exit..."
    exit

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

end start


Why Microsoft decided to have a stated counter frequency of 3,579,545 Hz but only update the counter output value every 6-7 cycles is beyond me.
Title: Re: Accurate CPU clock speed procedure
Post by: dioxin on February 22, 2006, 12:08:42 AM
Michael,
   just a guess, but if the PIT (or the bit of silicon which now does the job of the PIT) is still accessed according to ISA bus timing for compatibility with old software then 6-7 cycles is about how long it would take to read the registers.

   As for deriving the other CLKs, the PIT isn't used itself, but the same timebase used by the PIT is also used by the motherboard chipset to derive the higher frequencies needed.


Paul.
Title: Re: Accurate CPU clock speed procedure
Post by: Petroizki on April 06, 2006, 07:45:28 AM
This is a CPU speed proc made by me. I made my own, because the other versions I have found so far, kinda "freezes" the computer for a while, and I truly hate that. It works just fine with my AMD64 3000+ and 1.4GHz Athlon, but no idea what it might display on slow-end comps.

To get longer counting, set the 'FREQ_DIVIDE_POWER_OF_2' to 3 or even 2, but 4 seems to work just fine for me. When set to '4' the CPU speed timing takes about 1/2^4 seconds (63ms).

Note, that it does not check the cpuid flag for rdtsc.

.586
.model flat, stdcall
include \masm32\include\windows.inc
include \masm32\include\masm32.inc
include \masm32\include\kernel32.inc
include \masm32\include\msvcrt.inc

includelib \masm32\lib\masm32.lib
includelib \masm32\lib\kernel32.lib
includelib \masm32\lib\msvcrt.lib

include \masm32\macros\macros.asm
.data
DIVIDOR REAL4 1000000.0
.data?
SPEED dq ?
.code
GetCPUSpeed proc uses ebx edi esi
LOCAL qwCycles:QWORD, qwTimer:QWORD
LOCAL dwPriority:DWORD, hProcess:HANDLE

FREQ_DIVIDE_POWER_OF_2 EQU <4>

lea ebx, [qwTimer]

invoke GetCurrentProcess
mov hProcess, eax
invoke GetPriorityClass, eax
mov dwPriority, eax
invoke SetPriorityClass, hProcess, HIGH_PRIORITY_CLASS

invoke QueryPerformanceFrequency, ebx
test eax, eax
jz @no_timer

mov esi, dword ptr [ebx + 4]
mov edi, dword ptr [ebx]

mov eax, esi
shr edi, FREQ_DIVIDE_POWER_OF_2
shl eax, 32-FREQ_DIVIDE_POWER_OF_2
shr esi, FREQ_DIVIDE_POWER_OF_2
or edi, eax

push ebx
rdtsc
mov dword ptr [qwCycles], eax
mov dword ptr [qwCycles + 4], edx

call QueryPerformanceCounter
add edi, dword ptr [ebx]
adc esi, dword ptr [ebx + 4]

@@: invoke QueryPerformanceCounter, ebx
cmp esi, dword ptr [ebx + 4]
jb @F
cmp edi, dword ptr [ebx]
jnb @B

@@: rdtsc

sub eax, dword ptr [qwCycles]
sbb edx, dword ptr [qwCycles + 4]

mov ecx, eax
shl edx, FREQ_DIVIDE_POWER_OF_2
shr ecx, 32-FREQ_DIVIDE_POWER_OF_2
shl eax, FREQ_DIVIDE_POWER_OF_2
or edx, ecx

mov edi, eax
mov esi, edx

invoke SetPriorityClass, hProcess, dwPriority

mov eax, edi
mov edx, esi

@no_timer:
ret
GetCPUSpeed endp

start:
print chr$("CPU Speed: ")
invoke GetCPUSpeed

mov dword ptr [SPEED], eax
mov dword ptr [SPEED + 4], edx

fild qword ptr [SPEED]
fdiv dword ptr [DIVIDOR]

fstp qword ptr [SPEED]

mov eax, dword ptr [SPEED]
mov edx, dword ptr [SPEED + 4]

invoke crt_printf,chr$("%.2f MHz%c"),edx::eax,10 

inkey chr$(13,10,"Press any key to exit...")

ret
end start


dl: http://personal.inet.fi/atk/partsu/speed.zip
Title: Re: Accurate CPU clock speed procedure
Post by: PBrennick on April 06, 2006, 08:50:04 AM
Petroizki,
I am concerned by your cpuspeed utility.  It continually returns 500 without any variations which would be the first thing for me to doubt.  On a busy system there MUST be some variation.  Anyhow, Michaels program returns a value that is in the 900s and varies by as much as 25.  This is very close to what I have which is AMD Athlon 1GHz.

Am I supposed to play with FREQ_DIVIDE_POWER_OF_2 as you were mentioning or is there something else you would like me to try?

Paul
Title: Re: Accurate CPU clock speed procedure
Post by: Petroizki on April 06, 2006, 09:36:53 AM
Quote from: PBrennick on April 06, 2006, 08:50:04 AM
Petroizki,
I am concerned by your cpuspeed utility.  It continually returns 500 without any variations which would be the first thing for me to doubt.  On a busy system there MUST be some variation.  Anyhow, Michaels program returns a value that is in the 900s and varies by as much as 25.  This is very close to what I have which is AMD Athlon 1GHz.

Am I supposed to play with FREQ_DIVIDE_POWER_OF_2 as you were mentioning or is there something else you would like me to try?

Paul
The example rounds the clock speed to integer, I changed it to show 2 decimals.

Do you mean that the example returns 500MHz on a 1GHz comp?

The FREQ_DIVIDE_POWER_OF_2 is used as the dividor to shorten the time used to get the CPU speed. The time used to get the speed is '1/2^FREQ_DIVIDE_POWER_OF_2 seconds'. So by using smaller number you would get longer, and probably more accurate CPU speed timing.
Title: Re: Accurate CPU clock speed procedure
Post by: MichaelW on April 06, 2006, 02:11:07 PM
Petroizki,

On my P3-500 system your version runs in ~64ms and returns 503.54 or 503.55. If I change FREQ_DIVIDE_POWER_OF_2 to 2, it runs in ~255ms and returns a consistent 503.53. For reference, my version runs in one second and returns a consistent 503.52 (I display only two decimal digits because there is some variation in the third).

BTW I had to add an "option casemap:none" before I could assemble your code.

Paul,

Isn't your processor a mobile Athlon with PowerNow! Technology? If it is then a variation in clock speed would be normal. I wonder if there is some simple method of temporarily forcing the processor to run at its maximum speed.


Title: Re: Accurate CPU clock speed procedure
Post by: Mark Jones on April 06, 2006, 03:10:10 PM
AMD XP 2500+ (1.84GHz): 1837.57MHz +/- 0.03MHz

EDIT: Try setting the thread priority to REALTIME_PRIORITY_CLASS for the duration of the clocking. Might provide more accurate results.
Title: Re: Accurate CPU clock speed procedure
Post by: asmfan on April 06, 2006, 04:51:53 PM
to tell you the truth all CPU frequency measure procedures are based on timing cycles during some time. frequency = num_of_cycles per second.
all you need is to measure some time then calculate the number of CPU cycles and finally divide cycles by time and you'll get the Hz. divide by 1000000 and you have MHz.
use CPU timing macros and any internal clock.
Title: Re: Accurate CPU clock speed procedure
Post by: PBrennick on April 06, 2006, 06:18:57 PM
Michael,
I expect variations, that is what I said in my post.

QuoteOn a busy system there MUST be some variation.

I think your clock speed program is running in an acceptable manner, and you are right about the PowerNow thing.  Thank you for the nice utility.

Petroizki,
Your program only returns one value, no matter how often I run it and so, at least on my machine, does not seem to be working correctly.

Here are the results from my last test runs:
Quote
Running Michael's program:
877.90 MHz
921.33 MHz
863.21 MHz
901.21 MHz
925.57 MHz
875.64 MHz
920.84 MHz
926.96 MHz
858.86 MHz
928.83 MHz

Running Petroizki's Program:
CPU Speed: 500MHz
CPU Speed: 500MHz
CPU Speed: 500MHz
CPU Speed: 500MHz
CPU Speed: 500MHz
CPU Speed: 500MHz
CPU Speed: 500MHz
CPU Speed: 500MHz
CPU Speed: 500MHz
CPU Speed: 500MHz

I hope this clears up any confusion.  Can someone tell me why Petroizki's program is doing what it is doing?

Paul
Title: Re: Accurate CPU clock speed procedure
Post by: skywalker on April 06, 2006, 08:09:00 PM
Quote from: Petroizki on April 06, 2006, 09:36:53 AM
Quote from: PBrennick on April 06, 2006, 08:50:04 AM


Paul
The example rounds the clock speed to integer, I changed it to show 2 decimals.

Do you mean that the example returns 500MHz on a 1GHz comp?

The FREQ_DIVIDE_POWER_OF_2 is used as the dividor to shorten the time used to get the CPU speed. The time used to get the speed is '1/2^FREQ_DIVIDE_POWER_OF_2 seconds'. So by using smaller number you would get longer, and probably more accurate CPU speed timing.

I get numbers from 448.90 - 449.00 Mhz on a 448 Mhz system.
Title: Re: Accurate CPU clock speed procedure
Post by: PBrennick on April 06, 2006, 09:25:37 PM
Thanks Andy,
Those are pretty much 'on the money,' others are getting normal results as well so there is probably something wrong with my system.  Why one program works and another does not is driving me crazy.

Paul
Title: Re: Accurate CPU clock speed procedure
Post by: skywalker on April 06, 2006, 09:32:29 PM
Quote from: PBrennick on April 06, 2006, 09:25:37 PM
Thanks Andy,
Those are pretty much 'on the money,' others are getting normal results as well so there is probably something wrong with my system.  Why one program works and another does not is driving me crazy.

Paul

I know how you feel. Fixin to post some new findings on my high analyzed crypt code. :-)
Title: Re: Accurate CPU clock speed procedure
Post by: PBrennick on April 06, 2006, 11:18:48 PM
Andy,
You know that if we can help, we will.  I hope you have made some progress.

Paul
Title: Re: Accurate CPU clock speed procedure
Post by: ecube on April 03, 2007, 08:51:11 AM
some c++ code that very accurate and doesn't freeze your system, maybe someone can give a go at converting them? as I don't know how :\



//Count CPU Cycles
static inline unsigned __int64 cyclecount()
{
unsigned int i, j;
__asm
{
rdtsc
mov i, edx;
mov j, eax;
}
return ((unsigned __int64)i << 32) + (unsigned __int64)j;
}



//Get CPU Speed
void get_cpu(char *szBuffer)
{
const unsigned __int64 ui64StartCycle = cyclecount();
unsigned __int64 speed;

Sleep(1000);
speed = ((cyclecount() - ui64StartCycle) / 1000000);
sprintf(szBuffer, "cpu: %dMHZ", speed);
return;
}


the following one I tested a lot

// asm for cpuspeed() (used for counting cpu cycles)
#pragma warning( disable : 4035 )
inline unsigned __int64 GetCycleCount(void)
{
_asm {
_emit 0x0F;
_emit 0x31;
}
}
#pragma warning( default : 4035 )

// cpu speed function
unsigned __int64 GetCPUSpeed(void)
{
unsigned __int64 startcycle, speed, num, num2;

do {
startcycle = GetCycleCount();
Sleep(1000);
speed = ((GetCycleCount()-startcycle)/1000000);
} while (speed > 1000000);

num = speed % 100;
num2 = 100;
if (num < 80) num2 = 75;
if (num < 71) num2 = 66;
if (num < 55) num2 = 50;
if (num < 38) num2 = 33;
if (num < 30) num2 = 25;
if (num < 10) num2 = 0;
speed = (speed-num)+num2;

return (speed);
}