The attachment contains a test app for a procedure that returns the CPU clock speed in MHz. I specifically coded it to return a consistent and, hopefully, accurate value. On my system the run to run variation is only about .001 MHz. The CPU is a 500 MHz P3. The Intel Processor Frequency ID utility shows 500 MHz, the AMD CPUID app 504 MHz, dxdiag "~503" MHz, and this app 503.52 MHz.
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
include \masm32\include\masm32rt.inc
.586
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
.data
.code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
call CpuClockSpeed
.IF eax
push eax
push eax
fstp QWORD PTR[esp]
pop eax
pop edx
invoke crt_printf,chr$("%.2f MHz%c"),edx::eax,10
.ENDIF
inkey "Press any key to exit..."
exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
; This proc determines the CPU clock speed in MHz by counting TSC
; cycles over a one-second interval timed with the high-resolution
; performance counter. If the processor supports CPUID and RDTSC
; and the system supports a high-resolution performance counter,
; the clock speed is left on the FPU stack in ST(0) and the return
; value is non-zero. Otherwise, the return value is zero.
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
CpuClockSpeed proc uses edi esi
LOCAL pcFreq :QWORD
LOCAL pcCount :QWORD
;-----------------------------------------------------------
; CPUID supported if can set/clear ID flag (EFLAGS bit 21).
;-----------------------------------------------------------
pushfd
pop edx
pushfd
pop eax
xor eax, 200000h ; flip ID flag
push eax
popfd
pushfd
pop eax
xor eax, edx
jz fail
;------------------------------------------------
; TSC supported if CPUID function 1 returns with
; bit 4 of EDX set.
;------------------------------------------------
mov eax, 1
cpuid
and edx, 10h
jz fail
invoke QueryPerformanceFrequency, ADDR pcFreq
or eax, eax
jz fail
;pushad
;invoke crt_printf,chr$("pcFreq:%I64d%c"),pcFreq,10
;popad
invoke GetCurrentProcess
invoke SetPriorityClass, eax, HIGH_PRIORITY_CLASS
;----------------------------------------------------
; Sync with performance counter and get start count.
;----------------------------------------------------
invoke QueryPerformanceCounter, ADDR pcCount
mov edi, DWORD PTR pcCount
@@:
invoke QueryPerformanceCounter, ADDR pcCount
cmp edi, DWORD PTR pcCount
je @B
rdtsc
push edx
push eax
;-----------------------------------------
; Calc terminal count for 1 second delay.
;-----------------------------------------
mov edi, DWORD PTR pcCount
mov esi, DWORD PTR pcCount + 4
add edi, DWORD PTR pcFreq
adc esi, DWORD PTR pcFreq + 4
;---------------------------------------------
; Loop until PC count exceeds terminal count.
;
; Cannot check low-order dword for equality
; because PC cannot be depended on to always
; increment count by one.
;---------------------------------------------
@@:
invoke QueryPerformanceCounter, ADDR pcCount
cmp DWORD PTR pcCount+4, esi
jne @B
cmp DWORD PTR pcCount, edi
jb @B
rdtsc
pop ecx
sub eax, ecx
pop ecx
sbb edx, ecx
push edx
push eax
finit
fild QWORD PTR[esp]
fld8 1000000.0
fdiv
add esp, 8 ; Not necessary here, but still a good practice.
invoke GetCurrentProcess
invoke SetPriorityClass, eax, NORMAL_PRIORITY_CLASS
return 1
fail:
return 0
CpuClockSpeed endp
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start
[attachment deleted by admin]
Muhahaha :bdg I ran it lots of times, and got a different speed every time :toothy
But that isn't your fault - all the latest cpus have auto stepping built in, so they only go faster when they have to. And measuring in MHz brings up the whole argument that AMD has been having for the last x number of years. But all that aside, you have written a nice piece of code, and i know i will find a use for it in the future :U
Michael,
be careful with such timing routines, all you're actually measuring is the product of the clock multipliers (in your case 422) since the CPU and the high performance counters have the same base crystal.
If that crystal is out by 1% then your figure won't change since both the CPU clock and the high performance timer clock will shift by 1% in the same direction.
Instead, you need to find a different timebase such as the RTC which has its own crystal and contains bits that change every 1 second.
Attached is a version I wrote years ago to do this, it's in BASIC but all the important parts are inline ASM.
It was intended to run in DOS and Win98.
Paul.
PS you can download the DOS EXE file from : http://www.axol-electronics.com/cpuspeed.exe
[attachment deleted by admin]
Thanks Paul. If I boot my system from a Windows 9x boot diskette and run your app with a 10 second measurement period, after 40 tests the average is ~503.4 MHz, and 503.52 MHz (503,522,560 Hz) after 200 tests, with an oscillation of ~0.02 MHz around this value thereafter. After converting my code to a DOS app that uses the RTC to time the test period, running off a Windows 9x boot diskette I got 503.24 MHz on the single run I tried. After converting my code so I could use an external timer, starting and stopping the test manually, running under Windows 2000 I got 503.7 and 503.4 MHz on the two runs I tried. So I am now confident that on my system under Windows 2000 the measured clock speed is at least reasonably accurate.
I understand your point about the CPU and the timer both using the same frequency reference. But if this is so, and assuming that the clock generator is accurately synthesizing the timer and FSB frequencies, and that the CPU is accurately scaling up the FSB frequency, then it seems to me that I should get something closer to 500 MHz. Also, for Windows 9x I recall the PC frequency being reported as something close to 1,193,182 Hz, but for Windows 2000 and XP it is reported as 3,579,545 Hz, 3X the Windows 9x value, and 3X system timer input frequency. The PC count seems to update every 6.4 counts, which corresponds to a counter frequency of about 560 kHz, which might be doable with the system timer, but not with the RTC. Perhaps Microsoft is somehow combining the system timer with the RTC.
Quote from: MichaelW on February 19, 2006, 08:35:59 PM
The attachment contains a test app for a procedure that returns the CPU clock speed in MHz. I specifically coded it to return a consistent and, hopefully, accurate value. On my system the run to run variation is only about .001 MHz. The CPU is a 500 MHz P3. The Intel Processor Frequency ID utility shows 500 MHz, the AMD CPUID app 504 MHz, dxdiag "~503" MHz, and this app 503.52 MHz.
Good piece of code.
It's showing a consistent 448.88 MHz on my 448 MHz system.
Michael,
The usual way to derive the clock frequencies is (or at least it used to be) to start with the NTSC colour subcarrier crystal as it was the cheapest and most widely available crystal at the time PCs came on the market. The colour subcarrier frequency is 3,579,545Hz for an NTSC TV.
This frequency is derived from a crystal running at 4x that frequency to allow for quadrature signals to be produced giving
4 x 3,579,545Hz = 14,318,180Hz, the crystal timebase for the PC.
14,318,180Hz / 4 gives 3,579,545, the colour subcarrier frequency and also the high performance counter frequency you quoted.
Divide this by 3 and you get the PIT timer frequency of 1,193,181.666Hz which the PIT divides by 65,536 to give the more familiar 18.2Hz timer interrupt. But on WinXP machines the PIT loads a smaller value to give the timebase for the OS of either 64Hz or 1000Hz depending on circumstances and I have heard of 10ms being used.
The "33MHz" PCI bus clock is derived from the PIT clock, it's 28x1,193,181.6666Hz =33.4009086MHz and you can get at that one easily to measure it and check.
The CPU FSB is PCI clk x4,5 or 6 to give a selection of FSBs of 133.6363, 167.045 or 200.4545MHz
The CPU clk is the FSB clk x one of a large range of integer and half integer values such as 5x, 5.5x, 6x, 6.5x, 7x, 7.5x.. etc.
At this point everything works on my old PC, my FSB is 4x and the CPU is 3x giving 400.90904MHz and I can measure it with my cpuspeed code at 400.90925MHz, an error of 0.5ppm. It looks like my crystals are very well matched! I'd expect and error of upto 100ppm but I know my RTC crystal is well tuned as the RTC gains only a second a month so it's good to about 0.3ppm.
Now, if we take your accurate, long term CPU measurement of 503,522,560 Hz and divide it by the PIT timer reference of 1,193,181.6666Hz
503,522,560 Hz/1,193,181.6666= 421.99991 so it looks like you have a total multiplier of 422, but I can't see why!
I'd have expected PCI clk x 15=501MHz which is a total multiplier of 420 which fits with all the other frequencies.
I hope that helps throw some light on the situation.. but it's not as clear as it used to be 5 years ago!
<<The PC count seems to update every 6.4 counts>>
I'm not sure which counts you refer to here.
Paul.
Quote14,318,180Hz / 4 gives 3,579,545, the colour subcarrier frequency and also the high performance counter frequency you quoted.
Thanks, I tried to derive this relationship, but apparently I did not try hard enough. This explains how the PC frequency
could be derived, but how could this actually be done using normal PC hardware? AFAIK it cannot be done with the PIT, or at least not using a single timer channel.
Quote
<<The PC count seems to update every 6.4 counts>>
I'm not sure which counts you refer to here.
The PC output value does not update after each cycle at the stated PC frequency. Instead, it updates, on average, about every 6.4 cycles, as determined by this app:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
include \masm32\include\masm32rt.inc
.586
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
.data
pcCount dq 0
prevCount dd 0
rvals dd 10100 dup(0)
.code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
invoke GetCurrentProcess
invoke SetPriorityClass, eax, HIGH_PRIORITY_CLASS
xor ebx, ebx
.WHILE ebx < 100
.REPEAT
invoke QueryPerformanceCounter, ADDR pcCount
mov esi, DWORD PTR pcCount
.UNTIL esi != prevCount
mov eax, esi
sub eax, prevCount
mov prevCount, esi
mov [rvals+ebx*4], eax
inc ebx
.ENDW
invoke GetCurrentProcess
invoke SetPriorityClass, eax, NORMAL_PRIORITY_CLASS
xor ebx, ebx
.WHILE ebx < 100
print ustr$([rvals+ebx*4]),13,10
inc ebx
.ENDW
invoke GetCurrentProcess
invoke SetPriorityClass, eax, HIGH_PRIORITY_CLASS
xor ebx, ebx
.WHILE ebx < 10100
.REPEAT
invoke QueryPerformanceCounter, ADDR pcCount
mov esi, DWORD PTR pcCount
.UNTIL esi != prevCount
mov eax, esi
sub eax, prevCount
mov prevCount, esi
.IF eax < 9
mov [rvals+ebx*4], eax
inc ebx
.ENDIF
.ENDW
invoke GetCurrentProcess
invoke SetPriorityClass, eax, NORMAL_PRIORITY_CLASS
xor eax, eax
mov ebx, 100
.WHILE ebx < 10100
add eax, [rvals+ebx*4]
inc ebx
.ENDW
print ustr$(eax),13,10
inkey "Press any key to exit..."
exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start
Why Microsoft decided to have a stated counter frequency of 3,579,545 Hz but only update the counter output value every 6-7 cycles is beyond me.
Michael,
just a guess, but if the PIT (or the bit of silicon which now does the job of the PIT) is still accessed according to ISA bus timing for compatibility with old software then 6-7 cycles is about how long it would take to read the registers.
As for deriving the other CLKs, the PIT isn't used itself, but the same timebase used by the PIT is also used by the motherboard chipset to derive the higher frequencies needed.
Paul.
This is a CPU speed proc made by me. I made my own, because the other versions I have found so far, kinda "freezes" the computer for a while, and I truly hate that. It works just fine with my AMD64 3000+ and 1.4GHz Athlon, but no idea what it might display on slow-end comps.
To get longer counting, set the 'FREQ_DIVIDE_POWER_OF_2' to 3 or even 2, but 4 seems to work just fine for me. When set to '4' the CPU speed timing takes about 1/2^4 seconds (63ms).
Note, that it does not check the cpuid flag for rdtsc.
.586
.model flat, stdcall
include \masm32\include\windows.inc
include \masm32\include\masm32.inc
include \masm32\include\kernel32.inc
include \masm32\include\msvcrt.inc
includelib \masm32\lib\masm32.lib
includelib \masm32\lib\kernel32.lib
includelib \masm32\lib\msvcrt.lib
include \masm32\macros\macros.asm
.data
DIVIDOR REAL4 1000000.0
.data?
SPEED dq ?
.code
GetCPUSpeed proc uses ebx edi esi
LOCAL qwCycles:QWORD, qwTimer:QWORD
LOCAL dwPriority:DWORD, hProcess:HANDLE
FREQ_DIVIDE_POWER_OF_2 EQU <4>
lea ebx, [qwTimer]
invoke GetCurrentProcess
mov hProcess, eax
invoke GetPriorityClass, eax
mov dwPriority, eax
invoke SetPriorityClass, hProcess, HIGH_PRIORITY_CLASS
invoke QueryPerformanceFrequency, ebx
test eax, eax
jz @no_timer
mov esi, dword ptr [ebx + 4]
mov edi, dword ptr [ebx]
mov eax, esi
shr edi, FREQ_DIVIDE_POWER_OF_2
shl eax, 32-FREQ_DIVIDE_POWER_OF_2
shr esi, FREQ_DIVIDE_POWER_OF_2
or edi, eax
push ebx
rdtsc
mov dword ptr [qwCycles], eax
mov dword ptr [qwCycles + 4], edx
call QueryPerformanceCounter
add edi, dword ptr [ebx]
adc esi, dword ptr [ebx + 4]
@@: invoke QueryPerformanceCounter, ebx
cmp esi, dword ptr [ebx + 4]
jb @F
cmp edi, dword ptr [ebx]
jnb @B
@@: rdtsc
sub eax, dword ptr [qwCycles]
sbb edx, dword ptr [qwCycles + 4]
mov ecx, eax
shl edx, FREQ_DIVIDE_POWER_OF_2
shr ecx, 32-FREQ_DIVIDE_POWER_OF_2
shl eax, FREQ_DIVIDE_POWER_OF_2
or edx, ecx
mov edi, eax
mov esi, edx
invoke SetPriorityClass, hProcess, dwPriority
mov eax, edi
mov edx, esi
@no_timer:
ret
GetCPUSpeed endp
start:
print chr$("CPU Speed: ")
invoke GetCPUSpeed
mov dword ptr [SPEED], eax
mov dword ptr [SPEED + 4], edx
fild qword ptr [SPEED]
fdiv dword ptr [DIVIDOR]
fstp qword ptr [SPEED]
mov eax, dword ptr [SPEED]
mov edx, dword ptr [SPEED + 4]
invoke crt_printf,chr$("%.2f MHz%c"),edx::eax,10
inkey chr$(13,10,"Press any key to exit...")
ret
end start
dl: http://personal.inet.fi/atk/partsu/speed.zip
Petroizki,
I am concerned by your cpuspeed utility. It continually returns 500 without any variations which would be the first thing for me to doubt. On a busy system there MUST be some variation. Anyhow, Michaels program returns a value that is in the 900s and varies by as much as 25. This is very close to what I have which is AMD Athlon 1GHz.
Am I supposed to play with FREQ_DIVIDE_POWER_OF_2 as you were mentioning or is there something else you would like me to try?
Paul
Quote from: PBrennick on April 06, 2006, 08:50:04 AM
Petroizki,
I am concerned by your cpuspeed utility. It continually returns 500 without any variations which would be the first thing for me to doubt. On a busy system there MUST be some variation. Anyhow, Michaels program returns a value that is in the 900s and varies by as much as 25. This is very close to what I have which is AMD Athlon 1GHz.
Am I supposed to play with FREQ_DIVIDE_POWER_OF_2 as you were mentioning or is there something else you would like me to try?
Paul
The example rounds the clock speed to integer, I changed it to show 2 decimals.
Do you mean that the example returns 500MHz on a 1GHz comp?
The FREQ_DIVIDE_POWER_OF_2 is used as the dividor to shorten the time used to get the CPU speed. The time used to get the speed is '1/2^FREQ_DIVIDE_POWER_OF_2 seconds'. So by using smaller number you would get longer, and probably more accurate CPU speed timing.
Petroizki,
On my P3-500 system your version runs in ~64ms and returns 503.54 or 503.55. If I change FREQ_DIVIDE_POWER_OF_2 to 2, it runs in ~255ms and returns a consistent 503.53. For reference, my version runs in one second and returns a consistent 503.52 (I display only two decimal digits because there is some variation in the third).
BTW I had to add an "option casemap:none" before I could assemble your code.
Paul,
Isn't your processor a mobile Athlon with PowerNow! Technology? If it is then a variation in clock speed would be normal. I wonder if there is some simple method of temporarily forcing the processor to run at its maximum speed.
AMD XP 2500+ (1.84GHz): 1837.57MHz +/- 0.03MHz
EDIT: Try setting the thread priority to REALTIME_PRIORITY_CLASS for the duration of the clocking. Might provide more accurate results.
to tell you the truth all CPU frequency measure procedures are based on timing cycles during some time. frequency = num_of_cycles per second.
all you need is to measure some time then calculate the number of CPU cycles and finally divide cycles by time and you'll get the Hz. divide by 1000000 and you have MHz.
use CPU timing macros and any internal clock.
Michael,
I expect variations, that is what I said in my post.
QuoteOn a busy system there MUST be some variation.
I think your clock speed program is running in an acceptable manner, and you are right about the PowerNow thing. Thank you for the nice utility.
Petroizki,
Your program only returns one value, no matter how often I run it and so, at least on my machine, does not seem to be working correctly.
Here are the results from my last test runs:
Quote
Running Michael's program:
877.90 MHz
921.33 MHz
863.21 MHz
901.21 MHz
925.57 MHz
875.64 MHz
920.84 MHz
926.96 MHz
858.86 MHz
928.83 MHz
Running Petroizki's Program:
CPU Speed: 500MHz
CPU Speed: 500MHz
CPU Speed: 500MHz
CPU Speed: 500MHz
CPU Speed: 500MHz
CPU Speed: 500MHz
CPU Speed: 500MHz
CPU Speed: 500MHz
CPU Speed: 500MHz
CPU Speed: 500MHz
I hope this clears up any confusion. Can someone tell me why Petroizki's program is doing what it is doing?
Paul
Quote from: Petroizki on April 06, 2006, 09:36:53 AM
Quote from: PBrennick on April 06, 2006, 08:50:04 AM
Paul
The example rounds the clock speed to integer, I changed it to show 2 decimals.
Do you mean that the example returns 500MHz on a 1GHz comp?
The FREQ_DIVIDE_POWER_OF_2 is used as the dividor to shorten the time used to get the CPU speed. The time used to get the speed is '1/2^FREQ_DIVIDE_POWER_OF_2 seconds'. So by using smaller number you would get longer, and probably more accurate CPU speed timing.
I get numbers from 448.90 - 449.00 Mhz on a 448 Mhz system.
Thanks Andy,
Those are pretty much 'on the money,' others are getting normal results as well so there is probably something wrong with my system. Why one program works and another does not is driving me crazy.
Paul
Quote from: PBrennick on April 06, 2006, 09:25:37 PM
Thanks Andy,
Those are pretty much 'on the money,' others are getting normal results as well so there is probably something wrong with my system. Why one program works and another does not is driving me crazy.
Paul
I know how you feel. Fixin to post some new findings on my high analyzed crypt code. :-)
Andy,
You know that if we can help, we will. I hope you have made some progress.
Paul
some c++ code that very accurate and doesn't freeze your system, maybe someone can give a go at converting them? as I don't know how :\
//Count CPU Cycles
static inline unsigned __int64 cyclecount()
{
unsigned int i, j;
__asm
{
rdtsc
mov i, edx;
mov j, eax;
}
return ((unsigned __int64)i << 32) + (unsigned __int64)j;
}
//Get CPU Speed
void get_cpu(char *szBuffer)
{
const unsigned __int64 ui64StartCycle = cyclecount();
unsigned __int64 speed;
Sleep(1000);
speed = ((cyclecount() - ui64StartCycle) / 1000000);
sprintf(szBuffer, "cpu: %dMHZ", speed);
return;
}
the following one I tested a lot
// asm for cpuspeed() (used for counting cpu cycles)
#pragma warning( disable : 4035 )
inline unsigned __int64 GetCycleCount(void)
{
_asm {
_emit 0x0F;
_emit 0x31;
}
}
#pragma warning( default : 4035 )
// cpu speed function
unsigned __int64 GetCPUSpeed(void)
{
unsigned __int64 startcycle, speed, num, num2;
do {
startcycle = GetCycleCount();
Sleep(1000);
speed = ((GetCycleCount()-startcycle)/1000000);
} while (speed > 1000000);
num = speed % 100;
num2 = 100;
if (num < 80) num2 = 75;
if (num < 71) num2 = 66;
if (num < 55) num2 = 50;
if (num < 38) num2 = 33;
if (num < 30) num2 = 25;
if (num < 10) num2 = 0;
speed = (speed-num)+num2;
return (speed);
}