Just to make sure i'm not f**king this up: I run QueryPerformanceFrequency to get the counts per second of the clock, and then I run QueryPerformanceCounter before and after, subtract 1 from 2, divide by Frequency to get the execution time of some code?
Also, does anybody have any input on how accurate this is? Since the counter is external to the CPU itself (right?), I assume it takes into account all the other processes that are running, and all the task switches and protected mode overhead...but i've been known to be wrong about such things.
And if anybody has any suggestions on my procedure to print a decimal value of a DWORD, i'm always open to constructive (or even abusive) critisism.
As always, you guys are the best
alan
.data?
Freq DWORD ?
Time1 DWORD ?
Time2 DWORD ?
.code
start:
invoke QueryPerformanceFrequency, addr Freq
invoke QueryPerformanceCounter, addr Time1
REPEAT 1000
mov ax, 1
mov bx, 1
ENDM
invoke QueryPerformanceCounter, addr Time2
invoke Write_Decimal, Freq
invoke Write_Decimal, Time1
invoke Write_Decimal, Time2
; (Time2 - Time1) / Freq = Execution time of MOV statements?
invoke ExitProcess, 0
Write_Decimal proc DecimalByte:DWORD
xor ecx, ecx
mov eax, DecimalByte
mov esi, 10
.WHILE (eax > 0)
XOR edx, edx
div esi
push edx
inc ecx
.ENDW
.WHILE (ecx != 0)
POP edx
add dx, 30h
cmp dx, 03ah
jl printbyte
add dx, 07h
printbyte:
lea ebx, DecimalString
add ebx, 10
sub ebx, ecx
mov [ebx], dl
dec ecx
.ENDW
invoke MessageBox, 0, addr DecimalString, addr MsgCap,0
ret
Write_Decimal endp
end start
High performance counter only operates on qwords (LONGLONG & LARGE_INTEGER).
AFAIK the counter is external to the CPU, so the counter itself should be relatively accurate. But in using it under Windows to time some event you cannot completely avoid the effects of multitasking. The following app uses the performance counter to time a fixed (processor dependent) delay loop, collecting a total of 500 start/stop intervals. It then calculates and displays the mean, minimum, and maximum for the intervals. Experimenting with this I was able to determine the following:
1. The duration of the timed event is by far the most important determiner of timing accuracy.
2. The duration of the timed event must be several million times the period of the counter before the mean and the maximum will converge within a few percent.
3. The duration of the timed delay must be several hundred times the period of the counter before the mean and the minimum will converge within a few percent.
So if the duration of the timed event is several million times the period of the counter, you can get a reasonably accurate time in a single timing. And if the duration of the timed event is several thousand times the period of the counter, you can get a reasonably accurate time by running several trials and discarding all but the lowest time.
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
include \masm32\include\masm32rt.inc
.686
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
.data
pcfreq dq 0
counts dq 1000 dup (0)
min REAL8 10.0
max REAL8 0.0
total REAL8 0.0
mean REAL8 0.0
pctlow REAL8 0.0
pcthigh REAL8 0.0
.code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
invoke GetCurrentProcess
;invoke SetPriorityClass, eax, REALTIME_PRIORITY_CLASS
invoke SetPriorityClass, eax, HIGH_PRIORITY_CLASS
invoke QueryPerformanceFrequency, addr pcfreq
invoke crt_printf, chr$("pcfreq = %I64u Hz%c"), pcfreq, 10
xor ebx, ebx
.REPEAT
mov edi, offset counts
invoke Sleep, 0 ; start a new timeslice
add edi, ebx
invoke QueryPerformanceCounter, edi
call delay
add edi, 8
invoke QueryPerformanceCounter, edi
add ebx, 16
.UNTIL ebx > 8000 - 16
finit
mov esi, offset counts
xor ebx, ebx
fld total ; st=total
.REPEAT
fild qword ptr[esi+ebx+8] ; st=count2,st(1)=total
fild qword ptr[esi+ebx] ; st=count1,st(1)=count2,st(2)=total
fsubp st(1), st ; st=st(1)-st=pccycles
fild pcfreq ; st=pcfreq,st(1)=pccycles,st(2)=total
fdivp st(1), st ; st=st(1)/st=exetime,st(1)=total
fld min ; st=min,st(1)=exetime,st(2)=total
fcomip st, st(1) ; st=exetime,st(1)=total
jb @F ; jump if min < exetime
fst min ; set new min
@@:
fld max ; st=max,st(1)=seconds,st(2)=total
fcomip st, st(1) ; (st=seconds,st(1)=total)
ja @F ; jump if max > exetime
fst max ; set new max
@@:
faddp st(1), st ; st=st+st(1)=total
add ebx, 16
.UNTIL ebx > 8000 - 16
fwait
fst total ; st=total
fld FP8(500.0) ; st=500,st(1)=total
fdivp st(1), st ; st=st/500
fstp mean
fld mean ; st=mean
fld min ; st=min, st(1)=mean
fsubr st, st(1) ; st=mean-min, st(1)=mean
fdivrp st(1), st ; st=(mean-min)/mean
fld FP8(100.0) ; st=100,st(1)=(mean-min)/mean
fmulp st(1), st ; st=((mean-min)/mean)*100
fstp pctlow
fld mean ; st=mean
fld max ; st=max, st(1)=mean
fsub st, st(1) ; st=max-mean, st(1)=mean
fdivrp st(1), st ; st=(max-mean)/mean
fld FP8(100.0) ; st=100,st(1)=(max-mean)/mean
fmulp st(1), st ; st=((max-mean)/mean)*100
fstp pcthigh
invoke crt_printf, chr$("total = %f seconds%c"), total, 10
invoke crt_printf, chr$("mean = %f seconds%c"), mean, 10
invoke crt_printf, chr$("min = %f seconds%c"), min, 10
invoke crt_printf, chr$("max = %f seconds%c"), max, 10
invoke crt_printf, chr$("percent low = %f%c"), pctlow, 10
invoke crt_printf, chr$("percent high = %f%c"), pcthigh, 10
invoke GetCurrentProcess
invoke SetPriorityClass, eax, NORMAL_PRIORITY_CLASS
mov eax, input(13,10,"Press enter to exit...")
exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
delay proc
mov eax, 1000000
@@:
sub eax, 1
jnz @B
ret
delay endp
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start
BTW, on my Windows 2000 system, for which pcfreq = 3579545 Hz, it would take only 20 minutes for the counter to overflow the low-order 32 bits of a qword.
.