News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

High Precision Timing

Started by redskull, October 24, 2005, 03:32:24 AM

Previous topic - Next topic

redskull

Just to make sure i'm not f**king this up:  I run QueryPerformanceFrequency to get the counts per second of the clock, and then I run QueryPerformanceCounter before and after, subtract 1 from 2, divide by Frequency to get the execution time of some code?
Also, does anybody have any input on how accurate this is?  Since the counter is external to the CPU itself (right?), I assume it takes into account all the other processes that are running, and all the task switches and protected mode overhead...but i've been known to be wrong about such things.
And if anybody has any suggestions on my procedure to print a decimal value of a DWORD, i'm always open to constructive (or even abusive) critisism.

As always, you guys are the best
alan

.data?
Freq    DWORD ?
Time1   DWORD ?
Time2   DWORD ?
.code
start:

 
invoke QueryPerformanceFrequency, addr Freq

invoke QueryPerformanceCounter, addr Time1
REPEAT 1000
mov ax, 1
mov bx, 1
ENDM
invoke QueryPerformanceCounter, addr Time2

invoke Write_Decimal, Freq
invoke Write_Decimal, Time1
invoke Write_Decimal, Time2
; (Time2 - Time1) / Freq = Execution time of MOV statements?

invoke ExitProcess, 0   

Write_Decimal proc DecimalByte:DWORD
xor ecx, ecx
mov eax, DecimalByte
mov esi, 10
.WHILE (eax > 0)
  XOR edx, edx
  div esi
  push edx
  inc ecx
.ENDW
.WHILE (ecx != 0)
  POP edx
  add dx, 30h
  cmp dx, 03ah
  jl printbyte
   add dx, 07h
  printbyte:
  lea ebx, DecimalString
  add ebx, 10
  sub ebx, ecx
  mov [ebx], dl
  dec ecx
.ENDW
  invoke MessageBox, 0, addr DecimalString, addr MsgCap,0
ret
Write_Decimal endp

end start
Strange women, lying in ponds, distributing swords, is no basis for a system of government

hitchhikr

High performance counter only operates on qwords (LONGLONG & LARGE_INTEGER).

MichaelW

AFAIK the counter is external to the CPU, so the counter itself should be relatively accurate. But in using it under Windows to time some event you cannot completely avoid the effects of multitasking. The following app uses the performance counter to time a fixed (processor dependent) delay loop, collecting a total of 500 start/stop intervals. It then calculates and displays the mean, minimum, and maximum for the intervals. Experimenting with this I was able to determine the following:
1. The duration of the timed event is by far the most important determiner of timing accuracy.
2. The duration of the timed event must be several million times the period of the counter before the mean and the maximum will converge within a few percent.
3. The duration of the timed delay must be several hundred times the period of the counter before the mean and the minimum will converge within a few percent.

So if the duration of the timed event is several million times the period of the counter, you can get a reasonably accurate time in a single timing. And if the duration of the timed event is several thousand times the period of the counter, you can get a reasonably accurate time by running several trials and discarding all but the lowest time.

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc
    .686
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
      pcfreq  dq 0
      counts  dq 1000 dup (0)
      min     REAL8 10.0
      max     REAL8 0.0
      total   REAL8 0.0
      mean    REAL8 0.0
      pctlow  REAL8 0.0
      pcthigh REAL8 0.0
    .code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    invoke GetCurrentProcess
    ;invoke SetPriorityClass, eax, REALTIME_PRIORITY_CLASS
    invoke SetPriorityClass, eax, HIGH_PRIORITY_CLASS

    invoke QueryPerformanceFrequency, addr pcfreq
    invoke crt_printf, chr$("pcfreq = %I64u Hz%c"), pcfreq, 10

    xor   ebx, ebx
    .REPEAT
      mov   edi, offset counts
      invoke Sleep, 0             ; start a new timeslice
      add   edi, ebx
      invoke QueryPerformanceCounter, edi
      call  delay
      add   edi, 8
      invoke QueryPerformanceCounter, edi
      add   ebx, 16
    .UNTIL ebx > 8000 - 16

    finit
    mov   esi, offset counts
    xor   ebx, ebx
    fld   total                   ; st=total
    .REPEAT
      fild  qword ptr[esi+ebx+8]  ; st=count2,st(1)=total
      fild  qword ptr[esi+ebx]    ; st=count1,st(1)=count2,st(2)=total
      fsubp st(1), st             ; st=st(1)-st=pccycles
      fild  pcfreq                ; st=pcfreq,st(1)=pccycles,st(2)=total
      fdivp st(1), st             ; st=st(1)/st=exetime,st(1)=total
      fld   min                   ; st=min,st(1)=exetime,st(2)=total
      fcomip st, st(1)            ; st=exetime,st(1)=total
      jb    @F                    ; jump if min < exetime
      fst   min                   ; set new min
    @@:
      fld   max                   ; st=max,st(1)=seconds,st(2)=total
      fcomip st, st(1)            ; (st=seconds,st(1)=total)
      ja    @F                    ; jump if max > exetime
      fst   max                   ; set new max
    @@:     
      faddp st(1), st             ; st=st+st(1)=total
      add   ebx, 16
    .UNTIL ebx > 8000 - 16
    fwait
    fst   total                   ; st=total
    fld   FP8(500.0)              ; st=500,st(1)=total
    fdivp st(1), st               ; st=st/500
    fstp  mean

    fld   mean                    ; st=mean
    fld   min                     ; st=min, st(1)=mean
    fsubr st, st(1)               ; st=mean-min, st(1)=mean
    fdivrp st(1), st              ; st=(mean-min)/mean
    fld   FP8(100.0)              ; st=100,st(1)=(mean-min)/mean
    fmulp st(1), st               ; st=((mean-min)/mean)*100
    fstp  pctlow

    fld   mean                    ; st=mean
    fld   max                     ; st=max, st(1)=mean
    fsub  st, st(1)               ; st=max-mean, st(1)=mean
    fdivrp st(1), st              ; st=(max-mean)/mean
    fld   FP8(100.0)              ; st=100,st(1)=(max-mean)/mean
    fmulp st(1), st               ; st=((max-mean)/mean)*100
    fstp  pcthigh

    invoke crt_printf, chr$("total = %f seconds%c"), total, 10
    invoke crt_printf, chr$("mean = %f seconds%c"), mean, 10
    invoke crt_printf, chr$("min = %f seconds%c"), min, 10
    invoke crt_printf, chr$("max = %f seconds%c"), max, 10
    invoke crt_printf, chr$("percent low = %f%c"), pctlow, 10
    invoke crt_printf, chr$("percent high = %f%c"), pcthigh, 10

    invoke GetCurrentProcess
    invoke SetPriorityClass, eax, NORMAL_PRIORITY_CLASS
   
    mov   eax, input(13,10,"Press enter to exit...")
    exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
delay proc
    mov   eax, 1000000
  @@:   
    sub   eax, 1
    jnz   @B
    ret
delay endp
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start


BTW, on my Windows 2000 system, for which pcfreq = 3579545 Hz, it would take only 20 minutes for the counter to overflow the low-order 32 bits of a qword.
eschew obfuscation

jj2007