News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

preformance

Started by loki_dre, April 26, 2008, 08:14:14 AM

Previous topic - Next topic

loki_dre

fn MessageBox,0,str$(eax),str$(ebx),MB_OK    <<<<<<<<<========was reporting delay time

donkey

Quote from: loki_dre on April 27, 2008, 08:25:20 PM
fn MessageBox,0,str$(eax),str$(ebx),MB_OK    <<<<<<<<<========was reporting delay time


The function pauses for user input, a time delay there is meaningless.
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

loki_dre

apparently not,
since the program is very small 4Kb
looping with that message box to report the delay will cause windows to re-allocate memory if the user waits to long to press ok.
I have to press (enter/space) very fast to get the message box to report lower performance times.
in my case the difference was a program cycle of 40 times per a second to 20 times per a second.


loki_dre

btw .....if windows copies & runs all programs from HD to memory why don't you have the option to delete a program from the harddrive while it is running....you can do it with editing text files etc.....

donkey

I have no idea what you're talking about here, are you saying that Windows will copy your program to the page file if you wait too long to respond to a modal dialog (a message box is a modal dialog). If that's the case, its not true, copying to and from the page file is something Windows does when it runs low on physical memory. So if your program is being moved to the page file then you are running low on memory, it has nothing to do with any delay in responding to a message box. Here's an explanation of the page file process...

Quote from: PC911 © Copyright 1998-2008. All rights reservedTo execute a program in Windows, it first needs to be loaded into memory (RAM). Windows lets you run multiple programs simultaneously and chances are that they won't all fit into memory at the same time. For that purpose, Windows uses what is called Virtual Memory to simulate RAM, pretending it has more memory than what is actually build into the PC. It does this by moving data from real memory to a special file on the hard drive, called the swap file in Windows 95/98 or page file in Windows NT. This, in effect, allows Windows to address more memory than the amount of physical RAM installed. Without it, we would not be able to run windows on machines with limited RAM. For example, think back to when Windows 95 first came out, the average computer had 8 to 16 Mb of Ram. It would not have been possible to run Win95 and applications without using virtual memory. Program code and data are moved in pages (memory allocated in 4K or 16K segments within a 64K page frame) from physical memory to the swap file. As the information is needed by a process, it is paged back into physical memory on demand and, if necessary, windows may page other code or data to the swap file in its place.
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

loki_dre

i could not tell you what it does exactly.....I don't work for Microsoft and their source code is not released to the public

but I can tell you looping with
        invoke write_disk_file, addr performanceFileName, str$(eax), 3
Or lopping with
        fn MessageBox,0,str$(eax),str$(eax),MB_OK
        & pressing OK rapidly
gives me a smaller number than looping with
        fn MessageBox,0,str$(eax),str$(eax),MB_OK
        and pressing OK every second or so...


donkey

"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

loki_dre


include \masm32\include\masm32rt.inc

LOOP_VAL             EQU     9999*3;
;CONSTANTS
.data
        performanceFileName   db      "performance.txt", 0       
.data? 
        ;PREFORMANCE TIME VARIABLES
        CPU_Time    DWORD   ?
        CPU_Time1   QWORD   ?
        CPU_Time2   QWORD   ?
        CPU_Time3   QWORD   ?
.const
.code
start:
               
_MainLoop:
        ;invoke GetTickCount ; Milliseconds since system start (max of 49.7days)
        ;mov CPU_Time,eax
        ;invoke QueryPerformanceFrequency,XXXX      ;<<<<<<=======3,579,545cycle/sec
        invoke QueryPerformanceCounter, addr CPU_Time1
        mov eax, DWORD PTR [CPU_Time1+0]   ; Low DWORD of QW1
        mov edx, DWORD PTR [CPU_Time1+4]   ; High DWORD of QW1
        mov DWORD PTR [CPU_Time3+0], eax   ; Low DWORD of QW1
        mov DWORD PTR [CPU_Time3+4], edx   ; High DWORD of QW1


            ;ONLY DIALATE IN ONE DIRECTION (BACKWARDS)
            Repeat LOOP_VAL
                mov eax,100
                mov edx,1
                mov ebx,10000
                div ebx
            endM
           
        ;DONE PROCESSING
        invoke QueryPerformanceCounter, addr CPU_Time2
        invoke QueryPerformanceFrequency,addr CPU_Time1
    ; Subtract QWORDS (QW1 - QW2 = QW3)
    mov eax, DWORD PTR [CPU_Time2+0]   ; Low DWORD of QW1
    mov edx, DWORD PTR [CPU_Time2+4]   ; High DWORD of QW1
    sub eax, DWORD PTR [CPU_Time3+0]   ; Low DWORD of QW2
    sbb edx, DWORD PTR [CPU_Time3+4]   ; High DWORD of QW2
        mov ebx, eax
        mov ecx, edx
        mov eax, DWORD PTR [CPU_Time1+0]
        mov edx, DWORD PTR [CPU_Time1+4]
        div ebx
        mov ebx,ecx
        fn MessageBox,0,str$(eax),str$(ebx),MB_OK
        ; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««


jmp _MainLoop

end start

loki_dre

With the above code I get approx 600 program cycles per second if I press enter slow & 1200 if I press it rapidly

MichaelW

I'm having problems understanding the purpose of your code. The point of timing code is normally to compare the execution times between algorithms and/or implementations, to determine which executes faster. For this purpose, execution times that include a highly variable user response time are of little use. For your code the measured time is almost entirely the user response time. On my relatively slow system the REPEAT loop actually executes in around 2.4ms, and the shortest possible user response time is many times this. This example compares the execution time for two substantially different implementations of the Sieve of Eratosthenes algorithm.

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
      pcFreq  dq 0
      pcCount dq 0
      msCount dd 0
      total1  dd 0
      total2  dd 0
      pMem    dd 0
    .code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

; -------------------------------------------------
; This code is a somewhat optimized implementation
; of the Sieve of Eratosthenes algorithm.
; -------------------------------------------------

Sieve proc uses ebx esi pFlags:DWORD, nFlags:DWORD
    mov esi, pFlags
    fild nFlags
    fsqrt
    push ebx
    fistp DWORD PTR [esp]
    pop ebx
    mov ecx, 1
  outer:
    add ecx, 1
    cmp ecx, ebx
    ja  finished
    cmp BYTE PTR [esi+ecx], 0
    jne outer
    mov edx, ecx
    shl edx, 1
  inner:
    mov BYTE PTR [esi+edx], 1
    add edx, ecx
    cmp edx, nFlags
    jna inner
    jmp outer
  finished:
    ret
Sieve endp

; ------------------------------------------------------
; This code is an adaption of a Microsoft MASM example.
; ------------------------------------------------------

Sieve_ms proc uses ebx p:DWORD, sz:DWORD
    mov edx, p
    push 2
    pop eax
  iloop:
    mov ecx, eax
    shl ecx, 1
  jloop:
    mov ebx, sz
    cmp ecx, ebx
    ja @F
    mov BYTE PTR [edx+ecx], 1
    add ecx, eax
    jmp jloop
  @@:
    inc eax
    shr ebx, 1
    cmp eax, ebx
    jb iloop
    ret
Sieve_ms ENDP

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

    N EQU 15485863

    invoke Sleep, 3000

    REPEAT 4

      ; ------------------------------------------------
      ; Allocate an array of byte flags large enough to
      ; represent the first million primes. The array
      ; should be zeroed before each test, and an easy
      ; way to do this is just to free the array and
      ; reallocate it.
      ; ------------------------------------------------

      mov pMem, alloc( N )

      ; ------------------------------------------
      ; Flag the first million primes, timing the
      ; process with GetTickCount.
      ; ------------------------------------------

      invoke GetTickCount
      push eax
      invoke Sieve, pMem, N
      invoke GetTickCount
      pop ebx
      sub eax, ebx
      add total1, eax
      print ustr$(eax), "ms", 9

      ; ------------------------------------------
      ; Flag the first million primes, timing the
      ; process with the High-Resolution Timer.
      ; ------------------------------------------

      invoke QueryPerformanceFrequency, ADDR pcFreq
      invoke QueryPerformanceCounter, ADDR pcCount
      push DWORD PTR pcCount+4
      push DWORD PTR pcCount
      invoke Sieve, pMem, N
      invoke QueryPerformanceCounter, ADDR pcCount
      pop ecx
      sub DWORD PTR pcCount, ecx
      pop ecx
      sbb DWORD PTR pcCount+4, ecx

      fild pcCount
      fild pcFreq
      fdiv                    ; pcCount / pcFreq = seconds
      mov  msCount, 1000
      fild msCount
      fmul                    ; seconds * 1000 = milliseconds
      fistp msCount
      mov eax, msCount
      add total2, eax
      print ustr$(eax), "ms", 13, 10

      free( pMem )

    ENDM

    print "GetTickCount average "
    shr total1, 2
    print ustr$(total1), "ms", 13, 10

    print "High-Resolution Timer average "
    shr total2, 2
    print ustr$(total1), "ms", 13, 10, 13, 10

    free( pMem )
    mov pMem, alloc( N )

    ; ---------------------------------------
    ; The Microsoft code is very slow, so to
    ; save time do the timing only once.
    ; ---------------------------------------

    ; ------------------------------------------
    ; Flag the first million primes, timing the
    ; process with GetTickCount.
    ; ------------------------------------------

    invoke GetTickCount
    push eax
    invoke Sieve_ms, pMem, N
    invoke GetTickCount
    pop ebx
    sub eax, ebx
    print ustr$(eax), "ms", 9

    ; ------------------------------------------
    ; Flag the first million primes, timing the
    ; process with the High-Resolution Timer.
    ; ------------------------------------------

    invoke QueryPerformanceFrequency, ADDR pcFreq
    invoke QueryPerformanceCounter, ADDR pcCount
    push DWORD PTR pcCount+4
    push DWORD PTR pcCount
    invoke Sieve_ms, pMem, N
    invoke QueryPerformanceCounter, ADDR pcCount
    pop ecx
    sub DWORD PTR pcCount, ecx
    pop ecx
    sbb DWORD PTR pcCount+4, ecx

    fild pcCount
    fild pcFreq
    fdiv                    ; pcCount / pcFreq = seconds
    mov  msCount, 1000
    fild msCount
    fmul                    ; seconds * 1000 = milliseconds
    fistp msCount

    print ustr$(msCount), "ms", 13, 10

    free( pMem )

    inkey "Press any key to exit..."
    exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start


Typical results on my P3:

3075ms  3083ms
3065ms  3064ms
3095ms  3068ms
3084ms  3060ms
GetTickCount average 3079ms
High-Resolution Timer average 3079ms

28651ms 28564ms

eschew obfuscation

Tedd

Loki's code appears to be doing the following:

@@:
  t1 = perfCount()
  t3 = t1
  test_code()
  t2 = perfCount()
  t1 = perfFreq()
  diff = t2 - t3
  Msgbox( str(t1/LOW_DWORD(diff)), str(HIGH_DWORD(diff)) )
  jmp @B

The measured time isn't including the delay caused by response to the messagebox, directly.

So, why should creating a messagebox cause a slow down? Simple - creating any kind of dialog means loading various dlls into the process memory space, and initialising structures, etc.. This is particularly noticeable for the first call to any of the common dialogs or controls. But wait, that comes after the measure, so it shouldn't be included. True, but once the dialog is destroyed, it's cleaned up - that goes on in the background. So the next loop measures a longer time because the OS is trying to do other things at the same time. By responding fast enough, it may be that things are still in cache/buffers, so their loading is faster next time around, reducing the delays, etc.
The messagebox itself doesn't change the run time of the code. However, the method you're using to measure the time assumes your code is the only thing running - at all. This isn't too bad for a short time period, but the OS tries to multi-task, and that means it will do other things and they will interrupt your timing. By creating and destroying dialogs, you're forcing it to do other things, and thus messing up your own timing. If you want to measure code accurately, do as little else as possible.


Quotebtw .....if windows copies & runs all programs from HD to memory why don't you have the option to delete a program from the harddrive while it is running....you can do it with editing text files etc.....
Only the required sections are copied directly into memory. Small programs will usually fit their whole working set in memory in one go, but larger programs won't. The exe is kept locked (meaning you can't delete it) in case other parts/sections need to be loaded from it.
When editing text files, it depends on the editor - notepad loads the whole file into memory and no longer needs the file, but it was never really meant for editing large files. Not all editors do that.
No snowflake in an avalanche feels responsible.

loki_dre

Thanx guys that helps clear things up

QuoteI'm having problems understanding the purpose of your code. The point of timing code is normally to compare the execution times between algorithms and/or implementations

I'm basically writing code to do image processing on bmp files (detect objects in a picture)....the purpose of measuring my performance right now is to determine the average/max amount of frames per a second(FPS) that I can process.  My target it 60FPS (typical screen refresh rate).....code still needs some more work done on it right now,  & I probably got some HW & SW changes to make.....
Most high speed/high resolution cameras you can buy & interface to easily have better support for windows than other OS's so I'm developing it on Windows XP...

I've been looking at buying a Quad-Core PC right now but i'm not quite sure if it will have a significant increase in performance..........I mostly have for loops with some math...setx, & jxx.  And I am typically looking at surrounding pixels etc.
If anyone has any HW or SW tips it would be greatly appreciated....