News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

preformance

Started by loki_dre, April 26, 2008, 08:14:14 AM

Previous topic - Next topic

loki_dre

It appears that my program seems to have different performance times each time I run it with the same input file.
Sometimes it is 15ms, and sometimes it is 30ms or more.
I am not starting or stopping any other programs in between each run(ie.the system is relatively idle).

Is it normal to get different performance times like this.
Is there anything I can do to minimize this time and keep it consistent on a relatively idle PC?
      I'm new to MASM.......... is there something in my code that should be changed (some sort of common mistake made by beginners)?
      Or are there any settings in Windows that I can use to maximize performance (I used "start /high program.exe", & killed every process in the Windows Task Manager that I Possibly could)?

PS:
I Would like to avoid the use of Windows Embedded....Anyone got any tips or tricks?

donkey

Windows is a multitasking OS, it switches processes based on several factors but mainly the priority assigned to the process. When switching processes it performs a context switch which saves the machine state for your process and loads the machine state for another then gives that process it's time slice. When it has run through all the processes (not really but it's easier to explain this way) it gets back to yours and continues execution. So, depending on when the context switch takes place you can get different run times for exactly the same program with the same data. Also since you are using a file, the hard disk may be in use or seek times may be different from run to run, for example the indexing service may be reading the drive when the app starts for one run and it may be idle for another. Virtual memory can also play a part, one run of the process may have enough free memory to be completely resident while another is partially in the swap file. Could be a lot of other reasons that fall lower on the probability ladder as well...

You can set the process priority in program if you like but it is not something to be done lightly...

invoke SetPriorityClass,[hProcess], REALTIME_PRIORITY_CLASS
invoke SetThreadPriority,[hThread], THREAD_PRIORITY_TIME_CRITICAL


Donkey
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

loki_dre

do I have to do that for each function?????
ie.
EXAMPLE proc
     invoke SetPriorityClass,[hProcess], REALTIME_PRIORITY_CLASS
     invoke SetThreadPriority,[hThread], THREAD_PRIORITY_TIME_CRITICAL
ret
EXAMPLE endp

Or is just once after "start:" sufficient?

I run my program at the command prompt with "start /high masmIP.exe".........but it didn't really seem to have any effect on a relatively idle PC? any idea why?


MichaelW

#3
The effective resolution of the value returned by GetTickCount is no better than 10ms. Timing a period requires two calls to GetTickCount, and since you don't know where in the timer cycle the calls occur, each timed period has an uncertainty of at least plus or minus 20ms. You can cut the uncertainty in half by synchronizing with GetTickCount before you start timing. Either way, to get meaningful times the timed period must be at least several seconds. The High-Resolution Timer has an effective resolution of several microseconds, so it can be used to get meaningful times for periods down to perhaps 100ms. Below that you need to loop your code to get the period up to something reasonable, and then divide the total time by the number of loops, or measure your execution times in processor clock cycles. Boosting the process/thread priority will help reduce the number of context switches that occur during the timing period, and may improve the accuracy/consistency of the results, but using REALTIME_PRIORITY_CLASS with buggy code can cause Windows to crash.

If you examine the timing methods used in the Laboratory you will basically see two schools of thought, one that favors GetTickCount with no synchronization, no priority boost, and many loops, and one that favors counting clock cycles, with a smaller number of loops and a priority boost. Within limits, either method will work.

eschew obfuscation

hutch--

loki,

Michael is correct here, different timing methods test different things and its worth understand what each method is useful for. If you download Michael's timer code you will find it very useful for timing short sequences of instructions as it is designed to perform that function among others.

If you use getTickCount you need to be aware of its limitations and one of them is that its resolution is poor at low time intervals. To start to get reliable timings you need to set up the test with enough data to run for a quarter to half a second before the timings fall below a couple of percentage points.

Where the timer code gets used to time small instruction sequences which is very useful when designing the inner guts of an algorithm, the GetTickCount method when run a half a second or so is testing real time which is also very useful.

To further tailor this technique to what you are testing, if its an algo that handles a very large amount of data like some search or sorting algorithms, you feed it large data to get the speed of the main algorithm without taking much notice of how fast it starts or finishes. At the other end if its a very short algo where its start and finish speed is important, you tend to feed it much smaller data but with a much higher loop count.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

loki_dre

thanks guys


hhmmmm....QueryPerformanceCounter requires the use of a 64-bit variable.......and as a result a 64-bit register would be useful
how can a create a 64-bit variable?
        CPU_Time1   QWORD   ?         ;<<<<<<=========is that correct?
how can I access the 64-bit registers?

MichaelW

Timers.asm, available here, includes a pair of macros that use the High-Resolution Timer. The code is a little more complex that absolutely necessary, because it attempts to eliminate the effects of the loop overhead by timing an empty reference loop and then subtracting that time from the total time, so the result reasonably represents the execution time for the code being tested.
eschew obfuscation

donkey

Quote from: loki_dre on April 26, 2008, 10:58:50 AM
thanks guys


hhmmmm....QueryPerformanceCounter requires the use of a 64-bit variable.......and as a result a 64-bit register would be useful
how can a create a 64-bit variable?
        CPU_Time1   QWORD   ?         ;<<<<<<=========is that correct?
how can I access the 64-bit registers?


There are no 64 bit GP registers in a 32 bit machine, the value is split over two registers with EAX containing the low DWORD and EDX containing the high DWORD.
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

loki_dre

hmmmm.....I found an alternate answer on another post & am now getting more consistent results:
    mov eax, DWORD PTR [QW1+0]   ; Low DWORD of QW1
    mov edx, DWORD PTR [QW1+4]   ; High DWORD of QW1
    sub eax, DWORD PTR [QW2+0]   ; Low DWORD of QW2
    sbb edx, DWORD PTR [QW2+4]   ; High DWORD of QW2

but doesn't MMX mean the processor has a couple 64-bit registers?


I also noticed that my program seems to run faster on the first loop immediately after I compile it...........anyone know why this is?

donkey

GP stands for Gerneral Purpose. The MMX registers are not used to return values from the API.
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

loki_dre

any thoughts on why I get better performance if I compile first and run immediately after?
I created a bat file called compile with the following code to compile & run:
cls
del masmIP.exe
del masmIP.obj
\masm32\bin\ml /c /Zd /coff masmIP.asm
\masm32\bin\Link /SUBSYSTEM:WINDOWS masmIP.obj
start /high masmIP.exe


MichaelW

I can't see any reason why code would execute faster immediately after it was compiled. The program would load faster if it were in the disk cache, and immediately after the exe was compiled and linked it would be in the cache. I think the main reason is likely to be the command line:

start /high masmIP.exe

Which starts the program in high priority class. Depending on the program, this could cause a significant increase in performance.
eschew obfuscation

loki_dre

when you run a small program is it loaded into memory (RAM) and then run.....or would it read instructions off the hard drive?

is there anyway I could load my program into memory(RAM) and then run it?.....assuming it is not done automatically

donkey

Quote from: loki_dre on April 27, 2008, 05:32:47 PM
when you run a small program is it loaded into memory (RAM) and then run.....or would it read instructions off the hard drive?

is there anyway I could load my program into memory(RAM) and then run it?.....assuming it is not done automatically


All programs are run from memory. I think you should read some Randall Hyde about now, you seem to have a complete lack of knowledge about computer architecture and the very basics of how computers work, though we are more than happy to help you this is not a classroom and you should maybe try to research a few things yourself. You should definitely not be playing with priority classes without at least an idea of how pre-emptive multitasking operates and the consequences of modifying a processes priority.
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

loki_dre


it seems that the following command:
fn MessageBox,0,str$(eax),str$(ebx),MB_OK

was the problem..........since it delays (waits for user input)........windows must have removed it from memory/cache.