News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

An idea for realtime profiling

Started by akalenuk, October 26, 2006, 02:53:13 PM

Previous topic - Next topic

akalenuk

I want to share an idea and discuss its realisation.
Two weeks ago I got a problem with a profiling issue. There was a complex project, and it was enormously slow and it was completely unknown what exactly slowed it down. So I took some profiler and ran that project with a profiler. It became even slower and hardly passed initialisation stage. And of course realtime profiling was completely impossible. Then I thought of writing my own simple tool to search for bottlenecks in my own projects.
Most of the time I write code under PC using VC++, MASM32 and Delphi. These three support WinAPI, so I decided to manage things that way:
I recieve a window handler of a special external program, like: 1234578
To reknow the time some of my code uses, I send a message to my profiler: SendMessage,12345678,WM_USER,0,1 before the code
and SendMessage,12345678,WM_USER,0,2 - after it. The external program calculates the time between this two messages and stores it in a counter. I've made ten counters in my realisation, and so far it is way enough. It takes some time to print a result in a window, and this skrews the total result significantly, so I made an option to turn this output of for a while.

Well, there are some more features in my realisation, but the matter I want to discuss the most is an idea itself.

Here's the source and .exe file. http://perpholenta.narod.ru/profilersrc.zip

So, what do you think?

PBrennick

akalenuk,
Just to be clear, I am sure you realize that running ANY profiler at the same time as your App will involve timeslicing so it is expected that it would cause your App to slow down in its initialization. An App that is ALREADY slow would make the use of a profiler problematic.

As far as testing your profiler goes, I must be doing something wrong as the log file contains all zeroes. Can you give a more complete explanation of the steps involved? I started the profiler and pressed the 'W' button. Then I started my editor and then pressed the 'F' button, saved the logfile and then closed everything.
Paul
The GeneSys Project is available from:
The Repository or My crappy website

akalenuk

Quote from: PBrennick on October 26, 2006, 03:27:29 PM
akalenuk,
Just to be clear, I am sure you realize that running ANY profiler at the same time as your App will involve timeslicing so it is expected that it would cause your App to slow down in its initialization. An App that is ALREADY slow would make the use of a profiler problematic.
Yes, but it is possible to reduce the impact of profiler running. "GetTickCount" and couple of movs can't take much time, right?

Quote from: PBrennick on October 26, 2006, 03:27:29 PM
As far as testing your profiler goes, I must be doing something wrong as the log file contains all zeroes.
Everything's fine. You have not filled any of counters, so you have nothing in output. It is possible result for a profiler.

Quote from: PBrennick on October 26, 2006, 03:27:29 PM
Can you give a more complete explanation of the steps involved? I started the profiler and pressed the 'W' button. Then I started my editor and then pressed the 'F' button, saved the logfile and then closed everything.
Let's take an example. It was used to profile a 3D graphic engine. Th engine has certain groups of code, like physics, script processing, scene object parameters computation and rendering. There also is a main loop, in which these groups are somehow being used. Let it be something like:


void Main_loop(){
  Do_physics();
  Do_computation();
  Do_scripting();
  Do_rendering();
}


In order to use my realisation you need to run "profiler.exe" and instantly obtain it's window handler in clipboard. Then you may paste it somewhere and store in a constant, or juct copy/paste along with other strings.


static const HWND dHWND = (HWND)3146324;

void Main_loop(){
  Do_physics();
  Do_computation();
  Do_scripting();
  Do_rendering();
}


Then just put propriet SendMessages before and after each call you want to know about.


static const HWND dHWND = (HWND)3146324;

void Main_loop(){
  SendMessage(pHWND,WM_USER,1,1);
  Do_physics();
  SendMessage(pHWND,WM_USER,1,2);

  SendMessage(pHWND,WM_USER,2,1);
  Do_computation();
  SendMessage(pHWND,WM_USER,2,2);

  SendMessage(pHWND,WM_USER,2,1);
  Do_scripting();
  SendMessage(pHWND,WM_USER,2,2);

  SendMessage(pHWND,WM_USER,3,1);
  Do_rendering();
  SendMessage(pHWND,WM_USER,3,2);
}


Then by running the code and let this loop go for a while you should recieve something like this in a profiler window:


27-10-2006 14-3
Profile results:

0: 0
1: 12345
2: 6543
3: 2343
4: 543322
5: 0
6: 0
7: 0
8: 0
9: 0


So the bottleneck is in rendering part, as soon as it takes 543322 milliseconds in total.
You can also count percentage for each of the counters. This is [%] button for.
[w] button is to recieve a window handler of a profiler.

You can also do this:

void Main_loop(){
  SendMessage(pHWND,WM_USER,0,1);
  Do_physics();
  Do_computation();
  Do_scripting();
  SendMessage(pHWND,WM_USER,3,1);
  Do_rendering();
  SendMessage(pHWND,WM_USER,3,2);
  SendMessage(pHWND,WM_USER,0,2);
}

...and you'll have total time in counter #0. This counter is also used to count percentage as a total. If it is void - then it will be filled automaticly by the sum of other counters.
You can measure time of any part, not only of a function or a segment of code. You can use one counter from multiple places in your code to sum their time. And it will work realtime so you can see how your program reacts on other events.

And I put one more feature. It's counter increment:
SendMessage(pHWND,WM_USER,0,3); - this will add 1 to the counter #0 each time being posted.

Profiler window title may be used to display information too:
{char s[256];sprintf(s,"c: %d",name[key]);SetWindowText(pHWND,s);}

Well, what do you think?

PBrennick

I will take a look at it over the weekend. I love the concept. I did to figure all that out in assembly as it is the only language I use. Doesn't look too difficult, though.

I will get back to you.
Paul
The GeneSys Project is available from:
The Repository or My crappy website

Mark_Larson

  I actually did something similar to this a few years ago.  I did it without messages.  I simply had a function I called around the functions I wanted to time.  I had a big array of all the start times of all the functions I wanted to time.  And I simply subtracted the end times. 

so looking at the sample code you provided, it would look like this:


// DON"T NEED THIS ANYMORE static const HWND dHWND = (HWND)3146324;

void Main_loop(){
  Do_physics();
  Do_computation();
  Do_scripting();
  Do_rendering();
}


  and here's the way it would look in the code.  I didn't have to pass in the a 1 or a 2 for starting timing and stopping timing, since I had seperate functions for each.  The only parameter I pass in is which timer I need to save it in ( which offset in the array of start times).


// DONT NEED THIS ANYMORE static const HWND dHWND = (HWND)3146324;

void Main_loop(){
  StartTimer(1);
  Do_physics();
  StopTimer(1);

  StartTimer(2);
  Do_computation();
  StopTimer(2);

  StartTimer(3);
  Do_scripting();
  StopTimer(3);

  StartTimer(4);
  Do_rendering();
  StopTimer(4);
}


  StartTimer() would basically do the RDTSC, and save the result in the array of start times at the offset you passed in when you called the function.  StopTimer() would do RDTSC and subtract it from the proper start time array offset that you passed into the function.  And save the result in the start time array for later printing.  So I only had 1 array.

  I like it better as a solution because I don't have to run some other program ( the profiler) and gets it's window handle.  Second, I don't have to send a message to the other program, which could have some delay in when the message is delivered, and thus affect how accurate the timing results are.

BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

ic2

Do anyone remember MAVERICK ( his altar  was a colly or a wolf )  at the win32community.  Look up his threads about PROFILER. ( I could have miss-spelled his name but you got to know the original MAVERICK) ...  It will be your best shot ...

He wrote a true PROFILER in 100% asm and had many interesting replies.  He was so serious about it for a LONG LONG time, but he stop posting because there was not enough interested people and he got disappointed behind  that.  donkey or Man from Mars  and many others know where he or the thread might still be living.  Search the board above for starters.  It could have been lost because of hacking back in the day.

Good luck

akalenuk

Quote from: Mark_Larson on October 27, 2006, 06:16:54 PM
  I actually did something similar to this a few years ago.  I did it without messages.  I simply had a function I called around the functions I wanted to time.  I had a big array of all the start times of all the functions I wanted to time.  And I simply subtracted the end times. 
Well, yes. It is offten even embedded in 3d engines, as a debug function. But you can't just port debug code in every project you do. And you newer know when you'll really need that profiling.

Quote from: Mark_Larson on October 27, 2006, 06:16:54 PM
so looking at the sample code you provided, it would look like this:


// DON"T NEED THIS ANYMORE static const HWND dHWND = (HWND)3146324;

void Main_loop(){
  Do_physics();
  Do_computation();
  Do_scripting();
  Do_rendering();
}


Not really. It would also include dll import (if you made your routines external), or the code for time measuring, storing, calculation and visualisation. Isn't it logical to use a one single program with the similar code, then to copy your code in every program, that need it?

Quote from: Mark_Larson on October 27, 2006, 06:16:54 PM
  StartTimer() would basically do the RDTSC, and save the result in the array of start times at the offset you passed in when you called the function.  StopTimer() would do RDTSC and subtract it from the proper start time array offset that you passed into the function.  And save the result in the start time array for later printing.  So I only had 1 array.
How come? In my example "Main_loop()" is a looping function, so it would allways store a time of one last execution. This would not bring you the wiev how the program works in a time. And what if it calls two rendering issues: one before physics and one after? I should keep it in separate cells and then sum them with my own hands?

Quote from: Mark_Larson on October 27, 2006, 06:16:54 PM
  I like it better as a solution because I don't have to run some other program ( the profiler) and gets it's window handle.  Second, I don't have to send a message to the other program, which could have some delay in when the message is delivered, and thus affect how accurate the timing results are.

But how do you plan to visualise the results? You'd need the separate window anyway and still you will send this window messages, and for Windows there's no big difference what is the window you send messages to.

Your solution is better if the project you want to profile has it's own graphics interface, that is faster then Windows has. It is great if you do time monitoring constantly. But, as for me, I would rather have a small quick application to do some debuging routines I use the most, so I can use it any time in any project.

That week I wrote a tool for generating toon shading table (Delphi), a plugin for 3DMAX(VS .NET), I was still working on 3D engine (VC++) and, of course, I couldn't help playing with new MASM32 release. And I used my profiler in mostly any case i wanted to:
1. Find the bottleneck.
2. Control execution process.
3. Output an internal variable on the screen.

As example, my plugin for 3DMAX doesn't have it's own window at all, but if I still need to see what is going on when it is running. So I ought to use external window.

External application and internal routines is allways the tradeoff in speed-precision and generallity, you just have to pick what is importrant to you right now, and for me it is importrant to allways have an oppotunity to see into my programs running.

Mark_Larson

  Akalenuk,

  I don't think you understood my solution, because you made a number of statements below that just don't make sense.

Quote from: akalenuk on October 28, 2006, 03:15:59 PM
Quote from: Mark_Larson on October 27, 2006, 06:16:54 PM
  I actually did something similar to this a few years ago.  I did it without messages.  I simply had a function I called around the functions I wanted to time.  I had a big array of all the start times of all the functions I wanted to time.  And I simply subtracted the end times. 
Well, yes. It is offten even embedded in 3d engines, as a debug function. But you can't just port debug code in every project you do. And you newer know when you'll really need that profiling.

This has nothing to do with debug functions.  It has nothing to do with porting debug code.  And you last statement makes it sound like my solution doesn't profile, and it does, very well.  I've used it quite a bit under both VC++ and MASM.  Under MASM I use macros.  Under VC++ I use inline procedures.  Have you looked at Michaelw's macros to time functions?  Mine work the say way as his, except you save the data to an array of start times.  You can check his here:

http://www.masm32.com/board/index.php?topic=770.0


Quote from: akalenuk on October 28, 2006, 03:15:59 PM
Quote from: Mark_Larson on October 27, 2006, 06:16:54 PM
so looking at the sample code you provided, it would look like this:


// DON"T NEED THIS ANYMORE static const HWND dHWND = (HWND)3146324;

void Main_loop(){
  Do_physics();
  Do_computation();
  Do_scripting();
  Do_rendering();
}


Not really. It would also include dll import (if you made your routines external), or the code for time measuring, storing, calculation and visualisation. Isn't it logical to use a one single program with the similar code, then to copy your code in every program, that need it?

DLL import?  that doesn't make sense either.  I am not using DLLs for my profiling code.  For the VC++ part I have a seperate .C file I add to my project that has the StartTimer() and StopTimer() code.  For the MASM part I have a .inc file I include into my file to use the macros.  I don't have to "copy" the code into every program, because I simply add it to my project under VC++ or do an INCLUDE In MASM, which is trivial.  You are making it harder than it needs to be.



Quote from: akalenuk on October 28, 2006, 03:15:59 PM
Quote from: Mark_Larson on October 27, 2006, 06:16:54 PM
  StartTimer() would basically do the RDTSC, and save the result in the array of start times at the offset you passed in when you called the function.  StopTimer() would do RDTSC and subtract it from the proper start time array offset that you passed into the function.  And save the result in the start time array for later printing.  So I only had 1 array.
How come? In my example "Main_loop()" is a looping function, so it would allways store a time of one last execution. This would not bring you the wiev how the program works in a time. And what if it calls two rendering issues: one before physics and one after? I should keep it in separate cells and then sum them with my own hands?

I have no idea what you are talking about here.  My code correctly times each function that I add StartTimer()/StopTimer() around.  And correctly prints out the results for all the functions it times.  It has even less overhead than your code, and doesn't require any seperate program to print timings.  I've done a lot of testing of it when I wrote it several years ago. 


  And I have support for the visualization in the same .C file that StartTimer() and StopTimer() are in.  And the same thing in the macro file.

  And you can't get messages to send instantly.  That means their is overhead between when you send the message in your code and when the external profiler gets the message and does a RDTSC.  That means you will always end up with the timing code not being able to time it exactly.  Try doing it 2 ways.  Write one routine that you time by sending messages.  And then use Michael's or other macro based timers to see how fast it is.  I bet your timers will always show a varying time based on how long it takes to get the message to the profiler.  Where as Michael's macros produce much more consist and accurate timings.

BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

akalenuk

Quote from: Mark_Larson on October 28, 2006, 04:58:41 PM
Well, yes. It is offten even embedded in 3d engines, as a debug function. But you can't just port debug code in every project you do. And you newer know when you'll really need that profiling.
Quote
This has nothing to do with debug functions.  It has nothing to do with porting debug code.  And you last statement makes it sound like my solution doesn't profile, and it does, very well. 
Quote
By "debug functions" I mean that code that does not go along with a release. And embedded profiling is one of these things the end user don't want much. And by "you newer know when you'll really need that profiling" I do not mean that your solution doesn't profile. I mean that you may want to profile something in unusual enviroment, and to do so you'll have to port your code into VB, Object pascal, rosasm or something.

Quote from: Mark_Larson on October 28, 2006, 04:58:41 PM
I've used it quite a bit under both VC++ and MASM.  Under MASM I use macros.  Under VC++ I use inline procedures. 
That's nice. You do have more accurate results. But in VC++ and MASM only. Or am I wrong?

Quote from: Mark_Larson on October 28, 2006, 04:58:41 PM
Have you looked at Michaelw's macros to time functions?  Mine work the say way as his, except you save the data to an array of start times.  You can check his here:

http://www.masm32.com/board/index.php?topic=770.0
Thanks, I'll look it through as soon as I finish this post.

Quote from: Mark_Larson on October 27, 2006, 06:16:54 PM
Quote
Not really. It would also include dll import (if you made your routines external), or the code for time measuring, storing, calculation and visualisation. Isn't it logical to use a one single program with the similar code, then to copy your code in every program, that need it?

DLL import?  that doesn't make sense either.  I am not using DLLs for my profiling code. 
And that is why I wrote: "...or the code for time measuring, storing, calculation and visualisation." And I would sure prefer dll import as it would make it independent from development enviroment.

Quote from: Mark_Larson on October 27, 2006, 06:16:54 PM
For the VC++ part I have a seperate .C file I add to my project that has the StartTimer() and StopTimer() code.  For the MASM part I have a .inc file I include into my file to use the macros. 

I don't have to "copy" the code into every program, because I simply add it to my project under VC++ or do an INCLUDE In MASM, which is trivial.  You are making it harder than it needs to be.
Ok you don't copy code, "include" does :-)

Quote from: Mark_Larson on October 27, 2006, 06:16:54 PM
I have no idea what you are talking about here.  My code correctly times each function that I add StartTimer()/StopTimer() around.
Yes, but if it has it's own brunching you'll get different results each time the function runs. And time collection will give you a statistics, that's much more importrant.

Quote from: Mark_Larson on October 27, 2006, 06:16:54 PM
  And I have support for the visualization in the same .C file that StartTimer() and StopTimer() are in.  And the same thing in the macro file.
And how exactly do you do the visualisation?

Quote from: Mark_Larson on October 27, 2006, 06:16:54 PM
  And you can't get messages to send instantly.  That means their is overhead between when you send the message in your code and when the external profiler gets the message and does a RDTSC.  That means you will always end up with the timing code not being able to time it exactly.
Yes. And that is why I'm talking about trade off. It's portability or precision.

Quote from: Mark_Larson on October 27, 2006, 06:16:54 PM
  Try doing it 2 ways.  Write one routine that you time by sending messages.  And then use Michael's or other macro based timers to see how fast it is.  I bet your timers will always show a varying time based on how long it takes to get the message to the profiler.  Where as Michael's macros produce much more consist and accurate timings.
Why not? And I don't need exact measurement to find out that rendering wastes 80% of my time :-)

Once again: your solution is more accurate, and mine is portable. I can use it instantly with any DE, any language and any project, as soon as WinAPI is available. And I don't need much precision. While profiling big engine I even used realtime visualisation, that togather with IntegerToStr conversion sure consume a lot of time. Still the results remained acuurate enough to see what makes the whole project slow.

You don't need much accuracy to find the bottleneck, or esle it is not a bottleneck :-)

sluggy

akalenuk,
i'm going to throw my 2c in here as well. It seems like you haven't fully thought through the profiling issue and how to do it.

Profiling does two things:
- times individual functions or units of code
- provides a trace so you can determine program flow

In the case of your slow application, it is important to also get a trace, as you may find that you are calling functions unnecessarily, or calling them more times than you thought. Putting traces on selected functions only may hide this issue from you, you may be looking in the wrong place for the issue(s) (from experience i have found that bottlenecks are not always where you expect them to be).

The other important thing is that i believe you are taking the wrong approach to the design of your profiling. An app that is being profiled should not know about it, there should be no profiling code within the app being tested. Instead, your profiler should be able to attach to the module being profiled, thus achieving your aim of being portable. There are some good resources on how profiling works in the .Net world, and these can give you some good ideas on what you should do with your profiler. In simple terms, the .Net JITer hooks the start and end of each function, and the .Net profiler can use those hooks to do timing and tracing. Your profiler could do the same - hook the start and end of each function.


akalenuk

Quote from: sluggy on October 29, 2006, 11:19:06 PM
i'm going to throw my 2c in here as well. It seems like you haven't fully thought through the profiling issue and how to do it.

Profiling does two things:
- times individual functions or units of code
I can do this.

Quote from: sluggy on October 29, 2006, 11:19:06 PM
- provides a trace so you can determine program flow
And this.

Quote from: sluggy on October 29, 2006, 11:19:06 PM
In the case of your slow application, it is important to also get a trace, as you may find that you are calling functions unnecessarily, or calling them more times than you thought. Putting traces on selected functions only may hide this issue from you, you may be looking in the wrong place for the issue(s) (from experience i have found that bottlenecks are not always where you expect them to be).
Well, there's no point in profiling, if the bottleneck is in the place you expect it to be. And yes, you're right about putting traces on selected functions.

Quote from: sluggy on October 29, 2006, 11:19:06 PM
The other important thing is that i believe you are taking the wrong approach to the design of your profiling. An app that is being profiled should not know about it, there should be no profiling code within the app being tested.
Who said that? I have one CPU, though code execution is always linear. No profiling code - is an abstraction: there is always profiling code, and it's better for me to see it, than not to know what is going on in my code.

Quote from: sluggy on October 29, 2006, 11:19:06 PM
Instead, your profiler should be able to attach to the module being profiled, thus achieving your aim of being portable.
I have couple of profilers that do this way, and I have no intention to copy them. This makes profiling slow and unefficient. I want to know all and only what I do whant to know about my code. And the best way to do so - is to mark what interests me whith my own hands.

Quote from: sluggy on October 29, 2006, 11:19:06 PM
There are some good resources on how profiling works in the .Net world, and these can give you some good ideas on that you should do with your profiler.
I know well what to do with my profiler - use it for my own good :-)

Quote from: sluggy on October 29, 2006, 11:19:06 PM
In simple terms, the .Net JITer hooks the start and end of each function, and the .Net profiler can use those hooks to do timing and tracing. Your profiler could do the same - hook the start and end of each function.
Yes! And hooking is WAY faster than messaging!
No, i'd rather not.

I think, that the point of this thread is not how to do "right" profilers, but how to profile in a simple and fast way. Anyway, thanks for your concern.

Mark_Larson

Quote from: sluggy on October 29, 2006, 11:19:06 PM
akalenuk,
i'm going to throw my 2c in here as well. It seems like you haven't fully thought through the profiling issue and how to do it.

  That's why I stopped posting on the thread ;)  He wasn't getting the points I was trying to make :)

Case in point.  Here's two replies he made to your comments about the program being profiled shouldn't know about the profiling APP.  Good luck :)


Quote from: sluggy on October 29, 2006, 11:19:06 PM
The other important thing is that i believe you are taking the wrong approach to the design of your profiling. An app that is being profiled should not know about it, there should be no profiling code within the app being tested.
Who said that? I have one CPU, though code execution is always linear. No profiling code - is an abstraction: there is always profiling code, and it's better for me to see it, than not to know what is going on in my code.

Quote from: sluggy on October 29, 2006, 11:19:06 PM
Instead, your profiler should be able to attach to the module being profiled, thus achieving your aim of being portable.
I have couple of profilers that do this way, and I have no intention to copy them. This makes profiling slow and unefficient. I want to know all and only what I do whant to know about my code. And the best way to do so - is to mark what interests me whith my own hands.


BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

Mark Jones

Ever use something like this?

http://developer.amd.com/cawin.jsp

If you have an AMD processor, this is the ultimate profiler.
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08