Profiler Tool

A tool for measuring the efficiency of a running program.

Examples


Discussion of Profilers

What are good profilers? Most profilers I've used aren't very good. I've resorted to using something akin to the BinaryChop.

Like everything, it depends on what language you use and what tools are available for it. I have used OptimizeIt and found it invaluable, but the same observations probably apply to any reasonably good profiler.

Unlike the BinaryChop, a profiler doesn't much interfere with the patterns of execution in your program, and therefore measures real performance data, or at least a much better approximation thereof.

A profiler's primary mission is to automate the techniques you can carry out by hand: instrument your program to gather performance data at precisely identified points, then collect a very large number of samples to ensure maximal accuracy. As with every such tool, once the repetitive aspects of the task are automated you still have to invest some effort in analyzing and interpreting the data, deriving from it some testable conclusions as to where the problem lies, devising, implementing and verifying a solution.

The best profiler I have used by far is Shark (formerly Shikari, a much better name) on MacOsx. It will profile any code, no special build required. It reports at source or assembly level and will even point you to possible improvements in your code. One more reason why Mac OSX is a great platform for developers. -- hjm


I have had good success with the DelphiLanguage version of AQTime from http://www.automatedqa.com/. They also make versions for .Net and other languages. -- JohnRusk


MatLab has a pretty good profiler. An article about it http://www.mathworks.com/company/newsletter/may03/profiler.shtml gives the surprising result that the initial bottleneck (in the original code of the example MandelbrotSet program) was not inside the inner loop where I (and most other people) would expect it.


See also SimpleProfiler


The problem is that we programmers think of profiling for performance as measuring, in hopes that by measuring enough pieces of the program we can find the parts that need optimizing.

BUT, would you do that if you could just be told exactly where the problems are?

In fact, almost all "bottlenecks" are either call instructions or non-call instructions, and the time they cost is exactly the time they are on the stack. Where they are easily seen.

Suppose a "bottleneck", when removed, could save 50%, i.e. 50% of the time it is on the stack. Suppose you randomly pause the program and examine the stack. You have a 50% chance of seeing it, right under your nose. If you do this 4 times (for example) the chance you will see it 0 times is 1/16, 1 time: 1/4, 2 times: 3/8, 3 times: 1/4, 4 times: 1/16. So the chance you will see it 2 or more times is 11/16. That's pretty good odds of seeing it enough times to know it's a problem. If you do it more times, the odds get much better.

Another way to look at it. Suppose you take 4 samples of the stack, and you see a line of code on 2 samples. So what does it cost? (It's a beta distribution.) Given those samples, what it costs, on average, is (2+1)/(4+2) = 50%. Sure it's not exact, but you know precisely where it is. Would you turn down an average 50% savings just because it's approximate?

So - all those folks who say "measure measure measure"? Give this a try. You'll be surprised how effective it is. -- MikeDunlavey

I'm confused. As far as I can tell, you are measuring, yet it sounds as if you are proposing this as an alternative to measuring.

There's a qualitative difference. Consider the extreme of an infinite loop. Since it is infinite (or very long running), you only need to interrupt it once, and you know the problem is on the stack. Just single-step until you get to it, and you'll see the problem. You're not measuring the amount of time anything takes for the purpose of narrowing down where it is. The simple fact that it takes time is what exposes it, down to the exact line and instruction. OK, an infinite loop is a "bottleneck" that is active 100% of the time, so a single sample is guaranteed to hit it. Pick a percent - any bottleneck has to have a percent. That is the chance that any single pause will show it. The strategy is, take samples of the call stack until something you can fix shows up on 2 samples, the reason being that if something shows up on only 1 sample, that doesn't mean anything (unless you know a-priori that there's a big bottleneck). What's more, when you pause it, you can not only see where the PC is (or if it's in I/O), you can see the entire "why-chain" of why it's there. A "bottleneck" can be defined as unnecessary work, and the only way you know it's unnecessary is if you know why it's doing it, which is what the stack sample (and other state info) tells you.

There are profilers that get close to this. (I'm thinking of Zoom by RotateRight?.com.) They take stack samples on wall-clock time and, for each line of code or instruction appearing on one or more stack samples, they tell you the percent of samples it appeared on. In real big software, this can find mid-stack calls that account for significant time. That's good. What it doesn't tell you is the reason why that call is being executed. That info is encoded in each sample, but it doesn't let you see the raw samples, so you're left doing some guesswork.

What's more, they still fall in the category of encouraging people to take at least 100s of samples for the purpose of "measurement precision", seemingly unaware that the real goal is "location precision", for which you do not need a large number of samples. You do need to understand fully what each sample tells you. Sure you get a measurement out of it, but it's not particularly precise, nor does it need to be. Sorry to say so much, but I hope you can see the emphasis is completely different.

I'm not alone in touting this. I'm just the most passionate / outspoken, because I'm so tired of people not being aware of it. It's quite a topic on StackOverflow. Here are some links (I'm not sure how to put links in here more briefly, and you're welcome to clean up. I know I've imposed on your site.)

http://stackoverflow.com/questions/890222/analyzing-code-for-efficiency/893272#893272 (first comment)

http://stackoverflow.com/questions/4832642/when-is-optimization-premature/4832698#4832698 (last paragraph)

http://stackoverflow.com/questions/266373/one-could-use-a-profiler-but-why-not-just-halt-the-program/317160#317160 (in Java)

http://stackoverflow.com/questions/2473666/tips-for-optimizing-c-net-programs/2474118#2474118

http://stackoverflow.com/questions/2624667/whats-a-very-easy-c-profiler-vc/2624725#2624725

http://stackoverflow.com/questions/375913/what-can-i-use-to-profile-c-code-in-linux/378024#378024 (fifth comment (from Emtucifor))

http://stackoverflow.com/questions/4998142/profiling-a-ruby-app-that-hits-100-cpu-and-never-ends/4998164#4998164 (the comments)


CategorySoftwareTool, CategoryOptimization


EditText of this page (last edited March 3, 2011) or FindPage with title or text search