michael@0: michael@0: michael@0: michael@0: The Jprof Profiler michael@0: michael@0: michael@0:
michael@0:

The Jprof Profiler

michael@0: michael@0: jim_nance@yahoo.com

michael@0: Recent (4/2011) updates Randell Jesup (see bugzilla for contact info) michael@0: michael@0:


michael@0: michael@0: Introduction | Operation | michael@0: Setup | Usage | michael@0: Interpretation michael@0: michael@0:
michael@0:
michael@0: michael@0:

Introduction

michael@0: michael@0: Jprof is a profiling tool. I am writing it because I need to find out michael@0: where mozilla is spending its time, and there do not seem to be any michael@0: profilers for Linux that can handle threads and/or shared libraries. michael@0: This code is based heavily on Kipp Hickman's leaky. michael@0: michael@0:

Operation

michael@0: michael@0: Jprof operates by installing a timer which periodically interrupts mozilla. michael@0: When this timer goes off, the jprof code inside mozilla walks the function call michael@0: stack to determine which code was executing and saves the results into the michael@0: jprof-log and jprof-map files. By collecting a large michael@0: number of these call stacks, it is possible to deduce where mozilla is spending michael@0: its time. michael@0: michael@0:

Setup

michael@0: michael@0:

Configure your mozilla with jprof support by adding michael@0: --enable-jprof to your configure options (eg adding michael@0: ac_add_options --enable-jprof to your .mozconfig) and michael@0: making sure that you do not have the michael@0: --enable-strip configure option set -- jprof needs symbols to michael@0: operate. On many architectures with GCC, you'll need to add michael@0: --enable-optimize="-O3 -fno-omit-frame-pointer" or the michael@0: equivalent to ensure frame pointer generation in the compiler you're using.

michael@0: michael@0:

Finally, build mozilla with your new configuration. Now you can run jprof.

michael@0: michael@0:

Usage

michael@0:
 jprof [-v] [-t] [-e exclude] [-i include] [-s stackdepth] [--last] [--all] [--start n [--end m]] [--output-dir dir] prog log [log2 ...]
michael@0: Options: michael@0: michael@0: The behavior of jprof is determined by the value of the JPROF_FLAGS environment michael@0: variable. This environment variable can be composed of several substrings michael@0: which have the following meanings: michael@0: michael@0: michael@0:

Starting and stopping jprof from JavaScript

michael@0:

michael@0: A build with jprof enabled adds four functions to the Window object:

michael@0: JProfStartProfiling() and JProfStopProfiling(): When used with JP_DEFER, these michael@0: allow one to start and stop the timer just around whatever critical section is michael@0: being profiled.

michael@0: JProfClearCircular() and JProfSaveCircular(): michael@0: These clear the circular buffer and save the buffer (without stopping), respectively.

michael@0: michael@0:

Examples of JPROF_FLAGS usage

michael@0: michael@0: michael@0:

Pausing profiles

michael@0: michael@0:

jprof can be paused at any time by sending a SIGUSR1 to mozilla (kill michael@0: -USR1). This will cause the timer signals to stop and jprof-map to be michael@0: written, but it will not close jprof-log. Combining SIGUSR1 with the JP_DEFER michael@0: option allows profiling of one sequence of actions by starting the timer right michael@0: before starting the actions and stopping the timer right afterward. michael@0: michael@0:

After a SIGUSR1, sending another timer signal (SIGPROF, SIGALRM, or SIGPOLL (aka SIGIO), michael@0: depending on the mode) can be used to continue writing data to the same michael@0: output. michael@0: michael@0:

SIGUSR2 will cause the circular buffer to be cleared, if it's in use. michael@0: This is useful right before running a test when you're using a large, michael@0: continuous circular buffer, or programmatically at the start of an action michael@0: which might take too long (JProfClearCircular()). michael@0: michael@0:

Looking at the results

michael@0: michael@0: Now that we have jprof-log and jprof-map files, we michael@0: can use the jprof executable is used to turn them into readable output. To do michael@0: this jprof needs the name of the mozilla binary and the log file. It deduces michael@0: the name of the map file: michael@0: michael@0:
michael@0:   ./jprof /home/user/mozilla/objdir/dist/bin/firefox ./jprof-log > tmp.html
michael@0: 
michael@0: michael@0: This will generate the file tmp.html which you should view in a michael@0: web browser. michael@0: michael@0:
michael@0:   ./jprof --output-dir=/tmp /home/user/mozilla/objdir/dist/bin/firefox ./jprof-log*
michael@0: 
michael@0: michael@0: This will generate a set of files in /tmp for each process. michael@0: michael@0: michael@0:

Interpretation

michael@0: michael@0: michael@0: The Jprof output is split into a flat portion and a hierarchical portion. michael@0: There are links to each section at the top of the page. It is typically michael@0: easier to analyze the profile by starting with the flat output and following michael@0: the links contained in the flat output up to the hierarchical output. michael@0: michael@0:

Flat output

michael@0: michael@0: The flat portion of the profile indicates which functions were executing michael@0: when the timer was going off. It is displayed as a list of functions names michael@0: on the right and the number of times that function was interrupted on the michael@0: left. The list is sorted by decreasing interrupt count. For example: michael@0: michael@0:
michael@0: Total hit count: 151603
michael@0: Count %Total  Function Name
michael@0: 
michael@0: 8806   5.8     __libc_poll
michael@0: 2254   1.5     __i686.get_pc_thunk.bx
michael@0: 2053   1.4     _int_malloc
michael@0: 1777   1.2     nsStyleContext::GetStyleData(nsStyleStructID)
michael@0: 1600   1.1     __libc_malloc
michael@0: 1552   1.0     nsCOMPtr_base::~nsCOMPtr_base()
michael@0: 
michael@0: michael@0: This shows that of the 151603 times the timer fired, 1777 (1.2% of the total) were inside nsStyleContext::GetStyleData() and 1552 (1.0% of the total) were in the nsCOMPtr_base destructor. michael@0: michael@0:

michael@0: In general, the functions with the highest count are the functions which michael@0: are taking the most time. michael@0: michael@0:

michael@0: The function names are linked to the entry for that function in the michael@0: hierarchical profile, which is described in the next section. michael@0: michael@0:

Hierarchical output

michael@0: michael@0: The hierarchical output is divided up into sections, with each section michael@0: corresponding to one function. A typical section looks something like michael@0: this: michael@0: michael@0:
michael@0:  index  Count         Hits      Function Name
michael@0:                            545 (46.4%) nsBlockFrame::ReflowInlineFrames(nsBlockReflowState&, nsLineList_iterator, int*)
michael@0:                            100 (8.5%)  nsBlockFrame::ReflowDirtyLines(nsBlockReflowState&)
michael@0:  72870      4 (0.3%)       645 (54.9%) nsBlockFrame::DoReflowInlineFrames(nsBlockReflowState&, nsLineLayout&, nsLineList_iterator, nsFlowAreaRect&, int&, nsFloatManager::SavedState*, int*, LineReflowStatus*, int)
michael@0:                            545 (46.4%) nsBlockFrame::ReflowInlineFrame(nsBlockReflowState&, nsLineLayout&, nsLineList_iterator, nsIFrame*, LineReflowStatus*)
michael@0:                             83 (7.1%)  nsBlockFrame::PlaceLine(nsBlockReflowState&, nsLineLayout&, nsLineList_iterator, nsFloatManager::SavedState*, nsRect&, int&, int*)
michael@0:                              9 (0.8%)  nsLineLayout::BeginLineReflow(int, int, int, int, int, int)
michael@0:                              1 (0.1%)  nsTextFrame::GetType() const
michael@0:                              1 (0.1%)  nsLineLayout::RelativePositionFrames(nsOverflowAreas&)
michael@0:                              1 (0.1%)  __i686.get_pc_thunk.bx
michael@0:                              1 (0.1%)  PL_ArenaAllocate
michael@0: 
michael@0: michael@0: The information this block tells us is: michael@0: michael@0: michael@0: michael@0: michael@0: The rest of this section explains how to read this information off from the jprof output. michael@0: michael@0:

This block corresponds to the function nsBlockFrame::DoReflowInlineFrames, which is michael@0: therefore bolded and not a link. The name of this function is preceded by michael@0: five numbers which have the following meaning. The number on the left (72870) michael@0: is the index number, and is not important. The next number (4) and the michael@0: percentage following (0.3%) are the number michael@0: of times this function was interrupted by the timer and the percentage of michael@0: the total hits that is. The last number pair ("645 (54.9%)") michael@0: are the number of times this function was in the call stack when the timer went michael@0: off. That is, the timer went off while we were in code that was ultimately michael@0: called from nsBlockFrame::DoReflowInlineFrames. michael@0:

For our example we can see that our function was in the call stack for michael@0: 645 interrupt ticks, but we were only the function that was running when michael@0: the interrupt arrived 4 times. michael@0:

michael@0: The functions listed above the line for nsBlockFrame::DoReflowInlineFrames are its michael@0: callers. The numbers to the left of these function names are the numbers of michael@0: times these functions were in the call stack as callers of michael@0: nsBlockFrame::DoReflowInlineFrames. In our example, we were called 545 times by michael@0: nsBlockFrame::ReflowInlineFrames and 100 times by michael@0: nsBlockFrame::ReflowDirtyLines. michael@0:

michael@0: The functions listed below the line for nsBlockFrame::DoReflowInlineFrames are its michael@0: callees. The numbers to the left of the function names are the numbers of michael@0: times these functions were in the callstack as callees of michael@0: nsBlockFrame::DoReflowInlineFrames and the corresponding percentages. In our example, of the 645 profiler hits under nsBlockFrame::DoReflowInlineFrames 545 were under nsBlockFrame::ReflowInlineFrame, 83 were under nsBlockFrame::PlaceLine, and so forth.

michael@0: michael@0: NOTE: If there are loops of execution or recursion, the numbers will michael@0: not add up and percentages can exceed 100%. If a function directly calls michael@0: itself "(self)" will be appended to the line, but indirect recursion will michael@0: not be marked. michael@0: michael@0:

Bugs

michael@0: The current build of Jprof has only been tested under Ubuntu 8.04 LTS, but michael@0: should work under any fairly modern linux distribution using GCC/GLIBC. michael@0: Please update this document with any known compatibilities/incompatibilities. michael@0:

michael@0: If you get an error:

Inconsistency detected by ld.so: dl-open.c: 260: dl_open_worker: Assertion `_dl_debug_initialize (0, args->nsid)->r_state == RT_CONSISTENT' failed! michael@0:

that means you've hit a timing hole in the version of glibc you're michael@0: running. See Redhat bug 4578. michael@0: michael@0: michael@0: michael@0: