tools/jprof/README.html

Wed, 31 Dec 2014 06:09:35 +0100

author
Michael Schloh von Bennewitz <michael@schloh.com>
date
Wed, 31 Dec 2014 06:09:35 +0100
changeset 0
6474c204b198
permissions
-rw-r--r--

Cloned upstream origin tor-browser at tor-browser-31.3.0esr-4.5-1-build1
revision ID fc1c9ff7c1b2defdbc039f12214767608f46423f for hacking purpose.

     1 <!-- This Source Code Form is subject to the terms of the Mozilla Public
     2    - License, v. 2.0. If a copy of the MPL was not distributed with this
     3    - file, You can obtain one at http://mozilla.org/MPL/2.0/. -->
     5 <html>
     6 <head><title>The Jprof Profiler</title></head>
     8 <body bgcolor="#FFFFFF" text="#000000"
     9       link="#0000EE" vlink="#551A8B" alink="#FF0000">
    10 <center>
    11 <h1>The Jprof Profiler</h1>
    12 <font size="-1">
    13 <a href="mailto:jim_nance%yahoo.com">jim_nance@yahoo.com</a><p>
    14 Recent (4/2011) updates Randell Jesup (see bugzilla for contact info)
    15 </font>
    16 <hr>
    18 <a href="#introduction">Introduction</a> | <a href="#operation">Operation</a> |
    19 <a href="#setup">Setup</a> | <a href="#usage">Usage</a> |
    20 <a href="#interpretation">Interpretation</a>
    22 </center>
    23 <hr>
    25 <h3><a name="introduction">Introduction</a></h3>
    27 Jprof is a profiling tool.  I am writing it because I need to find out
    28 where mozilla is spending its time, and there do not seem to be any
    29 profilers for Linux that can handle threads and/or shared libraries.
    30 This code is based heavily on Kipp Hickman's leaky.
    32 <h3><a name="operation">Operation</a></h3>
    34 Jprof operates by installing a timer which periodically interrupts mozilla.
    35 When this timer goes off, the jprof code inside mozilla walks the function call
    36 stack to determine which code was executing and saves the results into the
    37 <code>jprof-log</code> and <code>jprof-map</code> files.  By collecting a large
    38 number of these call stacks, it is possible to deduce where mozilla is spending
    39 its time.
    41 <h3><a name="setup">Setup</a></h3>
    43 <p>Configure your mozilla with jprof support by adding
    44 <code>--enable-jprof</code> to your configure options (eg adding
    45 <code>ac_add_options --enable-jprof</code> to your <code>.mozconfig</code>) and
    46 making sure that you do <strong>not</strong> have the
    47 <code>--enable-strip</code> configure option set -- jprof needs symbols to
    48 operate.  On many architectures with GCC, you'll need to add
    49 <code>--enable-optimize="-O3 -fno-omit-frame-pointer"</code> or the
    50 equivalent to ensure frame pointer generation in the compiler you're using.</p>
    52 <p>Finally, build mozilla with your new configuration.  Now you can run jprof.</p>
    54 <h3><a name="usage">Usage</a></h3>
    55 <pre> jprof [-v] [-t] [-e exclude] [-i include] [-s stackdepth] [--last] [--all] [--start n [--end m]] [--output-dir dir] prog log [log2 ...]</pre>
    56 Options:
    57 <ul>
    58   <li><b>-s depth</b> : Limit depth looked at from captured stack
    59       frames</li>
    60   <li><b>-v</b> : Output some information about the symbols, memory map, etc.</li>
    61   <li><b>-t or --threads</b> : Group output according to thread.  May require external
    62       LD_PRELOAD library to help force sampling of spawned threads; jprof
    63        may capture the main thread only.  See <a
    64        href="http://sam.zoy.org/writings/programming/gprof.html">gprof-helper</a>;
    65        it may need adaption for jprof.</li>
    66   <li><b>--only-thread id</b> : Only output data for thread 'id'</li>
    67   <li><b>-e exclusion</b> : Allows excluding specific stack frames</li>
    68   <li><b>-i inclusion</b> : Allows including specific stack frames</li>
    69   <li><b>--last</b> : Only process data from the last 'section' of sampling
    70       (starting at the last PROF)</li>
    71   <li><b>--start N</b> : Start processing data at 'section' N </li>
    72   <li><b>--end N</b> : Stop processing data at 'section' N </li>
    73   <li><b>--output-dir dir</b> : Store generated .html files in the given directory </li>
    74 </ul>
    75 The behavior of jprof is determined by the value of the JPROF_FLAGS environment
    76 variable.  This environment variable can be composed of several substrings
    77 which have the following meanings:
    78 <ul>
    79     <li> <b>JP_START</b> : Install the signal handler, and start sending the
    80     timer signals.
    82     <li> <b>JP_DEFER</b> : Install the signal handler, but don't start sending
    83     the timer signals.  The user must start the signals by sending the first
    84     one (with <code>kill -PROF</code>, or with <code>kill -ALRM</code> if
    85     JP_REALTIME is used, or with <code>kill -POLL</code> (also known as <code>kill -IO</code>) if JP_RTC_HZ is used).
    87     <li> <b>JP_FIRST=x</b> : Wait x seconds before starting the timer
    89     <li> <b>JP_PERIOD=y</b> : Set timer to interrupt every y seconds.  Only
    90     values of y greater than or equal to 0.001 are supported.  Default is
    91     0.050 (50ms).
    93     <li> <b>JP_REALTIME</b> : Do the profiling in intervals of real time rather
    94     than intervals of time used by the mozilla process (and the kernel
    95     when doing work for mozilla).  This could probably lead to weird
    96     results (you'll see whatever runs when mozilla is waiting for events),
    97     but is needed to see time spent in the X server.
    99     <li> <b>JP_RTC_HZ=freq</b> : This option, only available on Linux if the
   100     kernel is built with RTC support, makes jprof use the RTC timer instead of
   101     using its own timer.  This option, like JP_REALTIME, uses intervals of real
   102     time.  This option overrides JP_PERIOD.  <code>freq</code> is the frequency
   103     at which the timer should fire, measured in Hz. It must be a power of 2.
   104     The maximal frequency allowed by the kernel can be changed by writing to
   105     <code>/proc/sys/dev/rtc/max-user-freq</code>; the maximum value it can be
   106     set to is 8192.  Note that <code>/dev/rtc</code> will need to be readable
   107     by the Firefox process; making that file world-readable is a simple way to
   108     accomplish that.
   110     <li> <b>JP_CIRCULAR=size</b> : This tells jprof to store samples in a
   111     circular buffer of the given size, which then will be saved (appended)
   112     to disk when SIGUSR1 is received or JProfStopProfiling is done.  If the
   113     buffer overflows, the oldest entries will be evicted until there's
   114     space for the new entry.<p>
   116     SIGUSR2 will cause the circular buffer to be cleared.
   118     <li> <b>JP_FILENAME=basefilename</b> : This is the filename used for
   119     saving the log files to; the default is "jprof-log".  If Electrolysis
   120     is used, each process after the first will have the process ID
   121     added ("jprof-log-3212");
   123 </ul>
   125 <h4>Starting and stopping jprof from JavaScript</h4>
   126 <p>
   127 A build with jprof enabled adds four functions to the Window object:<p>
   128 <code>JProfStartProfiling()</code> and <code>JProfStopProfiling()</code>:  When used with JP_DEFER, these
   129 allow one to start and stop the timer just around whatever critical section is
   130 being profiled.</p><p>
   131 <code>JProfClearCircular()</code> and <code>JProfSaveCircular()</code>:
   132 These clear the circular buffer and save the buffer (without stopping), respectively.</p>
   134 <h4>Examples of JPROF_FLAGS usage</h4>
   135 <ul>
   137   <li>To make the timer start firing 3 seconds after the program is started and
   138   fire every 25 milliseconds of program time use:
   139     <pre>
   140         setenv JPROF_FLAGS "JP_START JP_FIRST=3 JP_PERIOD=0.025" </pre>
   142   <li>To make the timer start on your signal and fire every 1 millisecond of
   143   program time use:  
   144     <pre>
   145         setenv JPROF_FLAGS "JP_DEFER JP_PERIOD=0.001" </pre>
   147   <li>To make the timer start on your signal and fire every 10 milliseconds of
   148   wall-clock time use:  
   149     <pre>
   150         setenv JPROF_FLAGS "JP_DEFER JP_PERIOD=0.010 JP_REALTIME" </pre>
   152   <li>To make the timer start on your signal and fire at 8192 Hz in wall-clock
   153   time use:
   154     <pre>
   155         setenv JPROF_FLAGS "JP_DEFER JP_RTC_HZ=8192" </pre>
   157   <li>To make the timer start on JProfStartProfiling() and run continously
   158       with a 1ms sample rate until told to stop, then save the last 1MB of
   159       data:
   160     <pre>
   161         setenv JPROF_FLAGS "JP_DEFER JP_CIRCULAR=1048576 JP_PERIOD=0.001" </pre>
   163 </ul>
   165 <h4>Pausing profiles</h4>
   167 <P>jprof can be paused at any time by sending a SIGUSR1 to mozilla (<code>kill
   168 -USR1</code>).  This will cause the timer signals to stop and jprof-map to be
   169 written, but it will not close jprof-log.  Combining SIGUSR1 with the JP_DEFER
   170 option allows profiling of one sequence of actions by starting the timer right
   171 before starting the actions and stopping the timer right afterward.
   173 <P>After a SIGUSR1, sending another timer signal (SIGPROF, SIGALRM, or SIGPOLL (aka SIGIO),
   174 depending on the mode) can be used to continue writing data to the same
   175 output.
   177 <P>SIGUSR2 will cause the circular buffer to be cleared, if it's in use.
   178 This is useful right before running a test when you're using a large,
   179 continuous circular buffer, or programmatically at the start of an action
   180 which might take too long (JProfClearCircular()).
   182 <h4>Looking at the results</h4>
   184 Now that we have <code>jprof-log</code> and <code>jprof-map</code> files, we
   185 can use the jprof executable is used to turn them into readable output.  To do
   186 this jprof needs the name of the mozilla binary and the log file.  It deduces
   187 the name of the map file:
   189 <pre>
   190   ./jprof /home/user/mozilla/objdir/dist/bin/firefox ./jprof-log > tmp.html
   191 </pre>
   193 This will generate the file <code>tmp.html</code> which you should view in a
   194 web browser.
   196 <pre>
   197   ./jprof --output-dir=/tmp /home/user/mozilla/objdir/dist/bin/firefox ./jprof-log*
   198 </pre>
   200 This will generate a set of files in /tmp for each process.
   203 <h3><a name="interpretation">Interpretation</a></h3>
   206 The Jprof output is split into a flat portion and a hierarchical portion.
   207 There are links to each section at the top of the page.  It is typically
   208 easier to analyze the profile by starting with the flat output and following
   209 the links contained in the flat output up to the hierarchical output.
   211 <h4><a name="flat">Flat output</a></h3>
   213 The flat portion of the profile indicates which functions were executing
   214 when the timer was going off.  It is displayed as a list of functions names
   215 on the right and the number of times that function was interrupted on the
   216 left.  The list is sorted by decreasing interrupt count.  For example:
   218 <blockquote> <pre>
   219 Total hit count: 151603
   220 Count %Total  Function Name
   222 <a href="#23081">8806   5.8     __libc_poll</a>
   223 <a href="#40008">2254   1.5     __i686.get_pc_thunk.bx</a>
   224 <a href="#21390">2053   1.4     _int_malloc</a>
   225 <a href="#49013">1777   1.2     nsStyleContext::GetStyleData(nsStyleStructID)</a>
   226 <a href="#21380">1600   1.1     __libc_malloc</a>
   227 <a href="#603">1552   1.0     nsCOMPtr_base::~nsCOMPtr_base()</a>
   228 </pre> </blockquote>
   230 This shows that of the 151603 times the timer fired, 1777 (1.2% of the total) were inside nsStyleContext::GetStyleData() and 1552 (1.0% of the total) were in the nsCOMPtr_base destructor.
   232 <p>
   233 In general, the functions with the highest count are the functions which
   234 are taking the most time.
   236 <P>
   237 The function names are linked to the entry for that function in the
   238 hierarchical profile, which is described in the next section.
   240 <h4><a name="hier">Hierarchical output</a></h4>
   242 The hierarchical output is divided up into sections, with each section
   243 corresponding to one function.  A typical section looks something like
   244 this:
   246 <blockquote><pre>
   247  index  Count         Hits      Function Name
   248                       <A href="#72871">     545 (46.4%) nsBlockFrame::ReflowInlineFrames(nsBlockReflowState&, nsLineList_iterator, int*)</A>
   249                       <A href="#72873">     100 (8.5%)  nsBlockFrame::ReflowDirtyLines(nsBlockReflowState&)</A>
   250  72870      4 (0.3%)  <a name=72870>     645 (54.9%)</a> <b>nsBlockFrame::DoReflowInlineFrames(nsBlockReflowState&, nsLineLayout&, nsLineList_iterator, nsFlowAreaRect&, int&, nsFloatManager::SavedState*, int*, LineReflowStatus*, int)</b>
   251                       <A href="#72821">     545 (46.4%) nsBlockFrame::ReflowInlineFrame(nsBlockReflowState&, nsLineLayout&, nsLineList_iterator, nsIFrame*, LineReflowStatus*)</A>
   252                       <A href="#72853">      83 (7.1%)  nsBlockFrame::PlaceLine(nsBlockReflowState&, nsLineLayout&, nsLineList_iterator, nsFloatManager::SavedState*, nsRect&, int&, int*)</A>
   253                       <A href="#74150">       9 (0.8%)  nsLineLayout::BeginLineReflow(int, int, int, int, int, int)</A>
   254                       <A href="#74897">       1 (0.1%)  nsTextFrame::GetType() const</A>
   255                       <A href="#74131">       1 (0.1%)  nsLineLayout::RelativePositionFrames(nsOverflowAreas&)</A>
   256                       <A href="#58320">       1 (0.1%)  __i686.get_pc_thunk.bx</A>
   257                       <A href="#53077">       1 (0.1%)  PL_ArenaAllocate</A>
   258 </pre></blockquote>
   260 The information this block tells us is:
   262 <ul>
   263 <li>There were 4 profiler hits <em>in</em> <code>nsBlockFrame::DoReflowInlineFrames</code>
   264 <li>There were 645 profiler hits <em>in or under</em> <code>nsBlockFrame::DoReflowInlineFrames</code>.  Of these:
   265 <ul>
   266   <li>545 were in or under <code>nsBlockFrame::ReflowInlineFrame</code>
   267   <li>83 were in or under <code>nsBlockFrame::PlaceLine</code>
   268   <li>9 were in or under <code>nsLineLayout::BeginLineReflow</code>
   269   <li>1 was in or under <code>nsTextFrame::GetType</code>
   270   <li>1 was in or under <code>nsLineLayout::RelativePositionFrames</code>
   271   <li>1 was in or under <code>__i686.get_pc_thunk.bx</code>
   272   <li>1 was in or under <code>PL_ArenaAllocate</code>
   273 </ul>
   274 <li>Of these 645 calls into <code>nsBlockFrame::DoReflowInlineFrames</code>:
   275 <ul>
   276   <li>545 came from <code>nsBlockFrame::ReflowInlineFrames</code>
   277   <li>100 came from <code>nsBlockFrame::ReflowDirtyLines</code>
   278 </ul>
   279 </ul>
   282 The rest of this section explains how to read this information off from the jprof output.
   284 <p>This block corresponds to the function <code>nsBlockFrame::DoReflowInlineFrames</code>, which is
   285 therefore bolded and not a link.  The name of this function is preceded by
   286 five numbers which have the following meaning.  The number on the left (72870)
   287 is the index number, and is not important.  The next number (4) and the
   288 percentage following (0.3%) are the number
   289 of times this function was interrupted by the timer and the percentage of
   290 the total hits that is.  The last number pair ("645 (54.9%)")
   291 are the number of times this function was in the call stack when the timer went
   292 off.  That is, the timer went off while we were in code that was ultimately
   293 called from <code>nsBlockFrame::DoReflowInlineFrames</code>.
   294 <p>For our example we can see that our function was in the call stack for
   295 645 interrupt ticks, but we were only the function that was running when
   296 the interrupt arrived 4 times.
   297 <P>
   298 The functions listed above the line for <code>nsBlockFrame::DoReflowInlineFrames</code> are its
   299 callers.  The numbers to the left of these function names are the numbers of
   300 times these functions were in the call stack as callers of
   301 <code>nsBlockFrame::DoReflowInlineFrames</code>.  In our example, we were called 545 times by
   302 <code>nsBlockFrame::ReflowInlineFrames</code> and 100 times by
   303 <code>nsBlockFrame::ReflowDirtyLines</code>.
   304 <P>
   305 The functions listed below the line for <code>nsBlockFrame::DoReflowInlineFrames</code> are its
   306 callees.  The numbers to the left of the function names are the numbers of
   307 times these functions were in the callstack as callees of
   308 <code>nsBlockFrame::DoReflowInlineFrames</code> and the corresponding percentages. In our example, of the 645 profiler hits under <code>nsBlockFrame::DoReflowInlineFrames</code> 545 were under <code>nsBlockFrame::ReflowInlineFrame</code>, 83 were under <code>nsBlockFrame::PlaceLine</code>, and so forth.<p>
   310 <b>NOTE:</b> If there are loops of execution or recursion, the numbers will
   311 not add up and percentages can exceed 100%.  If a function directly calls
   312 itself "(self)" will be appended to the line, but indirect recursion will
   313 not be marked.
   315 <h3>Bugs</h3>
   316 The current build of Jprof has only been tested under Ubuntu 8.04 LTS, but
   317 should work under any fairly modern linux distribution using GCC/GLIBC.
   318 Please update this document with any known compatibilities/incompatibilities.
   319 <p>
   320 If you get an error:<p><code>Inconsistency detected by ld.so: dl-open.c: 260: dl_open_worker: Assertion `_dl_debug_initialize (0, args->nsid)->r_state == RT_CONSISTENT' failed!
   321 </code><p>that means you've hit a timing hole in the version of glibc you're
   322 running.  See <a
   323 href="http://sources.redhat.com/bugzilla/show_bug.cgi?id=4578">Redhat bug 4578</a>.
   324 <!-- <h3>Update</h3>
   325 <ul>
   326 </ul>
   327 -->
   329 </body>
   330 </html>

mercurial