tools/jprof/README.html

changeset 0
6474c204b198
     1.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     1.2 +++ b/tools/jprof/README.html	Wed Dec 31 06:09:35 2014 +0100
     1.3 @@ -0,0 +1,330 @@
     1.4 +<!-- This Source Code Form is subject to the terms of the Mozilla Public
     1.5 +   - License, v. 2.0. If a copy of the MPL was not distributed with this
     1.6 +   - file, You can obtain one at http://mozilla.org/MPL/2.0/. -->
     1.7 +
     1.8 +<html>
     1.9 +<head><title>The Jprof Profiler</title></head>
    1.10 +
    1.11 +<body bgcolor="#FFFFFF" text="#000000"
    1.12 +      link="#0000EE" vlink="#551A8B" alink="#FF0000">
    1.13 +<center>
    1.14 +<h1>The Jprof Profiler</h1>
    1.15 +<font size="-1">
    1.16 +<a href="mailto:jim_nance%yahoo.com">jim_nance@yahoo.com</a><p>
    1.17 +Recent (4/2011) updates Randell Jesup (see bugzilla for contact info)
    1.18 +</font>
    1.19 +<hr>
    1.20 +
    1.21 +<a href="#introduction">Introduction</a> | <a href="#operation">Operation</a> |
    1.22 +<a href="#setup">Setup</a> | <a href="#usage">Usage</a> |
    1.23 +<a href="#interpretation">Interpretation</a>
    1.24 +
    1.25 +</center>
    1.26 +<hr>
    1.27 +
    1.28 +<h3><a name="introduction">Introduction</a></h3>
    1.29 +
    1.30 +Jprof is a profiling tool.  I am writing it because I need to find out
    1.31 +where mozilla is spending its time, and there do not seem to be any
    1.32 +profilers for Linux that can handle threads and/or shared libraries.
    1.33 +This code is based heavily on Kipp Hickman's leaky.
    1.34 +
    1.35 +<h3><a name="operation">Operation</a></h3>
    1.36 +
    1.37 +Jprof operates by installing a timer which periodically interrupts mozilla.
    1.38 +When this timer goes off, the jprof code inside mozilla walks the function call
    1.39 +stack to determine which code was executing and saves the results into the
    1.40 +<code>jprof-log</code> and <code>jprof-map</code> files.  By collecting a large
    1.41 +number of these call stacks, it is possible to deduce where mozilla is spending
    1.42 +its time.
    1.43 +
    1.44 +<h3><a name="setup">Setup</a></h3>
    1.45 +
    1.46 +<p>Configure your mozilla with jprof support by adding
    1.47 +<code>--enable-jprof</code> to your configure options (eg adding
    1.48 +<code>ac_add_options --enable-jprof</code> to your <code>.mozconfig</code>) and
    1.49 +making sure that you do <strong>not</strong> have the
    1.50 +<code>--enable-strip</code> configure option set -- jprof needs symbols to
    1.51 +operate.  On many architectures with GCC, you'll need to add
    1.52 +<code>--enable-optimize="-O3 -fno-omit-frame-pointer"</code> or the
    1.53 +equivalent to ensure frame pointer generation in the compiler you're using.</p>
    1.54 +
    1.55 +<p>Finally, build mozilla with your new configuration.  Now you can run jprof.</p>
    1.56 +
    1.57 +<h3><a name="usage">Usage</a></h3>
    1.58 +<pre> jprof [-v] [-t] [-e exclude] [-i include] [-s stackdepth] [--last] [--all] [--start n [--end m]] [--output-dir dir] prog log [log2 ...]</pre>
    1.59 +Options:
    1.60 +<ul>
    1.61 +  <li><b>-s depth</b> : Limit depth looked at from captured stack
    1.62 +      frames</li>
    1.63 +  <li><b>-v</b> : Output some information about the symbols, memory map, etc.</li>
    1.64 +  <li><b>-t or --threads</b> : Group output according to thread.  May require external
    1.65 +      LD_PRELOAD library to help force sampling of spawned threads; jprof
    1.66 +       may capture the main thread only.  See <a
    1.67 +       href="http://sam.zoy.org/writings/programming/gprof.html">gprof-helper</a>;
    1.68 +       it may need adaption for jprof.</li>
    1.69 +  <li><b>--only-thread id</b> : Only output data for thread 'id'</li>
    1.70 +  <li><b>-e exclusion</b> : Allows excluding specific stack frames</li>
    1.71 +  <li><b>-i inclusion</b> : Allows including specific stack frames</li>
    1.72 +  <li><b>--last</b> : Only process data from the last 'section' of sampling
    1.73 +      (starting at the last PROF)</li>
    1.74 +  <li><b>--start N</b> : Start processing data at 'section' N </li>
    1.75 +  <li><b>--end N</b> : Stop processing data at 'section' N </li>
    1.76 +  <li><b>--output-dir dir</b> : Store generated .html files in the given directory </li>
    1.77 +</ul>
    1.78 +The behavior of jprof is determined by the value of the JPROF_FLAGS environment
    1.79 +variable.  This environment variable can be composed of several substrings
    1.80 +which have the following meanings:
    1.81 +<ul>
    1.82 +    <li> <b>JP_START</b> : Install the signal handler, and start sending the
    1.83 +    timer signals.
    1.84 +    
    1.85 +    <li> <b>JP_DEFER</b> : Install the signal handler, but don't start sending
    1.86 +    the timer signals.  The user must start the signals by sending the first
    1.87 +    one (with <code>kill -PROF</code>, or with <code>kill -ALRM</code> if
    1.88 +    JP_REALTIME is used, or with <code>kill -POLL</code> (also known as <code>kill -IO</code>) if JP_RTC_HZ is used).
    1.89 +
    1.90 +    <li> <b>JP_FIRST=x</b> : Wait x seconds before starting the timer
    1.91 +
    1.92 +    <li> <b>JP_PERIOD=y</b> : Set timer to interrupt every y seconds.  Only
    1.93 +    values of y greater than or equal to 0.001 are supported.  Default is
    1.94 +    0.050 (50ms).
    1.95 +    
    1.96 +    <li> <b>JP_REALTIME</b> : Do the profiling in intervals of real time rather
    1.97 +    than intervals of time used by the mozilla process (and the kernel
    1.98 +    when doing work for mozilla).  This could probably lead to weird
    1.99 +    results (you'll see whatever runs when mozilla is waiting for events),
   1.100 +    but is needed to see time spent in the X server.
   1.101 +
   1.102 +    <li> <b>JP_RTC_HZ=freq</b> : This option, only available on Linux if the
   1.103 +    kernel is built with RTC support, makes jprof use the RTC timer instead of
   1.104 +    using its own timer.  This option, like JP_REALTIME, uses intervals of real
   1.105 +    time.  This option overrides JP_PERIOD.  <code>freq</code> is the frequency
   1.106 +    at which the timer should fire, measured in Hz. It must be a power of 2.
   1.107 +    The maximal frequency allowed by the kernel can be changed by writing to
   1.108 +    <code>/proc/sys/dev/rtc/max-user-freq</code>; the maximum value it can be
   1.109 +    set to is 8192.  Note that <code>/dev/rtc</code> will need to be readable
   1.110 +    by the Firefox process; making that file world-readable is a simple way to
   1.111 +    accomplish that.
   1.112 +
   1.113 +    <li> <b>JP_CIRCULAR=size</b> : This tells jprof to store samples in a
   1.114 +    circular buffer of the given size, which then will be saved (appended)
   1.115 +    to disk when SIGUSR1 is received or JProfStopProfiling is done.  If the
   1.116 +    buffer overflows, the oldest entries will be evicted until there's
   1.117 +    space for the new entry.<p>
   1.118 +	 
   1.119 +    SIGUSR2 will cause the circular buffer to be cleared.
   1.120 +
   1.121 +    <li> <b>JP_FILENAME=basefilename</b> : This is the filename used for
   1.122 +    saving the log files to; the default is "jprof-log".  If Electrolysis
   1.123 +    is used, each process after the first will have the process ID
   1.124 +    added ("jprof-log-3212");
   1.125 +
   1.126 +</ul>
   1.127 +
   1.128 +<h4>Starting and stopping jprof from JavaScript</h4>
   1.129 +<p>
   1.130 +A build with jprof enabled adds four functions to the Window object:<p>
   1.131 +<code>JProfStartProfiling()</code> and <code>JProfStopProfiling()</code>:  When used with JP_DEFER, these
   1.132 +allow one to start and stop the timer just around whatever critical section is
   1.133 +being profiled.</p><p>
   1.134 +<code>JProfClearCircular()</code> and <code>JProfSaveCircular()</code>:
   1.135 +These clear the circular buffer and save the buffer (without stopping), respectively.</p>
   1.136 +
   1.137 +<h4>Examples of JPROF_FLAGS usage</h4>
   1.138 +<ul>
   1.139 +
   1.140 +  <li>To make the timer start firing 3 seconds after the program is started and
   1.141 +  fire every 25 milliseconds of program time use:
   1.142 +    <pre>
   1.143 +        setenv JPROF_FLAGS "JP_START JP_FIRST=3 JP_PERIOD=0.025" </pre>
   1.144 +
   1.145 +  <li>To make the timer start on your signal and fire every 1 millisecond of
   1.146 +  program time use:  
   1.147 +    <pre>
   1.148 +        setenv JPROF_FLAGS "JP_DEFER JP_PERIOD=0.001" </pre>
   1.149 +
   1.150 +  <li>To make the timer start on your signal and fire every 10 milliseconds of
   1.151 +  wall-clock time use:  
   1.152 +    <pre>
   1.153 +        setenv JPROF_FLAGS "JP_DEFER JP_PERIOD=0.010 JP_REALTIME" </pre>
   1.154 +
   1.155 +  <li>To make the timer start on your signal and fire at 8192 Hz in wall-clock
   1.156 +  time use:
   1.157 +    <pre>
   1.158 +        setenv JPROF_FLAGS "JP_DEFER JP_RTC_HZ=8192" </pre>
   1.159 +  
   1.160 +  <li>To make the timer start on JProfStartProfiling() and run continously
   1.161 +      with a 1ms sample rate until told to stop, then save the last 1MB of
   1.162 +      data:
   1.163 +    <pre>
   1.164 +        setenv JPROF_FLAGS "JP_DEFER JP_CIRCULAR=1048576 JP_PERIOD=0.001" </pre>
   1.165 +
   1.166 +</ul>
   1.167 +
   1.168 +<h4>Pausing profiles</h4>
   1.169 +
   1.170 +<P>jprof can be paused at any time by sending a SIGUSR1 to mozilla (<code>kill
   1.171 +-USR1</code>).  This will cause the timer signals to stop and jprof-map to be
   1.172 +written, but it will not close jprof-log.  Combining SIGUSR1 with the JP_DEFER
   1.173 +option allows profiling of one sequence of actions by starting the timer right
   1.174 +before starting the actions and stopping the timer right afterward.
   1.175 +
   1.176 +<P>After a SIGUSR1, sending another timer signal (SIGPROF, SIGALRM, or SIGPOLL (aka SIGIO),
   1.177 +depending on the mode) can be used to continue writing data to the same
   1.178 +output.
   1.179 +
   1.180 +<P>SIGUSR2 will cause the circular buffer to be cleared, if it's in use.
   1.181 +This is useful right before running a test when you're using a large,
   1.182 +continuous circular buffer, or programmatically at the start of an action
   1.183 +which might take too long (JProfClearCircular()).
   1.184 +
   1.185 +<h4>Looking at the results</h4>
   1.186 +
   1.187 +Now that we have <code>jprof-log</code> and <code>jprof-map</code> files, we
   1.188 +can use the jprof executable is used to turn them into readable output.  To do
   1.189 +this jprof needs the name of the mozilla binary and the log file.  It deduces
   1.190 +the name of the map file:
   1.191 +
   1.192 +<pre>
   1.193 +  ./jprof /home/user/mozilla/objdir/dist/bin/firefox ./jprof-log > tmp.html
   1.194 +</pre>
   1.195 +
   1.196 +This will generate the file <code>tmp.html</code> which you should view in a
   1.197 +web browser.
   1.198 +
   1.199 +<pre>
   1.200 +  ./jprof --output-dir=/tmp /home/user/mozilla/objdir/dist/bin/firefox ./jprof-log*
   1.201 +</pre>
   1.202 +
   1.203 +This will generate a set of files in /tmp for each process.
   1.204 +
   1.205 +
   1.206 +<h3><a name="interpretation">Interpretation</a></h3>
   1.207 +
   1.208 +
   1.209 +The Jprof output is split into a flat portion and a hierarchical portion.
   1.210 +There are links to each section at the top of the page.  It is typically
   1.211 +easier to analyze the profile by starting with the flat output and following
   1.212 +the links contained in the flat output up to the hierarchical output.
   1.213 +
   1.214 +<h4><a name="flat">Flat output</a></h3>
   1.215 +
   1.216 +The flat portion of the profile indicates which functions were executing
   1.217 +when the timer was going off.  It is displayed as a list of functions names
   1.218 +on the right and the number of times that function was interrupted on the
   1.219 +left.  The list is sorted by decreasing interrupt count.  For example:
   1.220 +
   1.221 +<blockquote> <pre>
   1.222 +Total hit count: 151603
   1.223 +Count %Total  Function Name
   1.224 +
   1.225 +<a href="#23081">8806   5.8     __libc_poll</a>
   1.226 +<a href="#40008">2254   1.5     __i686.get_pc_thunk.bx</a>
   1.227 +<a href="#21390">2053   1.4     _int_malloc</a>
   1.228 +<a href="#49013">1777   1.2     nsStyleContext::GetStyleData(nsStyleStructID)</a>
   1.229 +<a href="#21380">1600   1.1     __libc_malloc</a>
   1.230 +<a href="#603">1552   1.0     nsCOMPtr_base::~nsCOMPtr_base()</a>
   1.231 +</pre> </blockquote>
   1.232 +
   1.233 +This shows that of the 151603 times the timer fired, 1777 (1.2% of the total) were inside nsStyleContext::GetStyleData() and 1552 (1.0% of the total) were in the nsCOMPtr_base destructor.
   1.234 +
   1.235 +<p>
   1.236 +In general, the functions with the highest count are the functions which
   1.237 +are taking the most time.
   1.238 +
   1.239 +<P>
   1.240 +The function names are linked to the entry for that function in the
   1.241 +hierarchical profile, which is described in the next section.
   1.242 +
   1.243 +<h4><a name="hier">Hierarchical output</a></h4>
   1.244 +
   1.245 +The hierarchical output is divided up into sections, with each section
   1.246 +corresponding to one function.  A typical section looks something like
   1.247 +this:
   1.248 +
   1.249 +<blockquote><pre>
   1.250 + index  Count         Hits      Function Name
   1.251 +                      <A href="#72871">     545 (46.4%) nsBlockFrame::ReflowInlineFrames(nsBlockReflowState&, nsLineList_iterator, int*)</A>
   1.252 +                      <A href="#72873">     100 (8.5%)  nsBlockFrame::ReflowDirtyLines(nsBlockReflowState&)</A>
   1.253 + 72870      4 (0.3%)  <a name=72870>     645 (54.9%)</a> <b>nsBlockFrame::DoReflowInlineFrames(nsBlockReflowState&, nsLineLayout&, nsLineList_iterator, nsFlowAreaRect&, int&, nsFloatManager::SavedState*, int*, LineReflowStatus*, int)</b>
   1.254 +                      <A href="#72821">     545 (46.4%) nsBlockFrame::ReflowInlineFrame(nsBlockReflowState&, nsLineLayout&, nsLineList_iterator, nsIFrame*, LineReflowStatus*)</A>
   1.255 +                      <A href="#72853">      83 (7.1%)  nsBlockFrame::PlaceLine(nsBlockReflowState&, nsLineLayout&, nsLineList_iterator, nsFloatManager::SavedState*, nsRect&, int&, int*)</A>
   1.256 +                      <A href="#74150">       9 (0.8%)  nsLineLayout::BeginLineReflow(int, int, int, int, int, int)</A>
   1.257 +                      <A href="#74897">       1 (0.1%)  nsTextFrame::GetType() const</A>
   1.258 +                      <A href="#74131">       1 (0.1%)  nsLineLayout::RelativePositionFrames(nsOverflowAreas&)</A>
   1.259 +                      <A href="#58320">       1 (0.1%)  __i686.get_pc_thunk.bx</A>
   1.260 +                      <A href="#53077">       1 (0.1%)  PL_ArenaAllocate</A>
   1.261 +</pre></blockquote>
   1.262 +
   1.263 +The information this block tells us is:
   1.264 +
   1.265 +<ul>
   1.266 +<li>There were 4 profiler hits <em>in</em> <code>nsBlockFrame::DoReflowInlineFrames</code>
   1.267 +<li>There were 645 profiler hits <em>in or under</em> <code>nsBlockFrame::DoReflowInlineFrames</code>.  Of these:
   1.268 +<ul>
   1.269 +  <li>545 were in or under <code>nsBlockFrame::ReflowInlineFrame</code>
   1.270 +  <li>83 were in or under <code>nsBlockFrame::PlaceLine</code>
   1.271 +  <li>9 were in or under <code>nsLineLayout::BeginLineReflow</code>
   1.272 +  <li>1 was in or under <code>nsTextFrame::GetType</code>
   1.273 +  <li>1 was in or under <code>nsLineLayout::RelativePositionFrames</code>
   1.274 +  <li>1 was in or under <code>__i686.get_pc_thunk.bx</code>
   1.275 +  <li>1 was in or under <code>PL_ArenaAllocate</code>
   1.276 +</ul>
   1.277 +<li>Of these 645 calls into <code>nsBlockFrame::DoReflowInlineFrames</code>:
   1.278 +<ul>
   1.279 +  <li>545 came from <code>nsBlockFrame::ReflowInlineFrames</code>
   1.280 +  <li>100 came from <code>nsBlockFrame::ReflowDirtyLines</code>
   1.281 +</ul>
   1.282 +</ul>
   1.283 +
   1.284 +
   1.285 +The rest of this section explains how to read this information off from the jprof output.
   1.286 +
   1.287 +<p>This block corresponds to the function <code>nsBlockFrame::DoReflowInlineFrames</code>, which is
   1.288 +therefore bolded and not a link.  The name of this function is preceded by
   1.289 +five numbers which have the following meaning.  The number on the left (72870)
   1.290 +is the index number, and is not important.  The next number (4) and the
   1.291 +percentage following (0.3%) are the number
   1.292 +of times this function was interrupted by the timer and the percentage of
   1.293 +the total hits that is.  The last number pair ("645 (54.9%)")
   1.294 +are the number of times this function was in the call stack when the timer went
   1.295 +off.  That is, the timer went off while we were in code that was ultimately
   1.296 +called from <code>nsBlockFrame::DoReflowInlineFrames</code>.
   1.297 +<p>For our example we can see that our function was in the call stack for
   1.298 +645 interrupt ticks, but we were only the function that was running when
   1.299 +the interrupt arrived 4 times.
   1.300 +<P>
   1.301 +The functions listed above the line for <code>nsBlockFrame::DoReflowInlineFrames</code> are its
   1.302 +callers.  The numbers to the left of these function names are the numbers of
   1.303 +times these functions were in the call stack as callers of
   1.304 +<code>nsBlockFrame::DoReflowInlineFrames</code>.  In our example, we were called 545 times by
   1.305 +<code>nsBlockFrame::ReflowInlineFrames</code> and 100 times by
   1.306 +<code>nsBlockFrame::ReflowDirtyLines</code>.
   1.307 +<P>
   1.308 +The functions listed below the line for <code>nsBlockFrame::DoReflowInlineFrames</code> are its
   1.309 +callees.  The numbers to the left of the function names are the numbers of
   1.310 +times these functions were in the callstack as callees of
   1.311 +<code>nsBlockFrame::DoReflowInlineFrames</code> and the corresponding percentages. In our example, of the 645 profiler hits under <code>nsBlockFrame::DoReflowInlineFrames</code> 545 were under <code>nsBlockFrame::ReflowInlineFrame</code>, 83 were under <code>nsBlockFrame::PlaceLine</code>, and so forth.<p>
   1.312 +
   1.313 +<b>NOTE:</b> If there are loops of execution or recursion, the numbers will
   1.314 +not add up and percentages can exceed 100%.  If a function directly calls
   1.315 +itself "(self)" will be appended to the line, but indirect recursion will
   1.316 +not be marked.
   1.317 +
   1.318 +<h3>Bugs</h3>
   1.319 +The current build of Jprof has only been tested under Ubuntu 8.04 LTS, but
   1.320 +should work under any fairly modern linux distribution using GCC/GLIBC.
   1.321 +Please update this document with any known compatibilities/incompatibilities.
   1.322 +<p>
   1.323 +If you get an error:<p><code>Inconsistency detected by ld.so: dl-open.c: 260: dl_open_worker: Assertion `_dl_debug_initialize (0, args->nsid)->r_state == RT_CONSISTENT' failed!
   1.324 +</code><p>that means you've hit a timing hole in the version of glibc you're
   1.325 +running.  See <a
   1.326 +href="http://sources.redhat.com/bugzilla/show_bug.cgi?id=4578">Redhat bug 4578</a>.
   1.327 +<!-- <h3>Update</h3>
   1.328 +<ul>
   1.329 +</ul>
   1.330 +-->
   1.331 +
   1.332 +</body>
   1.333 +</html>

mercurial