tools/jprof/README.html

Wed, 31 Dec 2014 06:09:35 +0100

author
Michael Schloh von Bennewitz <michael@schloh.com>
date
Wed, 31 Dec 2014 06:09:35 +0100
changeset 0
6474c204b198
permissions
-rw-r--r--

Cloned upstream origin tor-browser at tor-browser-31.3.0esr-4.5-1-build1
revision ID fc1c9ff7c1b2defdbc039f12214767608f46423f for hacking purpose.

michael@0 1 <!-- This Source Code Form is subject to the terms of the Mozilla Public
michael@0 2 - License, v. 2.0. If a copy of the MPL was not distributed with this
michael@0 3 - file, You can obtain one at http://mozilla.org/MPL/2.0/. -->
michael@0 4
michael@0 5 <html>
michael@0 6 <head><title>The Jprof Profiler</title></head>
michael@0 7
michael@0 8 <body bgcolor="#FFFFFF" text="#000000"
michael@0 9 link="#0000EE" vlink="#551A8B" alink="#FF0000">
michael@0 10 <center>
michael@0 11 <h1>The Jprof Profiler</h1>
michael@0 12 <font size="-1">
michael@0 13 <a href="mailto:jim_nance%yahoo.com">jim_nance@yahoo.com</a><p>
michael@0 14 Recent (4/2011) updates Randell Jesup (see bugzilla for contact info)
michael@0 15 </font>
michael@0 16 <hr>
michael@0 17
michael@0 18 <a href="#introduction">Introduction</a> | <a href="#operation">Operation</a> |
michael@0 19 <a href="#setup">Setup</a> | <a href="#usage">Usage</a> |
michael@0 20 <a href="#interpretation">Interpretation</a>
michael@0 21
michael@0 22 </center>
michael@0 23 <hr>
michael@0 24
michael@0 25 <h3><a name="introduction">Introduction</a></h3>
michael@0 26
michael@0 27 Jprof is a profiling tool. I am writing it because I need to find out
michael@0 28 where mozilla is spending its time, and there do not seem to be any
michael@0 29 profilers for Linux that can handle threads and/or shared libraries.
michael@0 30 This code is based heavily on Kipp Hickman's leaky.
michael@0 31
michael@0 32 <h3><a name="operation">Operation</a></h3>
michael@0 33
michael@0 34 Jprof operates by installing a timer which periodically interrupts mozilla.
michael@0 35 When this timer goes off, the jprof code inside mozilla walks the function call
michael@0 36 stack to determine which code was executing and saves the results into the
michael@0 37 <code>jprof-log</code> and <code>jprof-map</code> files. By collecting a large
michael@0 38 number of these call stacks, it is possible to deduce where mozilla is spending
michael@0 39 its time.
michael@0 40
michael@0 41 <h3><a name="setup">Setup</a></h3>
michael@0 42
michael@0 43 <p>Configure your mozilla with jprof support by adding
michael@0 44 <code>--enable-jprof</code> to your configure options (eg adding
michael@0 45 <code>ac_add_options --enable-jprof</code> to your <code>.mozconfig</code>) and
michael@0 46 making sure that you do <strong>not</strong> have the
michael@0 47 <code>--enable-strip</code> configure option set -- jprof needs symbols to
michael@0 48 operate. On many architectures with GCC, you'll need to add
michael@0 49 <code>--enable-optimize="-O3 -fno-omit-frame-pointer"</code> or the
michael@0 50 equivalent to ensure frame pointer generation in the compiler you're using.</p>
michael@0 51
michael@0 52 <p>Finally, build mozilla with your new configuration. Now you can run jprof.</p>
michael@0 53
michael@0 54 <h3><a name="usage">Usage</a></h3>
michael@0 55 <pre> jprof [-v] [-t] [-e exclude] [-i include] [-s stackdepth] [--last] [--all] [--start n [--end m]] [--output-dir dir] prog log [log2 ...]</pre>
michael@0 56 Options:
michael@0 57 <ul>
michael@0 58 <li><b>-s depth</b> : Limit depth looked at from captured stack
michael@0 59 frames</li>
michael@0 60 <li><b>-v</b> : Output some information about the symbols, memory map, etc.</li>
michael@0 61 <li><b>-t or --threads</b> : Group output according to thread. May require external
michael@0 62 LD_PRELOAD library to help force sampling of spawned threads; jprof
michael@0 63 may capture the main thread only. See <a
michael@0 64 href="http://sam.zoy.org/writings/programming/gprof.html">gprof-helper</a>;
michael@0 65 it may need adaption for jprof.</li>
michael@0 66 <li><b>--only-thread id</b> : Only output data for thread 'id'</li>
michael@0 67 <li><b>-e exclusion</b> : Allows excluding specific stack frames</li>
michael@0 68 <li><b>-i inclusion</b> : Allows including specific stack frames</li>
michael@0 69 <li><b>--last</b> : Only process data from the last 'section' of sampling
michael@0 70 (starting at the last PROF)</li>
michael@0 71 <li><b>--start N</b> : Start processing data at 'section' N </li>
michael@0 72 <li><b>--end N</b> : Stop processing data at 'section' N </li>
michael@0 73 <li><b>--output-dir dir</b> : Store generated .html files in the given directory </li>
michael@0 74 </ul>
michael@0 75 The behavior of jprof is determined by the value of the JPROF_FLAGS environment
michael@0 76 variable. This environment variable can be composed of several substrings
michael@0 77 which have the following meanings:
michael@0 78 <ul>
michael@0 79 <li> <b>JP_START</b> : Install the signal handler, and start sending the
michael@0 80 timer signals.
michael@0 81
michael@0 82 <li> <b>JP_DEFER</b> : Install the signal handler, but don't start sending
michael@0 83 the timer signals. The user must start the signals by sending the first
michael@0 84 one (with <code>kill -PROF</code>, or with <code>kill -ALRM</code> if
michael@0 85 JP_REALTIME is used, or with <code>kill -POLL</code> (also known as <code>kill -IO</code>) if JP_RTC_HZ is used).
michael@0 86
michael@0 87 <li> <b>JP_FIRST=x</b> : Wait x seconds before starting the timer
michael@0 88
michael@0 89 <li> <b>JP_PERIOD=y</b> : Set timer to interrupt every y seconds. Only
michael@0 90 values of y greater than or equal to 0.001 are supported. Default is
michael@0 91 0.050 (50ms).
michael@0 92
michael@0 93 <li> <b>JP_REALTIME</b> : Do the profiling in intervals of real time rather
michael@0 94 than intervals of time used by the mozilla process (and the kernel
michael@0 95 when doing work for mozilla). This could probably lead to weird
michael@0 96 results (you'll see whatever runs when mozilla is waiting for events),
michael@0 97 but is needed to see time spent in the X server.
michael@0 98
michael@0 99 <li> <b>JP_RTC_HZ=freq</b> : This option, only available on Linux if the
michael@0 100 kernel is built with RTC support, makes jprof use the RTC timer instead of
michael@0 101 using its own timer. This option, like JP_REALTIME, uses intervals of real
michael@0 102 time. This option overrides JP_PERIOD. <code>freq</code> is the frequency
michael@0 103 at which the timer should fire, measured in Hz. It must be a power of 2.
michael@0 104 The maximal frequency allowed by the kernel can be changed by writing to
michael@0 105 <code>/proc/sys/dev/rtc/max-user-freq</code>; the maximum value it can be
michael@0 106 set to is 8192. Note that <code>/dev/rtc</code> will need to be readable
michael@0 107 by the Firefox process; making that file world-readable is a simple way to
michael@0 108 accomplish that.
michael@0 109
michael@0 110 <li> <b>JP_CIRCULAR=size</b> : This tells jprof to store samples in a
michael@0 111 circular buffer of the given size, which then will be saved (appended)
michael@0 112 to disk when SIGUSR1 is received or JProfStopProfiling is done. If the
michael@0 113 buffer overflows, the oldest entries will be evicted until there's
michael@0 114 space for the new entry.<p>
michael@0 115
michael@0 116 SIGUSR2 will cause the circular buffer to be cleared.
michael@0 117
michael@0 118 <li> <b>JP_FILENAME=basefilename</b> : This is the filename used for
michael@0 119 saving the log files to; the default is "jprof-log". If Electrolysis
michael@0 120 is used, each process after the first will have the process ID
michael@0 121 added ("jprof-log-3212");
michael@0 122
michael@0 123 </ul>
michael@0 124
michael@0 125 <h4>Starting and stopping jprof from JavaScript</h4>
michael@0 126 <p>
michael@0 127 A build with jprof enabled adds four functions to the Window object:<p>
michael@0 128 <code>JProfStartProfiling()</code> and <code>JProfStopProfiling()</code>: When used with JP_DEFER, these
michael@0 129 allow one to start and stop the timer just around whatever critical section is
michael@0 130 being profiled.</p><p>
michael@0 131 <code>JProfClearCircular()</code> and <code>JProfSaveCircular()</code>:
michael@0 132 These clear the circular buffer and save the buffer (without stopping), respectively.</p>
michael@0 133
michael@0 134 <h4>Examples of JPROF_FLAGS usage</h4>
michael@0 135 <ul>
michael@0 136
michael@0 137 <li>To make the timer start firing 3 seconds after the program is started and
michael@0 138 fire every 25 milliseconds of program time use:
michael@0 139 <pre>
michael@0 140 setenv JPROF_FLAGS "JP_START JP_FIRST=3 JP_PERIOD=0.025" </pre>
michael@0 141
michael@0 142 <li>To make the timer start on your signal and fire every 1 millisecond of
michael@0 143 program time use:
michael@0 144 <pre>
michael@0 145 setenv JPROF_FLAGS "JP_DEFER JP_PERIOD=0.001" </pre>
michael@0 146
michael@0 147 <li>To make the timer start on your signal and fire every 10 milliseconds of
michael@0 148 wall-clock time use:
michael@0 149 <pre>
michael@0 150 setenv JPROF_FLAGS "JP_DEFER JP_PERIOD=0.010 JP_REALTIME" </pre>
michael@0 151
michael@0 152 <li>To make the timer start on your signal and fire at 8192 Hz in wall-clock
michael@0 153 time use:
michael@0 154 <pre>
michael@0 155 setenv JPROF_FLAGS "JP_DEFER JP_RTC_HZ=8192" </pre>
michael@0 156
michael@0 157 <li>To make the timer start on JProfStartProfiling() and run continously
michael@0 158 with a 1ms sample rate until told to stop, then save the last 1MB of
michael@0 159 data:
michael@0 160 <pre>
michael@0 161 setenv JPROF_FLAGS "JP_DEFER JP_CIRCULAR=1048576 JP_PERIOD=0.001" </pre>
michael@0 162
michael@0 163 </ul>
michael@0 164
michael@0 165 <h4>Pausing profiles</h4>
michael@0 166
michael@0 167 <P>jprof can be paused at any time by sending a SIGUSR1 to mozilla (<code>kill
michael@0 168 -USR1</code>). This will cause the timer signals to stop and jprof-map to be
michael@0 169 written, but it will not close jprof-log. Combining SIGUSR1 with the JP_DEFER
michael@0 170 option allows profiling of one sequence of actions by starting the timer right
michael@0 171 before starting the actions and stopping the timer right afterward.
michael@0 172
michael@0 173 <P>After a SIGUSR1, sending another timer signal (SIGPROF, SIGALRM, or SIGPOLL (aka SIGIO),
michael@0 174 depending on the mode) can be used to continue writing data to the same
michael@0 175 output.
michael@0 176
michael@0 177 <P>SIGUSR2 will cause the circular buffer to be cleared, if it's in use.
michael@0 178 This is useful right before running a test when you're using a large,
michael@0 179 continuous circular buffer, or programmatically at the start of an action
michael@0 180 which might take too long (JProfClearCircular()).
michael@0 181
michael@0 182 <h4>Looking at the results</h4>
michael@0 183
michael@0 184 Now that we have <code>jprof-log</code> and <code>jprof-map</code> files, we
michael@0 185 can use the jprof executable is used to turn them into readable output. To do
michael@0 186 this jprof needs the name of the mozilla binary and the log file. It deduces
michael@0 187 the name of the map file:
michael@0 188
michael@0 189 <pre>
michael@0 190 ./jprof /home/user/mozilla/objdir/dist/bin/firefox ./jprof-log > tmp.html
michael@0 191 </pre>
michael@0 192
michael@0 193 This will generate the file <code>tmp.html</code> which you should view in a
michael@0 194 web browser.
michael@0 195
michael@0 196 <pre>
michael@0 197 ./jprof --output-dir=/tmp /home/user/mozilla/objdir/dist/bin/firefox ./jprof-log*
michael@0 198 </pre>
michael@0 199
michael@0 200 This will generate a set of files in /tmp for each process.
michael@0 201
michael@0 202
michael@0 203 <h3><a name="interpretation">Interpretation</a></h3>
michael@0 204
michael@0 205
michael@0 206 The Jprof output is split into a flat portion and a hierarchical portion.
michael@0 207 There are links to each section at the top of the page. It is typically
michael@0 208 easier to analyze the profile by starting with the flat output and following
michael@0 209 the links contained in the flat output up to the hierarchical output.
michael@0 210
michael@0 211 <h4><a name="flat">Flat output</a></h3>
michael@0 212
michael@0 213 The flat portion of the profile indicates which functions were executing
michael@0 214 when the timer was going off. It is displayed as a list of functions names
michael@0 215 on the right and the number of times that function was interrupted on the
michael@0 216 left. The list is sorted by decreasing interrupt count. For example:
michael@0 217
michael@0 218 <blockquote> <pre>
michael@0 219 Total hit count: 151603
michael@0 220 Count %Total Function Name
michael@0 221
michael@0 222 <a href="#23081">8806 5.8 __libc_poll</a>
michael@0 223 <a href="#40008">2254 1.5 __i686.get_pc_thunk.bx</a>
michael@0 224 <a href="#21390">2053 1.4 _int_malloc</a>
michael@0 225 <a href="#49013">1777 1.2 nsStyleContext::GetStyleData(nsStyleStructID)</a>
michael@0 226 <a href="#21380">1600 1.1 __libc_malloc</a>
michael@0 227 <a href="#603">1552 1.0 nsCOMPtr_base::~nsCOMPtr_base()</a>
michael@0 228 </pre> </blockquote>
michael@0 229
michael@0 230 This shows that of the 151603 times the timer fired, 1777 (1.2% of the total) were inside nsStyleContext::GetStyleData() and 1552 (1.0% of the total) were in the nsCOMPtr_base destructor.
michael@0 231
michael@0 232 <p>
michael@0 233 In general, the functions with the highest count are the functions which
michael@0 234 are taking the most time.
michael@0 235
michael@0 236 <P>
michael@0 237 The function names are linked to the entry for that function in the
michael@0 238 hierarchical profile, which is described in the next section.
michael@0 239
michael@0 240 <h4><a name="hier">Hierarchical output</a></h4>
michael@0 241
michael@0 242 The hierarchical output is divided up into sections, with each section
michael@0 243 corresponding to one function. A typical section looks something like
michael@0 244 this:
michael@0 245
michael@0 246 <blockquote><pre>
michael@0 247 index Count Hits Function Name
michael@0 248 <A href="#72871"> 545 (46.4%) nsBlockFrame::ReflowInlineFrames(nsBlockReflowState&, nsLineList_iterator, int*)</A>
michael@0 249 <A href="#72873"> 100 (8.5%) nsBlockFrame::ReflowDirtyLines(nsBlockReflowState&)</A>
michael@0 250 72870 4 (0.3%) <a name=72870> 645 (54.9%)</a> <b>nsBlockFrame::DoReflowInlineFrames(nsBlockReflowState&, nsLineLayout&, nsLineList_iterator, nsFlowAreaRect&, int&, nsFloatManager::SavedState*, int*, LineReflowStatus*, int)</b>
michael@0 251 <A href="#72821"> 545 (46.4%) nsBlockFrame::ReflowInlineFrame(nsBlockReflowState&, nsLineLayout&, nsLineList_iterator, nsIFrame*, LineReflowStatus*)</A>
michael@0 252 <A href="#72853"> 83 (7.1%) nsBlockFrame::PlaceLine(nsBlockReflowState&, nsLineLayout&, nsLineList_iterator, nsFloatManager::SavedState*, nsRect&, int&, int*)</A>
michael@0 253 <A href="#74150"> 9 (0.8%) nsLineLayout::BeginLineReflow(int, int, int, int, int, int)</A>
michael@0 254 <A href="#74897"> 1 (0.1%) nsTextFrame::GetType() const</A>
michael@0 255 <A href="#74131"> 1 (0.1%) nsLineLayout::RelativePositionFrames(nsOverflowAreas&)</A>
michael@0 256 <A href="#58320"> 1 (0.1%) __i686.get_pc_thunk.bx</A>
michael@0 257 <A href="#53077"> 1 (0.1%) PL_ArenaAllocate</A>
michael@0 258 </pre></blockquote>
michael@0 259
michael@0 260 The information this block tells us is:
michael@0 261
michael@0 262 <ul>
michael@0 263 <li>There were 4 profiler hits <em>in</em> <code>nsBlockFrame::DoReflowInlineFrames</code>
michael@0 264 <li>There were 645 profiler hits <em>in or under</em> <code>nsBlockFrame::DoReflowInlineFrames</code>. Of these:
michael@0 265 <ul>
michael@0 266 <li>545 were in or under <code>nsBlockFrame::ReflowInlineFrame</code>
michael@0 267 <li>83 were in or under <code>nsBlockFrame::PlaceLine</code>
michael@0 268 <li>9 were in or under <code>nsLineLayout::BeginLineReflow</code>
michael@0 269 <li>1 was in or under <code>nsTextFrame::GetType</code>
michael@0 270 <li>1 was in or under <code>nsLineLayout::RelativePositionFrames</code>
michael@0 271 <li>1 was in or under <code>__i686.get_pc_thunk.bx</code>
michael@0 272 <li>1 was in or under <code>PL_ArenaAllocate</code>
michael@0 273 </ul>
michael@0 274 <li>Of these 645 calls into <code>nsBlockFrame::DoReflowInlineFrames</code>:
michael@0 275 <ul>
michael@0 276 <li>545 came from <code>nsBlockFrame::ReflowInlineFrames</code>
michael@0 277 <li>100 came from <code>nsBlockFrame::ReflowDirtyLines</code>
michael@0 278 </ul>
michael@0 279 </ul>
michael@0 280
michael@0 281
michael@0 282 The rest of this section explains how to read this information off from the jprof output.
michael@0 283
michael@0 284 <p>This block corresponds to the function <code>nsBlockFrame::DoReflowInlineFrames</code>, which is
michael@0 285 therefore bolded and not a link. The name of this function is preceded by
michael@0 286 five numbers which have the following meaning. The number on the left (72870)
michael@0 287 is the index number, and is not important. The next number (4) and the
michael@0 288 percentage following (0.3%) are the number
michael@0 289 of times this function was interrupted by the timer and the percentage of
michael@0 290 the total hits that is. The last number pair ("645 (54.9%)")
michael@0 291 are the number of times this function was in the call stack when the timer went
michael@0 292 off. That is, the timer went off while we were in code that was ultimately
michael@0 293 called from <code>nsBlockFrame::DoReflowInlineFrames</code>.
michael@0 294 <p>For our example we can see that our function was in the call stack for
michael@0 295 645 interrupt ticks, but we were only the function that was running when
michael@0 296 the interrupt arrived 4 times.
michael@0 297 <P>
michael@0 298 The functions listed above the line for <code>nsBlockFrame::DoReflowInlineFrames</code> are its
michael@0 299 callers. The numbers to the left of these function names are the numbers of
michael@0 300 times these functions were in the call stack as callers of
michael@0 301 <code>nsBlockFrame::DoReflowInlineFrames</code>. In our example, we were called 545 times by
michael@0 302 <code>nsBlockFrame::ReflowInlineFrames</code> and 100 times by
michael@0 303 <code>nsBlockFrame::ReflowDirtyLines</code>.
michael@0 304 <P>
michael@0 305 The functions listed below the line for <code>nsBlockFrame::DoReflowInlineFrames</code> are its
michael@0 306 callees. The numbers to the left of the function names are the numbers of
michael@0 307 times these functions were in the callstack as callees of
michael@0 308 <code>nsBlockFrame::DoReflowInlineFrames</code> and the corresponding percentages. In our example, of the 645 profiler hits under <code>nsBlockFrame::DoReflowInlineFrames</code> 545 were under <code>nsBlockFrame::ReflowInlineFrame</code>, 83 were under <code>nsBlockFrame::PlaceLine</code>, and so forth.<p>
michael@0 309
michael@0 310 <b>NOTE:</b> If there are loops of execution or recursion, the numbers will
michael@0 311 not add up and percentages can exceed 100%. If a function directly calls
michael@0 312 itself "(self)" will be appended to the line, but indirect recursion will
michael@0 313 not be marked.
michael@0 314
michael@0 315 <h3>Bugs</h3>
michael@0 316 The current build of Jprof has only been tested under Ubuntu 8.04 LTS, but
michael@0 317 should work under any fairly modern linux distribution using GCC/GLIBC.
michael@0 318 Please update this document with any known compatibilities/incompatibilities.
michael@0 319 <p>
michael@0 320 If you get an error:<p><code>Inconsistency detected by ld.so: dl-open.c: 260: dl_open_worker: Assertion `_dl_debug_initialize (0, args->nsid)->r_state == RT_CONSISTENT' failed!
michael@0 321 </code><p>that means you've hit a timing hole in the version of glibc you're
michael@0 322 running. See <a
michael@0 323 href="http://sources.redhat.com/bugzilla/show_bug.cgi?id=4578">Redhat bug 4578</a>.
michael@0 324 <!-- <h3>Update</h3>
michael@0 325 <ul>
michael@0 326 </ul>
michael@0 327 -->
michael@0 328
michael@0 329 </body>
michael@0 330 </html>

mercurial