1.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 1.2 +++ b/tools/jprof/README.html Wed Dec 31 06:09:35 2014 +0100 1.3 @@ -0,0 +1,330 @@ 1.4 +<!-- This Source Code Form is subject to the terms of the Mozilla Public 1.5 + - License, v. 2.0. If a copy of the MPL was not distributed with this 1.6 + - file, You can obtain one at http://mozilla.org/MPL/2.0/. --> 1.7 + 1.8 +<html> 1.9 +<head><title>The Jprof Profiler</title></head> 1.10 + 1.11 +<body bgcolor="#FFFFFF" text="#000000" 1.12 + link="#0000EE" vlink="#551A8B" alink="#FF0000"> 1.13 +<center> 1.14 +<h1>The Jprof Profiler</h1> 1.15 +<font size="-1"> 1.16 +<a href="mailto:jim_nance%yahoo.com">jim_nance@yahoo.com</a><p> 1.17 +Recent (4/2011) updates Randell Jesup (see bugzilla for contact info) 1.18 +</font> 1.19 +<hr> 1.20 + 1.21 +<a href="#introduction">Introduction</a> | <a href="#operation">Operation</a> | 1.22 +<a href="#setup">Setup</a> | <a href="#usage">Usage</a> | 1.23 +<a href="#interpretation">Interpretation</a> 1.24 + 1.25 +</center> 1.26 +<hr> 1.27 + 1.28 +<h3><a name="introduction">Introduction</a></h3> 1.29 + 1.30 +Jprof is a profiling tool. I am writing it because I need to find out 1.31 +where mozilla is spending its time, and there do not seem to be any 1.32 +profilers for Linux that can handle threads and/or shared libraries. 1.33 +This code is based heavily on Kipp Hickman's leaky. 1.34 + 1.35 +<h3><a name="operation">Operation</a></h3> 1.36 + 1.37 +Jprof operates by installing a timer which periodically interrupts mozilla. 1.38 +When this timer goes off, the jprof code inside mozilla walks the function call 1.39 +stack to determine which code was executing and saves the results into the 1.40 +<code>jprof-log</code> and <code>jprof-map</code> files. By collecting a large 1.41 +number of these call stacks, it is possible to deduce where mozilla is spending 1.42 +its time. 1.43 + 1.44 +<h3><a name="setup">Setup</a></h3> 1.45 + 1.46 +<p>Configure your mozilla with jprof support by adding 1.47 +<code>--enable-jprof</code> to your configure options (eg adding 1.48 +<code>ac_add_options --enable-jprof</code> to your <code>.mozconfig</code>) and 1.49 +making sure that you do <strong>not</strong> have the 1.50 +<code>--enable-strip</code> configure option set -- jprof needs symbols to 1.51 +operate. On many architectures with GCC, you'll need to add 1.52 +<code>--enable-optimize="-O3 -fno-omit-frame-pointer"</code> or the 1.53 +equivalent to ensure frame pointer generation in the compiler you're using.</p> 1.54 + 1.55 +<p>Finally, build mozilla with your new configuration. Now you can run jprof.</p> 1.56 + 1.57 +<h3><a name="usage">Usage</a></h3> 1.58 +<pre> jprof [-v] [-t] [-e exclude] [-i include] [-s stackdepth] [--last] [--all] [--start n [--end m]] [--output-dir dir] prog log [log2 ...]</pre> 1.59 +Options: 1.60 +<ul> 1.61 + <li><b>-s depth</b> : Limit depth looked at from captured stack 1.62 + frames</li> 1.63 + <li><b>-v</b> : Output some information about the symbols, memory map, etc.</li> 1.64 + <li><b>-t or --threads</b> : Group output according to thread. May require external 1.65 + LD_PRELOAD library to help force sampling of spawned threads; jprof 1.66 + may capture the main thread only. See <a 1.67 + href="http://sam.zoy.org/writings/programming/gprof.html">gprof-helper</a>; 1.68 + it may need adaption for jprof.</li> 1.69 + <li><b>--only-thread id</b> : Only output data for thread 'id'</li> 1.70 + <li><b>-e exclusion</b> : Allows excluding specific stack frames</li> 1.71 + <li><b>-i inclusion</b> : Allows including specific stack frames</li> 1.72 + <li><b>--last</b> : Only process data from the last 'section' of sampling 1.73 + (starting at the last PROF)</li> 1.74 + <li><b>--start N</b> : Start processing data at 'section' N </li> 1.75 + <li><b>--end N</b> : Stop processing data at 'section' N </li> 1.76 + <li><b>--output-dir dir</b> : Store generated .html files in the given directory </li> 1.77 +</ul> 1.78 +The behavior of jprof is determined by the value of the JPROF_FLAGS environment 1.79 +variable. This environment variable can be composed of several substrings 1.80 +which have the following meanings: 1.81 +<ul> 1.82 + <li> <b>JP_START</b> : Install the signal handler, and start sending the 1.83 + timer signals. 1.84 + 1.85 + <li> <b>JP_DEFER</b> : Install the signal handler, but don't start sending 1.86 + the timer signals. The user must start the signals by sending the first 1.87 + one (with <code>kill -PROF</code>, or with <code>kill -ALRM</code> if 1.88 + JP_REALTIME is used, or with <code>kill -POLL</code> (also known as <code>kill -IO</code>) if JP_RTC_HZ is used). 1.89 + 1.90 + <li> <b>JP_FIRST=x</b> : Wait x seconds before starting the timer 1.91 + 1.92 + <li> <b>JP_PERIOD=y</b> : Set timer to interrupt every y seconds. Only 1.93 + values of y greater than or equal to 0.001 are supported. Default is 1.94 + 0.050 (50ms). 1.95 + 1.96 + <li> <b>JP_REALTIME</b> : Do the profiling in intervals of real time rather 1.97 + than intervals of time used by the mozilla process (and the kernel 1.98 + when doing work for mozilla). This could probably lead to weird 1.99 + results (you'll see whatever runs when mozilla is waiting for events), 1.100 + but is needed to see time spent in the X server. 1.101 + 1.102 + <li> <b>JP_RTC_HZ=freq</b> : This option, only available on Linux if the 1.103 + kernel is built with RTC support, makes jprof use the RTC timer instead of 1.104 + using its own timer. This option, like JP_REALTIME, uses intervals of real 1.105 + time. This option overrides JP_PERIOD. <code>freq</code> is the frequency 1.106 + at which the timer should fire, measured in Hz. It must be a power of 2. 1.107 + The maximal frequency allowed by the kernel can be changed by writing to 1.108 + <code>/proc/sys/dev/rtc/max-user-freq</code>; the maximum value it can be 1.109 + set to is 8192. Note that <code>/dev/rtc</code> will need to be readable 1.110 + by the Firefox process; making that file world-readable is a simple way to 1.111 + accomplish that. 1.112 + 1.113 + <li> <b>JP_CIRCULAR=size</b> : This tells jprof to store samples in a 1.114 + circular buffer of the given size, which then will be saved (appended) 1.115 + to disk when SIGUSR1 is received or JProfStopProfiling is done. If the 1.116 + buffer overflows, the oldest entries will be evicted until there's 1.117 + space for the new entry.<p> 1.118 + 1.119 + SIGUSR2 will cause the circular buffer to be cleared. 1.120 + 1.121 + <li> <b>JP_FILENAME=basefilename</b> : This is the filename used for 1.122 + saving the log files to; the default is "jprof-log". If Electrolysis 1.123 + is used, each process after the first will have the process ID 1.124 + added ("jprof-log-3212"); 1.125 + 1.126 +</ul> 1.127 + 1.128 +<h4>Starting and stopping jprof from JavaScript</h4> 1.129 +<p> 1.130 +A build with jprof enabled adds four functions to the Window object:<p> 1.131 +<code>JProfStartProfiling()</code> and <code>JProfStopProfiling()</code>: When used with JP_DEFER, these 1.132 +allow one to start and stop the timer just around whatever critical section is 1.133 +being profiled.</p><p> 1.134 +<code>JProfClearCircular()</code> and <code>JProfSaveCircular()</code>: 1.135 +These clear the circular buffer and save the buffer (without stopping), respectively.</p> 1.136 + 1.137 +<h4>Examples of JPROF_FLAGS usage</h4> 1.138 +<ul> 1.139 + 1.140 + <li>To make the timer start firing 3 seconds after the program is started and 1.141 + fire every 25 milliseconds of program time use: 1.142 + <pre> 1.143 + setenv JPROF_FLAGS "JP_START JP_FIRST=3 JP_PERIOD=0.025" </pre> 1.144 + 1.145 + <li>To make the timer start on your signal and fire every 1 millisecond of 1.146 + program time use: 1.147 + <pre> 1.148 + setenv JPROF_FLAGS "JP_DEFER JP_PERIOD=0.001" </pre> 1.149 + 1.150 + <li>To make the timer start on your signal and fire every 10 milliseconds of 1.151 + wall-clock time use: 1.152 + <pre> 1.153 + setenv JPROF_FLAGS "JP_DEFER JP_PERIOD=0.010 JP_REALTIME" </pre> 1.154 + 1.155 + <li>To make the timer start on your signal and fire at 8192 Hz in wall-clock 1.156 + time use: 1.157 + <pre> 1.158 + setenv JPROF_FLAGS "JP_DEFER JP_RTC_HZ=8192" </pre> 1.159 + 1.160 + <li>To make the timer start on JProfStartProfiling() and run continously 1.161 + with a 1ms sample rate until told to stop, then save the last 1MB of 1.162 + data: 1.163 + <pre> 1.164 + setenv JPROF_FLAGS "JP_DEFER JP_CIRCULAR=1048576 JP_PERIOD=0.001" </pre> 1.165 + 1.166 +</ul> 1.167 + 1.168 +<h4>Pausing profiles</h4> 1.169 + 1.170 +<P>jprof can be paused at any time by sending a SIGUSR1 to mozilla (<code>kill 1.171 +-USR1</code>). This will cause the timer signals to stop and jprof-map to be 1.172 +written, but it will not close jprof-log. Combining SIGUSR1 with the JP_DEFER 1.173 +option allows profiling of one sequence of actions by starting the timer right 1.174 +before starting the actions and stopping the timer right afterward. 1.175 + 1.176 +<P>After a SIGUSR1, sending another timer signal (SIGPROF, SIGALRM, or SIGPOLL (aka SIGIO), 1.177 +depending on the mode) can be used to continue writing data to the same 1.178 +output. 1.179 + 1.180 +<P>SIGUSR2 will cause the circular buffer to be cleared, if it's in use. 1.181 +This is useful right before running a test when you're using a large, 1.182 +continuous circular buffer, or programmatically at the start of an action 1.183 +which might take too long (JProfClearCircular()). 1.184 + 1.185 +<h4>Looking at the results</h4> 1.186 + 1.187 +Now that we have <code>jprof-log</code> and <code>jprof-map</code> files, we 1.188 +can use the jprof executable is used to turn them into readable output. To do 1.189 +this jprof needs the name of the mozilla binary and the log file. It deduces 1.190 +the name of the map file: 1.191 + 1.192 +<pre> 1.193 + ./jprof /home/user/mozilla/objdir/dist/bin/firefox ./jprof-log > tmp.html 1.194 +</pre> 1.195 + 1.196 +This will generate the file <code>tmp.html</code> which you should view in a 1.197 +web browser. 1.198 + 1.199 +<pre> 1.200 + ./jprof --output-dir=/tmp /home/user/mozilla/objdir/dist/bin/firefox ./jprof-log* 1.201 +</pre> 1.202 + 1.203 +This will generate a set of files in /tmp for each process. 1.204 + 1.205 + 1.206 +<h3><a name="interpretation">Interpretation</a></h3> 1.207 + 1.208 + 1.209 +The Jprof output is split into a flat portion and a hierarchical portion. 1.210 +There are links to each section at the top of the page. It is typically 1.211 +easier to analyze the profile by starting with the flat output and following 1.212 +the links contained in the flat output up to the hierarchical output. 1.213 + 1.214 +<h4><a name="flat">Flat output</a></h3> 1.215 + 1.216 +The flat portion of the profile indicates which functions were executing 1.217 +when the timer was going off. It is displayed as a list of functions names 1.218 +on the right and the number of times that function was interrupted on the 1.219 +left. The list is sorted by decreasing interrupt count. For example: 1.220 + 1.221 +<blockquote> <pre> 1.222 +Total hit count: 151603 1.223 +Count %Total Function Name 1.224 + 1.225 +<a href="#23081">8806 5.8 __libc_poll</a> 1.226 +<a href="#40008">2254 1.5 __i686.get_pc_thunk.bx</a> 1.227 +<a href="#21390">2053 1.4 _int_malloc</a> 1.228 +<a href="#49013">1777 1.2 nsStyleContext::GetStyleData(nsStyleStructID)</a> 1.229 +<a href="#21380">1600 1.1 __libc_malloc</a> 1.230 +<a href="#603">1552 1.0 nsCOMPtr_base::~nsCOMPtr_base()</a> 1.231 +</pre> </blockquote> 1.232 + 1.233 +This shows that of the 151603 times the timer fired, 1777 (1.2% of the total) were inside nsStyleContext::GetStyleData() and 1552 (1.0% of the total) were in the nsCOMPtr_base destructor. 1.234 + 1.235 +<p> 1.236 +In general, the functions with the highest count are the functions which 1.237 +are taking the most time. 1.238 + 1.239 +<P> 1.240 +The function names are linked to the entry for that function in the 1.241 +hierarchical profile, which is described in the next section. 1.242 + 1.243 +<h4><a name="hier">Hierarchical output</a></h4> 1.244 + 1.245 +The hierarchical output is divided up into sections, with each section 1.246 +corresponding to one function. A typical section looks something like 1.247 +this: 1.248 + 1.249 +<blockquote><pre> 1.250 + index Count Hits Function Name 1.251 + <A href="#72871"> 545 (46.4%) nsBlockFrame::ReflowInlineFrames(nsBlockReflowState&, nsLineList_iterator, int*)</A> 1.252 + <A href="#72873"> 100 (8.5%) nsBlockFrame::ReflowDirtyLines(nsBlockReflowState&)</A> 1.253 + 72870 4 (0.3%) <a name=72870> 645 (54.9%)</a> <b>nsBlockFrame::DoReflowInlineFrames(nsBlockReflowState&, nsLineLayout&, nsLineList_iterator, nsFlowAreaRect&, int&, nsFloatManager::SavedState*, int*, LineReflowStatus*, int)</b> 1.254 + <A href="#72821"> 545 (46.4%) nsBlockFrame::ReflowInlineFrame(nsBlockReflowState&, nsLineLayout&, nsLineList_iterator, nsIFrame*, LineReflowStatus*)</A> 1.255 + <A href="#72853"> 83 (7.1%) nsBlockFrame::PlaceLine(nsBlockReflowState&, nsLineLayout&, nsLineList_iterator, nsFloatManager::SavedState*, nsRect&, int&, int*)</A> 1.256 + <A href="#74150"> 9 (0.8%) nsLineLayout::BeginLineReflow(int, int, int, int, int, int)</A> 1.257 + <A href="#74897"> 1 (0.1%) nsTextFrame::GetType() const</A> 1.258 + <A href="#74131"> 1 (0.1%) nsLineLayout::RelativePositionFrames(nsOverflowAreas&)</A> 1.259 + <A href="#58320"> 1 (0.1%) __i686.get_pc_thunk.bx</A> 1.260 + <A href="#53077"> 1 (0.1%) PL_ArenaAllocate</A> 1.261 +</pre></blockquote> 1.262 + 1.263 +The information this block tells us is: 1.264 + 1.265 +<ul> 1.266 +<li>There were 4 profiler hits <em>in</em> <code>nsBlockFrame::DoReflowInlineFrames</code> 1.267 +<li>There were 645 profiler hits <em>in or under</em> <code>nsBlockFrame::DoReflowInlineFrames</code>. Of these: 1.268 +<ul> 1.269 + <li>545 were in or under <code>nsBlockFrame::ReflowInlineFrame</code> 1.270 + <li>83 were in or under <code>nsBlockFrame::PlaceLine</code> 1.271 + <li>9 were in or under <code>nsLineLayout::BeginLineReflow</code> 1.272 + <li>1 was in or under <code>nsTextFrame::GetType</code> 1.273 + <li>1 was in or under <code>nsLineLayout::RelativePositionFrames</code> 1.274 + <li>1 was in or under <code>__i686.get_pc_thunk.bx</code> 1.275 + <li>1 was in or under <code>PL_ArenaAllocate</code> 1.276 +</ul> 1.277 +<li>Of these 645 calls into <code>nsBlockFrame::DoReflowInlineFrames</code>: 1.278 +<ul> 1.279 + <li>545 came from <code>nsBlockFrame::ReflowInlineFrames</code> 1.280 + <li>100 came from <code>nsBlockFrame::ReflowDirtyLines</code> 1.281 +</ul> 1.282 +</ul> 1.283 + 1.284 + 1.285 +The rest of this section explains how to read this information off from the jprof output. 1.286 + 1.287 +<p>This block corresponds to the function <code>nsBlockFrame::DoReflowInlineFrames</code>, which is 1.288 +therefore bolded and not a link. The name of this function is preceded by 1.289 +five numbers which have the following meaning. The number on the left (72870) 1.290 +is the index number, and is not important. The next number (4) and the 1.291 +percentage following (0.3%) are the number 1.292 +of times this function was interrupted by the timer and the percentage of 1.293 +the total hits that is. The last number pair ("645 (54.9%)") 1.294 +are the number of times this function was in the call stack when the timer went 1.295 +off. That is, the timer went off while we were in code that was ultimately 1.296 +called from <code>nsBlockFrame::DoReflowInlineFrames</code>. 1.297 +<p>For our example we can see that our function was in the call stack for 1.298 +645 interrupt ticks, but we were only the function that was running when 1.299 +the interrupt arrived 4 times. 1.300 +<P> 1.301 +The functions listed above the line for <code>nsBlockFrame::DoReflowInlineFrames</code> are its 1.302 +callers. The numbers to the left of these function names are the numbers of 1.303 +times these functions were in the call stack as callers of 1.304 +<code>nsBlockFrame::DoReflowInlineFrames</code>. In our example, we were called 545 times by 1.305 +<code>nsBlockFrame::ReflowInlineFrames</code> and 100 times by 1.306 +<code>nsBlockFrame::ReflowDirtyLines</code>. 1.307 +<P> 1.308 +The functions listed below the line for <code>nsBlockFrame::DoReflowInlineFrames</code> are its 1.309 +callees. The numbers to the left of the function names are the numbers of 1.310 +times these functions were in the callstack as callees of 1.311 +<code>nsBlockFrame::DoReflowInlineFrames</code> and the corresponding percentages. In our example, of the 645 profiler hits under <code>nsBlockFrame::DoReflowInlineFrames</code> 545 were under <code>nsBlockFrame::ReflowInlineFrame</code>, 83 were under <code>nsBlockFrame::PlaceLine</code>, and so forth.<p> 1.312 + 1.313 +<b>NOTE:</b> If there are loops of execution or recursion, the numbers will 1.314 +not add up and percentages can exceed 100%. If a function directly calls 1.315 +itself "(self)" will be appended to the line, but indirect recursion will 1.316 +not be marked. 1.317 + 1.318 +<h3>Bugs</h3> 1.319 +The current build of Jprof has only been tested under Ubuntu 8.04 LTS, but 1.320 +should work under any fairly modern linux distribution using GCC/GLIBC. 1.321 +Please update this document with any known compatibilities/incompatibilities. 1.322 +<p> 1.323 +If you get an error:<p><code>Inconsistency detected by ld.so: dl-open.c: 260: dl_open_worker: Assertion `_dl_debug_initialize (0, args->nsid)->r_state == RT_CONSISTENT' failed! 1.324 +</code><p>that means you've hit a timing hole in the version of glibc you're 1.325 +running. See <a 1.326 +href="http://sources.redhat.com/bugzilla/show_bug.cgi?id=4578">Redhat bug 4578</a>. 1.327 +<!-- <h3>Update</h3> 1.328 +<ul> 1.329 +</ul> 1.330 +--> 1.331 + 1.332 +</body> 1.333 +</html>