|
1 <!-- This Source Code Form is subject to the terms of the Mozilla Public |
|
2 - License, v. 2.0. If a copy of the MPL was not distributed with this |
|
3 - file, You can obtain one at http://mozilla.org/MPL/2.0/. --> |
|
4 |
|
5 <html> |
|
6 <head><title>The Jprof Profiler</title></head> |
|
7 |
|
8 <body bgcolor="#FFFFFF" text="#000000" |
|
9 link="#0000EE" vlink="#551A8B" alink="#FF0000"> |
|
10 <center> |
|
11 <h1>The Jprof Profiler</h1> |
|
12 <font size="-1"> |
|
13 <a href="mailto:jim_nance%yahoo.com">jim_nance@yahoo.com</a><p> |
|
14 Recent (4/2011) updates Randell Jesup (see bugzilla for contact info) |
|
15 </font> |
|
16 <hr> |
|
17 |
|
18 <a href="#introduction">Introduction</a> | <a href="#operation">Operation</a> | |
|
19 <a href="#setup">Setup</a> | <a href="#usage">Usage</a> | |
|
20 <a href="#interpretation">Interpretation</a> |
|
21 |
|
22 </center> |
|
23 <hr> |
|
24 |
|
25 <h3><a name="introduction">Introduction</a></h3> |
|
26 |
|
27 Jprof is a profiling tool. I am writing it because I need to find out |
|
28 where mozilla is spending its time, and there do not seem to be any |
|
29 profilers for Linux that can handle threads and/or shared libraries. |
|
30 This code is based heavily on Kipp Hickman's leaky. |
|
31 |
|
32 <h3><a name="operation">Operation</a></h3> |
|
33 |
|
34 Jprof operates by installing a timer which periodically interrupts mozilla. |
|
35 When this timer goes off, the jprof code inside mozilla walks the function call |
|
36 stack to determine which code was executing and saves the results into the |
|
37 <code>jprof-log</code> and <code>jprof-map</code> files. By collecting a large |
|
38 number of these call stacks, it is possible to deduce where mozilla is spending |
|
39 its time. |
|
40 |
|
41 <h3><a name="setup">Setup</a></h3> |
|
42 |
|
43 <p>Configure your mozilla with jprof support by adding |
|
44 <code>--enable-jprof</code> to your configure options (eg adding |
|
45 <code>ac_add_options --enable-jprof</code> to your <code>.mozconfig</code>) and |
|
46 making sure that you do <strong>not</strong> have the |
|
47 <code>--enable-strip</code> configure option set -- jprof needs symbols to |
|
48 operate. On many architectures with GCC, you'll need to add |
|
49 <code>--enable-optimize="-O3 -fno-omit-frame-pointer"</code> or the |
|
50 equivalent to ensure frame pointer generation in the compiler you're using.</p> |
|
51 |
|
52 <p>Finally, build mozilla with your new configuration. Now you can run jprof.</p> |
|
53 |
|
54 <h3><a name="usage">Usage</a></h3> |
|
55 <pre> jprof [-v] [-t] [-e exclude] [-i include] [-s stackdepth] [--last] [--all] [--start n [--end m]] [--output-dir dir] prog log [log2 ...]</pre> |
|
56 Options: |
|
57 <ul> |
|
58 <li><b>-s depth</b> : Limit depth looked at from captured stack |
|
59 frames</li> |
|
60 <li><b>-v</b> : Output some information about the symbols, memory map, etc.</li> |
|
61 <li><b>-t or --threads</b> : Group output according to thread. May require external |
|
62 LD_PRELOAD library to help force sampling of spawned threads; jprof |
|
63 may capture the main thread only. See <a |
|
64 href="http://sam.zoy.org/writings/programming/gprof.html">gprof-helper</a>; |
|
65 it may need adaption for jprof.</li> |
|
66 <li><b>--only-thread id</b> : Only output data for thread 'id'</li> |
|
67 <li><b>-e exclusion</b> : Allows excluding specific stack frames</li> |
|
68 <li><b>-i inclusion</b> : Allows including specific stack frames</li> |
|
69 <li><b>--last</b> : Only process data from the last 'section' of sampling |
|
70 (starting at the last PROF)</li> |
|
71 <li><b>--start N</b> : Start processing data at 'section' N </li> |
|
72 <li><b>--end N</b> : Stop processing data at 'section' N </li> |
|
73 <li><b>--output-dir dir</b> : Store generated .html files in the given directory </li> |
|
74 </ul> |
|
75 The behavior of jprof is determined by the value of the JPROF_FLAGS environment |
|
76 variable. This environment variable can be composed of several substrings |
|
77 which have the following meanings: |
|
78 <ul> |
|
79 <li> <b>JP_START</b> : Install the signal handler, and start sending the |
|
80 timer signals. |
|
81 |
|
82 <li> <b>JP_DEFER</b> : Install the signal handler, but don't start sending |
|
83 the timer signals. The user must start the signals by sending the first |
|
84 one (with <code>kill -PROF</code>, or with <code>kill -ALRM</code> if |
|
85 JP_REALTIME is used, or with <code>kill -POLL</code> (also known as <code>kill -IO</code>) if JP_RTC_HZ is used). |
|
86 |
|
87 <li> <b>JP_FIRST=x</b> : Wait x seconds before starting the timer |
|
88 |
|
89 <li> <b>JP_PERIOD=y</b> : Set timer to interrupt every y seconds. Only |
|
90 values of y greater than or equal to 0.001 are supported. Default is |
|
91 0.050 (50ms). |
|
92 |
|
93 <li> <b>JP_REALTIME</b> : Do the profiling in intervals of real time rather |
|
94 than intervals of time used by the mozilla process (and the kernel |
|
95 when doing work for mozilla). This could probably lead to weird |
|
96 results (you'll see whatever runs when mozilla is waiting for events), |
|
97 but is needed to see time spent in the X server. |
|
98 |
|
99 <li> <b>JP_RTC_HZ=freq</b> : This option, only available on Linux if the |
|
100 kernel is built with RTC support, makes jprof use the RTC timer instead of |
|
101 using its own timer. This option, like JP_REALTIME, uses intervals of real |
|
102 time. This option overrides JP_PERIOD. <code>freq</code> is the frequency |
|
103 at which the timer should fire, measured in Hz. It must be a power of 2. |
|
104 The maximal frequency allowed by the kernel can be changed by writing to |
|
105 <code>/proc/sys/dev/rtc/max-user-freq</code>; the maximum value it can be |
|
106 set to is 8192. Note that <code>/dev/rtc</code> will need to be readable |
|
107 by the Firefox process; making that file world-readable is a simple way to |
|
108 accomplish that. |
|
109 |
|
110 <li> <b>JP_CIRCULAR=size</b> : This tells jprof to store samples in a |
|
111 circular buffer of the given size, which then will be saved (appended) |
|
112 to disk when SIGUSR1 is received or JProfStopProfiling is done. If the |
|
113 buffer overflows, the oldest entries will be evicted until there's |
|
114 space for the new entry.<p> |
|
115 |
|
116 SIGUSR2 will cause the circular buffer to be cleared. |
|
117 |
|
118 <li> <b>JP_FILENAME=basefilename</b> : This is the filename used for |
|
119 saving the log files to; the default is "jprof-log". If Electrolysis |
|
120 is used, each process after the first will have the process ID |
|
121 added ("jprof-log-3212"); |
|
122 |
|
123 </ul> |
|
124 |
|
125 <h4>Starting and stopping jprof from JavaScript</h4> |
|
126 <p> |
|
127 A build with jprof enabled adds four functions to the Window object:<p> |
|
128 <code>JProfStartProfiling()</code> and <code>JProfStopProfiling()</code>: When used with JP_DEFER, these |
|
129 allow one to start and stop the timer just around whatever critical section is |
|
130 being profiled.</p><p> |
|
131 <code>JProfClearCircular()</code> and <code>JProfSaveCircular()</code>: |
|
132 These clear the circular buffer and save the buffer (without stopping), respectively.</p> |
|
133 |
|
134 <h4>Examples of JPROF_FLAGS usage</h4> |
|
135 <ul> |
|
136 |
|
137 <li>To make the timer start firing 3 seconds after the program is started and |
|
138 fire every 25 milliseconds of program time use: |
|
139 <pre> |
|
140 setenv JPROF_FLAGS "JP_START JP_FIRST=3 JP_PERIOD=0.025" </pre> |
|
141 |
|
142 <li>To make the timer start on your signal and fire every 1 millisecond of |
|
143 program time use: |
|
144 <pre> |
|
145 setenv JPROF_FLAGS "JP_DEFER JP_PERIOD=0.001" </pre> |
|
146 |
|
147 <li>To make the timer start on your signal and fire every 10 milliseconds of |
|
148 wall-clock time use: |
|
149 <pre> |
|
150 setenv JPROF_FLAGS "JP_DEFER JP_PERIOD=0.010 JP_REALTIME" </pre> |
|
151 |
|
152 <li>To make the timer start on your signal and fire at 8192 Hz in wall-clock |
|
153 time use: |
|
154 <pre> |
|
155 setenv JPROF_FLAGS "JP_DEFER JP_RTC_HZ=8192" </pre> |
|
156 |
|
157 <li>To make the timer start on JProfStartProfiling() and run continously |
|
158 with a 1ms sample rate until told to stop, then save the last 1MB of |
|
159 data: |
|
160 <pre> |
|
161 setenv JPROF_FLAGS "JP_DEFER JP_CIRCULAR=1048576 JP_PERIOD=0.001" </pre> |
|
162 |
|
163 </ul> |
|
164 |
|
165 <h4>Pausing profiles</h4> |
|
166 |
|
167 <P>jprof can be paused at any time by sending a SIGUSR1 to mozilla (<code>kill |
|
168 -USR1</code>). This will cause the timer signals to stop and jprof-map to be |
|
169 written, but it will not close jprof-log. Combining SIGUSR1 with the JP_DEFER |
|
170 option allows profiling of one sequence of actions by starting the timer right |
|
171 before starting the actions and stopping the timer right afterward. |
|
172 |
|
173 <P>After a SIGUSR1, sending another timer signal (SIGPROF, SIGALRM, or SIGPOLL (aka SIGIO), |
|
174 depending on the mode) can be used to continue writing data to the same |
|
175 output. |
|
176 |
|
177 <P>SIGUSR2 will cause the circular buffer to be cleared, if it's in use. |
|
178 This is useful right before running a test when you're using a large, |
|
179 continuous circular buffer, or programmatically at the start of an action |
|
180 which might take too long (JProfClearCircular()). |
|
181 |
|
182 <h4>Looking at the results</h4> |
|
183 |
|
184 Now that we have <code>jprof-log</code> and <code>jprof-map</code> files, we |
|
185 can use the jprof executable is used to turn them into readable output. To do |
|
186 this jprof needs the name of the mozilla binary and the log file. It deduces |
|
187 the name of the map file: |
|
188 |
|
189 <pre> |
|
190 ./jprof /home/user/mozilla/objdir/dist/bin/firefox ./jprof-log > tmp.html |
|
191 </pre> |
|
192 |
|
193 This will generate the file <code>tmp.html</code> which you should view in a |
|
194 web browser. |
|
195 |
|
196 <pre> |
|
197 ./jprof --output-dir=/tmp /home/user/mozilla/objdir/dist/bin/firefox ./jprof-log* |
|
198 </pre> |
|
199 |
|
200 This will generate a set of files in /tmp for each process. |
|
201 |
|
202 |
|
203 <h3><a name="interpretation">Interpretation</a></h3> |
|
204 |
|
205 |
|
206 The Jprof output is split into a flat portion and a hierarchical portion. |
|
207 There are links to each section at the top of the page. It is typically |
|
208 easier to analyze the profile by starting with the flat output and following |
|
209 the links contained in the flat output up to the hierarchical output. |
|
210 |
|
211 <h4><a name="flat">Flat output</a></h3> |
|
212 |
|
213 The flat portion of the profile indicates which functions were executing |
|
214 when the timer was going off. It is displayed as a list of functions names |
|
215 on the right and the number of times that function was interrupted on the |
|
216 left. The list is sorted by decreasing interrupt count. For example: |
|
217 |
|
218 <blockquote> <pre> |
|
219 Total hit count: 151603 |
|
220 Count %Total Function Name |
|
221 |
|
222 <a href="#23081">8806 5.8 __libc_poll</a> |
|
223 <a href="#40008">2254 1.5 __i686.get_pc_thunk.bx</a> |
|
224 <a href="#21390">2053 1.4 _int_malloc</a> |
|
225 <a href="#49013">1777 1.2 nsStyleContext::GetStyleData(nsStyleStructID)</a> |
|
226 <a href="#21380">1600 1.1 __libc_malloc</a> |
|
227 <a href="#603">1552 1.0 nsCOMPtr_base::~nsCOMPtr_base()</a> |
|
228 </pre> </blockquote> |
|
229 |
|
230 This shows that of the 151603 times the timer fired, 1777 (1.2% of the total) were inside nsStyleContext::GetStyleData() and 1552 (1.0% of the total) were in the nsCOMPtr_base destructor. |
|
231 |
|
232 <p> |
|
233 In general, the functions with the highest count are the functions which |
|
234 are taking the most time. |
|
235 |
|
236 <P> |
|
237 The function names are linked to the entry for that function in the |
|
238 hierarchical profile, which is described in the next section. |
|
239 |
|
240 <h4><a name="hier">Hierarchical output</a></h4> |
|
241 |
|
242 The hierarchical output is divided up into sections, with each section |
|
243 corresponding to one function. A typical section looks something like |
|
244 this: |
|
245 |
|
246 <blockquote><pre> |
|
247 index Count Hits Function Name |
|
248 <A href="#72871"> 545 (46.4%) nsBlockFrame::ReflowInlineFrames(nsBlockReflowState&, nsLineList_iterator, int*)</A> |
|
249 <A href="#72873"> 100 (8.5%) nsBlockFrame::ReflowDirtyLines(nsBlockReflowState&)</A> |
|
250 72870 4 (0.3%) <a name=72870> 645 (54.9%)</a> <b>nsBlockFrame::DoReflowInlineFrames(nsBlockReflowState&, nsLineLayout&, nsLineList_iterator, nsFlowAreaRect&, int&, nsFloatManager::SavedState*, int*, LineReflowStatus*, int)</b> |
|
251 <A href="#72821"> 545 (46.4%) nsBlockFrame::ReflowInlineFrame(nsBlockReflowState&, nsLineLayout&, nsLineList_iterator, nsIFrame*, LineReflowStatus*)</A> |
|
252 <A href="#72853"> 83 (7.1%) nsBlockFrame::PlaceLine(nsBlockReflowState&, nsLineLayout&, nsLineList_iterator, nsFloatManager::SavedState*, nsRect&, int&, int*)</A> |
|
253 <A href="#74150"> 9 (0.8%) nsLineLayout::BeginLineReflow(int, int, int, int, int, int)</A> |
|
254 <A href="#74897"> 1 (0.1%) nsTextFrame::GetType() const</A> |
|
255 <A href="#74131"> 1 (0.1%) nsLineLayout::RelativePositionFrames(nsOverflowAreas&)</A> |
|
256 <A href="#58320"> 1 (0.1%) __i686.get_pc_thunk.bx</A> |
|
257 <A href="#53077"> 1 (0.1%) PL_ArenaAllocate</A> |
|
258 </pre></blockquote> |
|
259 |
|
260 The information this block tells us is: |
|
261 |
|
262 <ul> |
|
263 <li>There were 4 profiler hits <em>in</em> <code>nsBlockFrame::DoReflowInlineFrames</code> |
|
264 <li>There were 645 profiler hits <em>in or under</em> <code>nsBlockFrame::DoReflowInlineFrames</code>. Of these: |
|
265 <ul> |
|
266 <li>545 were in or under <code>nsBlockFrame::ReflowInlineFrame</code> |
|
267 <li>83 were in or under <code>nsBlockFrame::PlaceLine</code> |
|
268 <li>9 were in or under <code>nsLineLayout::BeginLineReflow</code> |
|
269 <li>1 was in or under <code>nsTextFrame::GetType</code> |
|
270 <li>1 was in or under <code>nsLineLayout::RelativePositionFrames</code> |
|
271 <li>1 was in or under <code>__i686.get_pc_thunk.bx</code> |
|
272 <li>1 was in or under <code>PL_ArenaAllocate</code> |
|
273 </ul> |
|
274 <li>Of these 645 calls into <code>nsBlockFrame::DoReflowInlineFrames</code>: |
|
275 <ul> |
|
276 <li>545 came from <code>nsBlockFrame::ReflowInlineFrames</code> |
|
277 <li>100 came from <code>nsBlockFrame::ReflowDirtyLines</code> |
|
278 </ul> |
|
279 </ul> |
|
280 |
|
281 |
|
282 The rest of this section explains how to read this information off from the jprof output. |
|
283 |
|
284 <p>This block corresponds to the function <code>nsBlockFrame::DoReflowInlineFrames</code>, which is |
|
285 therefore bolded and not a link. The name of this function is preceded by |
|
286 five numbers which have the following meaning. The number on the left (72870) |
|
287 is the index number, and is not important. The next number (4) and the |
|
288 percentage following (0.3%) are the number |
|
289 of times this function was interrupted by the timer and the percentage of |
|
290 the total hits that is. The last number pair ("645 (54.9%)") |
|
291 are the number of times this function was in the call stack when the timer went |
|
292 off. That is, the timer went off while we were in code that was ultimately |
|
293 called from <code>nsBlockFrame::DoReflowInlineFrames</code>. |
|
294 <p>For our example we can see that our function was in the call stack for |
|
295 645 interrupt ticks, but we were only the function that was running when |
|
296 the interrupt arrived 4 times. |
|
297 <P> |
|
298 The functions listed above the line for <code>nsBlockFrame::DoReflowInlineFrames</code> are its |
|
299 callers. The numbers to the left of these function names are the numbers of |
|
300 times these functions were in the call stack as callers of |
|
301 <code>nsBlockFrame::DoReflowInlineFrames</code>. In our example, we were called 545 times by |
|
302 <code>nsBlockFrame::ReflowInlineFrames</code> and 100 times by |
|
303 <code>nsBlockFrame::ReflowDirtyLines</code>. |
|
304 <P> |
|
305 The functions listed below the line for <code>nsBlockFrame::DoReflowInlineFrames</code> are its |
|
306 callees. The numbers to the left of the function names are the numbers of |
|
307 times these functions were in the callstack as callees of |
|
308 <code>nsBlockFrame::DoReflowInlineFrames</code> and the corresponding percentages. In our example, of the 645 profiler hits under <code>nsBlockFrame::DoReflowInlineFrames</code> 545 were under <code>nsBlockFrame::ReflowInlineFrame</code>, 83 were under <code>nsBlockFrame::PlaceLine</code>, and so forth.<p> |
|
309 |
|
310 <b>NOTE:</b> If there are loops of execution or recursion, the numbers will |
|
311 not add up and percentages can exceed 100%. If a function directly calls |
|
312 itself "(self)" will be appended to the line, but indirect recursion will |
|
313 not be marked. |
|
314 |
|
315 <h3>Bugs</h3> |
|
316 The current build of Jprof has only been tested under Ubuntu 8.04 LTS, but |
|
317 should work under any fairly modern linux distribution using GCC/GLIBC. |
|
318 Please update this document with any known compatibilities/incompatibilities. |
|
319 <p> |
|
320 If you get an error:<p><code>Inconsistency detected by ld.so: dl-open.c: 260: dl_open_worker: Assertion `_dl_debug_initialize (0, args->nsid)->r_state == RT_CONSISTENT' failed! |
|
321 </code><p>that means you've hit a timing hole in the version of glibc you're |
|
322 running. See <a |
|
323 href="http://sources.redhat.com/bugzilla/show_bug.cgi?id=4578">Redhat bug 4578</a>. |
|
324 <!-- <h3>Update</h3> |
|
325 <ul> |
|
326 </ul> |
|
327 --> |
|
328 |
|
329 </body> |
|
330 </html> |