michael@0: michael@0: michael@0: michael@0:
michael@0: Recent (4/2011) updates Randell Jesup (see bugzilla for contact info) michael@0:
michael@0:jprof-log
and jprof-map
files. By collecting a large
michael@0: number of these call stacks, it is possible to deduce where mozilla is spending
michael@0: its time.
michael@0:
michael@0: Configure your mozilla with jprof support by adding
michael@0: --enable-jprof
to your configure options (eg adding
michael@0: ac_add_options --enable-jprof
to your .mozconfig
) and
michael@0: making sure that you do not have the
michael@0: --enable-strip
configure option set -- jprof needs symbols to
michael@0: operate. On many architectures with GCC, you'll need to add
michael@0: --enable-optimize="-O3 -fno-omit-frame-pointer"
or the
michael@0: equivalent to ensure frame pointer generation in the compiler you're using.
Finally, build mozilla with your new configuration. Now you can run jprof.
michael@0: michael@0:jprof [-v] [-t] [-e exclude] [-i include] [-s stackdepth] [--last] [--all] [--start n [--end m]] [--output-dir dir] prog log [log2 ...]michael@0: Options: michael@0:
kill -PROF
, or with kill -ALRM
if
michael@0: JP_REALTIME is used, or with kill -POLL
(also known as kill -IO
) if JP_RTC_HZ is used).
michael@0:
michael@0: freq
is the frequency
michael@0: at which the timer should fire, measured in Hz. It must be a power of 2.
michael@0: The maximal frequency allowed by the kernel can be changed by writing to
michael@0: /proc/sys/dev/rtc/max-user-freq
; the maximum value it can be
michael@0: set to is 8192. Note that /dev/rtc
will need to be readable
michael@0: by the Firefox process; making that file world-readable is a simple way to
michael@0: accomplish that.
michael@0:
michael@0: michael@0: michael@0: SIGUSR2 will cause the circular buffer to be cleared. michael@0: michael@0:
michael@0: A build with jprof enabled adds four functions to the Window object:
michael@0: JProfStartProfiling()
and JProfStopProfiling()
: When used with JP_DEFER, these
michael@0: allow one to start and stop the timer just around whatever critical section is
michael@0: being profiled.
michael@0: JProfClearCircular()
and JProfSaveCircular()
:
michael@0: These clear the circular buffer and save the buffer (without stopping), respectively.
michael@0: setenv JPROF_FLAGS "JP_START JP_FIRST=3 JP_PERIOD=0.025"michael@0: michael@0:
michael@0: setenv JPROF_FLAGS "JP_DEFER JP_PERIOD=0.001"michael@0: michael@0:
michael@0: setenv JPROF_FLAGS "JP_DEFER JP_PERIOD=0.010 JP_REALTIME"michael@0: michael@0:
michael@0: setenv JPROF_FLAGS "JP_DEFER JP_RTC_HZ=8192"michael@0: michael@0:
michael@0: setenv JPROF_FLAGS "JP_DEFER JP_CIRCULAR=1048576 JP_PERIOD=0.001"michael@0: michael@0:
jprof can be paused at any time by sending a SIGUSR1 to mozilla (kill
michael@0: -USR1
). This will cause the timer signals to stop and jprof-map to be
michael@0: written, but it will not close jprof-log. Combining SIGUSR1 with the JP_DEFER
michael@0: option allows profiling of one sequence of actions by starting the timer right
michael@0: before starting the actions and stopping the timer right afterward.
michael@0:
michael@0:
After a SIGUSR1, sending another timer signal (SIGPROF, SIGALRM, or SIGPOLL (aka SIGIO), michael@0: depending on the mode) can be used to continue writing data to the same michael@0: output. michael@0: michael@0:
SIGUSR2 will cause the circular buffer to be cleared, if it's in use. michael@0: This is useful right before running a test when you're using a large, michael@0: continuous circular buffer, or programmatically at the start of an action michael@0: which might take too long (JProfClearCircular()). michael@0: michael@0:
jprof-log
and jprof-map
files, we
michael@0: can use the jprof executable is used to turn them into readable output. To do
michael@0: this jprof needs the name of the mozilla binary and the log file. It deduces
michael@0: the name of the map file:
michael@0:
michael@0: michael@0: ./jprof /home/user/mozilla/objdir/dist/bin/firefox ./jprof-log > tmp.html michael@0:michael@0: michael@0: This will generate the file
tmp.html
which you should view in a
michael@0: web browser.
michael@0:
michael@0: michael@0: ./jprof --output-dir=/tmp /home/user/mozilla/objdir/dist/bin/firefox ./jprof-log* michael@0:michael@0: michael@0: This will generate a set of files in /tmp for each process. michael@0: michael@0: michael@0:
michael@0: michael@0: This shows that of the 151603 times the timer fired, 1777 (1.2% of the total) were inside nsStyleContext::GetStyleData() and 1552 (1.0% of the total) were in the nsCOMPtr_base destructor. michael@0: michael@0:michael@0: Total hit count: 151603 michael@0: Count %Total Function Name michael@0: michael@0: 8806 5.8 __libc_poll michael@0: 2254 1.5 __i686.get_pc_thunk.bx michael@0: 2053 1.4 _int_malloc michael@0: 1777 1.2 nsStyleContext::GetStyleData(nsStyleStructID) michael@0: 1600 1.1 __libc_malloc michael@0: 1552 1.0 nsCOMPtr_base::~nsCOMPtr_base() michael@0:
michael@0: In general, the functions with the highest count are the functions which michael@0: are taking the most time. michael@0: michael@0:
michael@0: The function names are linked to the entry for that function in the michael@0: hierarchical profile, which is described in the next section. michael@0: michael@0:
michael@0: michael@0: The information this block tells us is: michael@0: michael@0:michael@0: index Count Hits Function Name michael@0: 545 (46.4%) nsBlockFrame::ReflowInlineFrames(nsBlockReflowState&, nsLineList_iterator, int*) michael@0: 100 (8.5%) nsBlockFrame::ReflowDirtyLines(nsBlockReflowState&) michael@0: 72870 4 (0.3%) 645 (54.9%) nsBlockFrame::DoReflowInlineFrames(nsBlockReflowState&, nsLineLayout&, nsLineList_iterator, nsFlowAreaRect&, int&, nsFloatManager::SavedState*, int*, LineReflowStatus*, int) michael@0: 545 (46.4%) nsBlockFrame::ReflowInlineFrame(nsBlockReflowState&, nsLineLayout&, nsLineList_iterator, nsIFrame*, LineReflowStatus*) michael@0: 83 (7.1%) nsBlockFrame::PlaceLine(nsBlockReflowState&, nsLineLayout&, nsLineList_iterator, nsFloatManager::SavedState*, nsRect&, int&, int*) michael@0: 9 (0.8%) nsLineLayout::BeginLineReflow(int, int, int, int, int, int) michael@0: 1 (0.1%) nsTextFrame::GetType() const michael@0: 1 (0.1%) nsLineLayout::RelativePositionFrames(nsOverflowAreas&) michael@0: 1 (0.1%) __i686.get_pc_thunk.bx michael@0: 1 (0.1%) PL_ArenaAllocate michael@0:
nsBlockFrame::DoReflowInlineFrames
michael@0: nsBlockFrame::DoReflowInlineFrames
. Of these:
michael@0: nsBlockFrame::ReflowInlineFrame
michael@0: nsBlockFrame::PlaceLine
michael@0: nsLineLayout::BeginLineReflow
michael@0: nsTextFrame::GetType
michael@0: nsLineLayout::RelativePositionFrames
michael@0: __i686.get_pc_thunk.bx
michael@0: PL_ArenaAllocate
michael@0: nsBlockFrame::DoReflowInlineFrames
:
michael@0: nsBlockFrame::ReflowInlineFrames
michael@0: nsBlockFrame::ReflowDirtyLines
michael@0: This block corresponds to the function nsBlockFrame::DoReflowInlineFrames
, which is
michael@0: therefore bolded and not a link. The name of this function is preceded by
michael@0: five numbers which have the following meaning. The number on the left (72870)
michael@0: is the index number, and is not important. The next number (4) and the
michael@0: percentage following (0.3%) are the number
michael@0: of times this function was interrupted by the timer and the percentage of
michael@0: the total hits that is. The last number pair ("645 (54.9%)")
michael@0: are the number of times this function was in the call stack when the timer went
michael@0: off. That is, the timer went off while we were in code that was ultimately
michael@0: called from nsBlockFrame::DoReflowInlineFrames
.
michael@0:
For our example we can see that our function was in the call stack for michael@0: 645 interrupt ticks, but we were only the function that was running when michael@0: the interrupt arrived 4 times. michael@0:
michael@0: The functions listed above the line for nsBlockFrame::DoReflowInlineFrames
are its
michael@0: callers. The numbers to the left of these function names are the numbers of
michael@0: times these functions were in the call stack as callers of
michael@0: nsBlockFrame::DoReflowInlineFrames
. In our example, we were called 545 times by
michael@0: nsBlockFrame::ReflowInlineFrames
and 100 times by
michael@0: nsBlockFrame::ReflowDirtyLines
.
michael@0:
michael@0: The functions listed below the line for nsBlockFrame::DoReflowInlineFrames
are its
michael@0: callees. The numbers to the left of the function names are the numbers of
michael@0: times these functions were in the callstack as callees of
michael@0: nsBlockFrame::DoReflowInlineFrames
and the corresponding percentages. In our example, of the 645 profiler hits under nsBlockFrame::DoReflowInlineFrames
545 were under nsBlockFrame::ReflowInlineFrame
, 83 were under nsBlockFrame::PlaceLine
, and so forth.
michael@0: michael@0: NOTE: If there are loops of execution or recursion, the numbers will michael@0: not add up and percentages can exceed 100%. If a function directly calls michael@0: itself "(self)" will be appended to the line, but indirect recursion will michael@0: not be marked. michael@0: michael@0:
michael@0: If you get an error:
Inconsistency detected by ld.so: dl-open.c: 260: dl_open_worker: Assertion `_dl_debug_initialize (0, args->nsid)->r_state == RT_CONSISTENT' failed!
michael@0:
that means you've hit a timing hole in the version of glibc you're michael@0: running. See Redhat bug 4578. michael@0: michael@0: michael@0: michael@0: