toolkit/components/crashes/docs/crash-events.rst

Sat, 03 Jan 2015 20:18:00 +0100

author
Michael Schloh von Bennewitz <michael@schloh.com>
date
Sat, 03 Jan 2015 20:18:00 +0100
branch
TOR_BUG_3246
changeset 7
129ffea94266
permissions
-rw-r--r--

Conditionally enable double key logic according to:
private browsing mode or privacy.thirdparty.isolate preference and
implement in GetCookieStringCommon and FindCookie where it counts...
With some reservations of how to convince FindCookie users to test
condition and pass a nullptr when disabling double key logic.

michael@0 1 ============
michael@0 2 Crash Events
michael@0 3 ============
michael@0 4
michael@0 5 **Crash Events** refers to a special subsystem of Gecko that aims to capture
michael@0 6 events of interest related to process crashing and hanging.
michael@0 7
michael@0 8 When an event worthy of recording occurs, a file containing that event's
michael@0 9 information is written to a well-defined location on the filesystem. The Gecko
michael@0 10 process periodically scans for produced files and consolidates information
michael@0 11 into a more unified and efficient backend store.
michael@0 12
michael@0 13 Crash Event Files
michael@0 14 =================
michael@0 15
michael@0 16 When a crash-related event occurs, a file describing that event is written
michael@0 17 to a well-defined directory. That directory is likely in the directory of
michael@0 18 the currently-active profile. However, if a profile is not yet active in
michael@0 19 the Gecko process, that directory likely resides in the user's *app data*
michael@0 20 directory (*UAppData* from the directory service).
michael@0 21
michael@0 22 The filename of the event file is not relevant. However, producers need
michael@0 23 to choose a filename intelligently to avoid name collisions and race
michael@0 24 conditions. Since file locking is potentially dangerous at crash time,
michael@0 25 the convention of generating a UUID and using it as a filename has been
michael@0 26 adopted.
michael@0 27
michael@0 28 File Format
michael@0 29 -----------
michael@0 30
michael@0 31 All crash event files share the same high-level file format. The format
michael@0 32 consists of the following fields delimited by a UNIX newline (*\n*)
michael@0 33 character:
michael@0 34
michael@0 35 * String event name (valid UTF-8, but likely ASCII)
michael@0 36 * String representation of integer seconds since UNIX epoch
michael@0 37 * Payload
michael@0 38
michael@0 39 The payload is event specific and may contain UNIX newline characters.
michael@0 40 The recommended method for parsing is to split at most 3 times on UNIX
michael@0 41 newline and then dispatch to an event-specific parsed based on the
michael@0 42 event name.
michael@0 43
michael@0 44 If an unknown event type is encountered, the event can safely be ignored
michael@0 45 until later. This helps ensure that application downgrades (potentially
michael@0 46 due to elevated crash rate) don't result in data loss.
michael@0 47
michael@0 48 The format and semantics of each event type are meant to be constant once
michael@0 49 that event type is committed to the main Firefox repository. If new metadata
michael@0 50 needs to be captured or the meaning of data captured in an event changes,
michael@0 51 that change should be expressed through the invention of a new event type.
michael@0 52 For this reason, event names are highly recommended to contain a version.
michael@0 53 e.g. instead of a *Gecko process crashed* event, we prefer a *Gecko process
michael@0 54 crashed v1* event.
michael@0 55
michael@0 56 Event Types
michael@0 57 -----------
michael@0 58
michael@0 59 Each subsection documents the different types of crash events that may be
michael@0 60 produced. Each section name corresponds to the first line of the crash
michael@0 61 event file.
michael@0 62
michael@0 63 crash.main.1
michael@0 64 ^^^^^^^^^^^^
michael@0 65
michael@0 66 This event is produced when the main process crashes.
michael@0 67
michael@0 68 The payload of this event is the string crash ID, very likely a UUID.
michael@0 69 There should be ``UUID.dmp`` and ``UUID.extra`` files on disk, saved by
michael@0 70 Breakpad.
michael@0 71
michael@0 72 crash.plugin.1
michael@0 73 ^^^^^^^^^^^^^^
michael@0 74
michael@0 75 This event is produced when a plugin process crashes.
michael@0 76
michael@0 77 The payload is identical to ``crash.main.1``'s.
michael@0 78
michael@0 79 hang.plugin.1
michael@0 80 ^^^^^^^^^^^^^
michael@0 81
michael@0 82 This event is produced when a plugin process hangs.
michael@0 83
michael@0 84 The payload is identical to ``crash.main.1``'s.
michael@0 85
michael@0 86 Aggregated Event Log
michael@0 87 ====================
michael@0 88
michael@0 89 Crash events are aggregated together into a unified event *log*. Currently,
michael@0 90 this *log* is really a JSON file. However, this is an implementation detail
michael@0 91 and it could change at any time. The interface to crash data provided by
michael@0 92 the JavaScript API is the only supported interface.
michael@0 93
michael@0 94 Design Considerations
michael@0 95 =====================
michael@0 96
michael@0 97 There are many considerations influencing the design of this subsystem.
michael@0 98 We attempt to document them in this section.
michael@0 99
michael@0 100 Decoupling of Event Files from Final Data Structure
michael@0 101 ---------------------------------------------------
michael@0 102
michael@0 103 While it is certainly possible for the Gecko process to write directly to
michael@0 104 the final data structure on disk, there is an intentional decoupling between
michael@0 105 the production of events and their transition into final storage. Along the
michael@0 106 same vein, the choice to have events written to multiple files by producers
michael@0 107 is deliberate.
michael@0 108
michael@0 109 Some recorded events are written immediately after a process crash. This is
michael@0 110 a very uncertain time for the host system. There is a high liklihood the
michael@0 111 system is in an exceptional state, such as memory exhaustion. Therefore, any
michael@0 112 action taken after crashing needs to be very deliberate about what it does.
michael@0 113 Excessive memory allocation and certain system calls may cause the system
michael@0 114 to crash again or the machine's condition to worsen. This means that the act
michael@0 115 of recording a crash event must be very light weight. Writing a new file from
michael@0 116 nothing is very light weight. This is one reason we write separate files.
michael@0 117
michael@0 118 Another reason we write separate files is because if the main Gecko process
michael@0 119 itself crashes (as opposed to say a plugin process), the crash reporter (not
michael@0 120 Gecko) is running and the crash reporter needs to handle the writing of the
michael@0 121 event info. If this writing is involved (say loading, parsing, updating, and
michael@0 122 reserializing back to disk), this logic would need to be implemented in both
michael@0 123 Gecko and the crash reporter or would need to be implemented in such a way
michael@0 124 that both could use. Neither of these is very practical from a software
michael@0 125 lifecycle management perspective. It's much easier to have separate processes
michael@0 126 write a simple file and to let a single implementation do all the complex
michael@0 127 work.
michael@0 128
michael@0 129 Idempotent Event Processing
michael@0 130 ===========================
michael@0 131
michael@0 132 Processing of event files has been designed such that the result is
michael@0 133 idempotent regardless of what order those files are processed in. This is
michael@0 134 not only a good design decision, but it is arguably necessary. While event
michael@0 135 files are processed in order by file mtime, filesystem times may not have
michael@0 136 the resolution required for proper sorting. Therefore, processing order is
michael@0 137 merely an optimistic assumption.
michael@0 138
michael@0 139 Aggregated Storage Format
michael@0 140 =========================
michael@0 141
michael@0 142 Crash events are aggregated into a unified data structure on disk. That data
michael@0 143 structure is currently LZ4-compressed JSON and is represented by a single file.
michael@0 144
michael@0 145 The choice of a single JSON file was initially driven by time and complexity
michael@0 146 concerns. Before changing the format or adding significant amounts of new
michael@0 147 data, some considerations must be taken into account.
michael@0 148
michael@0 149 First, in well-behaving installs, crash data should be minimal. Crashes and
michael@0 150 hangs will be rare and thus the size of the crash data should remain small
michael@0 151 over time.
michael@0 152
michael@0 153 The choice of a single JSON file has larger implications as the amount of
michael@0 154 crash data grows. As new data is accumulated, we need to read and write
michael@0 155 an entire file to make small updates. LZ4 compression helps reduce I/O.
michael@0 156 But, there is a potential for unbounded file growth. We establish a
michael@0 157 limit for the max age of records. Anything older than that limit is
michael@0 158 pruned. We also establish a daily limit on the number of crashes we will
michael@0 159 store. All crashes beyond the first N in a day have no payload and are
michael@0 160 only recorded by the presence of a count. This count ensures we can
michael@0 161 distinguish between ``N`` and ``100 * N``, which are very different
michael@0 162 values!

mercurial