Sat, 03 Jan 2015 20:18:00 +0100
Conditionally enable double key logic according to:
private browsing mode or privacy.thirdparty.isolate preference and
implement in GetCookieStringCommon and FindCookie where it counts...
With some reservations of how to convince FindCookie users to test
condition and pass a nullptr when disabling double key logic.
michael@0 | 1 | ============ |
michael@0 | 2 | Crash Events |
michael@0 | 3 | ============ |
michael@0 | 4 | |
michael@0 | 5 | **Crash Events** refers to a special subsystem of Gecko that aims to capture |
michael@0 | 6 | events of interest related to process crashing and hanging. |
michael@0 | 7 | |
michael@0 | 8 | When an event worthy of recording occurs, a file containing that event's |
michael@0 | 9 | information is written to a well-defined location on the filesystem. The Gecko |
michael@0 | 10 | process periodically scans for produced files and consolidates information |
michael@0 | 11 | into a more unified and efficient backend store. |
michael@0 | 12 | |
michael@0 | 13 | Crash Event Files |
michael@0 | 14 | ================= |
michael@0 | 15 | |
michael@0 | 16 | When a crash-related event occurs, a file describing that event is written |
michael@0 | 17 | to a well-defined directory. That directory is likely in the directory of |
michael@0 | 18 | the currently-active profile. However, if a profile is not yet active in |
michael@0 | 19 | the Gecko process, that directory likely resides in the user's *app data* |
michael@0 | 20 | directory (*UAppData* from the directory service). |
michael@0 | 21 | |
michael@0 | 22 | The filename of the event file is not relevant. However, producers need |
michael@0 | 23 | to choose a filename intelligently to avoid name collisions and race |
michael@0 | 24 | conditions. Since file locking is potentially dangerous at crash time, |
michael@0 | 25 | the convention of generating a UUID and using it as a filename has been |
michael@0 | 26 | adopted. |
michael@0 | 27 | |
michael@0 | 28 | File Format |
michael@0 | 29 | ----------- |
michael@0 | 30 | |
michael@0 | 31 | All crash event files share the same high-level file format. The format |
michael@0 | 32 | consists of the following fields delimited by a UNIX newline (*\n*) |
michael@0 | 33 | character: |
michael@0 | 34 | |
michael@0 | 35 | * String event name (valid UTF-8, but likely ASCII) |
michael@0 | 36 | * String representation of integer seconds since UNIX epoch |
michael@0 | 37 | * Payload |
michael@0 | 38 | |
michael@0 | 39 | The payload is event specific and may contain UNIX newline characters. |
michael@0 | 40 | The recommended method for parsing is to split at most 3 times on UNIX |
michael@0 | 41 | newline and then dispatch to an event-specific parsed based on the |
michael@0 | 42 | event name. |
michael@0 | 43 | |
michael@0 | 44 | If an unknown event type is encountered, the event can safely be ignored |
michael@0 | 45 | until later. This helps ensure that application downgrades (potentially |
michael@0 | 46 | due to elevated crash rate) don't result in data loss. |
michael@0 | 47 | |
michael@0 | 48 | The format and semantics of each event type are meant to be constant once |
michael@0 | 49 | that event type is committed to the main Firefox repository. If new metadata |
michael@0 | 50 | needs to be captured or the meaning of data captured in an event changes, |
michael@0 | 51 | that change should be expressed through the invention of a new event type. |
michael@0 | 52 | For this reason, event names are highly recommended to contain a version. |
michael@0 | 53 | e.g. instead of a *Gecko process crashed* event, we prefer a *Gecko process |
michael@0 | 54 | crashed v1* event. |
michael@0 | 55 | |
michael@0 | 56 | Event Types |
michael@0 | 57 | ----------- |
michael@0 | 58 | |
michael@0 | 59 | Each subsection documents the different types of crash events that may be |
michael@0 | 60 | produced. Each section name corresponds to the first line of the crash |
michael@0 | 61 | event file. |
michael@0 | 62 | |
michael@0 | 63 | crash.main.1 |
michael@0 | 64 | ^^^^^^^^^^^^ |
michael@0 | 65 | |
michael@0 | 66 | This event is produced when the main process crashes. |
michael@0 | 67 | |
michael@0 | 68 | The payload of this event is the string crash ID, very likely a UUID. |
michael@0 | 69 | There should be ``UUID.dmp`` and ``UUID.extra`` files on disk, saved by |
michael@0 | 70 | Breakpad. |
michael@0 | 71 | |
michael@0 | 72 | crash.plugin.1 |
michael@0 | 73 | ^^^^^^^^^^^^^^ |
michael@0 | 74 | |
michael@0 | 75 | This event is produced when a plugin process crashes. |
michael@0 | 76 | |
michael@0 | 77 | The payload is identical to ``crash.main.1``'s. |
michael@0 | 78 | |
michael@0 | 79 | hang.plugin.1 |
michael@0 | 80 | ^^^^^^^^^^^^^ |
michael@0 | 81 | |
michael@0 | 82 | This event is produced when a plugin process hangs. |
michael@0 | 83 | |
michael@0 | 84 | The payload is identical to ``crash.main.1``'s. |
michael@0 | 85 | |
michael@0 | 86 | Aggregated Event Log |
michael@0 | 87 | ==================== |
michael@0 | 88 | |
michael@0 | 89 | Crash events are aggregated together into a unified event *log*. Currently, |
michael@0 | 90 | this *log* is really a JSON file. However, this is an implementation detail |
michael@0 | 91 | and it could change at any time. The interface to crash data provided by |
michael@0 | 92 | the JavaScript API is the only supported interface. |
michael@0 | 93 | |
michael@0 | 94 | Design Considerations |
michael@0 | 95 | ===================== |
michael@0 | 96 | |
michael@0 | 97 | There are many considerations influencing the design of this subsystem. |
michael@0 | 98 | We attempt to document them in this section. |
michael@0 | 99 | |
michael@0 | 100 | Decoupling of Event Files from Final Data Structure |
michael@0 | 101 | --------------------------------------------------- |
michael@0 | 102 | |
michael@0 | 103 | While it is certainly possible for the Gecko process to write directly to |
michael@0 | 104 | the final data structure on disk, there is an intentional decoupling between |
michael@0 | 105 | the production of events and their transition into final storage. Along the |
michael@0 | 106 | same vein, the choice to have events written to multiple files by producers |
michael@0 | 107 | is deliberate. |
michael@0 | 108 | |
michael@0 | 109 | Some recorded events are written immediately after a process crash. This is |
michael@0 | 110 | a very uncertain time for the host system. There is a high liklihood the |
michael@0 | 111 | system is in an exceptional state, such as memory exhaustion. Therefore, any |
michael@0 | 112 | action taken after crashing needs to be very deliberate about what it does. |
michael@0 | 113 | Excessive memory allocation and certain system calls may cause the system |
michael@0 | 114 | to crash again or the machine's condition to worsen. This means that the act |
michael@0 | 115 | of recording a crash event must be very light weight. Writing a new file from |
michael@0 | 116 | nothing is very light weight. This is one reason we write separate files. |
michael@0 | 117 | |
michael@0 | 118 | Another reason we write separate files is because if the main Gecko process |
michael@0 | 119 | itself crashes (as opposed to say a plugin process), the crash reporter (not |
michael@0 | 120 | Gecko) is running and the crash reporter needs to handle the writing of the |
michael@0 | 121 | event info. If this writing is involved (say loading, parsing, updating, and |
michael@0 | 122 | reserializing back to disk), this logic would need to be implemented in both |
michael@0 | 123 | Gecko and the crash reporter or would need to be implemented in such a way |
michael@0 | 124 | that both could use. Neither of these is very practical from a software |
michael@0 | 125 | lifecycle management perspective. It's much easier to have separate processes |
michael@0 | 126 | write a simple file and to let a single implementation do all the complex |
michael@0 | 127 | work. |
michael@0 | 128 | |
michael@0 | 129 | Idempotent Event Processing |
michael@0 | 130 | =========================== |
michael@0 | 131 | |
michael@0 | 132 | Processing of event files has been designed such that the result is |
michael@0 | 133 | idempotent regardless of what order those files are processed in. This is |
michael@0 | 134 | not only a good design decision, but it is arguably necessary. While event |
michael@0 | 135 | files are processed in order by file mtime, filesystem times may not have |
michael@0 | 136 | the resolution required for proper sorting. Therefore, processing order is |
michael@0 | 137 | merely an optimistic assumption. |
michael@0 | 138 | |
michael@0 | 139 | Aggregated Storage Format |
michael@0 | 140 | ========================= |
michael@0 | 141 | |
michael@0 | 142 | Crash events are aggregated into a unified data structure on disk. That data |
michael@0 | 143 | structure is currently LZ4-compressed JSON and is represented by a single file. |
michael@0 | 144 | |
michael@0 | 145 | The choice of a single JSON file was initially driven by time and complexity |
michael@0 | 146 | concerns. Before changing the format or adding significant amounts of new |
michael@0 | 147 | data, some considerations must be taken into account. |
michael@0 | 148 | |
michael@0 | 149 | First, in well-behaving installs, crash data should be minimal. Crashes and |
michael@0 | 150 | hangs will be rare and thus the size of the crash data should remain small |
michael@0 | 151 | over time. |
michael@0 | 152 | |
michael@0 | 153 | The choice of a single JSON file has larger implications as the amount of |
michael@0 | 154 | crash data grows. As new data is accumulated, we need to read and write |
michael@0 | 155 | an entire file to make small updates. LZ4 compression helps reduce I/O. |
michael@0 | 156 | But, there is a potential for unbounded file growth. We establish a |
michael@0 | 157 | limit for the max age of records. Anything older than that limit is |
michael@0 | 158 | pruned. We also establish a daily limit on the number of crashes we will |
michael@0 | 159 | store. All crashes beyond the first N in a day have no payload and are |
michael@0 | 160 | only recorded by the presence of a count. This count ensures we can |
michael@0 | 161 | distinguish between ``N`` and ``100 * N``, which are very different |
michael@0 | 162 | values! |