Wed, 31 Dec 2014 07:53:36 +0100
Correct small whitespace inconsistency, lost while renaming variables.
michael@0 | 1 | .. _healthreport_identifiers: |
michael@0 | 2 | |
michael@0 | 3 | =========== |
michael@0 | 4 | Identifiers |
michael@0 | 5 | =========== |
michael@0 | 6 | |
michael@0 | 7 | Firefox Health Report records some identifiers to keep track of clients |
michael@0 | 8 | and uploaded documents. |
michael@0 | 9 | |
michael@0 | 10 | Identifier Types |
michael@0 | 11 | ================ |
michael@0 | 12 | |
michael@0 | 13 | Document/Upload IDs |
michael@0 | 14 | ------------------- |
michael@0 | 15 | |
michael@0 | 16 | A random UUID called the *Document ID* or *Upload ID* is generated when the FHR |
michael@0 | 17 | client creates or uploads a new document. |
michael@0 | 18 | |
michael@0 | 19 | When clients generate a new *Document ID*, they persist this ID to disk |
michael@0 | 20 | **before** the upload attempt. |
michael@0 | 21 | |
michael@0 | 22 | As part of the upload, the client sends all old *Document IDs* to the server |
michael@0 | 23 | and asks the server to delete them. In well-behaving clients, the server |
michael@0 | 24 | has a single record for each client with a randomly-changing *Document ID*. |
michael@0 | 25 | |
michael@0 | 26 | Client IDs |
michael@0 | 27 | ---------- |
michael@0 | 28 | |
michael@0 | 29 | A *Client ID* is an identifier that **attempts** to uniquely identify an |
michael@0 | 30 | individual FHR client. Please note the emphasis on *attempts* in that last |
michael@0 | 31 | sentence: *Client IDs* do not guarantee uniqueness. |
michael@0 | 32 | |
michael@0 | 33 | The *Client ID* is generated when the client first runs or as needed. |
michael@0 | 34 | |
michael@0 | 35 | The *Client ID* is transferred to the server as part of every upload. The |
michael@0 | 36 | server is thus able to affiliate multiple document uploads with a single |
michael@0 | 37 | *Client ID*. |
michael@0 | 38 | |
michael@0 | 39 | Client ID Versions |
michael@0 | 40 | ^^^^^^^^^^^^^^^^^^ |
michael@0 | 41 | |
michael@0 | 42 | The semantics for how a *Client ID* is generated are versioned. |
michael@0 | 43 | |
michael@0 | 44 | Version 1 |
michael@0 | 45 | The *Client ID* is a randomly-generated UUID. |
michael@0 | 46 | |
michael@0 | 47 | History of Identifiers |
michael@0 | 48 | ====================== |
michael@0 | 49 | |
michael@0 | 50 | In the beginning, there were just *Document IDs*. The thinking was clients |
michael@0 | 51 | would clean up after themselves and leave at most 1 active document on the |
michael@0 | 52 | server. |
michael@0 | 53 | |
michael@0 | 54 | Unfortunately, this did not work out. Using brute force analysis to |
michael@0 | 55 | deduplicate records on the server, a number of interesting patterns emerged. |
michael@0 | 56 | |
michael@0 | 57 | Orphaning |
michael@0 | 58 | Clients would upload a new payload while not deleting the old payload. |
michael@0 | 59 | |
michael@0 | 60 | Divergent records |
michael@0 | 61 | Records would share data up to a certain date and then the data would |
michael@0 | 62 | almost completely diverge. This appears to be indicative of profile |
michael@0 | 63 | copying. |
michael@0 | 64 | |
michael@0 | 65 | Rollback |
michael@0 | 66 | Records would share data up to a certain date. Each record in this set |
michael@0 | 67 | would contain data for a day or two but no extra data. This could be |
michael@0 | 68 | explained by filesystem rollback on the client. |
michael@0 | 69 | |
michael@0 | 70 | A significant percentage of the records on the server belonged to |
michael@0 | 71 | misbehaving clients. Identifying these records was extremely resource |
michael@0 | 72 | intensive and error-prone. These records were undermining the ability |
michael@0 | 73 | to use Firefox Health Report data. |
michael@0 | 74 | |
michael@0 | 75 | Thus, the *Client ID* was born. The intent of the *Client ID* was to |
michael@0 | 76 | uniquely identify clients so the extreme effort required and the |
michael@0 | 77 | questionable reliability of deduplicating server data would become |
michael@0 | 78 | problems of the past. |
michael@0 | 79 | |
michael@0 | 80 | The *Client ID* was originally a randomly-generated UUID (version 1). This |
michael@0 | 81 | allowed detection of orphaning and rollback. However, these version 1 |
michael@0 | 82 | *Client IDs* were still susceptible to use on multiple profiles and |
michael@0 | 83 | machines if the profile was copied. |