Wed, 31 Dec 2014 07:53:36 +0100
Correct small whitespace inconsistency, lost while renaming variables.
1 .. _healthreport_identifiers:
3 ===========
4 Identifiers
5 ===========
7 Firefox Health Report records some identifiers to keep track of clients
8 and uploaded documents.
10 Identifier Types
11 ================
13 Document/Upload IDs
14 -------------------
16 A random UUID called the *Document ID* or *Upload ID* is generated when the FHR
17 client creates or uploads a new document.
19 When clients generate a new *Document ID*, they persist this ID to disk
20 **before** the upload attempt.
22 As part of the upload, the client sends all old *Document IDs* to the server
23 and asks the server to delete them. In well-behaving clients, the server
24 has a single record for each client with a randomly-changing *Document ID*.
26 Client IDs
27 ----------
29 A *Client ID* is an identifier that **attempts** to uniquely identify an
30 individual FHR client. Please note the emphasis on *attempts* in that last
31 sentence: *Client IDs* do not guarantee uniqueness.
33 The *Client ID* is generated when the client first runs or as needed.
35 The *Client ID* is transferred to the server as part of every upload. The
36 server is thus able to affiliate multiple document uploads with a single
37 *Client ID*.
39 Client ID Versions
40 ^^^^^^^^^^^^^^^^^^
42 The semantics for how a *Client ID* is generated are versioned.
44 Version 1
45 The *Client ID* is a randomly-generated UUID.
47 History of Identifiers
48 ======================
50 In the beginning, there were just *Document IDs*. The thinking was clients
51 would clean up after themselves and leave at most 1 active document on the
52 server.
54 Unfortunately, this did not work out. Using brute force analysis to
55 deduplicate records on the server, a number of interesting patterns emerged.
57 Orphaning
58 Clients would upload a new payload while not deleting the old payload.
60 Divergent records
61 Records would share data up to a certain date and then the data would
62 almost completely diverge. This appears to be indicative of profile
63 copying.
65 Rollback
66 Records would share data up to a certain date. Each record in this set
67 would contain data for a day or two but no extra data. This could be
68 explained by filesystem rollback on the client.
70 A significant percentage of the records on the server belonged to
71 misbehaving clients. Identifying these records was extremely resource
72 intensive and error-prone. These records were undermining the ability
73 to use Firefox Health Report data.
75 Thus, the *Client ID* was born. The intent of the *Client ID* was to
76 uniquely identify clients so the extreme effort required and the
77 questionable reliability of deduplicating server data would become
78 problems of the past.
80 The *Client ID* was originally a randomly-generated UUID (version 1). This
81 allowed detection of orphaning and rollback. However, these version 1
82 *Client IDs* were still susceptible to use on multiple profiles and
83 machines if the profile was copied.