|
1 .. _healthreport_identifiers: |
|
2 |
|
3 =========== |
|
4 Identifiers |
|
5 =========== |
|
6 |
|
7 Firefox Health Report records some identifiers to keep track of clients |
|
8 and uploaded documents. |
|
9 |
|
10 Identifier Types |
|
11 ================ |
|
12 |
|
13 Document/Upload IDs |
|
14 ------------------- |
|
15 |
|
16 A random UUID called the *Document ID* or *Upload ID* is generated when the FHR |
|
17 client creates or uploads a new document. |
|
18 |
|
19 When clients generate a new *Document ID*, they persist this ID to disk |
|
20 **before** the upload attempt. |
|
21 |
|
22 As part of the upload, the client sends all old *Document IDs* to the server |
|
23 and asks the server to delete them. In well-behaving clients, the server |
|
24 has a single record for each client with a randomly-changing *Document ID*. |
|
25 |
|
26 Client IDs |
|
27 ---------- |
|
28 |
|
29 A *Client ID* is an identifier that **attempts** to uniquely identify an |
|
30 individual FHR client. Please note the emphasis on *attempts* in that last |
|
31 sentence: *Client IDs* do not guarantee uniqueness. |
|
32 |
|
33 The *Client ID* is generated when the client first runs or as needed. |
|
34 |
|
35 The *Client ID* is transferred to the server as part of every upload. The |
|
36 server is thus able to affiliate multiple document uploads with a single |
|
37 *Client ID*. |
|
38 |
|
39 Client ID Versions |
|
40 ^^^^^^^^^^^^^^^^^^ |
|
41 |
|
42 The semantics for how a *Client ID* is generated are versioned. |
|
43 |
|
44 Version 1 |
|
45 The *Client ID* is a randomly-generated UUID. |
|
46 |
|
47 History of Identifiers |
|
48 ====================== |
|
49 |
|
50 In the beginning, there were just *Document IDs*. The thinking was clients |
|
51 would clean up after themselves and leave at most 1 active document on the |
|
52 server. |
|
53 |
|
54 Unfortunately, this did not work out. Using brute force analysis to |
|
55 deduplicate records on the server, a number of interesting patterns emerged. |
|
56 |
|
57 Orphaning |
|
58 Clients would upload a new payload while not deleting the old payload. |
|
59 |
|
60 Divergent records |
|
61 Records would share data up to a certain date and then the data would |
|
62 almost completely diverge. This appears to be indicative of profile |
|
63 copying. |
|
64 |
|
65 Rollback |
|
66 Records would share data up to a certain date. Each record in this set |
|
67 would contain data for a day or two but no extra data. This could be |
|
68 explained by filesystem rollback on the client. |
|
69 |
|
70 A significant percentage of the records on the server belonged to |
|
71 misbehaving clients. Identifying these records was extremely resource |
|
72 intensive and error-prone. These records were undermining the ability |
|
73 to use Firefox Health Report data. |
|
74 |
|
75 Thus, the *Client ID* was born. The intent of the *Client ID* was to |
|
76 uniquely identify clients so the extreme effort required and the |
|
77 questionable reliability of deduplicating server data would become |
|
78 problems of the past. |
|
79 |
|
80 The *Client ID* was originally a randomly-generated UUID (version 1). This |
|
81 allowed detection of orphaning and rollback. However, these version 1 |
|
82 *Client IDs* were still susceptible to use on multiple profiles and |
|
83 machines if the profile was copied. |