1.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 1.2 +++ b/services/healthreport/docs/identifiers.rst Wed Dec 31 06:09:35 2014 +0100 1.3 @@ -0,0 +1,83 @@ 1.4 +.. _healthreport_identifiers: 1.5 + 1.6 +=========== 1.7 +Identifiers 1.8 +=========== 1.9 + 1.10 +Firefox Health Report records some identifiers to keep track of clients 1.11 +and uploaded documents. 1.12 + 1.13 +Identifier Types 1.14 +================ 1.15 + 1.16 +Document/Upload IDs 1.17 +------------------- 1.18 + 1.19 +A random UUID called the *Document ID* or *Upload ID* is generated when the FHR 1.20 +client creates or uploads a new document. 1.21 + 1.22 +When clients generate a new *Document ID*, they persist this ID to disk 1.23 +**before** the upload attempt. 1.24 + 1.25 +As part of the upload, the client sends all old *Document IDs* to the server 1.26 +and asks the server to delete them. In well-behaving clients, the server 1.27 +has a single record for each client with a randomly-changing *Document ID*. 1.28 + 1.29 +Client IDs 1.30 +---------- 1.31 + 1.32 +A *Client ID* is an identifier that **attempts** to uniquely identify an 1.33 +individual FHR client. Please note the emphasis on *attempts* in that last 1.34 +sentence: *Client IDs* do not guarantee uniqueness. 1.35 + 1.36 +The *Client ID* is generated when the client first runs or as needed. 1.37 + 1.38 +The *Client ID* is transferred to the server as part of every upload. The 1.39 +server is thus able to affiliate multiple document uploads with a single 1.40 +*Client ID*. 1.41 + 1.42 +Client ID Versions 1.43 +^^^^^^^^^^^^^^^^^^ 1.44 + 1.45 +The semantics for how a *Client ID* is generated are versioned. 1.46 + 1.47 +Version 1 1.48 + The *Client ID* is a randomly-generated UUID. 1.49 + 1.50 +History of Identifiers 1.51 +====================== 1.52 + 1.53 +In the beginning, there were just *Document IDs*. The thinking was clients 1.54 +would clean up after themselves and leave at most 1 active document on the 1.55 +server. 1.56 + 1.57 +Unfortunately, this did not work out. Using brute force analysis to 1.58 +deduplicate records on the server, a number of interesting patterns emerged. 1.59 + 1.60 +Orphaning 1.61 + Clients would upload a new payload while not deleting the old payload. 1.62 + 1.63 +Divergent records 1.64 + Records would share data up to a certain date and then the data would 1.65 + almost completely diverge. This appears to be indicative of profile 1.66 + copying. 1.67 + 1.68 +Rollback 1.69 + Records would share data up to a certain date. Each record in this set 1.70 + would contain data for a day or two but no extra data. This could be 1.71 + explained by filesystem rollback on the client. 1.72 + 1.73 +A significant percentage of the records on the server belonged to 1.74 +misbehaving clients. Identifying these records was extremely resource 1.75 +intensive and error-prone. These records were undermining the ability 1.76 +to use Firefox Health Report data. 1.77 + 1.78 +Thus, the *Client ID* was born. The intent of the *Client ID* was to 1.79 +uniquely identify clients so the extreme effort required and the 1.80 +questionable reliability of deduplicating server data would become 1.81 +problems of the past. 1.82 + 1.83 +The *Client ID* was originally a randomly-generated UUID (version 1). This 1.84 +allowed detection of orphaning and rollback. However, these version 1 1.85 +*Client IDs* were still susceptible to use on multiple profiles and 1.86 +machines if the profile was copied.