michael@0: .. _healthreport_identifiers:
michael@0: 
michael@0: ===========
michael@0: Identifiers
michael@0: ===========
michael@0: 
michael@0: Firefox Health Report records some identifiers to keep track of clients
michael@0: and uploaded documents.
michael@0: 
michael@0: Identifier Types
michael@0: ================
michael@0: 
michael@0: Document/Upload IDs
michael@0: -------------------
michael@0: 
michael@0: A random UUID called the *Document ID* or *Upload ID* is generated when the FHR
michael@0: client creates or uploads a new document.
michael@0: 
michael@0: When clients generate a new *Document ID*, they persist this ID to disk
michael@0: **before** the upload attempt.
michael@0: 
michael@0: As part of the upload, the client sends all old *Document IDs* to the server
michael@0: and asks the server to delete them. In well-behaving clients, the server
michael@0: has a single record for each client with a randomly-changing *Document ID*.
michael@0: 
michael@0: Client IDs
michael@0: ----------
michael@0: 
michael@0: A *Client ID* is an identifier that **attempts** to uniquely identify an
michael@0: individual FHR client. Please note the emphasis on *attempts* in that last
michael@0: sentence: *Client IDs* do not guarantee uniqueness.
michael@0: 
michael@0: The *Client ID* is generated when the client first runs or as needed.
michael@0: 
michael@0: The *Client ID* is transferred to the server as part of every upload. The
michael@0: server is thus able to affiliate multiple document uploads with a single
michael@0: *Client ID*.
michael@0: 
michael@0: Client ID Versions
michael@0: ^^^^^^^^^^^^^^^^^^
michael@0: 
michael@0: The semantics for how a *Client ID* is generated are versioned.
michael@0: 
michael@0: Version 1
michael@0:    The *Client ID* is a randomly-generated UUID.
michael@0: 
michael@0: History of Identifiers
michael@0: ======================
michael@0: 
michael@0: In the beginning, there were just *Document IDs*. The thinking was clients
michael@0: would clean up after themselves and leave at most 1 active document on the
michael@0: server.
michael@0: 
michael@0: Unfortunately, this did not work out. Using brute force analysis to
michael@0: deduplicate records on the server, a number of interesting patterns emerged.
michael@0: 
michael@0: Orphaning
michael@0:    Clients would upload a new payload while not deleting the old payload.
michael@0: 
michael@0: Divergent records
michael@0:    Records would share data up to a certain date and then the data would
michael@0:    almost completely diverge. This appears to be indicative of profile
michael@0:    copying.
michael@0: 
michael@0: Rollback
michael@0:    Records would share data up to a certain date. Each record in this set
michael@0:    would contain data for a day or two but no extra data. This could be
michael@0:    explained by filesystem rollback on the client.
michael@0: 
michael@0: A significant percentage of the records on the server belonged to
michael@0: misbehaving clients. Identifying these records was extremely resource
michael@0: intensive and error-prone. These records were undermining the ability
michael@0: to use Firefox Health Report data.
michael@0: 
michael@0: Thus, the *Client ID* was born. The intent of the *Client ID* was to
michael@0: uniquely identify clients so the extreme effort required and the
michael@0: questionable reliability of deduplicating server data would become
michael@0: problems of the past.
michael@0: 
michael@0: The *Client ID* was originally a randomly-generated UUID (version 1). This
michael@0: allowed detection of orphaning and rollback. However, these version 1
michael@0: *Client IDs* were still susceptible to use on multiple profiles and
michael@0: machines if the profile was copied.