services/healthreport/docs/identifiers.rst

changeset 0
6474c204b198
     1.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     1.2 +++ b/services/healthreport/docs/identifiers.rst	Wed Dec 31 06:09:35 2014 +0100
     1.3 @@ -0,0 +1,83 @@
     1.4 +.. _healthreport_identifiers:
     1.5 +
     1.6 +===========
     1.7 +Identifiers
     1.8 +===========
     1.9 +
    1.10 +Firefox Health Report records some identifiers to keep track of clients
    1.11 +and uploaded documents.
    1.12 +
    1.13 +Identifier Types
    1.14 +================
    1.15 +
    1.16 +Document/Upload IDs
    1.17 +-------------------
    1.18 +
    1.19 +A random UUID called the *Document ID* or *Upload ID* is generated when the FHR
    1.20 +client creates or uploads a new document.
    1.21 +
    1.22 +When clients generate a new *Document ID*, they persist this ID to disk
    1.23 +**before** the upload attempt.
    1.24 +
    1.25 +As part of the upload, the client sends all old *Document IDs* to the server
    1.26 +and asks the server to delete them. In well-behaving clients, the server
    1.27 +has a single record for each client with a randomly-changing *Document ID*.
    1.28 +
    1.29 +Client IDs
    1.30 +----------
    1.31 +
    1.32 +A *Client ID* is an identifier that **attempts** to uniquely identify an
    1.33 +individual FHR client. Please note the emphasis on *attempts* in that last
    1.34 +sentence: *Client IDs* do not guarantee uniqueness.
    1.35 +
    1.36 +The *Client ID* is generated when the client first runs or as needed.
    1.37 +
    1.38 +The *Client ID* is transferred to the server as part of every upload. The
    1.39 +server is thus able to affiliate multiple document uploads with a single
    1.40 +*Client ID*.
    1.41 +
    1.42 +Client ID Versions
    1.43 +^^^^^^^^^^^^^^^^^^
    1.44 +
    1.45 +The semantics for how a *Client ID* is generated are versioned.
    1.46 +
    1.47 +Version 1
    1.48 +   The *Client ID* is a randomly-generated UUID.
    1.49 +
    1.50 +History of Identifiers
    1.51 +======================
    1.52 +
    1.53 +In the beginning, there were just *Document IDs*. The thinking was clients
    1.54 +would clean up after themselves and leave at most 1 active document on the
    1.55 +server.
    1.56 +
    1.57 +Unfortunately, this did not work out. Using brute force analysis to
    1.58 +deduplicate records on the server, a number of interesting patterns emerged.
    1.59 +
    1.60 +Orphaning
    1.61 +   Clients would upload a new payload while not deleting the old payload.
    1.62 +
    1.63 +Divergent records
    1.64 +   Records would share data up to a certain date and then the data would
    1.65 +   almost completely diverge. This appears to be indicative of profile
    1.66 +   copying.
    1.67 +
    1.68 +Rollback
    1.69 +   Records would share data up to a certain date. Each record in this set
    1.70 +   would contain data for a day or two but no extra data. This could be
    1.71 +   explained by filesystem rollback on the client.
    1.72 +
    1.73 +A significant percentage of the records on the server belonged to
    1.74 +misbehaving clients. Identifying these records was extremely resource
    1.75 +intensive and error-prone. These records were undermining the ability
    1.76 +to use Firefox Health Report data.
    1.77 +
    1.78 +Thus, the *Client ID* was born. The intent of the *Client ID* was to
    1.79 +uniquely identify clients so the extreme effort required and the
    1.80 +questionable reliability of deduplicating server data would become
    1.81 +problems of the past.
    1.82 +
    1.83 +The *Client ID* was originally a randomly-generated UUID (version 1). This
    1.84 +allowed detection of orphaning and rollback. However, these version 1
    1.85 +*Client IDs* were still susceptible to use on multiple profiles and
    1.86 +machines if the profile was copied.

mercurial