services/healthreport/docs/identifiers.rst

Wed, 31 Dec 2014 07:53:36 +0100

author
Michael Schloh von Bennewitz <michael@schloh.com>
date
Wed, 31 Dec 2014 07:53:36 +0100
branch
TOR_BUG_3246
changeset 5
4ab42b5ab56c
permissions
-rw-r--r--

Correct small whitespace inconsistency, lost while renaming variables.

     1 .. _healthreport_identifiers:
     3 ===========
     4 Identifiers
     5 ===========
     7 Firefox Health Report records some identifiers to keep track of clients
     8 and uploaded documents.
    10 Identifier Types
    11 ================
    13 Document/Upload IDs
    14 -------------------
    16 A random UUID called the *Document ID* or *Upload ID* is generated when the FHR
    17 client creates or uploads a new document.
    19 When clients generate a new *Document ID*, they persist this ID to disk
    20 **before** the upload attempt.
    22 As part of the upload, the client sends all old *Document IDs* to the server
    23 and asks the server to delete them. In well-behaving clients, the server
    24 has a single record for each client with a randomly-changing *Document ID*.
    26 Client IDs
    27 ----------
    29 A *Client ID* is an identifier that **attempts** to uniquely identify an
    30 individual FHR client. Please note the emphasis on *attempts* in that last
    31 sentence: *Client IDs* do not guarantee uniqueness.
    33 The *Client ID* is generated when the client first runs or as needed.
    35 The *Client ID* is transferred to the server as part of every upload. The
    36 server is thus able to affiliate multiple document uploads with a single
    37 *Client ID*.
    39 Client ID Versions
    40 ^^^^^^^^^^^^^^^^^^
    42 The semantics for how a *Client ID* is generated are versioned.
    44 Version 1
    45    The *Client ID* is a randomly-generated UUID.
    47 History of Identifiers
    48 ======================
    50 In the beginning, there were just *Document IDs*. The thinking was clients
    51 would clean up after themselves and leave at most 1 active document on the
    52 server.
    54 Unfortunately, this did not work out. Using brute force analysis to
    55 deduplicate records on the server, a number of interesting patterns emerged.
    57 Orphaning
    58    Clients would upload a new payload while not deleting the old payload.
    60 Divergent records
    61    Records would share data up to a certain date and then the data would
    62    almost completely diverge. This appears to be indicative of profile
    63    copying.
    65 Rollback
    66    Records would share data up to a certain date. Each record in this set
    67    would contain data for a day or two but no extra data. This could be
    68    explained by filesystem rollback on the client.
    70 A significant percentage of the records on the server belonged to
    71 misbehaving clients. Identifying these records was extremely resource
    72 intensive and error-prone. These records were undermining the ability
    73 to use Firefox Health Report data.
    75 Thus, the *Client ID* was born. The intent of the *Client ID* was to
    76 uniquely identify clients so the extreme effort required and the
    77 questionable reliability of deduplicating server data would become
    78 problems of the past.
    80 The *Client ID* was originally a randomly-generated UUID (version 1). This
    81 allowed detection of orphaning and rollback. However, these version 1
    82 *Client IDs* were still susceptible to use on multiple profiles and
    83 machines if the profile was copied.

mercurial