services/healthreport/docs/identifiers.rst

Tue, 06 Jan 2015 21:39:09 +0100

author
Michael Schloh von Bennewitz <michael@schloh.com>
date
Tue, 06 Jan 2015 21:39:09 +0100
branch
TOR_BUG_9701
changeset 8
97036ab72558
permissions
-rw-r--r--

Conditionally force memory storage according to privacy.thirdparty.isolate;
This solves Tor bug #9701, complying with disk avoidance documented in
https://www.torproject.org/projects/torbrowser/design/#disk-avoidance.

michael@0 1 .. _healthreport_identifiers:
michael@0 2
michael@0 3 ===========
michael@0 4 Identifiers
michael@0 5 ===========
michael@0 6
michael@0 7 Firefox Health Report records some identifiers to keep track of clients
michael@0 8 and uploaded documents.
michael@0 9
michael@0 10 Identifier Types
michael@0 11 ================
michael@0 12
michael@0 13 Document/Upload IDs
michael@0 14 -------------------
michael@0 15
michael@0 16 A random UUID called the *Document ID* or *Upload ID* is generated when the FHR
michael@0 17 client creates or uploads a new document.
michael@0 18
michael@0 19 When clients generate a new *Document ID*, they persist this ID to disk
michael@0 20 **before** the upload attempt.
michael@0 21
michael@0 22 As part of the upload, the client sends all old *Document IDs* to the server
michael@0 23 and asks the server to delete them. In well-behaving clients, the server
michael@0 24 has a single record for each client with a randomly-changing *Document ID*.
michael@0 25
michael@0 26 Client IDs
michael@0 27 ----------
michael@0 28
michael@0 29 A *Client ID* is an identifier that **attempts** to uniquely identify an
michael@0 30 individual FHR client. Please note the emphasis on *attempts* in that last
michael@0 31 sentence: *Client IDs* do not guarantee uniqueness.
michael@0 32
michael@0 33 The *Client ID* is generated when the client first runs or as needed.
michael@0 34
michael@0 35 The *Client ID* is transferred to the server as part of every upload. The
michael@0 36 server is thus able to affiliate multiple document uploads with a single
michael@0 37 *Client ID*.
michael@0 38
michael@0 39 Client ID Versions
michael@0 40 ^^^^^^^^^^^^^^^^^^
michael@0 41
michael@0 42 The semantics for how a *Client ID* is generated are versioned.
michael@0 43
michael@0 44 Version 1
michael@0 45 The *Client ID* is a randomly-generated UUID.
michael@0 46
michael@0 47 History of Identifiers
michael@0 48 ======================
michael@0 49
michael@0 50 In the beginning, there were just *Document IDs*. The thinking was clients
michael@0 51 would clean up after themselves and leave at most 1 active document on the
michael@0 52 server.
michael@0 53
michael@0 54 Unfortunately, this did not work out. Using brute force analysis to
michael@0 55 deduplicate records on the server, a number of interesting patterns emerged.
michael@0 56
michael@0 57 Orphaning
michael@0 58 Clients would upload a new payload while not deleting the old payload.
michael@0 59
michael@0 60 Divergent records
michael@0 61 Records would share data up to a certain date and then the data would
michael@0 62 almost completely diverge. This appears to be indicative of profile
michael@0 63 copying.
michael@0 64
michael@0 65 Rollback
michael@0 66 Records would share data up to a certain date. Each record in this set
michael@0 67 would contain data for a day or two but no extra data. This could be
michael@0 68 explained by filesystem rollback on the client.
michael@0 69
michael@0 70 A significant percentage of the records on the server belonged to
michael@0 71 misbehaving clients. Identifying these records was extremely resource
michael@0 72 intensive and error-prone. These records were undermining the ability
michael@0 73 to use Firefox Health Report data.
michael@0 74
michael@0 75 Thus, the *Client ID* was born. The intent of the *Client ID* was to
michael@0 76 uniquely identify clients so the extreme effort required and the
michael@0 77 questionable reliability of deduplicating server data would become
michael@0 78 problems of the past.
michael@0 79
michael@0 80 The *Client ID* was originally a randomly-generated UUID (version 1). This
michael@0 81 allowed detection of orphaning and rollback. However, these version 1
michael@0 82 *Client IDs* were still susceptible to use on multiple profiles and
michael@0 83 machines if the profile was copied.

mercurial