Tue, 06 Jan 2015 21:39:09 +0100
Conditionally force memory storage according to privacy.thirdparty.isolate;
This solves Tor bug #9701, complying with disk avoidance documented in
https://www.torproject.org/projects/torbrowser/design/#disk-avoidance.
michael@0 | 1 | Snappy framing format description |
michael@0 | 2 | Last revised: 2011-12-15 |
michael@0 | 3 | |
michael@0 | 4 | This format decribes a framing format for Snappy, allowing compressing to |
michael@0 | 5 | files or streams that can then more easily be decompressed without having |
michael@0 | 6 | to hold the entire stream in memory. It also provides data checksums to |
michael@0 | 7 | help verify integrity. It does not provide metadata checksums, so it does |
michael@0 | 8 | not protect against e.g. all forms of truncations. |
michael@0 | 9 | |
michael@0 | 10 | Implementation of the framing format is optional for Snappy compressors and |
michael@0 | 11 | decompressor; it is not part of the Snappy core specification. |
michael@0 | 12 | |
michael@0 | 13 | |
michael@0 | 14 | 1. General structure |
michael@0 | 15 | |
michael@0 | 16 | The file consists solely of chunks, lying back-to-back with no padding |
michael@0 | 17 | in between. Each chunk consists first a single byte of chunk identifier, |
michael@0 | 18 | then a two-byte little-endian length of the chunk in bytes (from 0 to 65535, |
michael@0 | 19 | inclusive), and then the data if any. The three bytes of chunk header is not |
michael@0 | 20 | counted in the data length. |
michael@0 | 21 | |
michael@0 | 22 | The different chunk types are listed below. The first chunk must always |
michael@0 | 23 | be the stream identifier chunk (see section 4.1, below). The stream |
michael@0 | 24 | ends when the file ends -- there is no explicit end-of-file marker. |
michael@0 | 25 | |
michael@0 | 26 | |
michael@0 | 27 | 2. File type identification |
michael@0 | 28 | |
michael@0 | 29 | The following identifiers for this format are recommended where appropriate. |
michael@0 | 30 | However, note that none have been registered officially, so this is only to |
michael@0 | 31 | be taken as a guideline. We use "Snappy framed" to distinguish between this |
michael@0 | 32 | format and raw Snappy data. |
michael@0 | 33 | |
michael@0 | 34 | File extension: .sz |
michael@0 | 35 | MIME type: application/x-snappy-framed |
michael@0 | 36 | HTTP Content-Encoding: x-snappy-framed |
michael@0 | 37 | |
michael@0 | 38 | |
michael@0 | 39 | 3. Checksum format |
michael@0 | 40 | |
michael@0 | 41 | Some chunks have data protected by a checksum (the ones that do will say so |
michael@0 | 42 | explicitly). The checksums are always masked CRC-32Cs. |
michael@0 | 43 | |
michael@0 | 44 | A description of CRC-32C can be found in RFC 3720, section 12.1, with |
michael@0 | 45 | examples in section B.4. |
michael@0 | 46 | |
michael@0 | 47 | Checksums are not stored directly, but masked, as checksumming data and |
michael@0 | 48 | then its own checksum can be problematic. The masking is the same as used |
michael@0 | 49 | in Apache Hadoop: Rotate the checksum by 15 bits, then add the constant |
michael@0 | 50 | 0xa282ead8 (using wraparound as normal for unsigned integers). This is |
michael@0 | 51 | equivalent to the following C code: |
michael@0 | 52 | |
michael@0 | 53 | uint32_t mask_checksum(uint32_t x) { |
michael@0 | 54 | return ((x >> 15) | (x << 17)) + 0xa282ead8; |
michael@0 | 55 | } |
michael@0 | 56 | |
michael@0 | 57 | Note that the masking is reversible. |
michael@0 | 58 | |
michael@0 | 59 | The checksum is always stored as a four bytes long integer, in little-endian. |
michael@0 | 60 | |
michael@0 | 61 | |
michael@0 | 62 | 4. Chunk types |
michael@0 | 63 | |
michael@0 | 64 | The currently supported chunk types are described below. The list may |
michael@0 | 65 | be extended in the future. |
michael@0 | 66 | |
michael@0 | 67 | |
michael@0 | 68 | 4.1. Stream identifier (chunk type 0xff) |
michael@0 | 69 | |
michael@0 | 70 | The stream identifier is always the first element in the stream. |
michael@0 | 71 | It is exactly six bytes long and contains "sNaPpY" in ASCII. This means that |
michael@0 | 72 | a valid Snappy framed stream always starts with the bytes |
michael@0 | 73 | |
michael@0 | 74 | 0xff 0x06 0x00 0x73 0x4e 0x61 0x50 0x70 0x59 |
michael@0 | 75 | |
michael@0 | 76 | The stream identifier chunk can come multiple times in the stream besides |
michael@0 | 77 | the first; if such a chunk shows up, it should simply be ignored, assuming |
michael@0 | 78 | it has the right length and contents. This allows for easy concatenation of |
michael@0 | 79 | compressed files without the need for re-framing. |
michael@0 | 80 | |
michael@0 | 81 | |
michael@0 | 82 | 4.2. Compressed data (chunk type 0x00) |
michael@0 | 83 | |
michael@0 | 84 | Compressed data chunks contain a normal Snappy compressed bitstream; |
michael@0 | 85 | see the compressed format specification. The compressed data is preceded by |
michael@0 | 86 | the CRC-32C (see section 3) of the _uncompressed_ data. |
michael@0 | 87 | |
michael@0 | 88 | Note that the data portion of the chunk, i.e., the compressed contents, |
michael@0 | 89 | can be at most 65531 bytes (2^16 - 1, minus the checksum). |
michael@0 | 90 | However, we place an additional restriction that the uncompressed data |
michael@0 | 91 | in a chunk must be no longer than 32768 bytes. This allows consumers to |
michael@0 | 92 | easily use small fixed-size buffers. |
michael@0 | 93 | |
michael@0 | 94 | |
michael@0 | 95 | 4.3. Uncompressed data (chunk type 0x01) |
michael@0 | 96 | |
michael@0 | 97 | Uncompressed data chunks allow a compressor to send uncompressed, |
michael@0 | 98 | raw data; this is useful if, for instance, uncompressible or |
michael@0 | 99 | near-incompressible data is detected, and faster decompression is desired. |
michael@0 | 100 | |
michael@0 | 101 | As in the compressed chunks, the data is preceded by its own masked |
michael@0 | 102 | CRC-32C (see section 3). |
michael@0 | 103 | |
michael@0 | 104 | An uncompressed data chunk, like compressed data chunks, should contain |
michael@0 | 105 | no more than 32768 data bytes, so the maximum legal chunk length with the |
michael@0 | 106 | checksum is 32772. |
michael@0 | 107 | |
michael@0 | 108 | |
michael@0 | 109 | 4.4. Reserved unskippable chunks (chunk types 0x02-0x7f) |
michael@0 | 110 | |
michael@0 | 111 | These are reserved for future expansion. A decoder that sees such a chunk |
michael@0 | 112 | should immediately return an error, as it must assume it cannot decode the |
michael@0 | 113 | stream correctly. |
michael@0 | 114 | |
michael@0 | 115 | Future versions of this specification may define meanings for these chunks. |
michael@0 | 116 | |
michael@0 | 117 | |
michael@0 | 118 | 4.5. Reserved skippable chunks (chunk types 0x80-0xfe) |
michael@0 | 119 | |
michael@0 | 120 | These are also reserved for future expansion, but unlike the chunks |
michael@0 | 121 | described in 4.4, a decoder seeing these must skip them and continue |
michael@0 | 122 | decoding. |
michael@0 | 123 | |
michael@0 | 124 | Future versions of this specification may define meanings for these chunks. |