other-licenses/snappy/src/framing_format.txt

Tue, 06 Jan 2015 21:39:09 +0100

author
Michael Schloh von Bennewitz <michael@schloh.com>
date
Tue, 06 Jan 2015 21:39:09 +0100
branch
TOR_BUG_9701
changeset 8
97036ab72558
permissions
-rw-r--r--

Conditionally force memory storage according to privacy.thirdparty.isolate;
This solves Tor bug #9701, complying with disk avoidance documented in
https://www.torproject.org/projects/torbrowser/design/#disk-avoidance.

michael@0 1 Snappy framing format description
michael@0 2 Last revised: 2011-12-15
michael@0 3
michael@0 4 This format decribes a framing format for Snappy, allowing compressing to
michael@0 5 files or streams that can then more easily be decompressed without having
michael@0 6 to hold the entire stream in memory. It also provides data checksums to
michael@0 7 help verify integrity. It does not provide metadata checksums, so it does
michael@0 8 not protect against e.g. all forms of truncations.
michael@0 9
michael@0 10 Implementation of the framing format is optional for Snappy compressors and
michael@0 11 decompressor; it is not part of the Snappy core specification.
michael@0 12
michael@0 13
michael@0 14 1. General structure
michael@0 15
michael@0 16 The file consists solely of chunks, lying back-to-back with no padding
michael@0 17 in between. Each chunk consists first a single byte of chunk identifier,
michael@0 18 then a two-byte little-endian length of the chunk in bytes (from 0 to 65535,
michael@0 19 inclusive), and then the data if any. The three bytes of chunk header is not
michael@0 20 counted in the data length.
michael@0 21
michael@0 22 The different chunk types are listed below. The first chunk must always
michael@0 23 be the stream identifier chunk (see section 4.1, below). The stream
michael@0 24 ends when the file ends -- there is no explicit end-of-file marker.
michael@0 25
michael@0 26
michael@0 27 2. File type identification
michael@0 28
michael@0 29 The following identifiers for this format are recommended where appropriate.
michael@0 30 However, note that none have been registered officially, so this is only to
michael@0 31 be taken as a guideline. We use "Snappy framed" to distinguish between this
michael@0 32 format and raw Snappy data.
michael@0 33
michael@0 34 File extension: .sz
michael@0 35 MIME type: application/x-snappy-framed
michael@0 36 HTTP Content-Encoding: x-snappy-framed
michael@0 37
michael@0 38
michael@0 39 3. Checksum format
michael@0 40
michael@0 41 Some chunks have data protected by a checksum (the ones that do will say so
michael@0 42 explicitly). The checksums are always masked CRC-32Cs.
michael@0 43
michael@0 44 A description of CRC-32C can be found in RFC 3720, section 12.1, with
michael@0 45 examples in section B.4.
michael@0 46
michael@0 47 Checksums are not stored directly, but masked, as checksumming data and
michael@0 48 then its own checksum can be problematic. The masking is the same as used
michael@0 49 in Apache Hadoop: Rotate the checksum by 15 bits, then add the constant
michael@0 50 0xa282ead8 (using wraparound as normal for unsigned integers). This is
michael@0 51 equivalent to the following C code:
michael@0 52
michael@0 53 uint32_t mask_checksum(uint32_t x) {
michael@0 54 return ((x >> 15) | (x << 17)) + 0xa282ead8;
michael@0 55 }
michael@0 56
michael@0 57 Note that the masking is reversible.
michael@0 58
michael@0 59 The checksum is always stored as a four bytes long integer, in little-endian.
michael@0 60
michael@0 61
michael@0 62 4. Chunk types
michael@0 63
michael@0 64 The currently supported chunk types are described below. The list may
michael@0 65 be extended in the future.
michael@0 66
michael@0 67
michael@0 68 4.1. Stream identifier (chunk type 0xff)
michael@0 69
michael@0 70 The stream identifier is always the first element in the stream.
michael@0 71 It is exactly six bytes long and contains "sNaPpY" in ASCII. This means that
michael@0 72 a valid Snappy framed stream always starts with the bytes
michael@0 73
michael@0 74 0xff 0x06 0x00 0x73 0x4e 0x61 0x50 0x70 0x59
michael@0 75
michael@0 76 The stream identifier chunk can come multiple times in the stream besides
michael@0 77 the first; if such a chunk shows up, it should simply be ignored, assuming
michael@0 78 it has the right length and contents. This allows for easy concatenation of
michael@0 79 compressed files without the need for re-framing.
michael@0 80
michael@0 81
michael@0 82 4.2. Compressed data (chunk type 0x00)
michael@0 83
michael@0 84 Compressed data chunks contain a normal Snappy compressed bitstream;
michael@0 85 see the compressed format specification. The compressed data is preceded by
michael@0 86 the CRC-32C (see section 3) of the _uncompressed_ data.
michael@0 87
michael@0 88 Note that the data portion of the chunk, i.e., the compressed contents,
michael@0 89 can be at most 65531 bytes (2^16 - 1, minus the checksum).
michael@0 90 However, we place an additional restriction that the uncompressed data
michael@0 91 in a chunk must be no longer than 32768 bytes. This allows consumers to
michael@0 92 easily use small fixed-size buffers.
michael@0 93
michael@0 94
michael@0 95 4.3. Uncompressed data (chunk type 0x01)
michael@0 96
michael@0 97 Uncompressed data chunks allow a compressor to send uncompressed,
michael@0 98 raw data; this is useful if, for instance, uncompressible or
michael@0 99 near-incompressible data is detected, and faster decompression is desired.
michael@0 100
michael@0 101 As in the compressed chunks, the data is preceded by its own masked
michael@0 102 CRC-32C (see section 3).
michael@0 103
michael@0 104 An uncompressed data chunk, like compressed data chunks, should contain
michael@0 105 no more than 32768 data bytes, so the maximum legal chunk length with the
michael@0 106 checksum is 32772.
michael@0 107
michael@0 108
michael@0 109 4.4. Reserved unskippable chunks (chunk types 0x02-0x7f)
michael@0 110
michael@0 111 These are reserved for future expansion. A decoder that sees such a chunk
michael@0 112 should immediately return an error, as it must assume it cannot decode the
michael@0 113 stream correctly.
michael@0 114
michael@0 115 Future versions of this specification may define meanings for these chunks.
michael@0 116
michael@0 117
michael@0 118 4.5. Reserved skippable chunks (chunk types 0x80-0xfe)
michael@0 119
michael@0 120 These are also reserved for future expansion, but unlike the chunks
michael@0 121 described in 4.4, a decoder seeing these must skip them and continue
michael@0 122 decoding.
michael@0 123
michael@0 124 Future versions of this specification may define meanings for these chunks.

mercurial