The Tor Browser: diff modules/libjar/appnote.txt

     1.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     1.2 +++ b/modules/libjar/appnote.txt	Wed Dec 31 06:09:35 2014 +0100
     1.3 @@ -0,0 +1,1192 @@
     1.4 +Revised: 03/01/1999
     1.5 +
     1.6 +Disclaimer
     1.7 +----------
     1.8 +
     1.9 +Although PKWARE will attempt to supply current and accurate
    1.10 +information relating to its file formats, algorithms, and the
    1.11 +subject programs, the possibility of error can not be eliminated.
    1.12 +PKWARE therefore expressly disclaims any warranty that the
    1.13 +information contained in the associated materials relating to the
    1.14 +subject programs and/or the format of the files created or
    1.15 +accessed by the subject programs and/or the algorithms used by
    1.16 +the subject programs, or any other matter, is current, correct or
    1.17 +accurate as delivered.  Any risk of damage due to any possible
    1.18 +inaccurate information is assumed by the user of the information.
    1.19 +Furthermore, the information relating to the subject programs
    1.20 +and/or the file formats created or accessed by the subject
    1.21 +programs and/or the algorithms used by the subject programs is
    1.22 +subject to change without notice.
    1.23 +
    1.24 +General Format of a ZIP file
    1.25 +----------------------------
    1.26 +
    1.27 +  Files stored in arbitrary order.  Large zipfiles can span multiple
    1.28 +  diskette media.
    1.29 +
    1.30 +  Overall zipfile format:
    1.31 +
    1.32 +    [local file header + file data + data_descriptor] . . .
    1.33 +    [central directory] end of central directory record
    1.34 +
    1.35 +
    1.36 +  A.  Local file header:
    1.37 +
    1.38 +        local file header signature     4 bytes  (0x04034b50)
    1.39 +        version needed to extract       2 bytes
    1.40 +        general purpose bit flag        2 bytes
    1.41 +        compression method              2 bytes
    1.42 +        last mod file time              2 bytes
    1.43 +        last mod file date              2 bytes
    1.44 +        crc-32                          4 bytes
    1.45 +        compressed size                 4 bytes
    1.46 +        uncompressed size               4 bytes
    1.47 +        filename length                 2 bytes
    1.48 +        extra field length              2 bytes
    1.49 +
    1.50 +        filename (variable size)
    1.51 +        extra field (variable size)
    1.52 +
    1.53 +  B.  Data descriptor:
    1.54 +
    1.55 +        crc-32                          4 bytes
    1.56 +        compressed size                 4 bytes
    1.57 +        uncompressed size               4 bytes
    1.58 +
    1.59 +      This descriptor exists only if bit 3 of the general
    1.60 +      purpose bit flag is set (see below).  It is byte aligned
    1.61 +      and immediately follows the last byte of compressed data.
    1.62 +      This descriptor is used only when it was not possible to
    1.63 +      seek in the output zip file, e.g., when the output zip file
    1.64 +      was standard output or a non seekable device.
    1.65 +
    1.66 +  C.  Central directory structure:
    1.67 +
    1.68 +      [file header] . . .  end of central dir record
    1.69 +
    1.70 +      File header:
    1.71 +
    1.72 +        central file header signature   4 bytes  (0x02014b50)
    1.73 +        version made by                 2 bytes
    1.74 +        version needed to extract       2 bytes
    1.75 +        general purpose bit flag        2 bytes
    1.76 +        compression method              2 bytes
    1.77 +        last mod file time              2 bytes
    1.78 +        last mod file date              2 bytes
    1.79 +        crc-32                          4 bytes
    1.80 +        compressed size                 4 bytes
    1.81 +        uncompressed size               4 bytes
    1.82 +        filename length                 2 bytes
    1.83 +        extra field length              2 bytes
    1.84 +        file comment length             2 bytes
    1.85 +        disk number start               2 bytes
    1.86 +        internal file attributes        2 bytes
    1.87 +        external file attributes        4 bytes
    1.88 +        relative offset of local header 4 bytes
    1.89 +
    1.90 +        filename (variable size)
    1.91 +        extra field (variable size)
    1.92 +        file comment (variable size)
    1.93 +
    1.94 +      End of central dir record:
    1.95 +
    1.96 +        end of central dir signature    4 bytes  (0x06054b50)
    1.97 +        number of this disk             2 bytes
    1.98 +        number of the disk with the
    1.99 +        start of the central directory  2 bytes
   1.100 +        total number of entries in
   1.101 +        the central dir on this disk    2 bytes
   1.102 +        total number of entries in
   1.103 +        the central dir                 2 bytes
   1.104 +        size of the central directory   4 bytes
   1.105 +        offset of start of central
   1.106 +        directory with respect to
   1.107 +        the starting disk number        4 bytes
   1.108 +        zipfile comment length          2 bytes
   1.109 +        zipfile comment (variable size)
   1.110 +
   1.111 +  D.  Explanation of fields:
   1.112 +
   1.113 +      version made by (2 bytes)
   1.114 +
   1.115 +          The upper byte indicates the compatibility of the file
   1.116 +          attribute information.  If the external file attributes 
   1.117 +          are compatible with MS-DOS and can be read by PKZIP for 
   1.118 +          DOS version 2.04g then this value will be zero.  If these 
   1.119 +          attributes are not compatible, then this value will 
   1.120 +          identify the host system on which the attributes are 
   1.121 +          compatible.  Software can use this information to determine
   1.122 +          the line record format for text files etc.  The current
   1.123 +          mappings are:
   1.124 +
   1.125 +          0 - MS-DOS and OS/2 (FAT / VFAT / FAT32 file systems)
   1.126 +          1 - Amiga                     2 - VAX/VMS
   1.127 +          3 - Unix                      4 - VM/CMS
   1.128 +          5 - Atari ST                  6 - OS/2 H.P.F.S.
   1.129 +          7 - Macintosh                 8 - Z-System
   1.130 +          9 - CP/M                     10 - Windows NTFS
   1.131 +         11 thru 255 - unused
   1.132 +
   1.133 +          The lower byte indicates the version number of the
   1.134 +          software used to encode the file.  The value/10
   1.135 +          indicates the major version number, and the value
   1.136 +          mod 10 is the minor version number.
   1.137 +
   1.138 +      version needed to extract (2 bytes)
   1.139 +
   1.140 +          The minimum software version needed to extract the
   1.141 +          file, mapped as above.
   1.142 +
   1.143 +      general purpose bit flag: (2 bytes)
   1.144 +
   1.145 +          Bit 0: If set, indicates that the file is encrypted.
   1.146 +
   1.147 +          (For Method 6 - Imploding)
   1.148 +          Bit 1: If the compression method used was type 6,
   1.149 +                 Imploding, then this bit, if set, indicates
   1.150 +                 an 8K sliding dictionary was used.  If clear,
   1.151 +                 then a 4K sliding dictionary was used.
   1.152 +          Bit 2: If the compression method used was type 6,
   1.153 +                 Imploding, then this bit, if set, indicates
   1.154 +                 3 Shannon-Fano trees were used to encode the
   1.155 +                 sliding dictionary output.  If clear, then 2
   1.156 +                 Shannon-Fano trees were used.
   1.157 +
   1.158 +          (For Method 8 - Deflating)
   1.159 +          Bit 2  Bit 1
   1.160 +            0      0    Normal (-en) compression option was used.
   1.161 +            0      1    Maximum (-ex) compression option was used.
   1.162 +            1      0    Fast (-ef) compression option was used.
   1.163 +            1      1    Super Fast (-es) compression option was used.
   1.164 +
   1.165 +          Note:  Bits 1 and 2 are undefined if the compression
   1.166 +                 method is any other.
   1.167 +
   1.168 +          Bit 3: If this bit is set, the fields crc-32, compressed 
   1.169 +                 size and uncompressed size are set to zero in the 
   1.170 +                 local header.  The correct values are put in the 
   1.171 +                 data descriptor immediately following the compressed
   1.172 +                 data.  (Note: PKZIP version 2.04g for DOS only 
   1.173 +                 recognizes this bit for method 8 compression, newer 
   1.174 +                 versions of PKZIP recognize this bit for any 
   1.175 +                 compression method.)
   1.176 +
   1.177 +          Bit 4: Reserved for use with method 8, for enhanced
   1.178 +                 deflating. 
   1.179 +
   1.180 +          Bit 5: If this bit is set, this indicates that the file is 
   1.181 +                 compressed patched data.  (Note: Requires PKZIP 
   1.182 +                 version 2.70 or greater)
   1.183 +
   1.184 +          Bit 6: Currently unused.
   1.185 +
   1.186 +          Bit 7: Currently unused.
   1.187 +
   1.188 +          Bit 8: Currently unused.
   1.189 +
   1.190 +          Bit 9: Currently unused.
   1.191 +
   1.192 +          Bit 10: Currently unused.
   1.193 +
   1.194 +          Bit 11: Currently unused.
   1.195 +
   1.196 +          Bit 12: Reserved by PKWARE for enhanced compression.
   1.197 +
   1.198 +          Bit 13: Reserved by PKWARE.
   1.199 +
   1.200 +          Bit 14: Reserved by PKWARE.
   1.201 +
   1.202 +          Bit 15: Reserved by PKWARE.
   1.203 +
   1.204 +      compression method: (2 bytes)
   1.205 +
   1.206 +          (see accompanying documentation for algorithm
   1.207 +          descriptions)
   1.208 +
   1.209 +          0 - The file is stored (no compression)
   1.210 +          1 - The file is Shrunk
   1.211 +          2 - The file is Reduced with compression factor 1
   1.212 +          3 - The file is Reduced with compression factor 2
   1.213 +          4 - The file is Reduced with compression factor 3
   1.214 +          5 - The file is Reduced with compression factor 4
   1.215 +          6 - The file is Imploded
   1.216 +          7 - Reserved for Tokenizing compression algorithm
   1.217 +          8 - The file is Deflated
   1.218 +          9 - Reserved for enhanced Deflating
   1.219 +         10 - PKWARE Date Compression Library Imploding
   1.220 +
   1.221 +      date and time fields: (2 bytes each)
   1.222 +
   1.223 +          The date and time are encoded in standard MS-DOS format.
   1.224 +          If input came from standard input, the date and time are
   1.225 +          those at which compression was started for this data.
   1.226 +
   1.227 +      CRC-32: (4 bytes)
   1.228 +
   1.229 +          The CRC-32 algorithm was generously contributed by
   1.230 +          David Schwaderer and can be found in his excellent
   1.231 +          book "C Programmers Guide to NetBIOS" published by
   1.232 +          Howard W. Sams & Co. Inc.  The 'magic number' for
   1.233 +          the CRC is 0xdebb20e3.  The proper CRC pre and post
   1.234 +          conditioning is used, meaning that the CRC register
   1.235 +          is pre-conditioned with all ones (a starting value
   1.236 +          of 0xffffffff) and the value is post-conditioned by
   1.237 +          taking the one's complement of the CRC residual.
   1.238 +          If bit 3 of the general purpose flag is set, this
   1.239 +          field is set to zero in the local header and the correct
   1.240 +          value is put in the data descriptor and in the central
   1.241 +          directory.
   1.242 +
   1.243 +      compressed size: (4 bytes)
   1.244 +      uncompressed size: (4 bytes)
   1.245 +
   1.246 +          The size of the file compressed and uncompressed,
   1.247 +          respectively.  If bit 3 of the general purpose bit flag
   1.248 +          is set, these fields are set to zero in the local header
   1.249 +          and the correct values are put in the data descriptor and
   1.250 +          in the central directory.
   1.251 +
   1.252 +      filename length: (2 bytes)
   1.253 +      extra field length: (2 bytes)
   1.254 +      file comment length: (2 bytes)
   1.255 +
   1.256 +          The length of the filename, extra field, and comment
   1.257 +          fields respectively.  The combined length of any
   1.258 +          directory record and these three fields should not
   1.259 +          generally exceed 65,535 bytes.  If input came from standard
   1.260 +          input, the filename length is set to zero.
   1.261 +
   1.262 +      disk number start: (2 bytes)
   1.263 +
   1.264 +          The number of the disk on which this file begins.
   1.265 +
   1.266 +      internal file attributes: (2 bytes)
   1.267 +
   1.268 +          The lowest bit of this field indicates, if set, that
   1.269 +          the file is apparently an ASCII or text file.  If not
   1.270 +          set, that the file apparently contains binary data.
   1.271 +          The remaining bits are unused in version 1.0.
   1.272 +
   1.273 +          Bits 1 and 2 are reserved for use by PKWARE.
   1.274 +
   1.275 +      external file attributes: (4 bytes)
   1.276 +
   1.277 +          The mapping of the external attributes is
   1.278 +          host-system dependent (see 'version made by').  For
   1.279 +          MS-DOS, the low order byte is the MS-DOS directory
   1.280 +          attribute byte.  If input came from standard input, this
   1.281 +          field is set to zero.
   1.282 +
   1.283 +      relative offset of local header: (4 bytes)
   1.284 +
   1.285 +          This is the offset from the start of the first disk on
   1.286 +          which this file appears, to where the local header should
   1.287 +          be found.
   1.288 +
   1.289 +      filename: (Variable)
   1.290 +
   1.291 +          The name of the file, with optional relative path.
   1.292 +          The path stored should not contain a drive or
   1.293 +          device letter, or a leading slash.  All slashes
   1.294 +          should be forward slashes '/' as opposed to
   1.295 +          backwards slashes '\' for compatibility with Amiga
   1.296 +          and Unix file systems etc.  If input came from standard
   1.297 +          input, there is no filename field.
   1.298 +
   1.299 +      extra field: (Variable)
   1.300 +
   1.301 +          This is for future expansion.  If additional information
   1.302 +          needs to be stored in the future, it should be stored
   1.303 +          here.  Earlier versions of the software can then safely
   1.304 +          skip this file, and find the next file or header.  This
   1.305 +          field will be 0 length in version 1.0.
   1.306 +
   1.307 +          In order to allow different programs and different types
   1.308 +          of information to be stored in the 'extra' field in .ZIP
   1.309 +          files, the following structure should be used for all
   1.310 +          programs storing data in this field:
   1.311 +
   1.312 +          header1+data1 + header2+data2 . . .
   1.313 +
   1.314 +          Each header should consist of:
   1.315 +
   1.316 +            Header ID - 2 bytes
   1.317 +            Data Size - 2 bytes
   1.318 +
   1.319 +          Note: all fields stored in Intel low-byte/high-byte order.
   1.320 +
   1.321 +          The Header ID field indicates the type of data that is in
   1.322 +          the following data block.
   1.323 +
   1.324 +          Header ID's of 0 thru 31 are reserved for use by PKWARE.
   1.325 +          The remaining ID's can be used by third party vendors for
   1.326 +          proprietary usage.
   1.327 +
   1.328 +          The current Header ID mappings defined by PKWARE are:
   1.329 +
   1.330 +          0x0007        AV Info
   1.331 +          0x0009        OS/2
   1.332 +          0x000a        NTFS 
   1.333 +          0x000c        VAX/VMS
   1.334 +          0x000d        Unix
   1.335 +          0x000f        Patch Descriptor
   1.336 +
   1.337 +          Several third party mappings commonly used are:
   1.338 +
   1.339 +          0x4b46        FWKCS MD5 (see below)
   1.340 +          0x07c8        Macintosh
   1.341 +          0x4341        Acorn/SparkFS 
   1.342 +          0x4453        Windows NT security descriptor (binary ACL)
   1.343 +          0x4704        VM/CMS
   1.344 +          0x470f        MVS
   1.345 +          0x4c41        OS/2 access control list (text ACL)
   1.346 +          0x4d49        Info-ZIP VMS (VAX or Alpha)
   1.347 +          0x5455        extended timestamp
   1.348 +          0x5855        Info-ZIP Unix (original, also OS/2, NT, etc)
   1.349 +          0x6542        BeOS/BeBox
   1.350 +          0x756e        ASi Unix
   1.351 +          0x7855        Info-ZIP Unix (new)
   1.352 +          0xfd4a        SMS/QDOS
   1.353 +
   1.354 +          The Data Size field indicates the size of the following
   1.355 +          data block. Programs can use this value to skip to the
   1.356 +          next header block, passing over any data blocks that are
   1.357 +          not of interest.
   1.358 +
   1.359 +          Note: As stated above, the size of the entire .ZIP file
   1.360 +                header, including the filename, comment, and extra
   1.361 +                field should not exceed 64K in size.
   1.362 +
   1.363 +          In case two different programs should appropriate the same
   1.364 +          Header ID value, it is strongly recommended that each
   1.365 +          program place a unique signature of at least two bytes in
   1.366 +          size (and preferably 4 bytes or bigger) at the start of
   1.367 +          each data area.  Every program should verify that its
   1.368 +          unique signature is present, in addition to the Header ID
   1.369 +          value being correct, before assuming that it is a block of
   1.370 +          known type.
   1.371 +
   1.372 +         -OS/2 Extra Field:
   1.373 +
   1.374 +          The following is the layout of the OS/2 attributes "extra" 
   1.375 +          block.  (Last Revision  09/05/95)
   1.376 +
   1.377 +          Note: all fields stored in Intel low-byte/high-byte order.
   1.378 +
   1.379 +          Value       Size          Description
   1.380 +          -----       ----          -----------
   1.381 +  (OS/2)  0x0009      2 bytes       Tag for this "extra" block type
   1.382 +          TSize       2 bytes       Size for the following data block
   1.383 +          BSize       4 bytes       Uncompressed Block Size
   1.384 +          CType       2 bytes       Compression type
   1.385 +          EACRC       4 bytes       CRC value for uncompress block
   1.386 +          (var)       variable      Compressed block
   1.387 +
   1.388 +        The OS/2 extended attribute structure (FEA2LIST) is 
   1.389 +        compressed and then stored in it's entirety within this 
   1.390 +        structure.  There will only ever be one "block" of data in 
   1.391 +        VarFields[].
   1.392 +
   1.393 +         -UNIX Extra Field:
   1.394 +
   1.395 +          The following is the layout of the Unix "extra" block.
   1.396 +          Note: all fields are stored in Intel low-byte/high-byte 
   1.397 +          order.
   1.398 +
   1.399 +          Value       Size          Description
   1.400 +          -----       ----          -----------
   1.401 +  (UNIX)  0x000d      2 bytes       Tag for this "extra" block type
   1.402 +          TSize       2 bytes       Size for the following data block
   1.403 +          Atime       4 bytes       File last access time
   1.404 +          Mtime       4 bytes       File last modification time
   1.405 +          Uid         2 bytes       File user ID
   1.406 +          Gid         2 bytes       File group ID
   1.407 +          (var)       variable      Variable length data field
   1.408 +
   1.409 +          The variable length data field will contain file type 
   1.410 +          specific data.  Currently the only values allowed are
   1.411 +          the original "linked to" file names for hard or symbolic 
   1.412 +          links.
   1.413 +
   1.414 +         -VAX/VMS Extra Field:
   1.415 +
   1.416 +          The following is the layout of the VAX/VMS attributes 
   1.417 +          "extra" block.
   1.418 +
   1.419 +          Note: all fields stored in Intel low-byte/high-byte order.
   1.420 +
   1.421 +          Value      Size       Description
   1.422 +          -----      ----       -----------
   1.423 +  (VMS)   0x000c     2 bytes    Tag for this "extra" block type
   1.424 +          TSize      2 bytes    Size of the total "extra" block
   1.425 +          CRC        4 bytes    32-bit CRC for remainder of the block
   1.426 +          Tag1       2 bytes    VMS attribute tag value #1
   1.427 +          Size1      2 bytes    Size of attribute #1, in bytes
   1.428 +          (var.)     Size1      Attribute #1 data
   1.429 +          .
   1.430 +          .
   1.431 +          .
   1.432 +          TagN       2 bytes    VMS attribute tage value #N
   1.433 +          SizeN      2 bytes    Size of attribute #N, in bytes
   1.434 +          (var.)     SizeN      Attribute #N data
   1.435 +
   1.436 +          Rules:
   1.437 +
   1.438 +          1. There will be one or more of attributes present, which 
   1.439 +             will each be preceded by the above TagX & SizeX values.  
   1.440 +             These values are identical to the ATR$C_XXXX and 
   1.441 +             ATR$S_XXXX constants which are defined in ATR.H under 
   1.442 +             VMS C.  Neither of these values will ever be zero.
   1.443 +
   1.444 +          2. No word alignment or padding is performed.
   1.445 +
   1.446 +          3. A well-behaved PKZIP/VMS program should never produce
   1.447 +             more than one sub-block with the same TagX value.  Also,
   1.448 +             there will never be more than one "extra" block of type
   1.449 +             0x000c in a particular directory record.
   1.450 +
   1.451 +         -NTFS Extra Field:
   1.452 +
   1.453 +          The following is the layout of the NTFS attributes 
   1.454 +          "extra" block.
   1.455 +
   1.456 +          Note: all fields stored in Intel low-byte/high-byte order.
   1.457 +
   1.458 +          Value      Size       Description
   1.459 +          -----      ----       -----------
   1.460 +  (NTFS)  0x000a     2 bytes    Tag for this "extra" block type
   1.461 +          TSize      2 bytes    Size of the total "extra" block
   1.462 +          Reserved   4 bytes    Reserved for future use
   1.463 +          Tag1       2 bytes    NTFS attribute tag value #1
   1.464 +          Size1      2 bytes    Size of attribute #1, in bytes
   1.465 +          (var.)     Size1      Attribute #1 data
   1.466 +          .
   1.467 +          .
   1.468 +          .
   1.469 +          TagN       2 bytes    NTFS attribute tage value #N
   1.470 +          SizeN      2 bytes    Size of attribute #N, in bytes
   1.471 +          (var.)     SizeN      Attribute #N data
   1.472 +
   1.473 +          For NTFS, values for Tag1 through TagN are as follows:
   1.474 +          (currently only one set of attributes is defined for NTFS)
   1.475 +
   1.476 +          Tag        Size       Description
   1.477 +          -----      ----       -----------
   1.478 +          0x0001     2 bytes    Tag for attribute #1 
   1.479 +          Size1      2 bytes    Size of attribute #1, in bytes
   1.480 +          Mtime      8 bytes    File last modification time
   1.481 +          Atime      8 bytes    File last access time
   1.482 +          Ctime      8 bytes    File creation time
   1.483 +          
   1.484 +         -PATCH Descriptor Extra Field:
   1.485 +
   1.486 +          The following is the layout of the Patch Descriptor "extra"
   1.487 +          block.
   1.488 +
   1.489 +          Note: all fields stored in Intel low-byte/high-byte order.
   1.490 +
   1.491 +          Value     Size     Description
   1.492 +          -----     ----     -----------
   1.493 +  (Patch) 0x000f    2 bytes  Tag for this "extra" block type
   1.494 +          TSize     2 bytes  Size of the total "extra" block
   1.495 +          Version   2 bytes  Version of the descriptor
   1.496 +          Flags     4 bytes  Actions and reactions (see below) 
   1.497 +          OldSize   4 bytes  Size of the file about to be patched 
   1.498 +          OldCRC    4 bytes  32-bit CRC of the file to be patched 
   1.499 +          NewSize   4 bytes  Size of the resulting file 
   1.500 +          NewCRC    4 bytes  32-bit CRC of the resulting file 
   1.501 +
   1.502 +          Actions and reactions
   1.503 +
   1.504 +          Bits          Description
   1.505 +          ----          ----------------
   1.506 +          0             Use for autodetection
   1.507 +          1             Treat as selfpatch
   1.508 +          2-3           RESERVED
   1.509 +          4-5           Action (see below)
   1.510 +          6-7           RESERVED
   1.511 +          8-9           Reaction (see below) to absent file 
   1.512 +          10-11         Reaction (see below) to newer file
   1.513 +          12-13         Reaction (see below) to unknown file
   1.514 +          14-15         RESERVED
   1.515 +          16-31         RESERVED
   1.516 +
   1.517 +          Actions
   1.518 +
   1.519 +          Action       Value
   1.520 +          ------       ----- 
   1.521 +          none         0
   1.522 +          add          1
   1.523 +          delete       2
   1.524 +          patch        3
   1.525 +
   1.526 +          Reactions
   1.527 + 
   1.528 +          Reaction     Value
   1.529 +          --------     -----
   1.530 +          ask          0
   1.531 +          skip         1
   1.532 +          ignore       2
   1.533 +          fail         3
   1.534 +
   1.535 +          - FWKCS MD5 Extra Field:
   1.536 +
   1.537 +          The FWKCS Contents_Signature System, used in
   1.538 +          automatically identifying files independent of filename,
   1.539 +          optionally adds and uses an extra field to support the
   1.540 +          rapid creation of an enhanced contents_signature:
   1.541 +
   1.542 +              Header ID = 0x4b46
   1.543 +              Data Size = 0x0013
   1.544 +              Preface   = 'M','D','5'
   1.545 +              followed by 16 bytes containing the uncompressed file's
   1.546 +              128_bit MD5 hash(1), low byte first.
   1.547 +
   1.548 +          When FWKCS revises a zipfile central directory to add
   1.549 +          this extra field for a file, it also replaces the
   1.550 +          central directory entry for that file's uncompressed
   1.551 +          filelength with a measured value.
   1.552 +
   1.553 +          FWKCS provides an option to strip this extra field, if
   1.554 +          present, from a zipfile central directory. In adding
   1.555 +          this extra field, FWKCS preserves Zipfile Authenticity
   1.556 +          Verification; if stripping this extra field, FWKCS
   1.557 +          preserves all versions of AV through PKZIP version 2.04g.
   1.558 +
   1.559 +          FWKCS, and FWKCS Contents_Signature System, are
   1.560 +          trademarks of Frederick W. Kantor.
   1.561 +
   1.562 +          (1) R. Rivest, RFC1321.TXT, MIT Laboratory for Computer
   1.563 +              Science and RSA Data Security, Inc., April 1992.
   1.564 +              ll.76-77: "The MD5 algorithm is being placed in the
   1.565 +              public domain for review and possible adoption as a
   1.566 +              standard."
   1.567 +
   1.568 +      file comment: (Variable)
   1.569 +
   1.570 +          The comment for this file.
   1.571 +
   1.572 +      number of this disk: (2 bytes)
   1.573 +
   1.574 +          The number of this disk, which contains central
   1.575 +          directory end record.
   1.576 +
   1.577 +      number of the disk with the start of the central
   1.578 +      directory: (2 bytes)
   1.579 +
   1.580 +          The number of the disk on which the central
   1.581 +          directory starts.
   1.582 +
   1.583 +      total number of entries in the central dir on 
   1.584 +      this disk: (2 bytes)
   1.585 +
   1.586 +          The number of central directory entries on this disk.
   1.587 +
   1.588 +      total number of entries in the central dir: (2 bytes)
   1.589 +
   1.590 +          The total number of files in the zipfile.
   1.591 +
   1.592 +      size of the central directory: (4 bytes)
   1.593 +
   1.594 +          The size (in bytes) of the entire central directory.
   1.595 +
   1.596 +      offset of start of central directory with respect to
   1.597 +      the starting disk number:  (4 bytes)
   1.598 +
   1.599 +          Offset of the start of the central directory on the
   1.600 +          disk on which the central directory starts.
   1.601 +
   1.602 +      zipfile comment length: (2 bytes)
   1.603 +
   1.604 +          The length of the comment for this zipfile.
   1.605 +
   1.606 +      zipfile comment: (Variable)
   1.607 +
   1.608 +          The comment for this zipfile.
   1.609 +
   1.610 +  D.  General notes:
   1.611 +
   1.612 +      1)  All fields unless otherwise noted are unsigned and stored
   1.613 +          in Intel low-byte:high-byte, low-word:high-word order.
   1.614 +
   1.615 +      2)  String fields are not null terminated, since the
   1.616 +          length is given explicitly.
   1.617 +
   1.618 +      3)  Local headers should not span disk boundaries.  Also, even
   1.619 +          though the central directory can span disk boundaries, no
   1.620 +          single record in the central directory should be split
   1.621 +          across disks.
   1.622 +
   1.623 +      4)  The entries in the central directory may not necessarily
   1.624 +          be in the same order that files appear in the zipfile.
   1.625 +
   1.626 +UnShrinking - Method 1
   1.627 +----------------------
   1.628 +
   1.629 +Shrinking is a Dynamic Ziv-Lempel-Welch compression algorithm
   1.630 +with partial clearing.  The initial code size is 9 bits, and
   1.631 +the maximum code size is 13 bits.  Shrinking differs from
   1.632 +conventional Dynamic Ziv-Lempel-Welch implementations in several
   1.633 +respects:
   1.634 +
   1.635 +1)  The code size is controlled by the compressor, and is not
   1.636 +    automatically increased when codes larger than the current
   1.637 +    code size are created (but not necessarily used).  When
   1.638 +    the decompressor encounters the code sequence 256
   1.639 +    (decimal) followed by 1, it should increase the code size
   1.640 +    read from the input stream to the next bit size.  No
   1.641 +    blocking of the codes is performed, so the next code at
   1.642 +    the increased size should be read from the input stream
   1.643 +    immediately after where the previous code at the smaller
   1.644 +    bit size was read.  Again, the decompressor should not
   1.645 +    increase the code size used until the sequence 256,1 is
   1.646 +    encountered.
   1.647 +
   1.648 +2)  When the table becomes full, total clearing is not
   1.649 +    performed.  Rather, when the compressor emits the code
   1.650 +    sequence 256,2 (decimal), the decompressor should clear
   1.651 +    all leaf nodes from the Ziv-Lempel tree, and continue to
   1.652 +    use the current code size.  The nodes that are cleared
   1.653 +    from the Ziv-Lempel tree are then re-used, with the lowest
   1.654 +    code value re-used first, and the highest code value
   1.655 +    re-used last.  The compressor can emit the sequence 256,2
   1.656 +    at any time.
   1.657 +
   1.658 +Expanding - Methods 2-5
   1.659 +-----------------------
   1.660 +
   1.661 +The Reducing algorithm is actually a combination of two
   1.662 +distinct algorithms.  The first algorithm compresses repeated
   1.663 +byte sequences, and the second algorithm takes the compressed
   1.664 +stream from the first algorithm and applies a probabilistic
   1.665 +compression method.
   1.666 +
   1.667 +The probabilistic compression stores an array of 'follower
   1.668 +sets' S(j), for j=0 to 255, corresponding to each possible
   1.669 +ASCII character.  Each set contains between 0 and 32
   1.670 +characters, to be denoted as S(j)[0],...,S(j)[m], where m<32.
   1.671 +The sets are stored at the beginning of the data area for a
   1.672 +Reduced file, in reverse order, with S(255) first, and S(0)
   1.673 +last.
   1.674 +
   1.675 +The sets are encoded as { N(j), S(j)[0],...,S(j)[N(j)-1] },
   1.676 +where N(j) is the size of set S(j).  N(j) can be 0, in which
   1.677 +case the follower set for S(j) is empty.  Each N(j) value is
   1.678 +encoded in 6 bits, followed by N(j) eight bit character values
   1.679 +corresponding to S(j)[0] to S(j)[N(j)-1] respectively.  If
   1.680 +N(j) is 0, then no values for S(j) are stored, and the value
   1.681 +for N(j-1) immediately follows.
   1.682 +
   1.683 +Immediately after the follower sets, is the compressed data
   1.684 +stream.  The compressed data stream can be interpreted for the
   1.685 +probabilistic decompression as follows:
   1.686 +
   1.687 +let Last-Character <- 0.
   1.688 +loop until done
   1.689 +    if the follower set S(Last-Character) is empty then
   1.690 +        read 8 bits from the input stream, and copy this
   1.691 +        value to the output stream.
   1.692 +    otherwise if the follower set S(Last-Character) is non-empty then
   1.693 +        read 1 bit from the input stream.
   1.694 +        if this bit is not zero then
   1.695 +            read 8 bits from the input stream, and copy this
   1.696 +            value to the output stream.
   1.697 +        otherwise if this bit is zero then
   1.698 +            read B(N(Last-Character)) bits from the input
   1.699 +            stream, and assign this value to I.
   1.700 +            Copy the value of S(Last-Character)[I] to the
   1.701 +            output stream.
   1.702 +
   1.703 +    assign the last value placed on the output stream to
   1.704 +    Last-Character.
   1.705 +end loop
   1.706 +
   1.707 +B(N(j)) is defined as the minimal number of bits required to
   1.708 +encode the value N(j)-1.
   1.709 +
   1.710 +The decompressed stream from above can then be expanded to
   1.711 +re-create the original file as follows:
   1.712 +
   1.713 +let State <- 0.
   1.714 +
   1.715 +loop until done
   1.716 +    read 8 bits from the input stream into C.
   1.717 +    case State of
   1.718 +        0:  if C is not equal to DLE (144 decimal) then
   1.719 +                copy C to the output stream.
   1.720 +            otherwise if C is equal to DLE then
   1.721 +                let State <- 1.
   1.722 +
   1.723 +        1:  if C is non-zero then
   1.724 +                let V <- C.
   1.725 +                let Len <- L(V)
   1.726 +                let State <- F(Len).
   1.727 +            otherwise if C is zero then
   1.728 +                copy the value 144 (decimal) to the output stream.
   1.729 +                let State <- 0
   1.730 +
   1.731 +        2:  let Len <- Len + C
   1.732 +            let State <- 3.
   1.733 +
   1.734 +        3:  move backwards D(V,C) bytes in the output stream
   1.735 +            (if this position is before the start of the output
   1.736 +            stream, then assume that all the data before the
   1.737 +            start of the output stream is filled with zeros).
   1.738 +            copy Len+3 bytes from this position to the output stream.
   1.739 +            let State <- 0.
   1.740 +    end case
   1.741 +end loop
   1.742 +
   1.743 +The functions F,L, and D are dependent on the 'compression
   1.744 +factor', 1 through 4, and are defined as follows:
   1.745 +
   1.746 +For compression factor 1:
   1.747 +    L(X) equals the lower 7 bits of X.
   1.748 +    F(X) equals 2 if X equals 127 otherwise F(X) equals 3.
   1.749 +    D(X,Y) equals the (upper 1 bit of X) * 256 + Y + 1.
   1.750 +For compression factor 2:
   1.751 +    L(X) equals the lower 6 bits of X.
   1.752 +    F(X) equals 2 if X equals 63 otherwise F(X) equals 3.
   1.753 +    D(X,Y) equals the (upper 2 bits of X) * 256 + Y + 1.
   1.754 +For compression factor 3:
   1.755 +    L(X) equals the lower 5 bits of X.
   1.756 +    F(X) equals 2 if X equals 31 otherwise F(X) equals 3.
   1.757 +    D(X,Y) equals the (upper 3 bits of X) * 256 + Y + 1.
   1.758 +For compression factor 4:
   1.759 +    L(X) equals the lower 4 bits of X.
   1.760 +    F(X) equals 2 if X equals 15 otherwise F(X) equals 3.
   1.761 +    D(X,Y) equals the (upper 4 bits of X) * 256 + Y + 1.
   1.762 +
   1.763 +Imploding - Method 6
   1.764 +--------------------
   1.765 +
   1.766 +The Imploding algorithm is actually a combination of two distinct
   1.767 +algorithms.  The first algorithm compresses repeated byte
   1.768 +sequences using a sliding dictionary.  The second algorithm is
   1.769 +used to compress the encoding of the sliding dictionary output,
   1.770 +using multiple Shannon-Fano trees.
   1.771 +
   1.772 +The Imploding algorithm can use a 4K or 8K sliding dictionary
   1.773 +size. The dictionary size used can be determined by bit 1 in the
   1.774 +general purpose flag word; a 0 bit indicates a 4K dictionary
   1.775 +while a 1 bit indicates an 8K dictionary.
   1.776 +
   1.777 +The Shannon-Fano trees are stored at the start of the compressed
   1.778 +file. The number of trees stored is defined by bit 2 in the
   1.779 +general purpose flag word; a 0 bit indicates two trees stored, a
   1.780 +1 bit indicates three trees are stored.  If 3 trees are stored,
   1.781 +the first Shannon-Fano tree represents the encoding of the
   1.782 +Literal characters, the second tree represents the encoding of
   1.783 +the Length information, the third represents the encoding of the
   1.784 +Distance information.  When 2 Shannon-Fano trees are stored, the
   1.785 +Length tree is stored first, followed by the Distance tree.
   1.786 +
   1.787 +The Literal Shannon-Fano tree, if present is used to represent
   1.788 +the entire ASCII character set, and contains 256 values.  This
   1.789 +tree is used to compress any data not compressed by the sliding
   1.790 +dictionary algorithm.  When this tree is present, the Minimum
   1.791 +Match Length for the sliding dictionary is 3.  If this tree is
   1.792 +not present, the Minimum Match Length is 2.
   1.793 +
   1.794 +The Length Shannon-Fano tree is used to compress the Length part
   1.795 +of the (length,distance) pairs from the sliding dictionary
   1.796 +output.  The Length tree contains 64 values, ranging from the
   1.797 +Minimum Match Length, to 63 plus the Minimum Match Length.
   1.798 +
   1.799 +The Distance Shannon-Fano tree is used to compress the Distance
   1.800 +part of the (length,distance) pairs from the sliding dictionary
   1.801 +output. The Distance tree contains 64 values, ranging from 0 to
   1.802 +63, representing the upper 6 bits of the distance value.  The
   1.803 +distance values themselves will be between 0 and the sliding
   1.804 +dictionary size, either 4K or 8K.
   1.805 +
   1.806 +The Shannon-Fano trees themselves are stored in a compressed
   1.807 +format. The first byte of the tree data represents the number of
   1.808 +bytes of data representing the (compressed) Shannon-Fano tree
   1.809 +minus 1.  The remaining bytes represent the Shannon-Fano tree
   1.810 +data encoded as:
   1.811 +
   1.812 +    High 4 bits: Number of values at this bit length + 1. (1 - 16)
   1.813 +    Low  4 bits: Bit Length needed to represent value + 1. (1 - 16)
   1.814 +
   1.815 +The Shannon-Fano codes can be constructed from the bit lengths
   1.816 +using the following algorithm:
   1.817 +
   1.818 +1)  Sort the Bit Lengths in ascending order, while retaining the
   1.819 +    order of the original lengths stored in the file.
   1.820 +
   1.821 +2)  Generate the Shannon-Fano trees:
   1.822 +
   1.823 +    Code <- 0
   1.824 +    CodeIncrement <- 0
   1.825 +    LastBitLength <- 0
   1.826 +    i <- number of Shannon-Fano codes - 1   (either 255 or 63)
   1.827 +
   1.828 +    loop while i >= 0
   1.829 +        Code = Code + CodeIncrement
   1.830 +        if BitLength(i) <> LastBitLength then
   1.831 +            LastBitLength=BitLength(i)
   1.832 +            CodeIncrement = 1 shifted left (16 - LastBitLength)
   1.833 +        ShannonCode(i) = Code
   1.834 +        i <- i - 1
   1.835 +    end loop
   1.836 +
   1.837 +3)  Reverse the order of all the bits in the above ShannonCode()
   1.838 +    vector, so that the most significant bit becomes the least
   1.839 +    significant bit.  For example, the value 0x1234 (hex) would
   1.840 +    become 0x2C48 (hex).
   1.841 +
   1.842 +4)  Restore the order of Shannon-Fano codes as originally stored
   1.843 +    within the file.
   1.844 +
   1.845 +Example:
   1.846 +
   1.847 +    This example will show the encoding of a Shannon-Fano tree
   1.848 +    of size 8.  Notice that the actual Shannon-Fano trees used
   1.849 +    for Imploding are either 64 or 256 entries in size.
   1.850 +
   1.851 +Example:   0x02, 0x42, 0x01, 0x13
   1.852 +
   1.853 +    The first byte indicates 3 values in this table.  Decoding the
   1.854 +    bytes:
   1.855 +            0x42 = 5 codes of 3 bits long
   1.856 +            0x01 = 1 code  of 2 bits long
   1.857 +            0x13 = 2 codes of 4 bits long
   1.858 +
   1.859 +    This would generate the original bit length array of:
   1.860 +    (3, 3, 3, 3, 3, 2, 4, 4)
   1.861 +
   1.862 +    There are 8 codes in this table for the values 0 thru 7.  Using 
   1.863 +    the algorithm to obtain the Shannon-Fano codes produces:
   1.864 +
   1.865 +                                  Reversed     Order     Original
   1.866 +Val  Sorted   Constructed Code      Value     Restored    Length
   1.867 +---  ------   -----------------   --------    --------    ------
   1.868 +0:     2      1100000000000000        11       101          3
   1.869 +1:     3      1010000000000000       101       001          3
   1.870 +2:     3      1000000000000000       001       110          3
   1.871 +3:     3      0110000000000000       110       010          3
   1.872 +4:     3      0100000000000000       010       100          3
   1.873 +5:     3      0010000000000000       100        11          2
   1.874 +6:     4      0001000000000000      1000      1000          4
   1.875 +7:     4      0000000000000000      0000      0000          4
   1.876 +
   1.877 +The values in the Val, Order Restored and Original Length columns
   1.878 +now represent the Shannon-Fano encoding tree that can be used for
   1.879 +decoding the Shannon-Fano encoded data.  How to parse the
   1.880 +variable length Shannon-Fano values from the data stream is beyond
   1.881 +the scope of this document.  (See the references listed at the end of
   1.882 +this document for more information.)  However, traditional decoding
   1.883 +schemes used for Huffman variable length decoding, such as the
   1.884 +Greenlaw algorithm, can be successfully applied.
   1.885 +
   1.886 +The compressed data stream begins immediately after the
   1.887 +compressed Shannon-Fano data.  The compressed data stream can be
   1.888 +interpreted as follows:
   1.889 +
   1.890 +loop until done
   1.891 +    read 1 bit from input stream.
   1.892 +
   1.893 +    if this bit is non-zero then       (encoded data is literal data)
   1.894 +        if Literal Shannon-Fano tree is present
   1.895 +            read and decode character using Literal Shannon-Fano tree.
   1.896 +        otherwise
   1.897 +            read 8 bits from input stream.
   1.898 +        copy character to the output stream.
   1.899 +    otherwise              (encoded data is sliding dictionary match)
   1.900 +        if 8K dictionary size
   1.901 +            read 7 bits for offset Distance (lower 7 bits of offset).
   1.902 +        otherwise
   1.903 +            read 6 bits for offset Distance (lower 6 bits of offset).
   1.904 +
   1.905 +        using the Distance Shannon-Fano tree, read and decode the
   1.906 +          upper 6 bits of the Distance value.
   1.907 +
   1.908 +        using the Length Shannon-Fano tree, read and decode
   1.909 +          the Length value.
   1.910 +
   1.911 +        Length <- Length + Minimum Match Length
   1.912 +
   1.913 +        if Length = 63 + Minimum Match Length
   1.914 +            read 8 bits from the input stream,
   1.915 +            add this value to Length.
   1.916 +
   1.917 +        move backwards Distance+1 bytes in the output stream, and
   1.918 +        copy Length characters from this position to the output
   1.919 +        stream.  (if this position is before the start of the output
   1.920 +        stream, then assume that all the data before the start of
   1.921 +        the output stream is filled with zeros).
   1.922 +end loop
   1.923 +
   1.924 +Tokenizing - Method 7
   1.925 +--------------------
   1.926 +
   1.927 +This method is not used by PKZIP.
   1.928 +
   1.929 +Deflating - Method 8
   1.930 +-----------------
   1.931 +
   1.932 +The Deflate algorithm is similar to the Implode algorithm using
   1.933 +a sliding dictionary of up to 32K with secondary compression
   1.934 +from Huffman/Shannon-Fano codes.
   1.935 +
   1.936 +The compressed data is stored in blocks with a header describing
   1.937 +the block and the Huffman codes used in the data block.  The header
   1.938 +format is as follows:
   1.939 +
   1.940 +   Bit 0: Last Block bit     This bit is set to 1 if this is the last
   1.941 +                             compressed block in the data.
   1.942 +   Bits 1-2: Block type
   1.943 +      00 (0) - Block is stored - All stored data is byte aligned.
   1.944 +               Skip bits until next byte, then next word = block 
   1.945 +               length, followed by the ones compliment of the block
   1.946 +               length word. Remaining data in block is the stored 
   1.947 +               data.
   1.948 +
   1.949 +      01 (1) - Use fixed Huffman codes for literal and distance codes.
   1.950 +               Lit Code    Bits             Dist Code   Bits
   1.951 +               ---------   ----             ---------   ----
   1.952 +                 0 - 143    8                 0 - 31      5
   1.953 +               144 - 255    9
   1.954 +               256 - 279    7
   1.955 +               280 - 287    8
   1.956 +
   1.957 +               Literal codes 286-287 and distance codes 30-31 are 
   1.958 +               never used but participate in the huffman construction.
   1.959 +
   1.960 +      10 (2) - Dynamic Huffman codes.  (See expanding Huffman codes)
   1.961 +
   1.962 +      11 (3) - Reserved - Flag a "Error in compressed data" if seen.
   1.963 +
   1.964 +Expanding Huffman Codes
   1.965 +-----------------------
   1.966 +If the data block is stored with dynamic Huffman codes, the Huffman
   1.967 +codes are sent in the following compressed format:
   1.968 +
   1.969 +   5 Bits: # of Literal codes sent - 256 (256 - 286)
   1.970 +           All other codes are never sent.
   1.971 +   5 Bits: # of Dist codes - 1           (1 - 32)
   1.972 +   4 Bits: # of Bit Length codes - 3     (3 - 19)
   1.973 +
   1.974 +The Huffman codes are sent as bit lengths and the codes are built as
   1.975 +described in the implode algorithm.  The bit lengths themselves are
   1.976 +compressed with Huffman codes.  There are 19 bit length codes:
   1.977 +
   1.978 +   0 - 15: Represent bit lengths of 0 - 15
   1.979 +       16: Copy the previous bit length 3 - 6 times.
   1.980 +           The next 2 bits indicate repeat length (0 = 3, ... ,3 = 6)
   1.981 +              Example:  Codes 8, 16 (+2 bits 11), 16 (+2 bits 10) will
   1.982 +                        expand to 12 bit lengths of 8 (1 + 6 + 5)
   1.983 +       17: Repeat a bit length of 0 for 3 - 10 times. (3 bits of length)
   1.984 +       18: Repeat a bit length of 0 for 11 - 138 times (7 bits of length)
   1.985 +
   1.986 +The lengths of the bit length codes are sent packed 3 bits per value
   1.987 +(0 - 7) in the following order:
   1.988 +
   1.989 +   16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15
   1.990 +
   1.991 +The Huffman codes should be built as described in the Implode algorithm
   1.992 +except codes are assigned starting at the shortest bit length, i.e. the
   1.993 +shortest code should be all 0's rather than all 1's.  Also, codes with
   1.994 +a bit length of zero do not participate in the tree construction.  The
   1.995 +codes are then used to decode the bit lengths for the literal and 
   1.996 +distance tables.
   1.997 +
   1.998 +The bit lengths for the literal tables are sent first with the number
   1.999 +of entries sent described by the 5 bits sent earlier.  There are up
  1.1000 +to 286 literal characters; the first 256 represent the respective 8
  1.1001 +bit character, code 256 represents the End-Of-Block code, the remaining
  1.1002 +29 codes represent copy lengths of 3 thru 258.  There are up to 30
  1.1003 +distance codes representing distances from 1 thru 32k as described
  1.1004 +below.
  1.1005 +
  1.1006 +                             Length Codes
  1.1007 +                             ------------
  1.1008 +      Extra             Extra              Extra              Extra
  1.1009 + Code Bits Length  Code Bits Lengths  Code Bits Lengths  Code Bits Length(s)
  1.1010 + ---- ---- ------  ---- ---- -------  ---- ---- -------  ---- ---- ---------
  1.1011 +  257   0     3     265   1   11,12    273   3   35-42    281   5  131-162
  1.1012 +  258   0     4     266   1   13,14    274   3   43-50    282   5  163-194
  1.1013 +  259   0     5     267   1   15,16    275   3   51-58    283   5  195-226
  1.1014 +  260   0     6     268   1   17,18    276   3   59-66    284   5  227-257
  1.1015 +  261   0     7     269   2   19-22    277   4   67-82    285   0    258
  1.1016 +  262   0     8     270   2   23-26    278   4   83-98
  1.1017 +  263   0     9     271   2   27-30    279   4   99-114
  1.1018 +  264   0    10     272   2   31-34    280   4  115-130
  1.1019 +
  1.1020 +                            Distance Codes
  1.1021 +                            --------------
  1.1022 +      Extra           Extra             Extra               Extra
  1.1023 + Code Bits Dist  Code Bits  Dist   Code Bits Distance  Code Bits Distance
  1.1024 + ---- ---- ----  ---- ---- ------  ---- ---- --------  ---- ---- --------
  1.1025 +   0   0    1      8   3   17-24    16    7  257-384    24   11  4097-6144
  1.1026 +   1   0    2      9   3   25-32    17    7  385-512    25   11  6145-8192
  1.1027 +   2   0    3     10   4   33-48    18    8  513-768    26   12  8193-12288
  1.1028 +   3   0    4     11   4   49-64    19    8  769-1024   27   12 12289-16384
  1.1029 +   4   1   5,6    12   5   65-96    20    9 1025-1536   28   13 16385-24576
  1.1030 +   5   1   7,8    13   5   97-128   21    9 1537-2048   29   13 24577-32768
  1.1031 +   6   2   9-12   14   6  129-192   22   10 2049-3072
  1.1032 +   7   2  13-16   15   6  193-256   23   10 3073-4096
  1.1033 +
  1.1034 +The compressed data stream begins immediately after the
  1.1035 +compressed header data.  The compressed data stream can be
  1.1036 +interpreted as follows:
  1.1037 +
  1.1038 +do
  1.1039 +   read header from input stream.
  1.1040 +
  1.1041 +   if stored block
  1.1042 +      skip bits until byte aligned
  1.1043 +      read count and 1's compliment of count
  1.1044 +      copy count bytes data block
  1.1045 +   otherwise
  1.1046 +      loop until end of block code sent
  1.1047 +         decode literal character from input stream
  1.1048 +         if literal < 256
  1.1049 +            copy character to the output stream
  1.1050 +         otherwise
  1.1051 +            if literal = end of block
  1.1052 +               break from loop
  1.1053 +            otherwise
  1.1054 +               decode distance from input stream
  1.1055 +
  1.1056 +               move backwards distance bytes in the output stream, and
  1.1057 +               copy length characters from this position to the output
  1.1058 +               stream.
  1.1059 +      end loop
  1.1060 +while not last block
  1.1061 +
  1.1062 +if data descriptor exists
  1.1063 +   skip bits until byte aligned
  1.1064 +   read crc and sizes
  1.1065 +endif
  1.1066 +
  1.1067 +Decryption
  1.1068 +----------
  1.1069 +
  1.1070 +The encryption used in PKZIP was generously supplied by Roger
  1.1071 +Schlafly.  PKWARE is grateful to Mr. Schlafly for his expert
  1.1072 +help and advice in the field of data encryption.
  1.1073 +
  1.1074 +PKZIP encrypts the compressed data stream.  Encrypted files must
  1.1075 +be decrypted before they can be extracted.
  1.1076 +
  1.1077 +Each encrypted file has an extra 12 bytes stored at the start of
  1.1078 +the data area defining the encryption header for that file.  The
  1.1079 +encryption header is originally set to random values, and then
  1.1080 +itself encrypted, using three, 32-bit keys.  The key values are
  1.1081 +initialized using the supplied encryption password.  After each byte
  1.1082 +is encrypted, the keys are then updated using pseudo-random number
  1.1083 +generation techniques in combination with the same CRC-32 algorithm
  1.1084 +used in PKZIP and described elsewhere in this document.
  1.1085 +
  1.1086 +The following is the basic steps required to decrypt a file:
  1.1087 +
  1.1088 +1) Initialize the three 32-bit keys with the password.
  1.1089 +2) Read and decrypt the 12-byte encryption header, further
  1.1090 +   initializing the encryption keys.
  1.1091 +3) Read and decrypt the compressed data stream using the
  1.1092 +   encryption keys.
  1.1093 +
  1.1094 +Step 1 - Initializing the encryption keys
  1.1095 +-----------------------------------------
  1.1096 +
  1.1097 +Key(0) <- 305419896
  1.1098 +Key(1) <- 591751049
  1.1099 +Key(2) <- 878082192
  1.1100 +
  1.1101 +loop for i <- 0 to length(password)-1
  1.1102 +    update_keys(password(i))
  1.1103 +end loop
  1.1104 +
  1.1105 +Where update_keys() is defined as:
  1.1106 +
  1.1107 +update_keys(char):
  1.1108 +  Key(0) <- crc32(key(0),char)
  1.1109 +  Key(1) <- Key(1) + (Key(0) & 000000ffH)
  1.1110 +  Key(1) <- Key(1) * 134775813 + 1
  1.1111 +  Key(2) <- crc32(key(2),key(1) >> 24)
  1.1112 +end update_keys
  1.1113 +
  1.1114 +Where crc32(old_crc,char) is a routine that given a CRC value and a
  1.1115 +character, returns an updated CRC value after applying the CRC-32
  1.1116 +algorithm described elsewhere in this document.
  1.1117 +
  1.1118 +Step 2 - Decrypting the encryption header
  1.1119 +-----------------------------------------
  1.1120 +
  1.1121 +The purpose of this step is to further initialize the encryption
  1.1122 +keys, based on random data, to render a plaintext attack on the
  1.1123 +data ineffective.
  1.1124 +
  1.1125 +Read the 12-byte encryption header into Buffer, in locations
  1.1126 +Buffer(0) thru Buffer(11).
  1.1127 +
  1.1128 +loop for i <- 0 to 11
  1.1129 +    C <- buffer(i) ^ decrypt_byte()
  1.1130 +    update_keys(C)
  1.1131 +    buffer(i) <- C
  1.1132 +end loop
  1.1133 +
  1.1134 +Where decrypt_byte() is defined as:
  1.1135 +
  1.1136 +unsigned char decrypt_byte()
  1.1137 +    local unsigned short temp
  1.1138 +    temp <- Key(2) | 2
  1.1139 +    decrypt_byte <- (temp * (temp ^ 1)) >> 8
  1.1140 +end decrypt_byte
  1.1141 +
  1.1142 +After the header is decrypted,  the last 1 or 2 bytes in Buffer
  1.1143 +should be the high-order word/byte of the CRC for the file being
  1.1144 +decrypted, stored in Intel low-byte/high-byte order.  Versions of
  1.1145 +PKZIP prior to 2.0 used a 2 byte CRC check; a 1 byte CRC check is
  1.1146 +used on versions after 2.0.  This can be used to test if the password
  1.1147 +supplied is correct or not.
  1.1148 +
  1.1149 +Step 3 - Decrypting the compressed data stream
  1.1150 +----------------------------------------------
  1.1151 +
  1.1152 +The compressed data stream can be decrypted as follows:
  1.1153 +
  1.1154 +loop until done
  1.1155 +    read a character into C
  1.1156 +    Temp <- C ^ decrypt_byte()
  1.1157 +    update_keys(temp)
  1.1158 +    output Temp
  1.1159 +end loop
  1.1160 +
  1.1161 +In addition to the above mentioned contributors to PKZIP and PKUNZIP,
  1.1162 +I would like to extend special thanks to Robert Mahoney for suggesting
  1.1163 +the extension .ZIP for this software.
  1.1164 +
  1.1165 +References:
  1.1166 +
  1.1167 +    Fiala, Edward R., and Greene, Daniel H., "Data compression with
  1.1168 +       finite windows",  Communications of the ACM, Volume 32, Number 4,
  1.1169 +       April 1989, pages 490-505.
  1.1170 +
  1.1171 +    Held, Gilbert, "Data Compression, Techniques and Applications,
  1.1172 +       Hardware and Software Considerations", John Wiley & Sons, 1987.
  1.1173 +
  1.1174 +    Huffman, D.A., "A method for the construction of minimum-redundancy
  1.1175 +       codes", Proceedings of the IRE, Volume 40, Number 9, September 1952,
  1.1176 +       pages 1098-1101.
  1.1177 +
  1.1178 +    Nelson, Mark, "LZW Data Compression", Dr. Dobbs Journal, Volume 14,
  1.1179 +       Number 10, October 1989, pages 29-37.
  1.1180 +
  1.1181 +    Nelson, Mark, "The Data Compression Book",  M&T Books, 1991.
  1.1182 +
  1.1183 +    Storer, James A., "Data Compression, Methods and Theory",
  1.1184 +       Computer Science Press, 1988
  1.1185 +
  1.1186 +    Welch, Terry, "A Technique for High-Performance Data Compression",
  1.1187 +       IEEE Computer, Volume 17, Number 6, June 1984, pages 8-19.
  1.1188 +
  1.1189 +    Ziv, J. and Lempel, A., "A universal algorithm for sequential data
  1.1190 +       compression", Communications of the ACM, Volume 30, Number 6,
  1.1191 +       June 1987, pages 520-540.
  1.1192 +
  1.1193 +    Ziv, J. and Lempel, A., "Compression of individual sequences via
  1.1194 +       variable-rate coding", IEEE Transactions on Information Theory,
  1.1195 +       Volume 24, Number 5, September 1978, pages 530-536.
The Tor Browser / file diff

diff: modules/libjar/appnote.txt

modules/libjar/appnote.txt