michael@0: Revised: 03/01/1999
michael@0: 
michael@0: Disclaimer
michael@0: ----------
michael@0: 
michael@0: Although PKWARE will attempt to supply current and accurate
michael@0: information relating to its file formats, algorithms, and the
michael@0: subject programs, the possibility of error can not be eliminated.
michael@0: PKWARE therefore expressly disclaims any warranty that the
michael@0: information contained in the associated materials relating to the
michael@0: subject programs and/or the format of the files created or
michael@0: accessed by the subject programs and/or the algorithms used by
michael@0: the subject programs, or any other matter, is current, correct or
michael@0: accurate as delivered.  Any risk of damage due to any possible
michael@0: inaccurate information is assumed by the user of the information.
michael@0: Furthermore, the information relating to the subject programs
michael@0: and/or the file formats created or accessed by the subject
michael@0: programs and/or the algorithms used by the subject programs is
michael@0: subject to change without notice.
michael@0: 
michael@0: General Format of a ZIP file
michael@0: ----------------------------
michael@0: 
michael@0:   Files stored in arbitrary order.  Large zipfiles can span multiple
michael@0:   diskette media.
michael@0: 
michael@0:   Overall zipfile format:
michael@0: 
michael@0:     [local file header + file data + data_descriptor] . . .
michael@0:     [central directory] end of central directory record
michael@0: 
michael@0: 
michael@0:   A.  Local file header:
michael@0: 
michael@0:         local file header signature     4 bytes  (0x04034b50)
michael@0:         version needed to extract       2 bytes
michael@0:         general purpose bit flag        2 bytes
michael@0:         compression method              2 bytes
michael@0:         last mod file time              2 bytes
michael@0:         last mod file date              2 bytes
michael@0:         crc-32                          4 bytes
michael@0:         compressed size                 4 bytes
michael@0:         uncompressed size               4 bytes
michael@0:         filename length                 2 bytes
michael@0:         extra field length              2 bytes
michael@0: 
michael@0:         filename (variable size)
michael@0:         extra field (variable size)
michael@0: 
michael@0:   B.  Data descriptor:
michael@0: 
michael@0:         crc-32                          4 bytes
michael@0:         compressed size                 4 bytes
michael@0:         uncompressed size               4 bytes
michael@0: 
michael@0:       This descriptor exists only if bit 3 of the general
michael@0:       purpose bit flag is set (see below).  It is byte aligned
michael@0:       and immediately follows the last byte of compressed data.
michael@0:       This descriptor is used only when it was not possible to
michael@0:       seek in the output zip file, e.g., when the output zip file
michael@0:       was standard output or a non seekable device.
michael@0: 
michael@0:   C.  Central directory structure:
michael@0: 
michael@0:       [file header] . . .  end of central dir record
michael@0: 
michael@0:       File header:
michael@0: 
michael@0:         central file header signature   4 bytes  (0x02014b50)
michael@0:         version made by                 2 bytes
michael@0:         version needed to extract       2 bytes
michael@0:         general purpose bit flag        2 bytes
michael@0:         compression method              2 bytes
michael@0:         last mod file time              2 bytes
michael@0:         last mod file date              2 bytes
michael@0:         crc-32                          4 bytes
michael@0:         compressed size                 4 bytes
michael@0:         uncompressed size               4 bytes
michael@0:         filename length                 2 bytes
michael@0:         extra field length              2 bytes
michael@0:         file comment length             2 bytes
michael@0:         disk number start               2 bytes
michael@0:         internal file attributes        2 bytes
michael@0:         external file attributes        4 bytes
michael@0:         relative offset of local header 4 bytes
michael@0: 
michael@0:         filename (variable size)
michael@0:         extra field (variable size)
michael@0:         file comment (variable size)
michael@0: 
michael@0:       End of central dir record:
michael@0: 
michael@0:         end of central dir signature    4 bytes  (0x06054b50)
michael@0:         number of this disk             2 bytes
michael@0:         number of the disk with the
michael@0:         start of the central directory  2 bytes
michael@0:         total number of entries in
michael@0:         the central dir on this disk    2 bytes
michael@0:         total number of entries in
michael@0:         the central dir                 2 bytes
michael@0:         size of the central directory   4 bytes
michael@0:         offset of start of central
michael@0:         directory with respect to
michael@0:         the starting disk number        4 bytes
michael@0:         zipfile comment length          2 bytes
michael@0:         zipfile comment (variable size)
michael@0: 
michael@0:   D.  Explanation of fields:
michael@0: 
michael@0:       version made by (2 bytes)
michael@0: 
michael@0:           The upper byte indicates the compatibility of the file
michael@0:           attribute information.  If the external file attributes 
michael@0:           are compatible with MS-DOS and can be read by PKZIP for 
michael@0:           DOS version 2.04g then this value will be zero.  If these 
michael@0:           attributes are not compatible, then this value will 
michael@0:           identify the host system on which the attributes are 
michael@0:           compatible.  Software can use this information to determine
michael@0:           the line record format for text files etc.  The current
michael@0:           mappings are:
michael@0: 
michael@0:           0 - MS-DOS and OS/2 (FAT / VFAT / FAT32 file systems)
michael@0:           1 - Amiga                     2 - VAX/VMS
michael@0:           3 - Unix                      4 - VM/CMS
michael@0:           5 - Atari ST                  6 - OS/2 H.P.F.S.
michael@0:           7 - Macintosh                 8 - Z-System
michael@0:           9 - CP/M                     10 - Windows NTFS
michael@0:          11 thru 255 - unused
michael@0: 
michael@0:           The lower byte indicates the version number of the
michael@0:           software used to encode the file.  The value/10
michael@0:           indicates the major version number, and the value
michael@0:           mod 10 is the minor version number.
michael@0: 
michael@0:       version needed to extract (2 bytes)
michael@0: 
michael@0:           The minimum software version needed to extract the
michael@0:           file, mapped as above.
michael@0: 
michael@0:       general purpose bit flag: (2 bytes)
michael@0: 
michael@0:           Bit 0: If set, indicates that the file is encrypted.
michael@0: 
michael@0:           (For Method 6 - Imploding)
michael@0:           Bit 1: If the compression method used was type 6,
michael@0:                  Imploding, then this bit, if set, indicates
michael@0:                  an 8K sliding dictionary was used.  If clear,
michael@0:                  then a 4K sliding dictionary was used.
michael@0:           Bit 2: If the compression method used was type 6,
michael@0:                  Imploding, then this bit, if set, indicates
michael@0:                  3 Shannon-Fano trees were used to encode the
michael@0:                  sliding dictionary output.  If clear, then 2
michael@0:                  Shannon-Fano trees were used.
michael@0: 
michael@0:           (For Method 8 - Deflating)
michael@0:           Bit 2  Bit 1
michael@0:             0      0    Normal (-en) compression option was used.
michael@0:             0      1    Maximum (-ex) compression option was used.
michael@0:             1      0    Fast (-ef) compression option was used.
michael@0:             1      1    Super Fast (-es) compression option was used.
michael@0: 
michael@0:           Note:  Bits 1 and 2 are undefined if the compression
michael@0:                  method is any other.
michael@0: 
michael@0:           Bit 3: If this bit is set, the fields crc-32, compressed 
michael@0:                  size and uncompressed size are set to zero in the 
michael@0:                  local header.  The correct values are put in the 
michael@0:                  data descriptor immediately following the compressed
michael@0:                  data.  (Note: PKZIP version 2.04g for DOS only 
michael@0:                  recognizes this bit for method 8 compression, newer 
michael@0:                  versions of PKZIP recognize this bit for any 
michael@0:                  compression method.)
michael@0: 
michael@0:           Bit 4: Reserved for use with method 8, for enhanced
michael@0:                  deflating. 
michael@0: 
michael@0:           Bit 5: If this bit is set, this indicates that the file is 
michael@0:                  compressed patched data.  (Note: Requires PKZIP 
michael@0:                  version 2.70 or greater)
michael@0: 
michael@0:           Bit 6: Currently unused.
michael@0: 
michael@0:           Bit 7: Currently unused.
michael@0: 
michael@0:           Bit 8: Currently unused.
michael@0: 
michael@0:           Bit 9: Currently unused.
michael@0: 
michael@0:           Bit 10: Currently unused.
michael@0: 
michael@0:           Bit 11: Currently unused.
michael@0: 
michael@0:           Bit 12: Reserved by PKWARE for enhanced compression.
michael@0: 
michael@0:           Bit 13: Reserved by PKWARE.
michael@0: 
michael@0:           Bit 14: Reserved by PKWARE.
michael@0: 
michael@0:           Bit 15: Reserved by PKWARE.
michael@0: 
michael@0:       compression method: (2 bytes)
michael@0: 
michael@0:           (see accompanying documentation for algorithm
michael@0:           descriptions)
michael@0: 
michael@0:           0 - The file is stored (no compression)
michael@0:           1 - The file is Shrunk
michael@0:           2 - The file is Reduced with compression factor 1
michael@0:           3 - The file is Reduced with compression factor 2
michael@0:           4 - The file is Reduced with compression factor 3
michael@0:           5 - The file is Reduced with compression factor 4
michael@0:           6 - The file is Imploded
michael@0:           7 - Reserved for Tokenizing compression algorithm
michael@0:           8 - The file is Deflated
michael@0:           9 - Reserved for enhanced Deflating
michael@0:          10 - PKWARE Date Compression Library Imploding
michael@0: 
michael@0:       date and time fields: (2 bytes each)
michael@0: 
michael@0:           The date and time are encoded in standard MS-DOS format.
michael@0:           If input came from standard input, the date and time are
michael@0:           those at which compression was started for this data.
michael@0: 
michael@0:       CRC-32: (4 bytes)
michael@0: 
michael@0:           The CRC-32 algorithm was generously contributed by
michael@0:           David Schwaderer and can be found in his excellent
michael@0:           book "C Programmers Guide to NetBIOS" published by
michael@0:           Howard W. Sams & Co. Inc.  The 'magic number' for
michael@0:           the CRC is 0xdebb20e3.  The proper CRC pre and post
michael@0:           conditioning is used, meaning that the CRC register
michael@0:           is pre-conditioned with all ones (a starting value
michael@0:           of 0xffffffff) and the value is post-conditioned by
michael@0:           taking the one's complement of the CRC residual.
michael@0:           If bit 3 of the general purpose flag is set, this
michael@0:           field is set to zero in the local header and the correct
michael@0:           value is put in the data descriptor and in the central
michael@0:           directory.
michael@0: 
michael@0:       compressed size: (4 bytes)
michael@0:       uncompressed size: (4 bytes)
michael@0: 
michael@0:           The size of the file compressed and uncompressed,
michael@0:           respectively.  If bit 3 of the general purpose bit flag
michael@0:           is set, these fields are set to zero in the local header
michael@0:           and the correct values are put in the data descriptor and
michael@0:           in the central directory.
michael@0: 
michael@0:       filename length: (2 bytes)
michael@0:       extra field length: (2 bytes)
michael@0:       file comment length: (2 bytes)
michael@0: 
michael@0:           The length of the filename, extra field, and comment
michael@0:           fields respectively.  The combined length of any
michael@0:           directory record and these three fields should not
michael@0:           generally exceed 65,535 bytes.  If input came from standard
michael@0:           input, the filename length is set to zero.
michael@0: 
michael@0:       disk number start: (2 bytes)
michael@0: 
michael@0:           The number of the disk on which this file begins.
michael@0: 
michael@0:       internal file attributes: (2 bytes)
michael@0: 
michael@0:           The lowest bit of this field indicates, if set, that
michael@0:           the file is apparently an ASCII or text file.  If not
michael@0:           set, that the file apparently contains binary data.
michael@0:           The remaining bits are unused in version 1.0.
michael@0: 
michael@0:           Bits 1 and 2 are reserved for use by PKWARE.
michael@0: 
michael@0:       external file attributes: (4 bytes)
michael@0: 
michael@0:           The mapping of the external attributes is
michael@0:           host-system dependent (see 'version made by').  For
michael@0:           MS-DOS, the low order byte is the MS-DOS directory
michael@0:           attribute byte.  If input came from standard input, this
michael@0:           field is set to zero.
michael@0: 
michael@0:       relative offset of local header: (4 bytes)
michael@0: 
michael@0:           This is the offset from the start of the first disk on
michael@0:           which this file appears, to where the local header should
michael@0:           be found.
michael@0: 
michael@0:       filename: (Variable)
michael@0: 
michael@0:           The name of the file, with optional relative path.
michael@0:           The path stored should not contain a drive or
michael@0:           device letter, or a leading slash.  All slashes
michael@0:           should be forward slashes '/' as opposed to
michael@0:           backwards slashes '\' for compatibility with Amiga
michael@0:           and Unix file systems etc.  If input came from standard
michael@0:           input, there is no filename field.
michael@0: 
michael@0:       extra field: (Variable)
michael@0: 
michael@0:           This is for future expansion.  If additional information
michael@0:           needs to be stored in the future, it should be stored
michael@0:           here.  Earlier versions of the software can then safely
michael@0:           skip this file, and find the next file or header.  This
michael@0:           field will be 0 length in version 1.0.
michael@0: 
michael@0:           In order to allow different programs and different types
michael@0:           of information to be stored in the 'extra' field in .ZIP
michael@0:           files, the following structure should be used for all
michael@0:           programs storing data in this field:
michael@0: 
michael@0:           header1+data1 + header2+data2 . . .
michael@0: 
michael@0:           Each header should consist of:
michael@0: 
michael@0:             Header ID - 2 bytes
michael@0:             Data Size - 2 bytes
michael@0: 
michael@0:           Note: all fields stored in Intel low-byte/high-byte order.
michael@0: 
michael@0:           The Header ID field indicates the type of data that is in
michael@0:           the following data block.
michael@0: 
michael@0:           Header ID's of 0 thru 31 are reserved for use by PKWARE.
michael@0:           The remaining ID's can be used by third party vendors for
michael@0:           proprietary usage.
michael@0: 
michael@0:           The current Header ID mappings defined by PKWARE are:
michael@0: 
michael@0:           0x0007        AV Info
michael@0:           0x0009        OS/2
michael@0:           0x000a        NTFS 
michael@0:           0x000c        VAX/VMS
michael@0:           0x000d        Unix
michael@0:           0x000f        Patch Descriptor
michael@0: 
michael@0:           Several third party mappings commonly used are:
michael@0: 
michael@0:           0x4b46        FWKCS MD5 (see below)
michael@0:           0x07c8        Macintosh
michael@0:           0x4341        Acorn/SparkFS 
michael@0:           0x4453        Windows NT security descriptor (binary ACL)
michael@0:           0x4704        VM/CMS
michael@0:           0x470f        MVS
michael@0:           0x4c41        OS/2 access control list (text ACL)
michael@0:           0x4d49        Info-ZIP VMS (VAX or Alpha)
michael@0:           0x5455        extended timestamp
michael@0:           0x5855        Info-ZIP Unix (original, also OS/2, NT, etc)
michael@0:           0x6542        BeOS/BeBox
michael@0:           0x756e        ASi Unix
michael@0:           0x7855        Info-ZIP Unix (new)
michael@0:           0xfd4a        SMS/QDOS
michael@0: 
michael@0:           The Data Size field indicates the size of the following
michael@0:           data block. Programs can use this value to skip to the
michael@0:           next header block, passing over any data blocks that are
michael@0:           not of interest.
michael@0: 
michael@0:           Note: As stated above, the size of the entire .ZIP file
michael@0:                 header, including the filename, comment, and extra
michael@0:                 field should not exceed 64K in size.
michael@0: 
michael@0:           In case two different programs should appropriate the same
michael@0:           Header ID value, it is strongly recommended that each
michael@0:           program place a unique signature of at least two bytes in
michael@0:           size (and preferably 4 bytes or bigger) at the start of
michael@0:           each data area.  Every program should verify that its
michael@0:           unique signature is present, in addition to the Header ID
michael@0:           value being correct, before assuming that it is a block of
michael@0:           known type.
michael@0: 
michael@0:          -OS/2 Extra Field:
michael@0: 
michael@0:           The following is the layout of the OS/2 attributes "extra" 
michael@0:           block.  (Last Revision  09/05/95)
michael@0: 
michael@0:           Note: all fields stored in Intel low-byte/high-byte order.
michael@0: 
michael@0:           Value       Size          Description
michael@0:           -----       ----          -----------
michael@0:   (OS/2)  0x0009      2 bytes       Tag for this "extra" block type
michael@0:           TSize       2 bytes       Size for the following data block
michael@0:           BSize       4 bytes       Uncompressed Block Size
michael@0:           CType       2 bytes       Compression type
michael@0:           EACRC       4 bytes       CRC value for uncompress block
michael@0:           (var)       variable      Compressed block
michael@0: 
michael@0:         The OS/2 extended attribute structure (FEA2LIST) is 
michael@0:         compressed and then stored in it's entirety within this 
michael@0:         structure.  There will only ever be one "block" of data in 
michael@0:         VarFields[].
michael@0: 
michael@0:          -UNIX Extra Field:
michael@0: 
michael@0:           The following is the layout of the Unix "extra" block.
michael@0:           Note: all fields are stored in Intel low-byte/high-byte 
michael@0:           order.
michael@0: 
michael@0:           Value       Size          Description
michael@0:           -----       ----          -----------
michael@0:   (UNIX)  0x000d      2 bytes       Tag for this "extra" block type
michael@0:           TSize       2 bytes       Size for the following data block
michael@0:           Atime       4 bytes       File last access time
michael@0:           Mtime       4 bytes       File last modification time
michael@0:           Uid         2 bytes       File user ID
michael@0:           Gid         2 bytes       File group ID
michael@0:           (var)       variable      Variable length data field
michael@0: 
michael@0:           The variable length data field will contain file type 
michael@0:           specific data.  Currently the only values allowed are
michael@0:           the original "linked to" file names for hard or symbolic 
michael@0:           links.
michael@0: 
michael@0:          -VAX/VMS Extra Field:
michael@0: 
michael@0:           The following is the layout of the VAX/VMS attributes 
michael@0:           "extra" block.
michael@0: 
michael@0:           Note: all fields stored in Intel low-byte/high-byte order.
michael@0: 
michael@0:           Value      Size       Description
michael@0:           -----      ----       -----------
michael@0:   (VMS)   0x000c     2 bytes    Tag for this "extra" block type
michael@0:           TSize      2 bytes    Size of the total "extra" block
michael@0:           CRC        4 bytes    32-bit CRC for remainder of the block
michael@0:           Tag1       2 bytes    VMS attribute tag value #1
michael@0:           Size1      2 bytes    Size of attribute #1, in bytes
michael@0:           (var.)     Size1      Attribute #1 data
michael@0:           .
michael@0:           .
michael@0:           .
michael@0:           TagN       2 bytes    VMS attribute tage value #N
michael@0:           SizeN      2 bytes    Size of attribute #N, in bytes
michael@0:           (var.)     SizeN      Attribute #N data
michael@0: 
michael@0:           Rules:
michael@0: 
michael@0:           1. There will be one or more of attributes present, which 
michael@0:              will each be preceded by the above TagX & SizeX values.  
michael@0:              These values are identical to the ATR$C_XXXX and 
michael@0:              ATR$S_XXXX constants which are defined in ATR.H under 
michael@0:              VMS C.  Neither of these values will ever be zero.
michael@0: 
michael@0:           2. No word alignment or padding is performed.
michael@0: 
michael@0:           3. A well-behaved PKZIP/VMS program should never produce
michael@0:              more than one sub-block with the same TagX value.  Also,
michael@0:              there will never be more than one "extra" block of type
michael@0:              0x000c in a particular directory record.
michael@0: 
michael@0:          -NTFS Extra Field:
michael@0: 
michael@0:           The following is the layout of the NTFS attributes 
michael@0:           "extra" block.
michael@0: 
michael@0:           Note: all fields stored in Intel low-byte/high-byte order.
michael@0: 
michael@0:           Value      Size       Description
michael@0:           -----      ----       -----------
michael@0:   (NTFS)  0x000a     2 bytes    Tag for this "extra" block type
michael@0:           TSize      2 bytes    Size of the total "extra" block
michael@0:           Reserved   4 bytes    Reserved for future use
michael@0:           Tag1       2 bytes    NTFS attribute tag value #1
michael@0:           Size1      2 bytes    Size of attribute #1, in bytes
michael@0:           (var.)     Size1      Attribute #1 data
michael@0:           .
michael@0:           .
michael@0:           .
michael@0:           TagN       2 bytes    NTFS attribute tage value #N
michael@0:           SizeN      2 bytes    Size of attribute #N, in bytes
michael@0:           (var.)     SizeN      Attribute #N data
michael@0: 
michael@0:           For NTFS, values for Tag1 through TagN are as follows:
michael@0:           (currently only one set of attributes is defined for NTFS)
michael@0: 
michael@0:           Tag        Size       Description
michael@0:           -----      ----       -----------
michael@0:           0x0001     2 bytes    Tag for attribute #1 
michael@0:           Size1      2 bytes    Size of attribute #1, in bytes
michael@0:           Mtime      8 bytes    File last modification time
michael@0:           Atime      8 bytes    File last access time
michael@0:           Ctime      8 bytes    File creation time
michael@0:           
michael@0:          -PATCH Descriptor Extra Field:
michael@0: 
michael@0:           The following is the layout of the Patch Descriptor "extra"
michael@0:           block.
michael@0: 
michael@0:           Note: all fields stored in Intel low-byte/high-byte order.
michael@0: 
michael@0:           Value     Size     Description
michael@0:           -----     ----     -----------
michael@0:   (Patch) 0x000f    2 bytes  Tag for this "extra" block type
michael@0:           TSize     2 bytes  Size of the total "extra" block
michael@0:           Version   2 bytes  Version of the descriptor
michael@0:           Flags     4 bytes  Actions and reactions (see below) 
michael@0:           OldSize   4 bytes  Size of the file about to be patched 
michael@0:           OldCRC    4 bytes  32-bit CRC of the file to be patched 
michael@0:           NewSize   4 bytes  Size of the resulting file 
michael@0:           NewCRC    4 bytes  32-bit CRC of the resulting file 
michael@0: 
michael@0:           Actions and reactions
michael@0: 
michael@0:           Bits          Description
michael@0:           ----          ----------------
michael@0:           0             Use for autodetection
michael@0:           1             Treat as selfpatch
michael@0:           2-3           RESERVED
michael@0:           4-5           Action (see below)
michael@0:           6-7           RESERVED
michael@0:           8-9           Reaction (see below) to absent file 
michael@0:           10-11         Reaction (see below) to newer file
michael@0:           12-13         Reaction (see below) to unknown file
michael@0:           14-15         RESERVED
michael@0:           16-31         RESERVED
michael@0: 
michael@0:           Actions
michael@0: 
michael@0:           Action       Value
michael@0:           ------       ----- 
michael@0:           none         0
michael@0:           add          1
michael@0:           delete       2
michael@0:           patch        3
michael@0: 
michael@0:           Reactions
michael@0:  
michael@0:           Reaction     Value
michael@0:           --------     -----
michael@0:           ask          0
michael@0:           skip         1
michael@0:           ignore       2
michael@0:           fail         3
michael@0: 
michael@0:           - FWKCS MD5 Extra Field:
michael@0: 
michael@0:           The FWKCS Contents_Signature System, used in
michael@0:           automatically identifying files independent of filename,
michael@0:           optionally adds and uses an extra field to support the
michael@0:           rapid creation of an enhanced contents_signature:
michael@0: 
michael@0:               Header ID = 0x4b46
michael@0:               Data Size = 0x0013
michael@0:               Preface   = 'M','D','5'
michael@0:               followed by 16 bytes containing the uncompressed file's
michael@0:               128_bit MD5 hash(1), low byte first.
michael@0: 
michael@0:           When FWKCS revises a zipfile central directory to add
michael@0:           this extra field for a file, it also replaces the
michael@0:           central directory entry for that file's uncompressed
michael@0:           filelength with a measured value.
michael@0: 
michael@0:           FWKCS provides an option to strip this extra field, if
michael@0:           present, from a zipfile central directory. In adding
michael@0:           this extra field, FWKCS preserves Zipfile Authenticity
michael@0:           Verification; if stripping this extra field, FWKCS
michael@0:           preserves all versions of AV through PKZIP version 2.04g.
michael@0: 
michael@0:           FWKCS, and FWKCS Contents_Signature System, are
michael@0:           trademarks of Frederick W. Kantor.
michael@0: 
michael@0:           (1) R. Rivest, RFC1321.TXT, MIT Laboratory for Computer
michael@0:               Science and RSA Data Security, Inc., April 1992.
michael@0:               ll.76-77: "The MD5 algorithm is being placed in the
michael@0:               public domain for review and possible adoption as a
michael@0:               standard."
michael@0: 
michael@0:       file comment: (Variable)
michael@0: 
michael@0:           The comment for this file.
michael@0: 
michael@0:       number of this disk: (2 bytes)
michael@0: 
michael@0:           The number of this disk, which contains central
michael@0:           directory end record.
michael@0: 
michael@0:       number of the disk with the start of the central
michael@0:       directory: (2 bytes)
michael@0: 
michael@0:           The number of the disk on which the central
michael@0:           directory starts.
michael@0: 
michael@0:       total number of entries in the central dir on 
michael@0:       this disk: (2 bytes)
michael@0: 
michael@0:           The number of central directory entries on this disk.
michael@0: 
michael@0:       total number of entries in the central dir: (2 bytes)
michael@0: 
michael@0:           The total number of files in the zipfile.
michael@0: 
michael@0:       size of the central directory: (4 bytes)
michael@0: 
michael@0:           The size (in bytes) of the entire central directory.
michael@0: 
michael@0:       offset of start of central directory with respect to
michael@0:       the starting disk number:  (4 bytes)
michael@0: 
michael@0:           Offset of the start of the central directory on the
michael@0:           disk on which the central directory starts.
michael@0: 
michael@0:       zipfile comment length: (2 bytes)
michael@0: 
michael@0:           The length of the comment for this zipfile.
michael@0: 
michael@0:       zipfile comment: (Variable)
michael@0: 
michael@0:           The comment for this zipfile.
michael@0: 
michael@0:   D.  General notes:
michael@0: 
michael@0:       1)  All fields unless otherwise noted are unsigned and stored
michael@0:           in Intel low-byte:high-byte, low-word:high-word order.
michael@0: 
michael@0:       2)  String fields are not null terminated, since the
michael@0:           length is given explicitly.
michael@0: 
michael@0:       3)  Local headers should not span disk boundaries.  Also, even
michael@0:           though the central directory can span disk boundaries, no
michael@0:           single record in the central directory should be split
michael@0:           across disks.
michael@0: 
michael@0:       4)  The entries in the central directory may not necessarily
michael@0:           be in the same order that files appear in the zipfile.
michael@0: 
michael@0: UnShrinking - Method 1
michael@0: ----------------------
michael@0: 
michael@0: Shrinking is a Dynamic Ziv-Lempel-Welch compression algorithm
michael@0: with partial clearing.  The initial code size is 9 bits, and
michael@0: the maximum code size is 13 bits.  Shrinking differs from
michael@0: conventional Dynamic Ziv-Lempel-Welch implementations in several
michael@0: respects:
michael@0: 
michael@0: 1)  The code size is controlled by the compressor, and is not
michael@0:     automatically increased when codes larger than the current
michael@0:     code size are created (but not necessarily used).  When
michael@0:     the decompressor encounters the code sequence 256
michael@0:     (decimal) followed by 1, it should increase the code size
michael@0:     read from the input stream to the next bit size.  No
michael@0:     blocking of the codes is performed, so the next code at
michael@0:     the increased size should be read from the input stream
michael@0:     immediately after where the previous code at the smaller
michael@0:     bit size was read.  Again, the decompressor should not
michael@0:     increase the code size used until the sequence 256,1 is
michael@0:     encountered.
michael@0: 
michael@0: 2)  When the table becomes full, total clearing is not
michael@0:     performed.  Rather, when the compressor emits the code
michael@0:     sequence 256,2 (decimal), the decompressor should clear
michael@0:     all leaf nodes from the Ziv-Lempel tree, and continue to
michael@0:     use the current code size.  The nodes that are cleared
michael@0:     from the Ziv-Lempel tree are then re-used, with the lowest
michael@0:     code value re-used first, and the highest code value
michael@0:     re-used last.  The compressor can emit the sequence 256,2
michael@0:     at any time.
michael@0: 
michael@0: Expanding - Methods 2-5
michael@0: -----------------------
michael@0: 
michael@0: The Reducing algorithm is actually a combination of two
michael@0: distinct algorithms.  The first algorithm compresses repeated
michael@0: byte sequences, and the second algorithm takes the compressed
michael@0: stream from the first algorithm and applies a probabilistic
michael@0: compression method.
michael@0: 
michael@0: The probabilistic compression stores an array of 'follower
michael@0: sets' S(j), for j=0 to 255, corresponding to each possible
michael@0: ASCII character.  Each set contains between 0 and 32
michael@0: characters, to be denoted as S(j)[0],...,S(j)[m], where m<32.
michael@0: The sets are stored at the beginning of the data area for a
michael@0: Reduced file, in reverse order, with S(255) first, and S(0)
michael@0: last.
michael@0: 
michael@0: The sets are encoded as { N(j), S(j)[0],...,S(j)[N(j)-1] },
michael@0: where N(j) is the size of set S(j).  N(j) can be 0, in which
michael@0: case the follower set for S(j) is empty.  Each N(j) value is
michael@0: encoded in 6 bits, followed by N(j) eight bit character values
michael@0: corresponding to S(j)[0] to S(j)[N(j)-1] respectively.  If
michael@0: N(j) is 0, then no values for S(j) are stored, and the value
michael@0: for N(j-1) immediately follows.
michael@0: 
michael@0: Immediately after the follower sets, is the compressed data
michael@0: stream.  The compressed data stream can be interpreted for the
michael@0: probabilistic decompression as follows:
michael@0: 
michael@0: let Last-Character <- 0.
michael@0: loop until done
michael@0:     if the follower set S(Last-Character) is empty then
michael@0:         read 8 bits from the input stream, and copy this
michael@0:         value to the output stream.
michael@0:     otherwise if the follower set S(Last-Character) is non-empty then
michael@0:         read 1 bit from the input stream.
michael@0:         if this bit is not zero then
michael@0:             read 8 bits from the input stream, and copy this
michael@0:             value to the output stream.
michael@0:         otherwise if this bit is zero then
michael@0:             read B(N(Last-Character)) bits from the input
michael@0:             stream, and assign this value to I.
michael@0:             Copy the value of S(Last-Character)[I] to the
michael@0:             output stream.
michael@0: 
michael@0:     assign the last value placed on the output stream to
michael@0:     Last-Character.
michael@0: end loop
michael@0: 
michael@0: B(N(j)) is defined as the minimal number of bits required to
michael@0: encode the value N(j)-1.
michael@0: 
michael@0: The decompressed stream from above can then be expanded to
michael@0: re-create the original file as follows:
michael@0: 
michael@0: let State <- 0.
michael@0: 
michael@0: loop until done
michael@0:     read 8 bits from the input stream into C.
michael@0:     case State of
michael@0:         0:  if C is not equal to DLE (144 decimal) then
michael@0:                 copy C to the output stream.
michael@0:             otherwise if C is equal to DLE then
michael@0:                 let State <- 1.
michael@0: 
michael@0:         1:  if C is non-zero then
michael@0:                 let V <- C.
michael@0:                 let Len <- L(V)
michael@0:                 let State <- F(Len).
michael@0:             otherwise if C is zero then
michael@0:                 copy the value 144 (decimal) to the output stream.
michael@0:                 let State <- 0
michael@0: 
michael@0:         2:  let Len <- Len + C
michael@0:             let State <- 3.
michael@0: 
michael@0:         3:  move backwards D(V,C) bytes in the output stream
michael@0:             (if this position is before the start of the output
michael@0:             stream, then assume that all the data before the
michael@0:             start of the output stream is filled with zeros).
michael@0:             copy Len+3 bytes from this position to the output stream.
michael@0:             let State <- 0.
michael@0:     end case
michael@0: end loop
michael@0: 
michael@0: The functions F,L, and D are dependent on the 'compression
michael@0: factor', 1 through 4, and are defined as follows:
michael@0: 
michael@0: For compression factor 1:
michael@0:     L(X) equals the lower 7 bits of X.
michael@0:     F(X) equals 2 if X equals 127 otherwise F(X) equals 3.
michael@0:     D(X,Y) equals the (upper 1 bit of X) * 256 + Y + 1.
michael@0: For compression factor 2:
michael@0:     L(X) equals the lower 6 bits of X.
michael@0:     F(X) equals 2 if X equals 63 otherwise F(X) equals 3.
michael@0:     D(X,Y) equals the (upper 2 bits of X) * 256 + Y + 1.
michael@0: For compression factor 3:
michael@0:     L(X) equals the lower 5 bits of X.
michael@0:     F(X) equals 2 if X equals 31 otherwise F(X) equals 3.
michael@0:     D(X,Y) equals the (upper 3 bits of X) * 256 + Y + 1.
michael@0: For compression factor 4:
michael@0:     L(X) equals the lower 4 bits of X.
michael@0:     F(X) equals 2 if X equals 15 otherwise F(X) equals 3.
michael@0:     D(X,Y) equals the (upper 4 bits of X) * 256 + Y + 1.
michael@0: 
michael@0: Imploding - Method 6
michael@0: --------------------
michael@0: 
michael@0: The Imploding algorithm is actually a combination of two distinct
michael@0: algorithms.  The first algorithm compresses repeated byte
michael@0: sequences using a sliding dictionary.  The second algorithm is
michael@0: used to compress the encoding of the sliding dictionary output,
michael@0: using multiple Shannon-Fano trees.
michael@0: 
michael@0: The Imploding algorithm can use a 4K or 8K sliding dictionary
michael@0: size. The dictionary size used can be determined by bit 1 in the
michael@0: general purpose flag word; a 0 bit indicates a 4K dictionary
michael@0: while a 1 bit indicates an 8K dictionary.
michael@0: 
michael@0: The Shannon-Fano trees are stored at the start of the compressed
michael@0: file. The number of trees stored is defined by bit 2 in the
michael@0: general purpose flag word; a 0 bit indicates two trees stored, a
michael@0: 1 bit indicates three trees are stored.  If 3 trees are stored,
michael@0: the first Shannon-Fano tree represents the encoding of the
michael@0: Literal characters, the second tree represents the encoding of
michael@0: the Length information, the third represents the encoding of the
michael@0: Distance information.  When 2 Shannon-Fano trees are stored, the
michael@0: Length tree is stored first, followed by the Distance tree.
michael@0: 
michael@0: The Literal Shannon-Fano tree, if present is used to represent
michael@0: the entire ASCII character set, and contains 256 values.  This
michael@0: tree is used to compress any data not compressed by the sliding
michael@0: dictionary algorithm.  When this tree is present, the Minimum
michael@0: Match Length for the sliding dictionary is 3.  If this tree is
michael@0: not present, the Minimum Match Length is 2.
michael@0: 
michael@0: The Length Shannon-Fano tree is used to compress the Length part
michael@0: of the (length,distance) pairs from the sliding dictionary
michael@0: output.  The Length tree contains 64 values, ranging from the
michael@0: Minimum Match Length, to 63 plus the Minimum Match Length.
michael@0: 
michael@0: The Distance Shannon-Fano tree is used to compress the Distance
michael@0: part of the (length,distance) pairs from the sliding dictionary
michael@0: output. The Distance tree contains 64 values, ranging from 0 to
michael@0: 63, representing the upper 6 bits of the distance value.  The
michael@0: distance values themselves will be between 0 and the sliding
michael@0: dictionary size, either 4K or 8K.
michael@0: 
michael@0: The Shannon-Fano trees themselves are stored in a compressed
michael@0: format. The first byte of the tree data represents the number of
michael@0: bytes of data representing the (compressed) Shannon-Fano tree
michael@0: minus 1.  The remaining bytes represent the Shannon-Fano tree
michael@0: data encoded as:
michael@0: 
michael@0:     High 4 bits: Number of values at this bit length + 1. (1 - 16)
michael@0:     Low  4 bits: Bit Length needed to represent value + 1. (1 - 16)
michael@0: 
michael@0: The Shannon-Fano codes can be constructed from the bit lengths
michael@0: using the following algorithm:
michael@0: 
michael@0: 1)  Sort the Bit Lengths in ascending order, while retaining the
michael@0:     order of the original lengths stored in the file.
michael@0: 
michael@0: 2)  Generate the Shannon-Fano trees:
michael@0: 
michael@0:     Code <- 0
michael@0:     CodeIncrement <- 0
michael@0:     LastBitLength <- 0
michael@0:     i <- number of Shannon-Fano codes - 1   (either 255 or 63)
michael@0: 
michael@0:     loop while i >= 0
michael@0:         Code = Code + CodeIncrement
michael@0:         if BitLength(i) <> LastBitLength then
michael@0:             LastBitLength=BitLength(i)
michael@0:             CodeIncrement = 1 shifted left (16 - LastBitLength)
michael@0:         ShannonCode(i) = Code
michael@0:         i <- i - 1
michael@0:     end loop
michael@0: 
michael@0: 3)  Reverse the order of all the bits in the above ShannonCode()
michael@0:     vector, so that the most significant bit becomes the least
michael@0:     significant bit.  For example, the value 0x1234 (hex) would
michael@0:     become 0x2C48 (hex).
michael@0: 
michael@0: 4)  Restore the order of Shannon-Fano codes as originally stored
michael@0:     within the file.
michael@0: 
michael@0: Example:
michael@0: 
michael@0:     This example will show the encoding of a Shannon-Fano tree
michael@0:     of size 8.  Notice that the actual Shannon-Fano trees used
michael@0:     for Imploding are either 64 or 256 entries in size.
michael@0: 
michael@0: Example:   0x02, 0x42, 0x01, 0x13
michael@0: 
michael@0:     The first byte indicates 3 values in this table.  Decoding the
michael@0:     bytes:
michael@0:             0x42 = 5 codes of 3 bits long
michael@0:             0x01 = 1 code  of 2 bits long
michael@0:             0x13 = 2 codes of 4 bits long
michael@0: 
michael@0:     This would generate the original bit length array of:
michael@0:     (3, 3, 3, 3, 3, 2, 4, 4)
michael@0: 
michael@0:     There are 8 codes in this table for the values 0 thru 7.  Using 
michael@0:     the algorithm to obtain the Shannon-Fano codes produces:
michael@0: 
michael@0:                                   Reversed     Order     Original
michael@0: Val  Sorted   Constructed Code      Value     Restored    Length
michael@0: ---  ------   -----------------   --------    --------    ------
michael@0: 0:     2      1100000000000000        11       101          3
michael@0: 1:     3      1010000000000000       101       001          3
michael@0: 2:     3      1000000000000000       001       110          3
michael@0: 3:     3      0110000000000000       110       010          3
michael@0: 4:     3      0100000000000000       010       100          3
michael@0: 5:     3      0010000000000000       100        11          2
michael@0: 6:     4      0001000000000000      1000      1000          4
michael@0: 7:     4      0000000000000000      0000      0000          4
michael@0: 
michael@0: The values in the Val, Order Restored and Original Length columns
michael@0: now represent the Shannon-Fano encoding tree that can be used for
michael@0: decoding the Shannon-Fano encoded data.  How to parse the
michael@0: variable length Shannon-Fano values from the data stream is beyond
michael@0: the scope of this document.  (See the references listed at the end of
michael@0: this document for more information.)  However, traditional decoding
michael@0: schemes used for Huffman variable length decoding, such as the
michael@0: Greenlaw algorithm, can be successfully applied.
michael@0: 
michael@0: The compressed data stream begins immediately after the
michael@0: compressed Shannon-Fano data.  The compressed data stream can be
michael@0: interpreted as follows:
michael@0: 
michael@0: loop until done
michael@0:     read 1 bit from input stream.
michael@0: 
michael@0:     if this bit is non-zero then       (encoded data is literal data)
michael@0:         if Literal Shannon-Fano tree is present
michael@0:             read and decode character using Literal Shannon-Fano tree.
michael@0:         otherwise
michael@0:             read 8 bits from input stream.
michael@0:         copy character to the output stream.
michael@0:     otherwise              (encoded data is sliding dictionary match)
michael@0:         if 8K dictionary size
michael@0:             read 7 bits for offset Distance (lower 7 bits of offset).
michael@0:         otherwise
michael@0:             read 6 bits for offset Distance (lower 6 bits of offset).
michael@0: 
michael@0:         using the Distance Shannon-Fano tree, read and decode the
michael@0:           upper 6 bits of the Distance value.
michael@0: 
michael@0:         using the Length Shannon-Fano tree, read and decode
michael@0:           the Length value.
michael@0: 
michael@0:         Length <- Length + Minimum Match Length
michael@0: 
michael@0:         if Length = 63 + Minimum Match Length
michael@0:             read 8 bits from the input stream,
michael@0:             add this value to Length.
michael@0: 
michael@0:         move backwards Distance+1 bytes in the output stream, and
michael@0:         copy Length characters from this position to the output
michael@0:         stream.  (if this position is before the start of the output
michael@0:         stream, then assume that all the data before the start of
michael@0:         the output stream is filled with zeros).
michael@0: end loop
michael@0: 
michael@0: Tokenizing - Method 7
michael@0: --------------------
michael@0: 
michael@0: This method is not used by PKZIP.
michael@0: 
michael@0: Deflating - Method 8
michael@0: -----------------
michael@0: 
michael@0: The Deflate algorithm is similar to the Implode algorithm using
michael@0: a sliding dictionary of up to 32K with secondary compression
michael@0: from Huffman/Shannon-Fano codes.
michael@0: 
michael@0: The compressed data is stored in blocks with a header describing
michael@0: the block and the Huffman codes used in the data block.  The header
michael@0: format is as follows:
michael@0: 
michael@0:    Bit 0: Last Block bit     This bit is set to 1 if this is the last
michael@0:                              compressed block in the data.
michael@0:    Bits 1-2: Block type
michael@0:       00 (0) - Block is stored - All stored data is byte aligned.
michael@0:                Skip bits until next byte, then next word = block 
michael@0:                length, followed by the ones compliment of the block
michael@0:                length word. Remaining data in block is the stored 
michael@0:                data.
michael@0: 
michael@0:       01 (1) - Use fixed Huffman codes for literal and distance codes.
michael@0:                Lit Code    Bits             Dist Code   Bits
michael@0:                ---------   ----             ---------   ----
michael@0:                  0 - 143    8                 0 - 31      5
michael@0:                144 - 255    9
michael@0:                256 - 279    7
michael@0:                280 - 287    8
michael@0: 
michael@0:                Literal codes 286-287 and distance codes 30-31 are 
michael@0:                never used but participate in the huffman construction.
michael@0: 
michael@0:       10 (2) - Dynamic Huffman codes.  (See expanding Huffman codes)
michael@0: 
michael@0:       11 (3) - Reserved - Flag a "Error in compressed data" if seen.
michael@0: 
michael@0: Expanding Huffman Codes
michael@0: -----------------------
michael@0: If the data block is stored with dynamic Huffman codes, the Huffman
michael@0: codes are sent in the following compressed format:
michael@0: 
michael@0:    5 Bits: # of Literal codes sent - 256 (256 - 286)
michael@0:            All other codes are never sent.
michael@0:    5 Bits: # of Dist codes - 1           (1 - 32)
michael@0:    4 Bits: # of Bit Length codes - 3     (3 - 19)
michael@0: 
michael@0: The Huffman codes are sent as bit lengths and the codes are built as
michael@0: described in the implode algorithm.  The bit lengths themselves are
michael@0: compressed with Huffman codes.  There are 19 bit length codes:
michael@0: 
michael@0:    0 - 15: Represent bit lengths of 0 - 15
michael@0:        16: Copy the previous bit length 3 - 6 times.
michael@0:            The next 2 bits indicate repeat length (0 = 3, ... ,3 = 6)
michael@0:               Example:  Codes 8, 16 (+2 bits 11), 16 (+2 bits 10) will
michael@0:                         expand to 12 bit lengths of 8 (1 + 6 + 5)
michael@0:        17: Repeat a bit length of 0 for 3 - 10 times. (3 bits of length)
michael@0:        18: Repeat a bit length of 0 for 11 - 138 times (7 bits of length)
michael@0: 
michael@0: The lengths of the bit length codes are sent packed 3 bits per value
michael@0: (0 - 7) in the following order:
michael@0: 
michael@0:    16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15
michael@0: 
michael@0: The Huffman codes should be built as described in the Implode algorithm
michael@0: except codes are assigned starting at the shortest bit length, i.e. the
michael@0: shortest code should be all 0's rather than all 1's.  Also, codes with
michael@0: a bit length of zero do not participate in the tree construction.  The
michael@0: codes are then used to decode the bit lengths for the literal and 
michael@0: distance tables.
michael@0: 
michael@0: The bit lengths for the literal tables are sent first with the number
michael@0: of entries sent described by the 5 bits sent earlier.  There are up
michael@0: to 286 literal characters; the first 256 represent the respective 8
michael@0: bit character, code 256 represents the End-Of-Block code, the remaining
michael@0: 29 codes represent copy lengths of 3 thru 258.  There are up to 30
michael@0: distance codes representing distances from 1 thru 32k as described
michael@0: below.
michael@0: 
michael@0:                              Length Codes
michael@0:                              ------------
michael@0:       Extra             Extra              Extra              Extra
michael@0:  Code Bits Length  Code Bits Lengths  Code Bits Lengths  Code Bits Length(s)
michael@0:  ---- ---- ------  ---- ---- -------  ---- ---- -------  ---- ---- ---------
michael@0:   257   0     3     265   1   11,12    273   3   35-42    281   5  131-162
michael@0:   258   0     4     266   1   13,14    274   3   43-50    282   5  163-194
michael@0:   259   0     5     267   1   15,16    275   3   51-58    283   5  195-226
michael@0:   260   0     6     268   1   17,18    276   3   59-66    284   5  227-257
michael@0:   261   0     7     269   2   19-22    277   4   67-82    285   0    258
michael@0:   262   0     8     270   2   23-26    278   4   83-98
michael@0:   263   0     9     271   2   27-30    279   4   99-114
michael@0:   264   0    10     272   2   31-34    280   4  115-130
michael@0: 
michael@0:                             Distance Codes
michael@0:                             --------------
michael@0:       Extra           Extra             Extra               Extra
michael@0:  Code Bits Dist  Code Bits  Dist   Code Bits Distance  Code Bits Distance
michael@0:  ---- ---- ----  ---- ---- ------  ---- ---- --------  ---- ---- --------
michael@0:    0   0    1      8   3   17-24    16    7  257-384    24   11  4097-6144
michael@0:    1   0    2      9   3   25-32    17    7  385-512    25   11  6145-8192
michael@0:    2   0    3     10   4   33-48    18    8  513-768    26   12  8193-12288
michael@0:    3   0    4     11   4   49-64    19    8  769-1024   27   12 12289-16384
michael@0:    4   1   5,6    12   5   65-96    20    9 1025-1536   28   13 16385-24576
michael@0:    5   1   7,8    13   5   97-128   21    9 1537-2048   29   13 24577-32768
michael@0:    6   2   9-12   14   6  129-192   22   10 2049-3072
michael@0:    7   2  13-16   15   6  193-256   23   10 3073-4096
michael@0: 
michael@0: The compressed data stream begins immediately after the
michael@0: compressed header data.  The compressed data stream can be
michael@0: interpreted as follows:
michael@0: 
michael@0: do
michael@0:    read header from input stream.
michael@0: 
michael@0:    if stored block
michael@0:       skip bits until byte aligned
michael@0:       read count and 1's compliment of count
michael@0:       copy count bytes data block
michael@0:    otherwise
michael@0:       loop until end of block code sent
michael@0:          decode literal character from input stream
michael@0:          if literal < 256
michael@0:             copy character to the output stream
michael@0:          otherwise
michael@0:             if literal = end of block
michael@0:                break from loop
michael@0:             otherwise
michael@0:                decode distance from input stream
michael@0: 
michael@0:                move backwards distance bytes in the output stream, and
michael@0:                copy length characters from this position to the output
michael@0:                stream.
michael@0:       end loop
michael@0: while not last block
michael@0: 
michael@0: if data descriptor exists
michael@0:    skip bits until byte aligned
michael@0:    read crc and sizes
michael@0: endif
michael@0: 
michael@0: Decryption
michael@0: ----------
michael@0: 
michael@0: The encryption used in PKZIP was generously supplied by Roger
michael@0: Schlafly.  PKWARE is grateful to Mr. Schlafly for his expert
michael@0: help and advice in the field of data encryption.
michael@0: 
michael@0: PKZIP encrypts the compressed data stream.  Encrypted files must
michael@0: be decrypted before they can be extracted.
michael@0: 
michael@0: Each encrypted file has an extra 12 bytes stored at the start of
michael@0: the data area defining the encryption header for that file.  The
michael@0: encryption header is originally set to random values, and then
michael@0: itself encrypted, using three, 32-bit keys.  The key values are
michael@0: initialized using the supplied encryption password.  After each byte
michael@0: is encrypted, the keys are then updated using pseudo-random number
michael@0: generation techniques in combination with the same CRC-32 algorithm
michael@0: used in PKZIP and described elsewhere in this document.
michael@0: 
michael@0: The following is the basic steps required to decrypt a file:
michael@0: 
michael@0: 1) Initialize the three 32-bit keys with the password.
michael@0: 2) Read and decrypt the 12-byte encryption header, further
michael@0:    initializing the encryption keys.
michael@0: 3) Read and decrypt the compressed data stream using the
michael@0:    encryption keys.
michael@0: 
michael@0: Step 1 - Initializing the encryption keys
michael@0: -----------------------------------------
michael@0: 
michael@0: Key(0) <- 305419896
michael@0: Key(1) <- 591751049
michael@0: Key(2) <- 878082192
michael@0: 
michael@0: loop for i <- 0 to length(password)-1
michael@0:     update_keys(password(i))
michael@0: end loop
michael@0: 
michael@0: Where update_keys() is defined as:
michael@0: 
michael@0: update_keys(char):
michael@0:   Key(0) <- crc32(key(0),char)
michael@0:   Key(1) <- Key(1) + (Key(0) & 000000ffH)
michael@0:   Key(1) <- Key(1) * 134775813 + 1
michael@0:   Key(2) <- crc32(key(2),key(1) >> 24)
michael@0: end update_keys
michael@0: 
michael@0: Where crc32(old_crc,char) is a routine that given a CRC value and a
michael@0: character, returns an updated CRC value after applying the CRC-32
michael@0: algorithm described elsewhere in this document.
michael@0: 
michael@0: Step 2 - Decrypting the encryption header
michael@0: -----------------------------------------
michael@0: 
michael@0: The purpose of this step is to further initialize the encryption
michael@0: keys, based on random data, to render a plaintext attack on the
michael@0: data ineffective.
michael@0: 
michael@0: Read the 12-byte encryption header into Buffer, in locations
michael@0: Buffer(0) thru Buffer(11).
michael@0: 
michael@0: loop for i <- 0 to 11
michael@0:     C <- buffer(i) ^ decrypt_byte()
michael@0:     update_keys(C)
michael@0:     buffer(i) <- C
michael@0: end loop
michael@0: 
michael@0: Where decrypt_byte() is defined as:
michael@0: 
michael@0: unsigned char decrypt_byte()
michael@0:     local unsigned short temp
michael@0:     temp <- Key(2) | 2
michael@0:     decrypt_byte <- (temp * (temp ^ 1)) >> 8
michael@0: end decrypt_byte
michael@0: 
michael@0: After the header is decrypted,  the last 1 or 2 bytes in Buffer
michael@0: should be the high-order word/byte of the CRC for the file being
michael@0: decrypted, stored in Intel low-byte/high-byte order.  Versions of
michael@0: PKZIP prior to 2.0 used a 2 byte CRC check; a 1 byte CRC check is
michael@0: used on versions after 2.0.  This can be used to test if the password
michael@0: supplied is correct or not.
michael@0: 
michael@0: Step 3 - Decrypting the compressed data stream
michael@0: ----------------------------------------------
michael@0: 
michael@0: The compressed data stream can be decrypted as follows:
michael@0: 
michael@0: loop until done
michael@0:     read a character into C
michael@0:     Temp <- C ^ decrypt_byte()
michael@0:     update_keys(temp)
michael@0:     output Temp
michael@0: end loop
michael@0: 
michael@0: In addition to the above mentioned contributors to PKZIP and PKUNZIP,
michael@0: I would like to extend special thanks to Robert Mahoney for suggesting
michael@0: the extension .ZIP for this software.
michael@0: 
michael@0: References:
michael@0: 
michael@0:     Fiala, Edward R., and Greene, Daniel H., "Data compression with
michael@0:        finite windows",  Communications of the ACM, Volume 32, Number 4,
michael@0:        April 1989, pages 490-505.
michael@0: 
michael@0:     Held, Gilbert, "Data Compression, Techniques and Applications,
michael@0:        Hardware and Software Considerations", John Wiley & Sons, 1987.
michael@0: 
michael@0:     Huffman, D.A., "A method for the construction of minimum-redundancy
michael@0:        codes", Proceedings of the IRE, Volume 40, Number 9, September 1952,
michael@0:        pages 1098-1101.
michael@0: 
michael@0:     Nelson, Mark, "LZW Data Compression", Dr. Dobbs Journal, Volume 14,
michael@0:        Number 10, October 1989, pages 29-37.
michael@0: 
michael@0:     Nelson, Mark, "The Data Compression Book",  M&T Books, 1991.
michael@0: 
michael@0:     Storer, James A., "Data Compression, Methods and Theory",
michael@0:        Computer Science Press, 1988
michael@0: 
michael@0:     Welch, Terry, "A Technique for High-Performance Data Compression",
michael@0:        IEEE Computer, Volume 17, Number 6, June 1984, pages 8-19.
michael@0: 
michael@0:     Ziv, J. and Lempel, A., "A universal algorithm for sequential data
michael@0:        compression", Communications of the ACM, Volume 30, Number 6,
michael@0:        June 1987, pages 520-540.
michael@0: 
michael@0:     Ziv, J. and Lempel, A., "Compression of individual sequences via
michael@0:        variable-rate coding", IEEE Transactions on Information Theory,
michael@0:        Volume 24, Number 5, September 1978, pages 530-536.