1.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 1.2 +++ b/modules/libjar/appnote.txt Wed Dec 31 06:09:35 2014 +0100 1.3 @@ -0,0 +1,1192 @@ 1.4 +Revised: 03/01/1999 1.5 + 1.6 +Disclaimer 1.7 +---------- 1.8 + 1.9 +Although PKWARE will attempt to supply current and accurate 1.10 +information relating to its file formats, algorithms, and the 1.11 +subject programs, the possibility of error can not be eliminated. 1.12 +PKWARE therefore expressly disclaims any warranty that the 1.13 +information contained in the associated materials relating to the 1.14 +subject programs and/or the format of the files created or 1.15 +accessed by the subject programs and/or the algorithms used by 1.16 +the subject programs, or any other matter, is current, correct or 1.17 +accurate as delivered. Any risk of damage due to any possible 1.18 +inaccurate information is assumed by the user of the information. 1.19 +Furthermore, the information relating to the subject programs 1.20 +and/or the file formats created or accessed by the subject 1.21 +programs and/or the algorithms used by the subject programs is 1.22 +subject to change without notice. 1.23 + 1.24 +General Format of a ZIP file 1.25 +---------------------------- 1.26 + 1.27 + Files stored in arbitrary order. Large zipfiles can span multiple 1.28 + diskette media. 1.29 + 1.30 + Overall zipfile format: 1.31 + 1.32 + [local file header + file data + data_descriptor] . . . 1.33 + [central directory] end of central directory record 1.34 + 1.35 + 1.36 + A. Local file header: 1.37 + 1.38 + local file header signature 4 bytes (0x04034b50) 1.39 + version needed to extract 2 bytes 1.40 + general purpose bit flag 2 bytes 1.41 + compression method 2 bytes 1.42 + last mod file time 2 bytes 1.43 + last mod file date 2 bytes 1.44 + crc-32 4 bytes 1.45 + compressed size 4 bytes 1.46 + uncompressed size 4 bytes 1.47 + filename length 2 bytes 1.48 + extra field length 2 bytes 1.49 + 1.50 + filename (variable size) 1.51 + extra field (variable size) 1.52 + 1.53 + B. Data descriptor: 1.54 + 1.55 + crc-32 4 bytes 1.56 + compressed size 4 bytes 1.57 + uncompressed size 4 bytes 1.58 + 1.59 + This descriptor exists only if bit 3 of the general 1.60 + purpose bit flag is set (see below). It is byte aligned 1.61 + and immediately follows the last byte of compressed data. 1.62 + This descriptor is used only when it was not possible to 1.63 + seek in the output zip file, e.g., when the output zip file 1.64 + was standard output or a non seekable device. 1.65 + 1.66 + C. Central directory structure: 1.67 + 1.68 + [file header] . . . end of central dir record 1.69 + 1.70 + File header: 1.71 + 1.72 + central file header signature 4 bytes (0x02014b50) 1.73 + version made by 2 bytes 1.74 + version needed to extract 2 bytes 1.75 + general purpose bit flag 2 bytes 1.76 + compression method 2 bytes 1.77 + last mod file time 2 bytes 1.78 + last mod file date 2 bytes 1.79 + crc-32 4 bytes 1.80 + compressed size 4 bytes 1.81 + uncompressed size 4 bytes 1.82 + filename length 2 bytes 1.83 + extra field length 2 bytes 1.84 + file comment length 2 bytes 1.85 + disk number start 2 bytes 1.86 + internal file attributes 2 bytes 1.87 + external file attributes 4 bytes 1.88 + relative offset of local header 4 bytes 1.89 + 1.90 + filename (variable size) 1.91 + extra field (variable size) 1.92 + file comment (variable size) 1.93 + 1.94 + End of central dir record: 1.95 + 1.96 + end of central dir signature 4 bytes (0x06054b50) 1.97 + number of this disk 2 bytes 1.98 + number of the disk with the 1.99 + start of the central directory 2 bytes 1.100 + total number of entries in 1.101 + the central dir on this disk 2 bytes 1.102 + total number of entries in 1.103 + the central dir 2 bytes 1.104 + size of the central directory 4 bytes 1.105 + offset of start of central 1.106 + directory with respect to 1.107 + the starting disk number 4 bytes 1.108 + zipfile comment length 2 bytes 1.109 + zipfile comment (variable size) 1.110 + 1.111 + D. Explanation of fields: 1.112 + 1.113 + version made by (2 bytes) 1.114 + 1.115 + The upper byte indicates the compatibility of the file 1.116 + attribute information. If the external file attributes 1.117 + are compatible with MS-DOS and can be read by PKZIP for 1.118 + DOS version 2.04g then this value will be zero. If these 1.119 + attributes are not compatible, then this value will 1.120 + identify the host system on which the attributes are 1.121 + compatible. Software can use this information to determine 1.122 + the line record format for text files etc. The current 1.123 + mappings are: 1.124 + 1.125 + 0 - MS-DOS and OS/2 (FAT / VFAT / FAT32 file systems) 1.126 + 1 - Amiga 2 - VAX/VMS 1.127 + 3 - Unix 4 - VM/CMS 1.128 + 5 - Atari ST 6 - OS/2 H.P.F.S. 1.129 + 7 - Macintosh 8 - Z-System 1.130 + 9 - CP/M 10 - Windows NTFS 1.131 + 11 thru 255 - unused 1.132 + 1.133 + The lower byte indicates the version number of the 1.134 + software used to encode the file. The value/10 1.135 + indicates the major version number, and the value 1.136 + mod 10 is the minor version number. 1.137 + 1.138 + version needed to extract (2 bytes) 1.139 + 1.140 + The minimum software version needed to extract the 1.141 + file, mapped as above. 1.142 + 1.143 + general purpose bit flag: (2 bytes) 1.144 + 1.145 + Bit 0: If set, indicates that the file is encrypted. 1.146 + 1.147 + (For Method 6 - Imploding) 1.148 + Bit 1: If the compression method used was type 6, 1.149 + Imploding, then this bit, if set, indicates 1.150 + an 8K sliding dictionary was used. If clear, 1.151 + then a 4K sliding dictionary was used. 1.152 + Bit 2: If the compression method used was type 6, 1.153 + Imploding, then this bit, if set, indicates 1.154 + 3 Shannon-Fano trees were used to encode the 1.155 + sliding dictionary output. If clear, then 2 1.156 + Shannon-Fano trees were used. 1.157 + 1.158 + (For Method 8 - Deflating) 1.159 + Bit 2 Bit 1 1.160 + 0 0 Normal (-en) compression option was used. 1.161 + 0 1 Maximum (-ex) compression option was used. 1.162 + 1 0 Fast (-ef) compression option was used. 1.163 + 1 1 Super Fast (-es) compression option was used. 1.164 + 1.165 + Note: Bits 1 and 2 are undefined if the compression 1.166 + method is any other. 1.167 + 1.168 + Bit 3: If this bit is set, the fields crc-32, compressed 1.169 + size and uncompressed size are set to zero in the 1.170 + local header. The correct values are put in the 1.171 + data descriptor immediately following the compressed 1.172 + data. (Note: PKZIP version 2.04g for DOS only 1.173 + recognizes this bit for method 8 compression, newer 1.174 + versions of PKZIP recognize this bit for any 1.175 + compression method.) 1.176 + 1.177 + Bit 4: Reserved for use with method 8, for enhanced 1.178 + deflating. 1.179 + 1.180 + Bit 5: If this bit is set, this indicates that the file is 1.181 + compressed patched data. (Note: Requires PKZIP 1.182 + version 2.70 or greater) 1.183 + 1.184 + Bit 6: Currently unused. 1.185 + 1.186 + Bit 7: Currently unused. 1.187 + 1.188 + Bit 8: Currently unused. 1.189 + 1.190 + Bit 9: Currently unused. 1.191 + 1.192 + Bit 10: Currently unused. 1.193 + 1.194 + Bit 11: Currently unused. 1.195 + 1.196 + Bit 12: Reserved by PKWARE for enhanced compression. 1.197 + 1.198 + Bit 13: Reserved by PKWARE. 1.199 + 1.200 + Bit 14: Reserved by PKWARE. 1.201 + 1.202 + Bit 15: Reserved by PKWARE. 1.203 + 1.204 + compression method: (2 bytes) 1.205 + 1.206 + (see accompanying documentation for algorithm 1.207 + descriptions) 1.208 + 1.209 + 0 - The file is stored (no compression) 1.210 + 1 - The file is Shrunk 1.211 + 2 - The file is Reduced with compression factor 1 1.212 + 3 - The file is Reduced with compression factor 2 1.213 + 4 - The file is Reduced with compression factor 3 1.214 + 5 - The file is Reduced with compression factor 4 1.215 + 6 - The file is Imploded 1.216 + 7 - Reserved for Tokenizing compression algorithm 1.217 + 8 - The file is Deflated 1.218 + 9 - Reserved for enhanced Deflating 1.219 + 10 - PKWARE Date Compression Library Imploding 1.220 + 1.221 + date and time fields: (2 bytes each) 1.222 + 1.223 + The date and time are encoded in standard MS-DOS format. 1.224 + If input came from standard input, the date and time are 1.225 + those at which compression was started for this data. 1.226 + 1.227 + CRC-32: (4 bytes) 1.228 + 1.229 + The CRC-32 algorithm was generously contributed by 1.230 + David Schwaderer and can be found in his excellent 1.231 + book "C Programmers Guide to NetBIOS" published by 1.232 + Howard W. Sams & Co. Inc. The 'magic number' for 1.233 + the CRC is 0xdebb20e3. The proper CRC pre and post 1.234 + conditioning is used, meaning that the CRC register 1.235 + is pre-conditioned with all ones (a starting value 1.236 + of 0xffffffff) and the value is post-conditioned by 1.237 + taking the one's complement of the CRC residual. 1.238 + If bit 3 of the general purpose flag is set, this 1.239 + field is set to zero in the local header and the correct 1.240 + value is put in the data descriptor and in the central 1.241 + directory. 1.242 + 1.243 + compressed size: (4 bytes) 1.244 + uncompressed size: (4 bytes) 1.245 + 1.246 + The size of the file compressed and uncompressed, 1.247 + respectively. If bit 3 of the general purpose bit flag 1.248 + is set, these fields are set to zero in the local header 1.249 + and the correct values are put in the data descriptor and 1.250 + in the central directory. 1.251 + 1.252 + filename length: (2 bytes) 1.253 + extra field length: (2 bytes) 1.254 + file comment length: (2 bytes) 1.255 + 1.256 + The length of the filename, extra field, and comment 1.257 + fields respectively. The combined length of any 1.258 + directory record and these three fields should not 1.259 + generally exceed 65,535 bytes. If input came from standard 1.260 + input, the filename length is set to zero. 1.261 + 1.262 + disk number start: (2 bytes) 1.263 + 1.264 + The number of the disk on which this file begins. 1.265 + 1.266 + internal file attributes: (2 bytes) 1.267 + 1.268 + The lowest bit of this field indicates, if set, that 1.269 + the file is apparently an ASCII or text file. If not 1.270 + set, that the file apparently contains binary data. 1.271 + The remaining bits are unused in version 1.0. 1.272 + 1.273 + Bits 1 and 2 are reserved for use by PKWARE. 1.274 + 1.275 + external file attributes: (4 bytes) 1.276 + 1.277 + The mapping of the external attributes is 1.278 + host-system dependent (see 'version made by'). For 1.279 + MS-DOS, the low order byte is the MS-DOS directory 1.280 + attribute byte. If input came from standard input, this 1.281 + field is set to zero. 1.282 + 1.283 + relative offset of local header: (4 bytes) 1.284 + 1.285 + This is the offset from the start of the first disk on 1.286 + which this file appears, to where the local header should 1.287 + be found. 1.288 + 1.289 + filename: (Variable) 1.290 + 1.291 + The name of the file, with optional relative path. 1.292 + The path stored should not contain a drive or 1.293 + device letter, or a leading slash. All slashes 1.294 + should be forward slashes '/' as opposed to 1.295 + backwards slashes '\' for compatibility with Amiga 1.296 + and Unix file systems etc. If input came from standard 1.297 + input, there is no filename field. 1.298 + 1.299 + extra field: (Variable) 1.300 + 1.301 + This is for future expansion. If additional information 1.302 + needs to be stored in the future, it should be stored 1.303 + here. Earlier versions of the software can then safely 1.304 + skip this file, and find the next file or header. This 1.305 + field will be 0 length in version 1.0. 1.306 + 1.307 + In order to allow different programs and different types 1.308 + of information to be stored in the 'extra' field in .ZIP 1.309 + files, the following structure should be used for all 1.310 + programs storing data in this field: 1.311 + 1.312 + header1+data1 + header2+data2 . . . 1.313 + 1.314 + Each header should consist of: 1.315 + 1.316 + Header ID - 2 bytes 1.317 + Data Size - 2 bytes 1.318 + 1.319 + Note: all fields stored in Intel low-byte/high-byte order. 1.320 + 1.321 + The Header ID field indicates the type of data that is in 1.322 + the following data block. 1.323 + 1.324 + Header ID's of 0 thru 31 are reserved for use by PKWARE. 1.325 + The remaining ID's can be used by third party vendors for 1.326 + proprietary usage. 1.327 + 1.328 + The current Header ID mappings defined by PKWARE are: 1.329 + 1.330 + 0x0007 AV Info 1.331 + 0x0009 OS/2 1.332 + 0x000a NTFS 1.333 + 0x000c VAX/VMS 1.334 + 0x000d Unix 1.335 + 0x000f Patch Descriptor 1.336 + 1.337 + Several third party mappings commonly used are: 1.338 + 1.339 + 0x4b46 FWKCS MD5 (see below) 1.340 + 0x07c8 Macintosh 1.341 + 0x4341 Acorn/SparkFS 1.342 + 0x4453 Windows NT security descriptor (binary ACL) 1.343 + 0x4704 VM/CMS 1.344 + 0x470f MVS 1.345 + 0x4c41 OS/2 access control list (text ACL) 1.346 + 0x4d49 Info-ZIP VMS (VAX or Alpha) 1.347 + 0x5455 extended timestamp 1.348 + 0x5855 Info-ZIP Unix (original, also OS/2, NT, etc) 1.349 + 0x6542 BeOS/BeBox 1.350 + 0x756e ASi Unix 1.351 + 0x7855 Info-ZIP Unix (new) 1.352 + 0xfd4a SMS/QDOS 1.353 + 1.354 + The Data Size field indicates the size of the following 1.355 + data block. Programs can use this value to skip to the 1.356 + next header block, passing over any data blocks that are 1.357 + not of interest. 1.358 + 1.359 + Note: As stated above, the size of the entire .ZIP file 1.360 + header, including the filename, comment, and extra 1.361 + field should not exceed 64K in size. 1.362 + 1.363 + In case two different programs should appropriate the same 1.364 + Header ID value, it is strongly recommended that each 1.365 + program place a unique signature of at least two bytes in 1.366 + size (and preferably 4 bytes or bigger) at the start of 1.367 + each data area. Every program should verify that its 1.368 + unique signature is present, in addition to the Header ID 1.369 + value being correct, before assuming that it is a block of 1.370 + known type. 1.371 + 1.372 + -OS/2 Extra Field: 1.373 + 1.374 + The following is the layout of the OS/2 attributes "extra" 1.375 + block. (Last Revision 09/05/95) 1.376 + 1.377 + Note: all fields stored in Intel low-byte/high-byte order. 1.378 + 1.379 + Value Size Description 1.380 + ----- ---- ----------- 1.381 + (OS/2) 0x0009 2 bytes Tag for this "extra" block type 1.382 + TSize 2 bytes Size for the following data block 1.383 + BSize 4 bytes Uncompressed Block Size 1.384 + CType 2 bytes Compression type 1.385 + EACRC 4 bytes CRC value for uncompress block 1.386 + (var) variable Compressed block 1.387 + 1.388 + The OS/2 extended attribute structure (FEA2LIST) is 1.389 + compressed and then stored in it's entirety within this 1.390 + structure. There will only ever be one "block" of data in 1.391 + VarFields[]. 1.392 + 1.393 + -UNIX Extra Field: 1.394 + 1.395 + The following is the layout of the Unix "extra" block. 1.396 + Note: all fields are stored in Intel low-byte/high-byte 1.397 + order. 1.398 + 1.399 + Value Size Description 1.400 + ----- ---- ----------- 1.401 + (UNIX) 0x000d 2 bytes Tag for this "extra" block type 1.402 + TSize 2 bytes Size for the following data block 1.403 + Atime 4 bytes File last access time 1.404 + Mtime 4 bytes File last modification time 1.405 + Uid 2 bytes File user ID 1.406 + Gid 2 bytes File group ID 1.407 + (var) variable Variable length data field 1.408 + 1.409 + The variable length data field will contain file type 1.410 + specific data. Currently the only values allowed are 1.411 + the original "linked to" file names for hard or symbolic 1.412 + links. 1.413 + 1.414 + -VAX/VMS Extra Field: 1.415 + 1.416 + The following is the layout of the VAX/VMS attributes 1.417 + "extra" block. 1.418 + 1.419 + Note: all fields stored in Intel low-byte/high-byte order. 1.420 + 1.421 + Value Size Description 1.422 + ----- ---- ----------- 1.423 + (VMS) 0x000c 2 bytes Tag for this "extra" block type 1.424 + TSize 2 bytes Size of the total "extra" block 1.425 + CRC 4 bytes 32-bit CRC for remainder of the block 1.426 + Tag1 2 bytes VMS attribute tag value #1 1.427 + Size1 2 bytes Size of attribute #1, in bytes 1.428 + (var.) Size1 Attribute #1 data 1.429 + . 1.430 + . 1.431 + . 1.432 + TagN 2 bytes VMS attribute tage value #N 1.433 + SizeN 2 bytes Size of attribute #N, in bytes 1.434 + (var.) SizeN Attribute #N data 1.435 + 1.436 + Rules: 1.437 + 1.438 + 1. There will be one or more of attributes present, which 1.439 + will each be preceded by the above TagX & SizeX values. 1.440 + These values are identical to the ATR$C_XXXX and 1.441 + ATR$S_XXXX constants which are defined in ATR.H under 1.442 + VMS C. Neither of these values will ever be zero. 1.443 + 1.444 + 2. No word alignment or padding is performed. 1.445 + 1.446 + 3. A well-behaved PKZIP/VMS program should never produce 1.447 + more than one sub-block with the same TagX value. Also, 1.448 + there will never be more than one "extra" block of type 1.449 + 0x000c in a particular directory record. 1.450 + 1.451 + -NTFS Extra Field: 1.452 + 1.453 + The following is the layout of the NTFS attributes 1.454 + "extra" block. 1.455 + 1.456 + Note: all fields stored in Intel low-byte/high-byte order. 1.457 + 1.458 + Value Size Description 1.459 + ----- ---- ----------- 1.460 + (NTFS) 0x000a 2 bytes Tag for this "extra" block type 1.461 + TSize 2 bytes Size of the total "extra" block 1.462 + Reserved 4 bytes Reserved for future use 1.463 + Tag1 2 bytes NTFS attribute tag value #1 1.464 + Size1 2 bytes Size of attribute #1, in bytes 1.465 + (var.) Size1 Attribute #1 data 1.466 + . 1.467 + . 1.468 + . 1.469 + TagN 2 bytes NTFS attribute tage value #N 1.470 + SizeN 2 bytes Size of attribute #N, in bytes 1.471 + (var.) SizeN Attribute #N data 1.472 + 1.473 + For NTFS, values for Tag1 through TagN are as follows: 1.474 + (currently only one set of attributes is defined for NTFS) 1.475 + 1.476 + Tag Size Description 1.477 + ----- ---- ----------- 1.478 + 0x0001 2 bytes Tag for attribute #1 1.479 + Size1 2 bytes Size of attribute #1, in bytes 1.480 + Mtime 8 bytes File last modification time 1.481 + Atime 8 bytes File last access time 1.482 + Ctime 8 bytes File creation time 1.483 + 1.484 + -PATCH Descriptor Extra Field: 1.485 + 1.486 + The following is the layout of the Patch Descriptor "extra" 1.487 + block. 1.488 + 1.489 + Note: all fields stored in Intel low-byte/high-byte order. 1.490 + 1.491 + Value Size Description 1.492 + ----- ---- ----------- 1.493 + (Patch) 0x000f 2 bytes Tag for this "extra" block type 1.494 + TSize 2 bytes Size of the total "extra" block 1.495 + Version 2 bytes Version of the descriptor 1.496 + Flags 4 bytes Actions and reactions (see below) 1.497 + OldSize 4 bytes Size of the file about to be patched 1.498 + OldCRC 4 bytes 32-bit CRC of the file to be patched 1.499 + NewSize 4 bytes Size of the resulting file 1.500 + NewCRC 4 bytes 32-bit CRC of the resulting file 1.501 + 1.502 + Actions and reactions 1.503 + 1.504 + Bits Description 1.505 + ---- ---------------- 1.506 + 0 Use for autodetection 1.507 + 1 Treat as selfpatch 1.508 + 2-3 RESERVED 1.509 + 4-5 Action (see below) 1.510 + 6-7 RESERVED 1.511 + 8-9 Reaction (see below) to absent file 1.512 + 10-11 Reaction (see below) to newer file 1.513 + 12-13 Reaction (see below) to unknown file 1.514 + 14-15 RESERVED 1.515 + 16-31 RESERVED 1.516 + 1.517 + Actions 1.518 + 1.519 + Action Value 1.520 + ------ ----- 1.521 + none 0 1.522 + add 1 1.523 + delete 2 1.524 + patch 3 1.525 + 1.526 + Reactions 1.527 + 1.528 + Reaction Value 1.529 + -------- ----- 1.530 + ask 0 1.531 + skip 1 1.532 + ignore 2 1.533 + fail 3 1.534 + 1.535 + - FWKCS MD5 Extra Field: 1.536 + 1.537 + The FWKCS Contents_Signature System, used in 1.538 + automatically identifying files independent of filename, 1.539 + optionally adds and uses an extra field to support the 1.540 + rapid creation of an enhanced contents_signature: 1.541 + 1.542 + Header ID = 0x4b46 1.543 + Data Size = 0x0013 1.544 + Preface = 'M','D','5' 1.545 + followed by 16 bytes containing the uncompressed file's 1.546 + 128_bit MD5 hash(1), low byte first. 1.547 + 1.548 + When FWKCS revises a zipfile central directory to add 1.549 + this extra field for a file, it also replaces the 1.550 + central directory entry for that file's uncompressed 1.551 + filelength with a measured value. 1.552 + 1.553 + FWKCS provides an option to strip this extra field, if 1.554 + present, from a zipfile central directory. In adding 1.555 + this extra field, FWKCS preserves Zipfile Authenticity 1.556 + Verification; if stripping this extra field, FWKCS 1.557 + preserves all versions of AV through PKZIP version 2.04g. 1.558 + 1.559 + FWKCS, and FWKCS Contents_Signature System, are 1.560 + trademarks of Frederick W. Kantor. 1.561 + 1.562 + (1) R. Rivest, RFC1321.TXT, MIT Laboratory for Computer 1.563 + Science and RSA Data Security, Inc., April 1992. 1.564 + ll.76-77: "The MD5 algorithm is being placed in the 1.565 + public domain for review and possible adoption as a 1.566 + standard." 1.567 + 1.568 + file comment: (Variable) 1.569 + 1.570 + The comment for this file. 1.571 + 1.572 + number of this disk: (2 bytes) 1.573 + 1.574 + The number of this disk, which contains central 1.575 + directory end record. 1.576 + 1.577 + number of the disk with the start of the central 1.578 + directory: (2 bytes) 1.579 + 1.580 + The number of the disk on which the central 1.581 + directory starts. 1.582 + 1.583 + total number of entries in the central dir on 1.584 + this disk: (2 bytes) 1.585 + 1.586 + The number of central directory entries on this disk. 1.587 + 1.588 + total number of entries in the central dir: (2 bytes) 1.589 + 1.590 + The total number of files in the zipfile. 1.591 + 1.592 + size of the central directory: (4 bytes) 1.593 + 1.594 + The size (in bytes) of the entire central directory. 1.595 + 1.596 + offset of start of central directory with respect to 1.597 + the starting disk number: (4 bytes) 1.598 + 1.599 + Offset of the start of the central directory on the 1.600 + disk on which the central directory starts. 1.601 + 1.602 + zipfile comment length: (2 bytes) 1.603 + 1.604 + The length of the comment for this zipfile. 1.605 + 1.606 + zipfile comment: (Variable) 1.607 + 1.608 + The comment for this zipfile. 1.609 + 1.610 + D. General notes: 1.611 + 1.612 + 1) All fields unless otherwise noted are unsigned and stored 1.613 + in Intel low-byte:high-byte, low-word:high-word order. 1.614 + 1.615 + 2) String fields are not null terminated, since the 1.616 + length is given explicitly. 1.617 + 1.618 + 3) Local headers should not span disk boundaries. Also, even 1.619 + though the central directory can span disk boundaries, no 1.620 + single record in the central directory should be split 1.621 + across disks. 1.622 + 1.623 + 4) The entries in the central directory may not necessarily 1.624 + be in the same order that files appear in the zipfile. 1.625 + 1.626 +UnShrinking - Method 1 1.627 +---------------------- 1.628 + 1.629 +Shrinking is a Dynamic Ziv-Lempel-Welch compression algorithm 1.630 +with partial clearing. The initial code size is 9 bits, and 1.631 +the maximum code size is 13 bits. Shrinking differs from 1.632 +conventional Dynamic Ziv-Lempel-Welch implementations in several 1.633 +respects: 1.634 + 1.635 +1) The code size is controlled by the compressor, and is not 1.636 + automatically increased when codes larger than the current 1.637 + code size are created (but not necessarily used). When 1.638 + the decompressor encounters the code sequence 256 1.639 + (decimal) followed by 1, it should increase the code size 1.640 + read from the input stream to the next bit size. No 1.641 + blocking of the codes is performed, so the next code at 1.642 + the increased size should be read from the input stream 1.643 + immediately after where the previous code at the smaller 1.644 + bit size was read. Again, the decompressor should not 1.645 + increase the code size used until the sequence 256,1 is 1.646 + encountered. 1.647 + 1.648 +2) When the table becomes full, total clearing is not 1.649 + performed. Rather, when the compressor emits the code 1.650 + sequence 256,2 (decimal), the decompressor should clear 1.651 + all leaf nodes from the Ziv-Lempel tree, and continue to 1.652 + use the current code size. The nodes that are cleared 1.653 + from the Ziv-Lempel tree are then re-used, with the lowest 1.654 + code value re-used first, and the highest code value 1.655 + re-used last. The compressor can emit the sequence 256,2 1.656 + at any time. 1.657 + 1.658 +Expanding - Methods 2-5 1.659 +----------------------- 1.660 + 1.661 +The Reducing algorithm is actually a combination of two 1.662 +distinct algorithms. The first algorithm compresses repeated 1.663 +byte sequences, and the second algorithm takes the compressed 1.664 +stream from the first algorithm and applies a probabilistic 1.665 +compression method. 1.666 + 1.667 +The probabilistic compression stores an array of 'follower 1.668 +sets' S(j), for j=0 to 255, corresponding to each possible 1.669 +ASCII character. Each set contains between 0 and 32 1.670 +characters, to be denoted as S(j)[0],...,S(j)[m], where m<32. 1.671 +The sets are stored at the beginning of the data area for a 1.672 +Reduced file, in reverse order, with S(255) first, and S(0) 1.673 +last. 1.674 + 1.675 +The sets are encoded as { N(j), S(j)[0],...,S(j)[N(j)-1] }, 1.676 +where N(j) is the size of set S(j). N(j) can be 0, in which 1.677 +case the follower set for S(j) is empty. Each N(j) value is 1.678 +encoded in 6 bits, followed by N(j) eight bit character values 1.679 +corresponding to S(j)[0] to S(j)[N(j)-1] respectively. If 1.680 +N(j) is 0, then no values for S(j) are stored, and the value 1.681 +for N(j-1) immediately follows. 1.682 + 1.683 +Immediately after the follower sets, is the compressed data 1.684 +stream. The compressed data stream can be interpreted for the 1.685 +probabilistic decompression as follows: 1.686 + 1.687 +let Last-Character <- 0. 1.688 +loop until done 1.689 + if the follower set S(Last-Character) is empty then 1.690 + read 8 bits from the input stream, and copy this 1.691 + value to the output stream. 1.692 + otherwise if the follower set S(Last-Character) is non-empty then 1.693 + read 1 bit from the input stream. 1.694 + if this bit is not zero then 1.695 + read 8 bits from the input stream, and copy this 1.696 + value to the output stream. 1.697 + otherwise if this bit is zero then 1.698 + read B(N(Last-Character)) bits from the input 1.699 + stream, and assign this value to I. 1.700 + Copy the value of S(Last-Character)[I] to the 1.701 + output stream. 1.702 + 1.703 + assign the last value placed on the output stream to 1.704 + Last-Character. 1.705 +end loop 1.706 + 1.707 +B(N(j)) is defined as the minimal number of bits required to 1.708 +encode the value N(j)-1. 1.709 + 1.710 +The decompressed stream from above can then be expanded to 1.711 +re-create the original file as follows: 1.712 + 1.713 +let State <- 0. 1.714 + 1.715 +loop until done 1.716 + read 8 bits from the input stream into C. 1.717 + case State of 1.718 + 0: if C is not equal to DLE (144 decimal) then 1.719 + copy C to the output stream. 1.720 + otherwise if C is equal to DLE then 1.721 + let State <- 1. 1.722 + 1.723 + 1: if C is non-zero then 1.724 + let V <- C. 1.725 + let Len <- L(V) 1.726 + let State <- F(Len). 1.727 + otherwise if C is zero then 1.728 + copy the value 144 (decimal) to the output stream. 1.729 + let State <- 0 1.730 + 1.731 + 2: let Len <- Len + C 1.732 + let State <- 3. 1.733 + 1.734 + 3: move backwards D(V,C) bytes in the output stream 1.735 + (if this position is before the start of the output 1.736 + stream, then assume that all the data before the 1.737 + start of the output stream is filled with zeros). 1.738 + copy Len+3 bytes from this position to the output stream. 1.739 + let State <- 0. 1.740 + end case 1.741 +end loop 1.742 + 1.743 +The functions F,L, and D are dependent on the 'compression 1.744 +factor', 1 through 4, and are defined as follows: 1.745 + 1.746 +For compression factor 1: 1.747 + L(X) equals the lower 7 bits of X. 1.748 + F(X) equals 2 if X equals 127 otherwise F(X) equals 3. 1.749 + D(X,Y) equals the (upper 1 bit of X) * 256 + Y + 1. 1.750 +For compression factor 2: 1.751 + L(X) equals the lower 6 bits of X. 1.752 + F(X) equals 2 if X equals 63 otherwise F(X) equals 3. 1.753 + D(X,Y) equals the (upper 2 bits of X) * 256 + Y + 1. 1.754 +For compression factor 3: 1.755 + L(X) equals the lower 5 bits of X. 1.756 + F(X) equals 2 if X equals 31 otherwise F(X) equals 3. 1.757 + D(X,Y) equals the (upper 3 bits of X) * 256 + Y + 1. 1.758 +For compression factor 4: 1.759 + L(X) equals the lower 4 bits of X. 1.760 + F(X) equals 2 if X equals 15 otherwise F(X) equals 3. 1.761 + D(X,Y) equals the (upper 4 bits of X) * 256 + Y + 1. 1.762 + 1.763 +Imploding - Method 6 1.764 +-------------------- 1.765 + 1.766 +The Imploding algorithm is actually a combination of two distinct 1.767 +algorithms. The first algorithm compresses repeated byte 1.768 +sequences using a sliding dictionary. The second algorithm is 1.769 +used to compress the encoding of the sliding dictionary output, 1.770 +using multiple Shannon-Fano trees. 1.771 + 1.772 +The Imploding algorithm can use a 4K or 8K sliding dictionary 1.773 +size. The dictionary size used can be determined by bit 1 in the 1.774 +general purpose flag word; a 0 bit indicates a 4K dictionary 1.775 +while a 1 bit indicates an 8K dictionary. 1.776 + 1.777 +The Shannon-Fano trees are stored at the start of the compressed 1.778 +file. The number of trees stored is defined by bit 2 in the 1.779 +general purpose flag word; a 0 bit indicates two trees stored, a 1.780 +1 bit indicates three trees are stored. If 3 trees are stored, 1.781 +the first Shannon-Fano tree represents the encoding of the 1.782 +Literal characters, the second tree represents the encoding of 1.783 +the Length information, the third represents the encoding of the 1.784 +Distance information. When 2 Shannon-Fano trees are stored, the 1.785 +Length tree is stored first, followed by the Distance tree. 1.786 + 1.787 +The Literal Shannon-Fano tree, if present is used to represent 1.788 +the entire ASCII character set, and contains 256 values. This 1.789 +tree is used to compress any data not compressed by the sliding 1.790 +dictionary algorithm. When this tree is present, the Minimum 1.791 +Match Length for the sliding dictionary is 3. If this tree is 1.792 +not present, the Minimum Match Length is 2. 1.793 + 1.794 +The Length Shannon-Fano tree is used to compress the Length part 1.795 +of the (length,distance) pairs from the sliding dictionary 1.796 +output. The Length tree contains 64 values, ranging from the 1.797 +Minimum Match Length, to 63 plus the Minimum Match Length. 1.798 + 1.799 +The Distance Shannon-Fano tree is used to compress the Distance 1.800 +part of the (length,distance) pairs from the sliding dictionary 1.801 +output. The Distance tree contains 64 values, ranging from 0 to 1.802 +63, representing the upper 6 bits of the distance value. The 1.803 +distance values themselves will be between 0 and the sliding 1.804 +dictionary size, either 4K or 8K. 1.805 + 1.806 +The Shannon-Fano trees themselves are stored in a compressed 1.807 +format. The first byte of the tree data represents the number of 1.808 +bytes of data representing the (compressed) Shannon-Fano tree 1.809 +minus 1. The remaining bytes represent the Shannon-Fano tree 1.810 +data encoded as: 1.811 + 1.812 + High 4 bits: Number of values at this bit length + 1. (1 - 16) 1.813 + Low 4 bits: Bit Length needed to represent value + 1. (1 - 16) 1.814 + 1.815 +The Shannon-Fano codes can be constructed from the bit lengths 1.816 +using the following algorithm: 1.817 + 1.818 +1) Sort the Bit Lengths in ascending order, while retaining the 1.819 + order of the original lengths stored in the file. 1.820 + 1.821 +2) Generate the Shannon-Fano trees: 1.822 + 1.823 + Code <- 0 1.824 + CodeIncrement <- 0 1.825 + LastBitLength <- 0 1.826 + i <- number of Shannon-Fano codes - 1 (either 255 or 63) 1.827 + 1.828 + loop while i >= 0 1.829 + Code = Code + CodeIncrement 1.830 + if BitLength(i) <> LastBitLength then 1.831 + LastBitLength=BitLength(i) 1.832 + CodeIncrement = 1 shifted left (16 - LastBitLength) 1.833 + ShannonCode(i) = Code 1.834 + i <- i - 1 1.835 + end loop 1.836 + 1.837 +3) Reverse the order of all the bits in the above ShannonCode() 1.838 + vector, so that the most significant bit becomes the least 1.839 + significant bit. For example, the value 0x1234 (hex) would 1.840 + become 0x2C48 (hex). 1.841 + 1.842 +4) Restore the order of Shannon-Fano codes as originally stored 1.843 + within the file. 1.844 + 1.845 +Example: 1.846 + 1.847 + This example will show the encoding of a Shannon-Fano tree 1.848 + of size 8. Notice that the actual Shannon-Fano trees used 1.849 + for Imploding are either 64 or 256 entries in size. 1.850 + 1.851 +Example: 0x02, 0x42, 0x01, 0x13 1.852 + 1.853 + The first byte indicates 3 values in this table. Decoding the 1.854 + bytes: 1.855 + 0x42 = 5 codes of 3 bits long 1.856 + 0x01 = 1 code of 2 bits long 1.857 + 0x13 = 2 codes of 4 bits long 1.858 + 1.859 + This would generate the original bit length array of: 1.860 + (3, 3, 3, 3, 3, 2, 4, 4) 1.861 + 1.862 + There are 8 codes in this table for the values 0 thru 7. Using 1.863 + the algorithm to obtain the Shannon-Fano codes produces: 1.864 + 1.865 + Reversed Order Original 1.866 +Val Sorted Constructed Code Value Restored Length 1.867 +--- ------ ----------------- -------- -------- ------ 1.868 +0: 2 1100000000000000 11 101 3 1.869 +1: 3 1010000000000000 101 001 3 1.870 +2: 3 1000000000000000 001 110 3 1.871 +3: 3 0110000000000000 110 010 3 1.872 +4: 3 0100000000000000 010 100 3 1.873 +5: 3 0010000000000000 100 11 2 1.874 +6: 4 0001000000000000 1000 1000 4 1.875 +7: 4 0000000000000000 0000 0000 4 1.876 + 1.877 +The values in the Val, Order Restored and Original Length columns 1.878 +now represent the Shannon-Fano encoding tree that can be used for 1.879 +decoding the Shannon-Fano encoded data. How to parse the 1.880 +variable length Shannon-Fano values from the data stream is beyond 1.881 +the scope of this document. (See the references listed at the end of 1.882 +this document for more information.) However, traditional decoding 1.883 +schemes used for Huffman variable length decoding, such as the 1.884 +Greenlaw algorithm, can be successfully applied. 1.885 + 1.886 +The compressed data stream begins immediately after the 1.887 +compressed Shannon-Fano data. The compressed data stream can be 1.888 +interpreted as follows: 1.889 + 1.890 +loop until done 1.891 + read 1 bit from input stream. 1.892 + 1.893 + if this bit is non-zero then (encoded data is literal data) 1.894 + if Literal Shannon-Fano tree is present 1.895 + read and decode character using Literal Shannon-Fano tree. 1.896 + otherwise 1.897 + read 8 bits from input stream. 1.898 + copy character to the output stream. 1.899 + otherwise (encoded data is sliding dictionary match) 1.900 + if 8K dictionary size 1.901 + read 7 bits for offset Distance (lower 7 bits of offset). 1.902 + otherwise 1.903 + read 6 bits for offset Distance (lower 6 bits of offset). 1.904 + 1.905 + using the Distance Shannon-Fano tree, read and decode the 1.906 + upper 6 bits of the Distance value. 1.907 + 1.908 + using the Length Shannon-Fano tree, read and decode 1.909 + the Length value. 1.910 + 1.911 + Length <- Length + Minimum Match Length 1.912 + 1.913 + if Length = 63 + Minimum Match Length 1.914 + read 8 bits from the input stream, 1.915 + add this value to Length. 1.916 + 1.917 + move backwards Distance+1 bytes in the output stream, and 1.918 + copy Length characters from this position to the output 1.919 + stream. (if this position is before the start of the output 1.920 + stream, then assume that all the data before the start of 1.921 + the output stream is filled with zeros). 1.922 +end loop 1.923 + 1.924 +Tokenizing - Method 7 1.925 +-------------------- 1.926 + 1.927 +This method is not used by PKZIP. 1.928 + 1.929 +Deflating - Method 8 1.930 +----------------- 1.931 + 1.932 +The Deflate algorithm is similar to the Implode algorithm using 1.933 +a sliding dictionary of up to 32K with secondary compression 1.934 +from Huffman/Shannon-Fano codes. 1.935 + 1.936 +The compressed data is stored in blocks with a header describing 1.937 +the block and the Huffman codes used in the data block. The header 1.938 +format is as follows: 1.939 + 1.940 + Bit 0: Last Block bit This bit is set to 1 if this is the last 1.941 + compressed block in the data. 1.942 + Bits 1-2: Block type 1.943 + 00 (0) - Block is stored - All stored data is byte aligned. 1.944 + Skip bits until next byte, then next word = block 1.945 + length, followed by the ones compliment of the block 1.946 + length word. Remaining data in block is the stored 1.947 + data. 1.948 + 1.949 + 01 (1) - Use fixed Huffman codes for literal and distance codes. 1.950 + Lit Code Bits Dist Code Bits 1.951 + --------- ---- --------- ---- 1.952 + 0 - 143 8 0 - 31 5 1.953 + 144 - 255 9 1.954 + 256 - 279 7 1.955 + 280 - 287 8 1.956 + 1.957 + Literal codes 286-287 and distance codes 30-31 are 1.958 + never used but participate in the huffman construction. 1.959 + 1.960 + 10 (2) - Dynamic Huffman codes. (See expanding Huffman codes) 1.961 + 1.962 + 11 (3) - Reserved - Flag a "Error in compressed data" if seen. 1.963 + 1.964 +Expanding Huffman Codes 1.965 +----------------------- 1.966 +If the data block is stored with dynamic Huffman codes, the Huffman 1.967 +codes are sent in the following compressed format: 1.968 + 1.969 + 5 Bits: # of Literal codes sent - 256 (256 - 286) 1.970 + All other codes are never sent. 1.971 + 5 Bits: # of Dist codes - 1 (1 - 32) 1.972 + 4 Bits: # of Bit Length codes - 3 (3 - 19) 1.973 + 1.974 +The Huffman codes are sent as bit lengths and the codes are built as 1.975 +described in the implode algorithm. The bit lengths themselves are 1.976 +compressed with Huffman codes. There are 19 bit length codes: 1.977 + 1.978 + 0 - 15: Represent bit lengths of 0 - 15 1.979 + 16: Copy the previous bit length 3 - 6 times. 1.980 + The next 2 bits indicate repeat length (0 = 3, ... ,3 = 6) 1.981 + Example: Codes 8, 16 (+2 bits 11), 16 (+2 bits 10) will 1.982 + expand to 12 bit lengths of 8 (1 + 6 + 5) 1.983 + 17: Repeat a bit length of 0 for 3 - 10 times. (3 bits of length) 1.984 + 18: Repeat a bit length of 0 for 11 - 138 times (7 bits of length) 1.985 + 1.986 +The lengths of the bit length codes are sent packed 3 bits per value 1.987 +(0 - 7) in the following order: 1.988 + 1.989 + 16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15 1.990 + 1.991 +The Huffman codes should be built as described in the Implode algorithm 1.992 +except codes are assigned starting at the shortest bit length, i.e. the 1.993 +shortest code should be all 0's rather than all 1's. Also, codes with 1.994 +a bit length of zero do not participate in the tree construction. The 1.995 +codes are then used to decode the bit lengths for the literal and 1.996 +distance tables. 1.997 + 1.998 +The bit lengths for the literal tables are sent first with the number 1.999 +of entries sent described by the 5 bits sent earlier. There are up 1.1000 +to 286 literal characters; the first 256 represent the respective 8 1.1001 +bit character, code 256 represents the End-Of-Block code, the remaining 1.1002 +29 codes represent copy lengths of 3 thru 258. There are up to 30 1.1003 +distance codes representing distances from 1 thru 32k as described 1.1004 +below. 1.1005 + 1.1006 + Length Codes 1.1007 + ------------ 1.1008 + Extra Extra Extra Extra 1.1009 + Code Bits Length Code Bits Lengths Code Bits Lengths Code Bits Length(s) 1.1010 + ---- ---- ------ ---- ---- ------- ---- ---- ------- ---- ---- --------- 1.1011 + 257 0 3 265 1 11,12 273 3 35-42 281 5 131-162 1.1012 + 258 0 4 266 1 13,14 274 3 43-50 282 5 163-194 1.1013 + 259 0 5 267 1 15,16 275 3 51-58 283 5 195-226 1.1014 + 260 0 6 268 1 17,18 276 3 59-66 284 5 227-257 1.1015 + 261 0 7 269 2 19-22 277 4 67-82 285 0 258 1.1016 + 262 0 8 270 2 23-26 278 4 83-98 1.1017 + 263 0 9 271 2 27-30 279 4 99-114 1.1018 + 264 0 10 272 2 31-34 280 4 115-130 1.1019 + 1.1020 + Distance Codes 1.1021 + -------------- 1.1022 + Extra Extra Extra Extra 1.1023 + Code Bits Dist Code Bits Dist Code Bits Distance Code Bits Distance 1.1024 + ---- ---- ---- ---- ---- ------ ---- ---- -------- ---- ---- -------- 1.1025 + 0 0 1 8 3 17-24 16 7 257-384 24 11 4097-6144 1.1026 + 1 0 2 9 3 25-32 17 7 385-512 25 11 6145-8192 1.1027 + 2 0 3 10 4 33-48 18 8 513-768 26 12 8193-12288 1.1028 + 3 0 4 11 4 49-64 19 8 769-1024 27 12 12289-16384 1.1029 + 4 1 5,6 12 5 65-96 20 9 1025-1536 28 13 16385-24576 1.1030 + 5 1 7,8 13 5 97-128 21 9 1537-2048 29 13 24577-32768 1.1031 + 6 2 9-12 14 6 129-192 22 10 2049-3072 1.1032 + 7 2 13-16 15 6 193-256 23 10 3073-4096 1.1033 + 1.1034 +The compressed data stream begins immediately after the 1.1035 +compressed header data. The compressed data stream can be 1.1036 +interpreted as follows: 1.1037 + 1.1038 +do 1.1039 + read header from input stream. 1.1040 + 1.1041 + if stored block 1.1042 + skip bits until byte aligned 1.1043 + read count and 1's compliment of count 1.1044 + copy count bytes data block 1.1045 + otherwise 1.1046 + loop until end of block code sent 1.1047 + decode literal character from input stream 1.1048 + if literal < 256 1.1049 + copy character to the output stream 1.1050 + otherwise 1.1051 + if literal = end of block 1.1052 + break from loop 1.1053 + otherwise 1.1054 + decode distance from input stream 1.1055 + 1.1056 + move backwards distance bytes in the output stream, and 1.1057 + copy length characters from this position to the output 1.1058 + stream. 1.1059 + end loop 1.1060 +while not last block 1.1061 + 1.1062 +if data descriptor exists 1.1063 + skip bits until byte aligned 1.1064 + read crc and sizes 1.1065 +endif 1.1066 + 1.1067 +Decryption 1.1068 +---------- 1.1069 + 1.1070 +The encryption used in PKZIP was generously supplied by Roger 1.1071 +Schlafly. PKWARE is grateful to Mr. Schlafly for his expert 1.1072 +help and advice in the field of data encryption. 1.1073 + 1.1074 +PKZIP encrypts the compressed data stream. Encrypted files must 1.1075 +be decrypted before they can be extracted. 1.1076 + 1.1077 +Each encrypted file has an extra 12 bytes stored at the start of 1.1078 +the data area defining the encryption header for that file. The 1.1079 +encryption header is originally set to random values, and then 1.1080 +itself encrypted, using three, 32-bit keys. The key values are 1.1081 +initialized using the supplied encryption password. After each byte 1.1082 +is encrypted, the keys are then updated using pseudo-random number 1.1083 +generation techniques in combination with the same CRC-32 algorithm 1.1084 +used in PKZIP and described elsewhere in this document. 1.1085 + 1.1086 +The following is the basic steps required to decrypt a file: 1.1087 + 1.1088 +1) Initialize the three 32-bit keys with the password. 1.1089 +2) Read and decrypt the 12-byte encryption header, further 1.1090 + initializing the encryption keys. 1.1091 +3) Read and decrypt the compressed data stream using the 1.1092 + encryption keys. 1.1093 + 1.1094 +Step 1 - Initializing the encryption keys 1.1095 +----------------------------------------- 1.1096 + 1.1097 +Key(0) <- 305419896 1.1098 +Key(1) <- 591751049 1.1099 +Key(2) <- 878082192 1.1100 + 1.1101 +loop for i <- 0 to length(password)-1 1.1102 + update_keys(password(i)) 1.1103 +end loop 1.1104 + 1.1105 +Where update_keys() is defined as: 1.1106 + 1.1107 +update_keys(char): 1.1108 + Key(0) <- crc32(key(0),char) 1.1109 + Key(1) <- Key(1) + (Key(0) & 000000ffH) 1.1110 + Key(1) <- Key(1) * 134775813 + 1 1.1111 + Key(2) <- crc32(key(2),key(1) >> 24) 1.1112 +end update_keys 1.1113 + 1.1114 +Where crc32(old_crc,char) is a routine that given a CRC value and a 1.1115 +character, returns an updated CRC value after applying the CRC-32 1.1116 +algorithm described elsewhere in this document. 1.1117 + 1.1118 +Step 2 - Decrypting the encryption header 1.1119 +----------------------------------------- 1.1120 + 1.1121 +The purpose of this step is to further initialize the encryption 1.1122 +keys, based on random data, to render a plaintext attack on the 1.1123 +data ineffective. 1.1124 + 1.1125 +Read the 12-byte encryption header into Buffer, in locations 1.1126 +Buffer(0) thru Buffer(11). 1.1127 + 1.1128 +loop for i <- 0 to 11 1.1129 + C <- buffer(i) ^ decrypt_byte() 1.1130 + update_keys(C) 1.1131 + buffer(i) <- C 1.1132 +end loop 1.1133 + 1.1134 +Where decrypt_byte() is defined as: 1.1135 + 1.1136 +unsigned char decrypt_byte() 1.1137 + local unsigned short temp 1.1138 + temp <- Key(2) | 2 1.1139 + decrypt_byte <- (temp * (temp ^ 1)) >> 8 1.1140 +end decrypt_byte 1.1141 + 1.1142 +After the header is decrypted, the last 1 or 2 bytes in Buffer 1.1143 +should be the high-order word/byte of the CRC for the file being 1.1144 +decrypted, stored in Intel low-byte/high-byte order. Versions of 1.1145 +PKZIP prior to 2.0 used a 2 byte CRC check; a 1 byte CRC check is 1.1146 +used on versions after 2.0. This can be used to test if the password 1.1147 +supplied is correct or not. 1.1148 + 1.1149 +Step 3 - Decrypting the compressed data stream 1.1150 +---------------------------------------------- 1.1151 + 1.1152 +The compressed data stream can be decrypted as follows: 1.1153 + 1.1154 +loop until done 1.1155 + read a character into C 1.1156 + Temp <- C ^ decrypt_byte() 1.1157 + update_keys(temp) 1.1158 + output Temp 1.1159 +end loop 1.1160 + 1.1161 +In addition to the above mentioned contributors to PKZIP and PKUNZIP, 1.1162 +I would like to extend special thanks to Robert Mahoney for suggesting 1.1163 +the extension .ZIP for this software. 1.1164 + 1.1165 +References: 1.1166 + 1.1167 + Fiala, Edward R., and Greene, Daniel H., "Data compression with 1.1168 + finite windows", Communications of the ACM, Volume 32, Number 4, 1.1169 + April 1989, pages 490-505. 1.1170 + 1.1171 + Held, Gilbert, "Data Compression, Techniques and Applications, 1.1172 + Hardware and Software Considerations", John Wiley & Sons, 1987. 1.1173 + 1.1174 + Huffman, D.A., "A method for the construction of minimum-redundancy 1.1175 + codes", Proceedings of the IRE, Volume 40, Number 9, September 1952, 1.1176 + pages 1098-1101. 1.1177 + 1.1178 + Nelson, Mark, "LZW Data Compression", Dr. Dobbs Journal, Volume 14, 1.1179 + Number 10, October 1989, pages 29-37. 1.1180 + 1.1181 + Nelson, Mark, "The Data Compression Book", M&T Books, 1991. 1.1182 + 1.1183 + Storer, James A., "Data Compression, Methods and Theory", 1.1184 + Computer Science Press, 1988 1.1185 + 1.1186 + Welch, Terry, "A Technique for High-Performance Data Compression", 1.1187 + IEEE Computer, Volume 17, Number 6, June 1984, pages 8-19. 1.1188 + 1.1189 + Ziv, J. and Lempel, A., "A universal algorithm for sequential data 1.1190 + compression", Communications of the ACM, Volume 30, Number 6, 1.1191 + June 1987, pages 520-540. 1.1192 + 1.1193 + Ziv, J. and Lempel, A., "Compression of individual sequences via 1.1194 + variable-rate coding", IEEE Transactions on Information Theory, 1.1195 + Volume 24, Number 5, September 1978, pages 530-536.