|
1 Revised: 03/01/1999 |
|
2 |
|
3 Disclaimer |
|
4 ---------- |
|
5 |
|
6 Although PKWARE will attempt to supply current and accurate |
|
7 information relating to its file formats, algorithms, and the |
|
8 subject programs, the possibility of error can not be eliminated. |
|
9 PKWARE therefore expressly disclaims any warranty that the |
|
10 information contained in the associated materials relating to the |
|
11 subject programs and/or the format of the files created or |
|
12 accessed by the subject programs and/or the algorithms used by |
|
13 the subject programs, or any other matter, is current, correct or |
|
14 accurate as delivered. Any risk of damage due to any possible |
|
15 inaccurate information is assumed by the user of the information. |
|
16 Furthermore, the information relating to the subject programs |
|
17 and/or the file formats created or accessed by the subject |
|
18 programs and/or the algorithms used by the subject programs is |
|
19 subject to change without notice. |
|
20 |
|
21 General Format of a ZIP file |
|
22 ---------------------------- |
|
23 |
|
24 Files stored in arbitrary order. Large zipfiles can span multiple |
|
25 diskette media. |
|
26 |
|
27 Overall zipfile format: |
|
28 |
|
29 [local file header + file data + data_descriptor] . . . |
|
30 [central directory] end of central directory record |
|
31 |
|
32 |
|
33 A. Local file header: |
|
34 |
|
35 local file header signature 4 bytes (0x04034b50) |
|
36 version needed to extract 2 bytes |
|
37 general purpose bit flag 2 bytes |
|
38 compression method 2 bytes |
|
39 last mod file time 2 bytes |
|
40 last mod file date 2 bytes |
|
41 crc-32 4 bytes |
|
42 compressed size 4 bytes |
|
43 uncompressed size 4 bytes |
|
44 filename length 2 bytes |
|
45 extra field length 2 bytes |
|
46 |
|
47 filename (variable size) |
|
48 extra field (variable size) |
|
49 |
|
50 B. Data descriptor: |
|
51 |
|
52 crc-32 4 bytes |
|
53 compressed size 4 bytes |
|
54 uncompressed size 4 bytes |
|
55 |
|
56 This descriptor exists only if bit 3 of the general |
|
57 purpose bit flag is set (see below). It is byte aligned |
|
58 and immediately follows the last byte of compressed data. |
|
59 This descriptor is used only when it was not possible to |
|
60 seek in the output zip file, e.g., when the output zip file |
|
61 was standard output or a non seekable device. |
|
62 |
|
63 C. Central directory structure: |
|
64 |
|
65 [file header] . . . end of central dir record |
|
66 |
|
67 File header: |
|
68 |
|
69 central file header signature 4 bytes (0x02014b50) |
|
70 version made by 2 bytes |
|
71 version needed to extract 2 bytes |
|
72 general purpose bit flag 2 bytes |
|
73 compression method 2 bytes |
|
74 last mod file time 2 bytes |
|
75 last mod file date 2 bytes |
|
76 crc-32 4 bytes |
|
77 compressed size 4 bytes |
|
78 uncompressed size 4 bytes |
|
79 filename length 2 bytes |
|
80 extra field length 2 bytes |
|
81 file comment length 2 bytes |
|
82 disk number start 2 bytes |
|
83 internal file attributes 2 bytes |
|
84 external file attributes 4 bytes |
|
85 relative offset of local header 4 bytes |
|
86 |
|
87 filename (variable size) |
|
88 extra field (variable size) |
|
89 file comment (variable size) |
|
90 |
|
91 End of central dir record: |
|
92 |
|
93 end of central dir signature 4 bytes (0x06054b50) |
|
94 number of this disk 2 bytes |
|
95 number of the disk with the |
|
96 start of the central directory 2 bytes |
|
97 total number of entries in |
|
98 the central dir on this disk 2 bytes |
|
99 total number of entries in |
|
100 the central dir 2 bytes |
|
101 size of the central directory 4 bytes |
|
102 offset of start of central |
|
103 directory with respect to |
|
104 the starting disk number 4 bytes |
|
105 zipfile comment length 2 bytes |
|
106 zipfile comment (variable size) |
|
107 |
|
108 D. Explanation of fields: |
|
109 |
|
110 version made by (2 bytes) |
|
111 |
|
112 The upper byte indicates the compatibility of the file |
|
113 attribute information. If the external file attributes |
|
114 are compatible with MS-DOS and can be read by PKZIP for |
|
115 DOS version 2.04g then this value will be zero. If these |
|
116 attributes are not compatible, then this value will |
|
117 identify the host system on which the attributes are |
|
118 compatible. Software can use this information to determine |
|
119 the line record format for text files etc. The current |
|
120 mappings are: |
|
121 |
|
122 0 - MS-DOS and OS/2 (FAT / VFAT / FAT32 file systems) |
|
123 1 - Amiga 2 - VAX/VMS |
|
124 3 - Unix 4 - VM/CMS |
|
125 5 - Atari ST 6 - OS/2 H.P.F.S. |
|
126 7 - Macintosh 8 - Z-System |
|
127 9 - CP/M 10 - Windows NTFS |
|
128 11 thru 255 - unused |
|
129 |
|
130 The lower byte indicates the version number of the |
|
131 software used to encode the file. The value/10 |
|
132 indicates the major version number, and the value |
|
133 mod 10 is the minor version number. |
|
134 |
|
135 version needed to extract (2 bytes) |
|
136 |
|
137 The minimum software version needed to extract the |
|
138 file, mapped as above. |
|
139 |
|
140 general purpose bit flag: (2 bytes) |
|
141 |
|
142 Bit 0: If set, indicates that the file is encrypted. |
|
143 |
|
144 (For Method 6 - Imploding) |
|
145 Bit 1: If the compression method used was type 6, |
|
146 Imploding, then this bit, if set, indicates |
|
147 an 8K sliding dictionary was used. If clear, |
|
148 then a 4K sliding dictionary was used. |
|
149 Bit 2: If the compression method used was type 6, |
|
150 Imploding, then this bit, if set, indicates |
|
151 3 Shannon-Fano trees were used to encode the |
|
152 sliding dictionary output. If clear, then 2 |
|
153 Shannon-Fano trees were used. |
|
154 |
|
155 (For Method 8 - Deflating) |
|
156 Bit 2 Bit 1 |
|
157 0 0 Normal (-en) compression option was used. |
|
158 0 1 Maximum (-ex) compression option was used. |
|
159 1 0 Fast (-ef) compression option was used. |
|
160 1 1 Super Fast (-es) compression option was used. |
|
161 |
|
162 Note: Bits 1 and 2 are undefined if the compression |
|
163 method is any other. |
|
164 |
|
165 Bit 3: If this bit is set, the fields crc-32, compressed |
|
166 size and uncompressed size are set to zero in the |
|
167 local header. The correct values are put in the |
|
168 data descriptor immediately following the compressed |
|
169 data. (Note: PKZIP version 2.04g for DOS only |
|
170 recognizes this bit for method 8 compression, newer |
|
171 versions of PKZIP recognize this bit for any |
|
172 compression method.) |
|
173 |
|
174 Bit 4: Reserved for use with method 8, for enhanced |
|
175 deflating. |
|
176 |
|
177 Bit 5: If this bit is set, this indicates that the file is |
|
178 compressed patched data. (Note: Requires PKZIP |
|
179 version 2.70 or greater) |
|
180 |
|
181 Bit 6: Currently unused. |
|
182 |
|
183 Bit 7: Currently unused. |
|
184 |
|
185 Bit 8: Currently unused. |
|
186 |
|
187 Bit 9: Currently unused. |
|
188 |
|
189 Bit 10: Currently unused. |
|
190 |
|
191 Bit 11: Currently unused. |
|
192 |
|
193 Bit 12: Reserved by PKWARE for enhanced compression. |
|
194 |
|
195 Bit 13: Reserved by PKWARE. |
|
196 |
|
197 Bit 14: Reserved by PKWARE. |
|
198 |
|
199 Bit 15: Reserved by PKWARE. |
|
200 |
|
201 compression method: (2 bytes) |
|
202 |
|
203 (see accompanying documentation for algorithm |
|
204 descriptions) |
|
205 |
|
206 0 - The file is stored (no compression) |
|
207 1 - The file is Shrunk |
|
208 2 - The file is Reduced with compression factor 1 |
|
209 3 - The file is Reduced with compression factor 2 |
|
210 4 - The file is Reduced with compression factor 3 |
|
211 5 - The file is Reduced with compression factor 4 |
|
212 6 - The file is Imploded |
|
213 7 - Reserved for Tokenizing compression algorithm |
|
214 8 - The file is Deflated |
|
215 9 - Reserved for enhanced Deflating |
|
216 10 - PKWARE Date Compression Library Imploding |
|
217 |
|
218 date and time fields: (2 bytes each) |
|
219 |
|
220 The date and time are encoded in standard MS-DOS format. |
|
221 If input came from standard input, the date and time are |
|
222 those at which compression was started for this data. |
|
223 |
|
224 CRC-32: (4 bytes) |
|
225 |
|
226 The CRC-32 algorithm was generously contributed by |
|
227 David Schwaderer and can be found in his excellent |
|
228 book "C Programmers Guide to NetBIOS" published by |
|
229 Howard W. Sams & Co. Inc. The 'magic number' for |
|
230 the CRC is 0xdebb20e3. The proper CRC pre and post |
|
231 conditioning is used, meaning that the CRC register |
|
232 is pre-conditioned with all ones (a starting value |
|
233 of 0xffffffff) and the value is post-conditioned by |
|
234 taking the one's complement of the CRC residual. |
|
235 If bit 3 of the general purpose flag is set, this |
|
236 field is set to zero in the local header and the correct |
|
237 value is put in the data descriptor and in the central |
|
238 directory. |
|
239 |
|
240 compressed size: (4 bytes) |
|
241 uncompressed size: (4 bytes) |
|
242 |
|
243 The size of the file compressed and uncompressed, |
|
244 respectively. If bit 3 of the general purpose bit flag |
|
245 is set, these fields are set to zero in the local header |
|
246 and the correct values are put in the data descriptor and |
|
247 in the central directory. |
|
248 |
|
249 filename length: (2 bytes) |
|
250 extra field length: (2 bytes) |
|
251 file comment length: (2 bytes) |
|
252 |
|
253 The length of the filename, extra field, and comment |
|
254 fields respectively. The combined length of any |
|
255 directory record and these three fields should not |
|
256 generally exceed 65,535 bytes. If input came from standard |
|
257 input, the filename length is set to zero. |
|
258 |
|
259 disk number start: (2 bytes) |
|
260 |
|
261 The number of the disk on which this file begins. |
|
262 |
|
263 internal file attributes: (2 bytes) |
|
264 |
|
265 The lowest bit of this field indicates, if set, that |
|
266 the file is apparently an ASCII or text file. If not |
|
267 set, that the file apparently contains binary data. |
|
268 The remaining bits are unused in version 1.0. |
|
269 |
|
270 Bits 1 and 2 are reserved for use by PKWARE. |
|
271 |
|
272 external file attributes: (4 bytes) |
|
273 |
|
274 The mapping of the external attributes is |
|
275 host-system dependent (see 'version made by'). For |
|
276 MS-DOS, the low order byte is the MS-DOS directory |
|
277 attribute byte. If input came from standard input, this |
|
278 field is set to zero. |
|
279 |
|
280 relative offset of local header: (4 bytes) |
|
281 |
|
282 This is the offset from the start of the first disk on |
|
283 which this file appears, to where the local header should |
|
284 be found. |
|
285 |
|
286 filename: (Variable) |
|
287 |
|
288 The name of the file, with optional relative path. |
|
289 The path stored should not contain a drive or |
|
290 device letter, or a leading slash. All slashes |
|
291 should be forward slashes '/' as opposed to |
|
292 backwards slashes '\' for compatibility with Amiga |
|
293 and Unix file systems etc. If input came from standard |
|
294 input, there is no filename field. |
|
295 |
|
296 extra field: (Variable) |
|
297 |
|
298 This is for future expansion. If additional information |
|
299 needs to be stored in the future, it should be stored |
|
300 here. Earlier versions of the software can then safely |
|
301 skip this file, and find the next file or header. This |
|
302 field will be 0 length in version 1.0. |
|
303 |
|
304 In order to allow different programs and different types |
|
305 of information to be stored in the 'extra' field in .ZIP |
|
306 files, the following structure should be used for all |
|
307 programs storing data in this field: |
|
308 |
|
309 header1+data1 + header2+data2 . . . |
|
310 |
|
311 Each header should consist of: |
|
312 |
|
313 Header ID - 2 bytes |
|
314 Data Size - 2 bytes |
|
315 |
|
316 Note: all fields stored in Intel low-byte/high-byte order. |
|
317 |
|
318 The Header ID field indicates the type of data that is in |
|
319 the following data block. |
|
320 |
|
321 Header ID's of 0 thru 31 are reserved for use by PKWARE. |
|
322 The remaining ID's can be used by third party vendors for |
|
323 proprietary usage. |
|
324 |
|
325 The current Header ID mappings defined by PKWARE are: |
|
326 |
|
327 0x0007 AV Info |
|
328 0x0009 OS/2 |
|
329 0x000a NTFS |
|
330 0x000c VAX/VMS |
|
331 0x000d Unix |
|
332 0x000f Patch Descriptor |
|
333 |
|
334 Several third party mappings commonly used are: |
|
335 |
|
336 0x4b46 FWKCS MD5 (see below) |
|
337 0x07c8 Macintosh |
|
338 0x4341 Acorn/SparkFS |
|
339 0x4453 Windows NT security descriptor (binary ACL) |
|
340 0x4704 VM/CMS |
|
341 0x470f MVS |
|
342 0x4c41 OS/2 access control list (text ACL) |
|
343 0x4d49 Info-ZIP VMS (VAX or Alpha) |
|
344 0x5455 extended timestamp |
|
345 0x5855 Info-ZIP Unix (original, also OS/2, NT, etc) |
|
346 0x6542 BeOS/BeBox |
|
347 0x756e ASi Unix |
|
348 0x7855 Info-ZIP Unix (new) |
|
349 0xfd4a SMS/QDOS |
|
350 |
|
351 The Data Size field indicates the size of the following |
|
352 data block. Programs can use this value to skip to the |
|
353 next header block, passing over any data blocks that are |
|
354 not of interest. |
|
355 |
|
356 Note: As stated above, the size of the entire .ZIP file |
|
357 header, including the filename, comment, and extra |
|
358 field should not exceed 64K in size. |
|
359 |
|
360 In case two different programs should appropriate the same |
|
361 Header ID value, it is strongly recommended that each |
|
362 program place a unique signature of at least two bytes in |
|
363 size (and preferably 4 bytes or bigger) at the start of |
|
364 each data area. Every program should verify that its |
|
365 unique signature is present, in addition to the Header ID |
|
366 value being correct, before assuming that it is a block of |
|
367 known type. |
|
368 |
|
369 -OS/2 Extra Field: |
|
370 |
|
371 The following is the layout of the OS/2 attributes "extra" |
|
372 block. (Last Revision 09/05/95) |
|
373 |
|
374 Note: all fields stored in Intel low-byte/high-byte order. |
|
375 |
|
376 Value Size Description |
|
377 ----- ---- ----------- |
|
378 (OS/2) 0x0009 2 bytes Tag for this "extra" block type |
|
379 TSize 2 bytes Size for the following data block |
|
380 BSize 4 bytes Uncompressed Block Size |
|
381 CType 2 bytes Compression type |
|
382 EACRC 4 bytes CRC value for uncompress block |
|
383 (var) variable Compressed block |
|
384 |
|
385 The OS/2 extended attribute structure (FEA2LIST) is |
|
386 compressed and then stored in it's entirety within this |
|
387 structure. There will only ever be one "block" of data in |
|
388 VarFields[]. |
|
389 |
|
390 -UNIX Extra Field: |
|
391 |
|
392 The following is the layout of the Unix "extra" block. |
|
393 Note: all fields are stored in Intel low-byte/high-byte |
|
394 order. |
|
395 |
|
396 Value Size Description |
|
397 ----- ---- ----------- |
|
398 (UNIX) 0x000d 2 bytes Tag for this "extra" block type |
|
399 TSize 2 bytes Size for the following data block |
|
400 Atime 4 bytes File last access time |
|
401 Mtime 4 bytes File last modification time |
|
402 Uid 2 bytes File user ID |
|
403 Gid 2 bytes File group ID |
|
404 (var) variable Variable length data field |
|
405 |
|
406 The variable length data field will contain file type |
|
407 specific data. Currently the only values allowed are |
|
408 the original "linked to" file names for hard or symbolic |
|
409 links. |
|
410 |
|
411 -VAX/VMS Extra Field: |
|
412 |
|
413 The following is the layout of the VAX/VMS attributes |
|
414 "extra" block. |
|
415 |
|
416 Note: all fields stored in Intel low-byte/high-byte order. |
|
417 |
|
418 Value Size Description |
|
419 ----- ---- ----------- |
|
420 (VMS) 0x000c 2 bytes Tag for this "extra" block type |
|
421 TSize 2 bytes Size of the total "extra" block |
|
422 CRC 4 bytes 32-bit CRC for remainder of the block |
|
423 Tag1 2 bytes VMS attribute tag value #1 |
|
424 Size1 2 bytes Size of attribute #1, in bytes |
|
425 (var.) Size1 Attribute #1 data |
|
426 . |
|
427 . |
|
428 . |
|
429 TagN 2 bytes VMS attribute tage value #N |
|
430 SizeN 2 bytes Size of attribute #N, in bytes |
|
431 (var.) SizeN Attribute #N data |
|
432 |
|
433 Rules: |
|
434 |
|
435 1. There will be one or more of attributes present, which |
|
436 will each be preceded by the above TagX & SizeX values. |
|
437 These values are identical to the ATR$C_XXXX and |
|
438 ATR$S_XXXX constants which are defined in ATR.H under |
|
439 VMS C. Neither of these values will ever be zero. |
|
440 |
|
441 2. No word alignment or padding is performed. |
|
442 |
|
443 3. A well-behaved PKZIP/VMS program should never produce |
|
444 more than one sub-block with the same TagX value. Also, |
|
445 there will never be more than one "extra" block of type |
|
446 0x000c in a particular directory record. |
|
447 |
|
448 -NTFS Extra Field: |
|
449 |
|
450 The following is the layout of the NTFS attributes |
|
451 "extra" block. |
|
452 |
|
453 Note: all fields stored in Intel low-byte/high-byte order. |
|
454 |
|
455 Value Size Description |
|
456 ----- ---- ----------- |
|
457 (NTFS) 0x000a 2 bytes Tag for this "extra" block type |
|
458 TSize 2 bytes Size of the total "extra" block |
|
459 Reserved 4 bytes Reserved for future use |
|
460 Tag1 2 bytes NTFS attribute tag value #1 |
|
461 Size1 2 bytes Size of attribute #1, in bytes |
|
462 (var.) Size1 Attribute #1 data |
|
463 . |
|
464 . |
|
465 . |
|
466 TagN 2 bytes NTFS attribute tage value #N |
|
467 SizeN 2 bytes Size of attribute #N, in bytes |
|
468 (var.) SizeN Attribute #N data |
|
469 |
|
470 For NTFS, values for Tag1 through TagN are as follows: |
|
471 (currently only one set of attributes is defined for NTFS) |
|
472 |
|
473 Tag Size Description |
|
474 ----- ---- ----------- |
|
475 0x0001 2 bytes Tag for attribute #1 |
|
476 Size1 2 bytes Size of attribute #1, in bytes |
|
477 Mtime 8 bytes File last modification time |
|
478 Atime 8 bytes File last access time |
|
479 Ctime 8 bytes File creation time |
|
480 |
|
481 -PATCH Descriptor Extra Field: |
|
482 |
|
483 The following is the layout of the Patch Descriptor "extra" |
|
484 block. |
|
485 |
|
486 Note: all fields stored in Intel low-byte/high-byte order. |
|
487 |
|
488 Value Size Description |
|
489 ----- ---- ----------- |
|
490 (Patch) 0x000f 2 bytes Tag for this "extra" block type |
|
491 TSize 2 bytes Size of the total "extra" block |
|
492 Version 2 bytes Version of the descriptor |
|
493 Flags 4 bytes Actions and reactions (see below) |
|
494 OldSize 4 bytes Size of the file about to be patched |
|
495 OldCRC 4 bytes 32-bit CRC of the file to be patched |
|
496 NewSize 4 bytes Size of the resulting file |
|
497 NewCRC 4 bytes 32-bit CRC of the resulting file |
|
498 |
|
499 Actions and reactions |
|
500 |
|
501 Bits Description |
|
502 ---- ---------------- |
|
503 0 Use for autodetection |
|
504 1 Treat as selfpatch |
|
505 2-3 RESERVED |
|
506 4-5 Action (see below) |
|
507 6-7 RESERVED |
|
508 8-9 Reaction (see below) to absent file |
|
509 10-11 Reaction (see below) to newer file |
|
510 12-13 Reaction (see below) to unknown file |
|
511 14-15 RESERVED |
|
512 16-31 RESERVED |
|
513 |
|
514 Actions |
|
515 |
|
516 Action Value |
|
517 ------ ----- |
|
518 none 0 |
|
519 add 1 |
|
520 delete 2 |
|
521 patch 3 |
|
522 |
|
523 Reactions |
|
524 |
|
525 Reaction Value |
|
526 -------- ----- |
|
527 ask 0 |
|
528 skip 1 |
|
529 ignore 2 |
|
530 fail 3 |
|
531 |
|
532 - FWKCS MD5 Extra Field: |
|
533 |
|
534 The FWKCS Contents_Signature System, used in |
|
535 automatically identifying files independent of filename, |
|
536 optionally adds and uses an extra field to support the |
|
537 rapid creation of an enhanced contents_signature: |
|
538 |
|
539 Header ID = 0x4b46 |
|
540 Data Size = 0x0013 |
|
541 Preface = 'M','D','5' |
|
542 followed by 16 bytes containing the uncompressed file's |
|
543 128_bit MD5 hash(1), low byte first. |
|
544 |
|
545 When FWKCS revises a zipfile central directory to add |
|
546 this extra field for a file, it also replaces the |
|
547 central directory entry for that file's uncompressed |
|
548 filelength with a measured value. |
|
549 |
|
550 FWKCS provides an option to strip this extra field, if |
|
551 present, from a zipfile central directory. In adding |
|
552 this extra field, FWKCS preserves Zipfile Authenticity |
|
553 Verification; if stripping this extra field, FWKCS |
|
554 preserves all versions of AV through PKZIP version 2.04g. |
|
555 |
|
556 FWKCS, and FWKCS Contents_Signature System, are |
|
557 trademarks of Frederick W. Kantor. |
|
558 |
|
559 (1) R. Rivest, RFC1321.TXT, MIT Laboratory for Computer |
|
560 Science and RSA Data Security, Inc., April 1992. |
|
561 ll.76-77: "The MD5 algorithm is being placed in the |
|
562 public domain for review and possible adoption as a |
|
563 standard." |
|
564 |
|
565 file comment: (Variable) |
|
566 |
|
567 The comment for this file. |
|
568 |
|
569 number of this disk: (2 bytes) |
|
570 |
|
571 The number of this disk, which contains central |
|
572 directory end record. |
|
573 |
|
574 number of the disk with the start of the central |
|
575 directory: (2 bytes) |
|
576 |
|
577 The number of the disk on which the central |
|
578 directory starts. |
|
579 |
|
580 total number of entries in the central dir on |
|
581 this disk: (2 bytes) |
|
582 |
|
583 The number of central directory entries on this disk. |
|
584 |
|
585 total number of entries in the central dir: (2 bytes) |
|
586 |
|
587 The total number of files in the zipfile. |
|
588 |
|
589 size of the central directory: (4 bytes) |
|
590 |
|
591 The size (in bytes) of the entire central directory. |
|
592 |
|
593 offset of start of central directory with respect to |
|
594 the starting disk number: (4 bytes) |
|
595 |
|
596 Offset of the start of the central directory on the |
|
597 disk on which the central directory starts. |
|
598 |
|
599 zipfile comment length: (2 bytes) |
|
600 |
|
601 The length of the comment for this zipfile. |
|
602 |
|
603 zipfile comment: (Variable) |
|
604 |
|
605 The comment for this zipfile. |
|
606 |
|
607 D. General notes: |
|
608 |
|
609 1) All fields unless otherwise noted are unsigned and stored |
|
610 in Intel low-byte:high-byte, low-word:high-word order. |
|
611 |
|
612 2) String fields are not null terminated, since the |
|
613 length is given explicitly. |
|
614 |
|
615 3) Local headers should not span disk boundaries. Also, even |
|
616 though the central directory can span disk boundaries, no |
|
617 single record in the central directory should be split |
|
618 across disks. |
|
619 |
|
620 4) The entries in the central directory may not necessarily |
|
621 be in the same order that files appear in the zipfile. |
|
622 |
|
623 UnShrinking - Method 1 |
|
624 ---------------------- |
|
625 |
|
626 Shrinking is a Dynamic Ziv-Lempel-Welch compression algorithm |
|
627 with partial clearing. The initial code size is 9 bits, and |
|
628 the maximum code size is 13 bits. Shrinking differs from |
|
629 conventional Dynamic Ziv-Lempel-Welch implementations in several |
|
630 respects: |
|
631 |
|
632 1) The code size is controlled by the compressor, and is not |
|
633 automatically increased when codes larger than the current |
|
634 code size are created (but not necessarily used). When |
|
635 the decompressor encounters the code sequence 256 |
|
636 (decimal) followed by 1, it should increase the code size |
|
637 read from the input stream to the next bit size. No |
|
638 blocking of the codes is performed, so the next code at |
|
639 the increased size should be read from the input stream |
|
640 immediately after where the previous code at the smaller |
|
641 bit size was read. Again, the decompressor should not |
|
642 increase the code size used until the sequence 256,1 is |
|
643 encountered. |
|
644 |
|
645 2) When the table becomes full, total clearing is not |
|
646 performed. Rather, when the compressor emits the code |
|
647 sequence 256,2 (decimal), the decompressor should clear |
|
648 all leaf nodes from the Ziv-Lempel tree, and continue to |
|
649 use the current code size. The nodes that are cleared |
|
650 from the Ziv-Lempel tree are then re-used, with the lowest |
|
651 code value re-used first, and the highest code value |
|
652 re-used last. The compressor can emit the sequence 256,2 |
|
653 at any time. |
|
654 |
|
655 Expanding - Methods 2-5 |
|
656 ----------------------- |
|
657 |
|
658 The Reducing algorithm is actually a combination of two |
|
659 distinct algorithms. The first algorithm compresses repeated |
|
660 byte sequences, and the second algorithm takes the compressed |
|
661 stream from the first algorithm and applies a probabilistic |
|
662 compression method. |
|
663 |
|
664 The probabilistic compression stores an array of 'follower |
|
665 sets' S(j), for j=0 to 255, corresponding to each possible |
|
666 ASCII character. Each set contains between 0 and 32 |
|
667 characters, to be denoted as S(j)[0],...,S(j)[m], where m<32. |
|
668 The sets are stored at the beginning of the data area for a |
|
669 Reduced file, in reverse order, with S(255) first, and S(0) |
|
670 last. |
|
671 |
|
672 The sets are encoded as { N(j), S(j)[0],...,S(j)[N(j)-1] }, |
|
673 where N(j) is the size of set S(j). N(j) can be 0, in which |
|
674 case the follower set for S(j) is empty. Each N(j) value is |
|
675 encoded in 6 bits, followed by N(j) eight bit character values |
|
676 corresponding to S(j)[0] to S(j)[N(j)-1] respectively. If |
|
677 N(j) is 0, then no values for S(j) are stored, and the value |
|
678 for N(j-1) immediately follows. |
|
679 |
|
680 Immediately after the follower sets, is the compressed data |
|
681 stream. The compressed data stream can be interpreted for the |
|
682 probabilistic decompression as follows: |
|
683 |
|
684 let Last-Character <- 0. |
|
685 loop until done |
|
686 if the follower set S(Last-Character) is empty then |
|
687 read 8 bits from the input stream, and copy this |
|
688 value to the output stream. |
|
689 otherwise if the follower set S(Last-Character) is non-empty then |
|
690 read 1 bit from the input stream. |
|
691 if this bit is not zero then |
|
692 read 8 bits from the input stream, and copy this |
|
693 value to the output stream. |
|
694 otherwise if this bit is zero then |
|
695 read B(N(Last-Character)) bits from the input |
|
696 stream, and assign this value to I. |
|
697 Copy the value of S(Last-Character)[I] to the |
|
698 output stream. |
|
699 |
|
700 assign the last value placed on the output stream to |
|
701 Last-Character. |
|
702 end loop |
|
703 |
|
704 B(N(j)) is defined as the minimal number of bits required to |
|
705 encode the value N(j)-1. |
|
706 |
|
707 The decompressed stream from above can then be expanded to |
|
708 re-create the original file as follows: |
|
709 |
|
710 let State <- 0. |
|
711 |
|
712 loop until done |
|
713 read 8 bits from the input stream into C. |
|
714 case State of |
|
715 0: if C is not equal to DLE (144 decimal) then |
|
716 copy C to the output stream. |
|
717 otherwise if C is equal to DLE then |
|
718 let State <- 1. |
|
719 |
|
720 1: if C is non-zero then |
|
721 let V <- C. |
|
722 let Len <- L(V) |
|
723 let State <- F(Len). |
|
724 otherwise if C is zero then |
|
725 copy the value 144 (decimal) to the output stream. |
|
726 let State <- 0 |
|
727 |
|
728 2: let Len <- Len + C |
|
729 let State <- 3. |
|
730 |
|
731 3: move backwards D(V,C) bytes in the output stream |
|
732 (if this position is before the start of the output |
|
733 stream, then assume that all the data before the |
|
734 start of the output stream is filled with zeros). |
|
735 copy Len+3 bytes from this position to the output stream. |
|
736 let State <- 0. |
|
737 end case |
|
738 end loop |
|
739 |
|
740 The functions F,L, and D are dependent on the 'compression |
|
741 factor', 1 through 4, and are defined as follows: |
|
742 |
|
743 For compression factor 1: |
|
744 L(X) equals the lower 7 bits of X. |
|
745 F(X) equals 2 if X equals 127 otherwise F(X) equals 3. |
|
746 D(X,Y) equals the (upper 1 bit of X) * 256 + Y + 1. |
|
747 For compression factor 2: |
|
748 L(X) equals the lower 6 bits of X. |
|
749 F(X) equals 2 if X equals 63 otherwise F(X) equals 3. |
|
750 D(X,Y) equals the (upper 2 bits of X) * 256 + Y + 1. |
|
751 For compression factor 3: |
|
752 L(X) equals the lower 5 bits of X. |
|
753 F(X) equals 2 if X equals 31 otherwise F(X) equals 3. |
|
754 D(X,Y) equals the (upper 3 bits of X) * 256 + Y + 1. |
|
755 For compression factor 4: |
|
756 L(X) equals the lower 4 bits of X. |
|
757 F(X) equals 2 if X equals 15 otherwise F(X) equals 3. |
|
758 D(X,Y) equals the (upper 4 bits of X) * 256 + Y + 1. |
|
759 |
|
760 Imploding - Method 6 |
|
761 -------------------- |
|
762 |
|
763 The Imploding algorithm is actually a combination of two distinct |
|
764 algorithms. The first algorithm compresses repeated byte |
|
765 sequences using a sliding dictionary. The second algorithm is |
|
766 used to compress the encoding of the sliding dictionary output, |
|
767 using multiple Shannon-Fano trees. |
|
768 |
|
769 The Imploding algorithm can use a 4K or 8K sliding dictionary |
|
770 size. The dictionary size used can be determined by bit 1 in the |
|
771 general purpose flag word; a 0 bit indicates a 4K dictionary |
|
772 while a 1 bit indicates an 8K dictionary. |
|
773 |
|
774 The Shannon-Fano trees are stored at the start of the compressed |
|
775 file. The number of trees stored is defined by bit 2 in the |
|
776 general purpose flag word; a 0 bit indicates two trees stored, a |
|
777 1 bit indicates three trees are stored. If 3 trees are stored, |
|
778 the first Shannon-Fano tree represents the encoding of the |
|
779 Literal characters, the second tree represents the encoding of |
|
780 the Length information, the third represents the encoding of the |
|
781 Distance information. When 2 Shannon-Fano trees are stored, the |
|
782 Length tree is stored first, followed by the Distance tree. |
|
783 |
|
784 The Literal Shannon-Fano tree, if present is used to represent |
|
785 the entire ASCII character set, and contains 256 values. This |
|
786 tree is used to compress any data not compressed by the sliding |
|
787 dictionary algorithm. When this tree is present, the Minimum |
|
788 Match Length for the sliding dictionary is 3. If this tree is |
|
789 not present, the Minimum Match Length is 2. |
|
790 |
|
791 The Length Shannon-Fano tree is used to compress the Length part |
|
792 of the (length,distance) pairs from the sliding dictionary |
|
793 output. The Length tree contains 64 values, ranging from the |
|
794 Minimum Match Length, to 63 plus the Minimum Match Length. |
|
795 |
|
796 The Distance Shannon-Fano tree is used to compress the Distance |
|
797 part of the (length,distance) pairs from the sliding dictionary |
|
798 output. The Distance tree contains 64 values, ranging from 0 to |
|
799 63, representing the upper 6 bits of the distance value. The |
|
800 distance values themselves will be between 0 and the sliding |
|
801 dictionary size, either 4K or 8K. |
|
802 |
|
803 The Shannon-Fano trees themselves are stored in a compressed |
|
804 format. The first byte of the tree data represents the number of |
|
805 bytes of data representing the (compressed) Shannon-Fano tree |
|
806 minus 1. The remaining bytes represent the Shannon-Fano tree |
|
807 data encoded as: |
|
808 |
|
809 High 4 bits: Number of values at this bit length + 1. (1 - 16) |
|
810 Low 4 bits: Bit Length needed to represent value + 1. (1 - 16) |
|
811 |
|
812 The Shannon-Fano codes can be constructed from the bit lengths |
|
813 using the following algorithm: |
|
814 |
|
815 1) Sort the Bit Lengths in ascending order, while retaining the |
|
816 order of the original lengths stored in the file. |
|
817 |
|
818 2) Generate the Shannon-Fano trees: |
|
819 |
|
820 Code <- 0 |
|
821 CodeIncrement <- 0 |
|
822 LastBitLength <- 0 |
|
823 i <- number of Shannon-Fano codes - 1 (either 255 or 63) |
|
824 |
|
825 loop while i >= 0 |
|
826 Code = Code + CodeIncrement |
|
827 if BitLength(i) <> LastBitLength then |
|
828 LastBitLength=BitLength(i) |
|
829 CodeIncrement = 1 shifted left (16 - LastBitLength) |
|
830 ShannonCode(i) = Code |
|
831 i <- i - 1 |
|
832 end loop |
|
833 |
|
834 3) Reverse the order of all the bits in the above ShannonCode() |
|
835 vector, so that the most significant bit becomes the least |
|
836 significant bit. For example, the value 0x1234 (hex) would |
|
837 become 0x2C48 (hex). |
|
838 |
|
839 4) Restore the order of Shannon-Fano codes as originally stored |
|
840 within the file. |
|
841 |
|
842 Example: |
|
843 |
|
844 This example will show the encoding of a Shannon-Fano tree |
|
845 of size 8. Notice that the actual Shannon-Fano trees used |
|
846 for Imploding are either 64 or 256 entries in size. |
|
847 |
|
848 Example: 0x02, 0x42, 0x01, 0x13 |
|
849 |
|
850 The first byte indicates 3 values in this table. Decoding the |
|
851 bytes: |
|
852 0x42 = 5 codes of 3 bits long |
|
853 0x01 = 1 code of 2 bits long |
|
854 0x13 = 2 codes of 4 bits long |
|
855 |
|
856 This would generate the original bit length array of: |
|
857 (3, 3, 3, 3, 3, 2, 4, 4) |
|
858 |
|
859 There are 8 codes in this table for the values 0 thru 7. Using |
|
860 the algorithm to obtain the Shannon-Fano codes produces: |
|
861 |
|
862 Reversed Order Original |
|
863 Val Sorted Constructed Code Value Restored Length |
|
864 --- ------ ----------------- -------- -------- ------ |
|
865 0: 2 1100000000000000 11 101 3 |
|
866 1: 3 1010000000000000 101 001 3 |
|
867 2: 3 1000000000000000 001 110 3 |
|
868 3: 3 0110000000000000 110 010 3 |
|
869 4: 3 0100000000000000 010 100 3 |
|
870 5: 3 0010000000000000 100 11 2 |
|
871 6: 4 0001000000000000 1000 1000 4 |
|
872 7: 4 0000000000000000 0000 0000 4 |
|
873 |
|
874 The values in the Val, Order Restored and Original Length columns |
|
875 now represent the Shannon-Fano encoding tree that can be used for |
|
876 decoding the Shannon-Fano encoded data. How to parse the |
|
877 variable length Shannon-Fano values from the data stream is beyond |
|
878 the scope of this document. (See the references listed at the end of |
|
879 this document for more information.) However, traditional decoding |
|
880 schemes used for Huffman variable length decoding, such as the |
|
881 Greenlaw algorithm, can be successfully applied. |
|
882 |
|
883 The compressed data stream begins immediately after the |
|
884 compressed Shannon-Fano data. The compressed data stream can be |
|
885 interpreted as follows: |
|
886 |
|
887 loop until done |
|
888 read 1 bit from input stream. |
|
889 |
|
890 if this bit is non-zero then (encoded data is literal data) |
|
891 if Literal Shannon-Fano tree is present |
|
892 read and decode character using Literal Shannon-Fano tree. |
|
893 otherwise |
|
894 read 8 bits from input stream. |
|
895 copy character to the output stream. |
|
896 otherwise (encoded data is sliding dictionary match) |
|
897 if 8K dictionary size |
|
898 read 7 bits for offset Distance (lower 7 bits of offset). |
|
899 otherwise |
|
900 read 6 bits for offset Distance (lower 6 bits of offset). |
|
901 |
|
902 using the Distance Shannon-Fano tree, read and decode the |
|
903 upper 6 bits of the Distance value. |
|
904 |
|
905 using the Length Shannon-Fano tree, read and decode |
|
906 the Length value. |
|
907 |
|
908 Length <- Length + Minimum Match Length |
|
909 |
|
910 if Length = 63 + Minimum Match Length |
|
911 read 8 bits from the input stream, |
|
912 add this value to Length. |
|
913 |
|
914 move backwards Distance+1 bytes in the output stream, and |
|
915 copy Length characters from this position to the output |
|
916 stream. (if this position is before the start of the output |
|
917 stream, then assume that all the data before the start of |
|
918 the output stream is filled with zeros). |
|
919 end loop |
|
920 |
|
921 Tokenizing - Method 7 |
|
922 -------------------- |
|
923 |
|
924 This method is not used by PKZIP. |
|
925 |
|
926 Deflating - Method 8 |
|
927 ----------------- |
|
928 |
|
929 The Deflate algorithm is similar to the Implode algorithm using |
|
930 a sliding dictionary of up to 32K with secondary compression |
|
931 from Huffman/Shannon-Fano codes. |
|
932 |
|
933 The compressed data is stored in blocks with a header describing |
|
934 the block and the Huffman codes used in the data block. The header |
|
935 format is as follows: |
|
936 |
|
937 Bit 0: Last Block bit This bit is set to 1 if this is the last |
|
938 compressed block in the data. |
|
939 Bits 1-2: Block type |
|
940 00 (0) - Block is stored - All stored data is byte aligned. |
|
941 Skip bits until next byte, then next word = block |
|
942 length, followed by the ones compliment of the block |
|
943 length word. Remaining data in block is the stored |
|
944 data. |
|
945 |
|
946 01 (1) - Use fixed Huffman codes for literal and distance codes. |
|
947 Lit Code Bits Dist Code Bits |
|
948 --------- ---- --------- ---- |
|
949 0 - 143 8 0 - 31 5 |
|
950 144 - 255 9 |
|
951 256 - 279 7 |
|
952 280 - 287 8 |
|
953 |
|
954 Literal codes 286-287 and distance codes 30-31 are |
|
955 never used but participate in the huffman construction. |
|
956 |
|
957 10 (2) - Dynamic Huffman codes. (See expanding Huffman codes) |
|
958 |
|
959 11 (3) - Reserved - Flag a "Error in compressed data" if seen. |
|
960 |
|
961 Expanding Huffman Codes |
|
962 ----------------------- |
|
963 If the data block is stored with dynamic Huffman codes, the Huffman |
|
964 codes are sent in the following compressed format: |
|
965 |
|
966 5 Bits: # of Literal codes sent - 256 (256 - 286) |
|
967 All other codes are never sent. |
|
968 5 Bits: # of Dist codes - 1 (1 - 32) |
|
969 4 Bits: # of Bit Length codes - 3 (3 - 19) |
|
970 |
|
971 The Huffman codes are sent as bit lengths and the codes are built as |
|
972 described in the implode algorithm. The bit lengths themselves are |
|
973 compressed with Huffman codes. There are 19 bit length codes: |
|
974 |
|
975 0 - 15: Represent bit lengths of 0 - 15 |
|
976 16: Copy the previous bit length 3 - 6 times. |
|
977 The next 2 bits indicate repeat length (0 = 3, ... ,3 = 6) |
|
978 Example: Codes 8, 16 (+2 bits 11), 16 (+2 bits 10) will |
|
979 expand to 12 bit lengths of 8 (1 + 6 + 5) |
|
980 17: Repeat a bit length of 0 for 3 - 10 times. (3 bits of length) |
|
981 18: Repeat a bit length of 0 for 11 - 138 times (7 bits of length) |
|
982 |
|
983 The lengths of the bit length codes are sent packed 3 bits per value |
|
984 (0 - 7) in the following order: |
|
985 |
|
986 16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15 |
|
987 |
|
988 The Huffman codes should be built as described in the Implode algorithm |
|
989 except codes are assigned starting at the shortest bit length, i.e. the |
|
990 shortest code should be all 0's rather than all 1's. Also, codes with |
|
991 a bit length of zero do not participate in the tree construction. The |
|
992 codes are then used to decode the bit lengths for the literal and |
|
993 distance tables. |
|
994 |
|
995 The bit lengths for the literal tables are sent first with the number |
|
996 of entries sent described by the 5 bits sent earlier. There are up |
|
997 to 286 literal characters; the first 256 represent the respective 8 |
|
998 bit character, code 256 represents the End-Of-Block code, the remaining |
|
999 29 codes represent copy lengths of 3 thru 258. There are up to 30 |
|
1000 distance codes representing distances from 1 thru 32k as described |
|
1001 below. |
|
1002 |
|
1003 Length Codes |
|
1004 ------------ |
|
1005 Extra Extra Extra Extra |
|
1006 Code Bits Length Code Bits Lengths Code Bits Lengths Code Bits Length(s) |
|
1007 ---- ---- ------ ---- ---- ------- ---- ---- ------- ---- ---- --------- |
|
1008 257 0 3 265 1 11,12 273 3 35-42 281 5 131-162 |
|
1009 258 0 4 266 1 13,14 274 3 43-50 282 5 163-194 |
|
1010 259 0 5 267 1 15,16 275 3 51-58 283 5 195-226 |
|
1011 260 0 6 268 1 17,18 276 3 59-66 284 5 227-257 |
|
1012 261 0 7 269 2 19-22 277 4 67-82 285 0 258 |
|
1013 262 0 8 270 2 23-26 278 4 83-98 |
|
1014 263 0 9 271 2 27-30 279 4 99-114 |
|
1015 264 0 10 272 2 31-34 280 4 115-130 |
|
1016 |
|
1017 Distance Codes |
|
1018 -------------- |
|
1019 Extra Extra Extra Extra |
|
1020 Code Bits Dist Code Bits Dist Code Bits Distance Code Bits Distance |
|
1021 ---- ---- ---- ---- ---- ------ ---- ---- -------- ---- ---- -------- |
|
1022 0 0 1 8 3 17-24 16 7 257-384 24 11 4097-6144 |
|
1023 1 0 2 9 3 25-32 17 7 385-512 25 11 6145-8192 |
|
1024 2 0 3 10 4 33-48 18 8 513-768 26 12 8193-12288 |
|
1025 3 0 4 11 4 49-64 19 8 769-1024 27 12 12289-16384 |
|
1026 4 1 5,6 12 5 65-96 20 9 1025-1536 28 13 16385-24576 |
|
1027 5 1 7,8 13 5 97-128 21 9 1537-2048 29 13 24577-32768 |
|
1028 6 2 9-12 14 6 129-192 22 10 2049-3072 |
|
1029 7 2 13-16 15 6 193-256 23 10 3073-4096 |
|
1030 |
|
1031 The compressed data stream begins immediately after the |
|
1032 compressed header data. The compressed data stream can be |
|
1033 interpreted as follows: |
|
1034 |
|
1035 do |
|
1036 read header from input stream. |
|
1037 |
|
1038 if stored block |
|
1039 skip bits until byte aligned |
|
1040 read count and 1's compliment of count |
|
1041 copy count bytes data block |
|
1042 otherwise |
|
1043 loop until end of block code sent |
|
1044 decode literal character from input stream |
|
1045 if literal < 256 |
|
1046 copy character to the output stream |
|
1047 otherwise |
|
1048 if literal = end of block |
|
1049 break from loop |
|
1050 otherwise |
|
1051 decode distance from input stream |
|
1052 |
|
1053 move backwards distance bytes in the output stream, and |
|
1054 copy length characters from this position to the output |
|
1055 stream. |
|
1056 end loop |
|
1057 while not last block |
|
1058 |
|
1059 if data descriptor exists |
|
1060 skip bits until byte aligned |
|
1061 read crc and sizes |
|
1062 endif |
|
1063 |
|
1064 Decryption |
|
1065 ---------- |
|
1066 |
|
1067 The encryption used in PKZIP was generously supplied by Roger |
|
1068 Schlafly. PKWARE is grateful to Mr. Schlafly for his expert |
|
1069 help and advice in the field of data encryption. |
|
1070 |
|
1071 PKZIP encrypts the compressed data stream. Encrypted files must |
|
1072 be decrypted before they can be extracted. |
|
1073 |
|
1074 Each encrypted file has an extra 12 bytes stored at the start of |
|
1075 the data area defining the encryption header for that file. The |
|
1076 encryption header is originally set to random values, and then |
|
1077 itself encrypted, using three, 32-bit keys. The key values are |
|
1078 initialized using the supplied encryption password. After each byte |
|
1079 is encrypted, the keys are then updated using pseudo-random number |
|
1080 generation techniques in combination with the same CRC-32 algorithm |
|
1081 used in PKZIP and described elsewhere in this document. |
|
1082 |
|
1083 The following is the basic steps required to decrypt a file: |
|
1084 |
|
1085 1) Initialize the three 32-bit keys with the password. |
|
1086 2) Read and decrypt the 12-byte encryption header, further |
|
1087 initializing the encryption keys. |
|
1088 3) Read and decrypt the compressed data stream using the |
|
1089 encryption keys. |
|
1090 |
|
1091 Step 1 - Initializing the encryption keys |
|
1092 ----------------------------------------- |
|
1093 |
|
1094 Key(0) <- 305419896 |
|
1095 Key(1) <- 591751049 |
|
1096 Key(2) <- 878082192 |
|
1097 |
|
1098 loop for i <- 0 to length(password)-1 |
|
1099 update_keys(password(i)) |
|
1100 end loop |
|
1101 |
|
1102 Where update_keys() is defined as: |
|
1103 |
|
1104 update_keys(char): |
|
1105 Key(0) <- crc32(key(0),char) |
|
1106 Key(1) <- Key(1) + (Key(0) & 000000ffH) |
|
1107 Key(1) <- Key(1) * 134775813 + 1 |
|
1108 Key(2) <- crc32(key(2),key(1) >> 24) |
|
1109 end update_keys |
|
1110 |
|
1111 Where crc32(old_crc,char) is a routine that given a CRC value and a |
|
1112 character, returns an updated CRC value after applying the CRC-32 |
|
1113 algorithm described elsewhere in this document. |
|
1114 |
|
1115 Step 2 - Decrypting the encryption header |
|
1116 ----------------------------------------- |
|
1117 |
|
1118 The purpose of this step is to further initialize the encryption |
|
1119 keys, based on random data, to render a plaintext attack on the |
|
1120 data ineffective. |
|
1121 |
|
1122 Read the 12-byte encryption header into Buffer, in locations |
|
1123 Buffer(0) thru Buffer(11). |
|
1124 |
|
1125 loop for i <- 0 to 11 |
|
1126 C <- buffer(i) ^ decrypt_byte() |
|
1127 update_keys(C) |
|
1128 buffer(i) <- C |
|
1129 end loop |
|
1130 |
|
1131 Where decrypt_byte() is defined as: |
|
1132 |
|
1133 unsigned char decrypt_byte() |
|
1134 local unsigned short temp |
|
1135 temp <- Key(2) | 2 |
|
1136 decrypt_byte <- (temp * (temp ^ 1)) >> 8 |
|
1137 end decrypt_byte |
|
1138 |
|
1139 After the header is decrypted, the last 1 or 2 bytes in Buffer |
|
1140 should be the high-order word/byte of the CRC for the file being |
|
1141 decrypted, stored in Intel low-byte/high-byte order. Versions of |
|
1142 PKZIP prior to 2.0 used a 2 byte CRC check; a 1 byte CRC check is |
|
1143 used on versions after 2.0. This can be used to test if the password |
|
1144 supplied is correct or not. |
|
1145 |
|
1146 Step 3 - Decrypting the compressed data stream |
|
1147 ---------------------------------------------- |
|
1148 |
|
1149 The compressed data stream can be decrypted as follows: |
|
1150 |
|
1151 loop until done |
|
1152 read a character into C |
|
1153 Temp <- C ^ decrypt_byte() |
|
1154 update_keys(temp) |
|
1155 output Temp |
|
1156 end loop |
|
1157 |
|
1158 In addition to the above mentioned contributors to PKZIP and PKUNZIP, |
|
1159 I would like to extend special thanks to Robert Mahoney for suggesting |
|
1160 the extension .ZIP for this software. |
|
1161 |
|
1162 References: |
|
1163 |
|
1164 Fiala, Edward R., and Greene, Daniel H., "Data compression with |
|
1165 finite windows", Communications of the ACM, Volume 32, Number 4, |
|
1166 April 1989, pages 490-505. |
|
1167 |
|
1168 Held, Gilbert, "Data Compression, Techniques and Applications, |
|
1169 Hardware and Software Considerations", John Wiley & Sons, 1987. |
|
1170 |
|
1171 Huffman, D.A., "A method for the construction of minimum-redundancy |
|
1172 codes", Proceedings of the IRE, Volume 40, Number 9, September 1952, |
|
1173 pages 1098-1101. |
|
1174 |
|
1175 Nelson, Mark, "LZW Data Compression", Dr. Dobbs Journal, Volume 14, |
|
1176 Number 10, October 1989, pages 29-37. |
|
1177 |
|
1178 Nelson, Mark, "The Data Compression Book", M&T Books, 1991. |
|
1179 |
|
1180 Storer, James A., "Data Compression, Methods and Theory", |
|
1181 Computer Science Press, 1988 |
|
1182 |
|
1183 Welch, Terry, "A Technique for High-Performance Data Compression", |
|
1184 IEEE Computer, Volume 17, Number 6, June 1984, pages 8-19. |
|
1185 |
|
1186 Ziv, J. and Lempel, A., "A universal algorithm for sequential data |
|
1187 compression", Communications of the ACM, Volume 30, Number 6, |
|
1188 June 1987, pages 520-540. |
|
1189 |
|
1190 Ziv, J. and Lempel, A., "Compression of individual sequences via |
|
1191 variable-rate coding", IEEE Transactions on Information Theory, |
|
1192 Volume 24, Number 5, September 1978, pages 530-536. |