Wed, 31 Dec 2014 07:22:50 +0100
Correct previous dual key logic pending first delivery installment.
michael@0 | 1 | Hyphen - hyphenation library to use converted TeX hyphenation patterns |
michael@0 | 2 | |
michael@0 | 3 | (C) 1998 Raph Levien |
michael@0 | 4 | (C) 2001 ALTLinux, Moscow |
michael@0 | 5 | (C) 2006, 2007, 2008, 2010, 2011 László Németh |
michael@0 | 6 | |
michael@0 | 7 | This was part of libHnj library by Raph Levien. |
michael@0 | 8 | |
michael@0 | 9 | Peter Novodvorsky from ALTLinux cut hyphenation part from libHnj |
michael@0 | 10 | to use it in OpenOffice.org. |
michael@0 | 11 | |
michael@0 | 12 | Compound word and non-standard hyphenation support by László Németh. |
michael@0 | 13 | |
michael@0 | 14 | License is the original LibHnj license: |
michael@0 | 15 | LibHnj is dual licensed under LGPL and MPL (see also README.libhnj). |
michael@0 | 16 | |
michael@0 | 17 | Because LGPL allows GPL relicensing, COPYING contains now |
michael@0 | 18 | LGPL/GPL/MPL tri-license for explicit Mozilla source compatibility. |
michael@0 | 19 | |
michael@0 | 20 | Original Libhnj source with OOo's patches are managed by Rene Engelhard |
michael@0 | 21 | and Chris Halls at Debian: |
michael@0 | 22 | |
michael@0 | 23 | http://packages.debian.org/stable/libdevel/libhnj-dev |
michael@0 | 24 | and http://packages.debian.org/unstable/source/libhnj |
michael@0 | 25 | |
michael@0 | 26 | |
michael@0 | 27 | OTHER FILES |
michael@0 | 28 | |
michael@0 | 29 | This distribution is the source of the en_US hyphenation patterns |
michael@0 | 30 | "hyph_en_US.dic", too. See README_hyph_en_US.txt. |
michael@0 | 31 | |
michael@0 | 32 | Source files of hyph_en_US.dic in the distribution: |
michael@0 | 33 | |
michael@0 | 34 | hyphen.tex (en_US hyphenation patterns from plain TeX) |
michael@0 | 35 | |
michael@0 | 36 | Source: http://tug.ctan.org/text-archive/macros/plain/base/hyphen.tex |
michael@0 | 37 | |
michael@0 | 38 | tbhyphext.tex: hyphenation exception log from TugBoat archive |
michael@0 | 39 | |
michael@0 | 40 | Source of the hyphenation exception list: |
michael@0 | 41 | http://www.ctan.org/tex-archive/info/digests/tugboat/tb0hyf.tex |
michael@0 | 42 | |
michael@0 | 43 | Generated with the hyphenex script |
michael@0 | 44 | (http://www.ctan.org/tex-archive/info/digests/tugboat/hyphenex.sh) |
michael@0 | 45 | |
michael@0 | 46 | sh hyphenex.sh <tb0hyf.tex >tbhyphext.tex |
michael@0 | 47 | |
michael@0 | 48 | |
michael@0 | 49 | INSTALLATION |
michael@0 | 50 | |
michael@0 | 51 | ./configure |
michael@0 | 52 | make |
michael@0 | 53 | make install |
michael@0 | 54 | |
michael@0 | 55 | UNIT TESTS (WITH VALGRIND DEBUGGER) |
michael@0 | 56 | |
michael@0 | 57 | make check |
michael@0 | 58 | VALGRIND=memcheck make check |
michael@0 | 59 | |
michael@0 | 60 | USAGE |
michael@0 | 61 | |
michael@0 | 62 | ./example hyph_en_US.dic mywords.txt |
michael@0 | 63 | |
michael@0 | 64 | or (under Linux) |
michael@0 | 65 | |
michael@0 | 66 | echo example | ./example hyph_en_US.dic /dev/stdin |
michael@0 | 67 | |
michael@0 | 68 | NOTE: In the case of Unicode encoded input, convert your words |
michael@0 | 69 | to lowercase before hyphenation (under UTF-8 console environment): |
michael@0 | 70 | |
michael@0 | 71 | cat mywords.txt | awk '{print tolower($0)}' >mywordslow.txt |
michael@0 | 72 | |
michael@0 | 73 | DEVELOPMENT |
michael@0 | 74 | |
michael@0 | 75 | See README.hyphen for hyphenation algorithm, README.nonstandard |
michael@0 | 76 | and doc/tb87nemeth.pdf for non-standard hyphenation, |
michael@0 | 77 | README.compound for compound word hyphenation, and tests/*. |
michael@0 | 78 | |
michael@0 | 79 | Description of the dictionary format: |
michael@0 | 80 | |
michael@0 | 81 | First line contains the character encoding (ISO8859-x, UTF-8). |
michael@0 | 82 | |
michael@0 | 83 | Possible options in the following lines: |
michael@0 | 84 | |
michael@0 | 85 | LEFTHYPHENMIN num minimal hyphenation distance from the left word end |
michael@0 | 86 | RIGHTHYPHENMIN num minimal hyphation distance from the right word end |
michael@0 | 87 | COMPOUNDLEFTHYPHENMIN num min. hyph. dist. from the left compound word boundary |
michael@0 | 88 | COMPOUNDRIGHTHYPHENMIN num min. hyph. dist. from the right comp. word boundary |
michael@0 | 89 | |
michael@0 | 90 | hyphenation patterns see README.* files |
michael@0 | 91 | |
michael@0 | 92 | NEXTWORD separate the two compound sets (see README.compound) |
michael@0 | 93 | |
michael@0 | 94 | Default values: |
michael@0 | 95 | Without explicite declarations, hyphenmin fields of dict struct |
michael@0 | 96 | are zeroes, but in this case the lefthyphenmin and righthyphenmin |
michael@0 | 97 | will be the default 2 under the hyphenation (for backward compatibility). |
michael@0 | 98 | |
michael@0 | 99 | Comments |
michael@0 | 100 | |
michael@0 | 101 | Use percent sign at the beginning of the lines to add comments to your |
michael@0 | 102 | hpyhenation patterns (after the character encoding in the first line): |
michael@0 | 103 | |
michael@0 | 104 | % comment |
michael@0 | 105 | |
michael@0 | 106 | ***************************************************************************** |
michael@0 | 107 | * Warning! Correct working of Libhnj *needs* prepared hyphenation patterns. * |
michael@0 | 108 | |
michael@0 | 109 | For example, generating hyph_en_US.dic from "hyphen.us" TeX patterns: |
michael@0 | 110 | |
michael@0 | 111 | perl substrings.pl hyphen.us hyph_en_US.dic ISO8859-1 |
michael@0 | 112 | |
michael@0 | 113 | or with default LEFTHYPHENMIN and RIGHTHYPHENMIN values: |
michael@0 | 114 | |
michael@0 | 115 | perl substrings.pl hyphen.us hyph_en_US.dic ISO8859-1 2 3 |
michael@0 | 116 | perl substrings.pl hyphen.gb hyph_en_GB.dic ISO8859-1 3 3 |
michael@0 | 117 | **************************************************************************** |
michael@0 | 118 | |
michael@0 | 119 | OTHERS |
michael@0 | 120 | |
michael@0 | 121 | Java hyphenation: Peter B. West (Folio project) implements a hyphenator with |
michael@0 | 122 | non standard hyphenation facilities based on extended Libhnj. The HyFo module |
michael@0 | 123 | is released in binary form as jar files and in source form as zip files. |
michael@0 | 124 | See http://sourceforge.net/project/showfiles.php?group_id=119136 |
michael@0 | 125 | |
michael@0 | 126 | László Németh |
michael@0 | 127 | <nemeth (at) numbertext (dot) org> |