michael@0: Hyphen - hyphenation library to use converted TeX hyphenation patterns michael@0: michael@0: (C) 1998 Raph Levien michael@0: (C) 2001 ALTLinux, Moscow michael@0: (C) 2006, 2007, 2008, 2010, 2011 László Németh michael@0: michael@0: This was part of libHnj library by Raph Levien. michael@0: michael@0: Peter Novodvorsky from ALTLinux cut hyphenation part from libHnj michael@0: to use it in OpenOffice.org. michael@0: michael@0: Compound word and non-standard hyphenation support by László Németh. michael@0: michael@0: License is the original LibHnj license: michael@0: LibHnj is dual licensed under LGPL and MPL (see also README.libhnj). michael@0: michael@0: Because LGPL allows GPL relicensing, COPYING contains now michael@0: LGPL/GPL/MPL tri-license for explicit Mozilla source compatibility. michael@0: michael@0: Original Libhnj source with OOo's patches are managed by Rene Engelhard michael@0: and Chris Halls at Debian: michael@0: michael@0: http://packages.debian.org/stable/libdevel/libhnj-dev michael@0: and http://packages.debian.org/unstable/source/libhnj michael@0: michael@0: michael@0: OTHER FILES michael@0: michael@0: This distribution is the source of the en_US hyphenation patterns michael@0: "hyph_en_US.dic", too. See README_hyph_en_US.txt. michael@0: michael@0: Source files of hyph_en_US.dic in the distribution: michael@0: michael@0: hyphen.tex (en_US hyphenation patterns from plain TeX) michael@0: michael@0: Source: http://tug.ctan.org/text-archive/macros/plain/base/hyphen.tex michael@0: michael@0: tbhyphext.tex: hyphenation exception log from TugBoat archive michael@0: michael@0: Source of the hyphenation exception list: michael@0: http://www.ctan.org/tex-archive/info/digests/tugboat/tb0hyf.tex michael@0: michael@0: Generated with the hyphenex script michael@0: (http://www.ctan.org/tex-archive/info/digests/tugboat/hyphenex.sh) michael@0: michael@0: sh hyphenex.sh tbhyphext.tex michael@0: michael@0: michael@0: INSTALLATION michael@0: michael@0: ./configure michael@0: make michael@0: make install michael@0: michael@0: UNIT TESTS (WITH VALGRIND DEBUGGER) michael@0: michael@0: make check michael@0: VALGRIND=memcheck make check michael@0: michael@0: USAGE michael@0: michael@0: ./example hyph_en_US.dic mywords.txt michael@0: michael@0: or (under Linux) michael@0: michael@0: echo example | ./example hyph_en_US.dic /dev/stdin michael@0: michael@0: NOTE: In the case of Unicode encoded input, convert your words michael@0: to lowercase before hyphenation (under UTF-8 console environment): michael@0: michael@0: cat mywords.txt | awk '{print tolower($0)}' >mywordslow.txt michael@0: michael@0: DEVELOPMENT michael@0: michael@0: See README.hyphen for hyphenation algorithm, README.nonstandard michael@0: and doc/tb87nemeth.pdf for non-standard hyphenation, michael@0: README.compound for compound word hyphenation, and tests/*. michael@0: michael@0: Description of the dictionary format: michael@0: michael@0: First line contains the character encoding (ISO8859-x, UTF-8). michael@0: michael@0: Possible options in the following lines: michael@0: michael@0: LEFTHYPHENMIN num minimal hyphenation distance from the left word end michael@0: RIGHTHYPHENMIN num minimal hyphation distance from the right word end michael@0: COMPOUNDLEFTHYPHENMIN num min. hyph. dist. from the left compound word boundary michael@0: COMPOUNDRIGHTHYPHENMIN num min. hyph. dist. from the right comp. word boundary michael@0: michael@0: hyphenation patterns see README.* files michael@0: michael@0: NEXTWORD separate the two compound sets (see README.compound) michael@0: michael@0: Default values: michael@0: Without explicite declarations, hyphenmin fields of dict struct michael@0: are zeroes, but in this case the lefthyphenmin and righthyphenmin michael@0: will be the default 2 under the hyphenation (for backward compatibility). michael@0: michael@0: Comments michael@0: michael@0: Use percent sign at the beginning of the lines to add comments to your michael@0: hpyhenation patterns (after the character encoding in the first line): michael@0: michael@0: % comment michael@0: michael@0: ***************************************************************************** michael@0: * Warning! Correct working of Libhnj *needs* prepared hyphenation patterns. * michael@0: michael@0: For example, generating hyph_en_US.dic from "hyphen.us" TeX patterns: michael@0: michael@0: perl substrings.pl hyphen.us hyph_en_US.dic ISO8859-1 michael@0: michael@0: or with default LEFTHYPHENMIN and RIGHTHYPHENMIN values: michael@0: michael@0: perl substrings.pl hyphen.us hyph_en_US.dic ISO8859-1 2 3 michael@0: perl substrings.pl hyphen.gb hyph_en_GB.dic ISO8859-1 3 3 michael@0: **************************************************************************** michael@0: michael@0: OTHERS michael@0: michael@0: Java hyphenation: Peter B. West (Folio project) implements a hyphenator with michael@0: non standard hyphenation facilities based on extended Libhnj. The HyFo module michael@0: is released in binary form as jar files and in source form as zip files. michael@0: See http://sourceforge.net/project/showfiles.php?group_id=119136 michael@0: michael@0: László Németh michael@0: