intl/hyphenation/src/README

Wed, 31 Dec 2014 06:09:35 +0100

author
Michael Schloh von Bennewitz <michael@schloh.com>
date
Wed, 31 Dec 2014 06:09:35 +0100
changeset 0
6474c204b198
permissions
-rw-r--r--

Cloned upstream origin tor-browser at tor-browser-31.3.0esr-4.5-1-build1
revision ID fc1c9ff7c1b2defdbc039f12214767608f46423f for hacking purpose.

michael@0 1 Hyphen - hyphenation library to use converted TeX hyphenation patterns
michael@0 2
michael@0 3 (C) 1998 Raph Levien
michael@0 4 (C) 2001 ALTLinux, Moscow
michael@0 5 (C) 2006, 2007, 2008, 2010, 2011 László Németh
michael@0 6
michael@0 7 This was part of libHnj library by Raph Levien.
michael@0 8
michael@0 9 Peter Novodvorsky from ALTLinux cut hyphenation part from libHnj
michael@0 10 to use it in OpenOffice.org.
michael@0 11
michael@0 12 Compound word and non-standard hyphenation support by László Németh.
michael@0 13
michael@0 14 License is the original LibHnj license:
michael@0 15 LibHnj is dual licensed under LGPL and MPL (see also README.libhnj).
michael@0 16
michael@0 17 Because LGPL allows GPL relicensing, COPYING contains now
michael@0 18 LGPL/GPL/MPL tri-license for explicit Mozilla source compatibility.
michael@0 19
michael@0 20 Original Libhnj source with OOo's patches are managed by Rene Engelhard
michael@0 21 and Chris Halls at Debian:
michael@0 22
michael@0 23 http://packages.debian.org/stable/libdevel/libhnj-dev
michael@0 24 and http://packages.debian.org/unstable/source/libhnj
michael@0 25
michael@0 26
michael@0 27 OTHER FILES
michael@0 28
michael@0 29 This distribution is the source of the en_US hyphenation patterns
michael@0 30 "hyph_en_US.dic", too. See README_hyph_en_US.txt.
michael@0 31
michael@0 32 Source files of hyph_en_US.dic in the distribution:
michael@0 33
michael@0 34 hyphen.tex (en_US hyphenation patterns from plain TeX)
michael@0 35
michael@0 36 Source: http://tug.ctan.org/text-archive/macros/plain/base/hyphen.tex
michael@0 37
michael@0 38 tbhyphext.tex: hyphenation exception log from TugBoat archive
michael@0 39
michael@0 40 Source of the hyphenation exception list:
michael@0 41 http://www.ctan.org/tex-archive/info/digests/tugboat/tb0hyf.tex
michael@0 42
michael@0 43 Generated with the hyphenex script
michael@0 44 (http://www.ctan.org/tex-archive/info/digests/tugboat/hyphenex.sh)
michael@0 45
michael@0 46 sh hyphenex.sh <tb0hyf.tex >tbhyphext.tex
michael@0 47
michael@0 48
michael@0 49 INSTALLATION
michael@0 50
michael@0 51 ./configure
michael@0 52 make
michael@0 53 make install
michael@0 54
michael@0 55 UNIT TESTS (WITH VALGRIND DEBUGGER)
michael@0 56
michael@0 57 make check
michael@0 58 VALGRIND=memcheck make check
michael@0 59
michael@0 60 USAGE
michael@0 61
michael@0 62 ./example hyph_en_US.dic mywords.txt
michael@0 63
michael@0 64 or (under Linux)
michael@0 65
michael@0 66 echo example | ./example hyph_en_US.dic /dev/stdin
michael@0 67
michael@0 68 NOTE: In the case of Unicode encoded input, convert your words
michael@0 69 to lowercase before hyphenation (under UTF-8 console environment):
michael@0 70
michael@0 71 cat mywords.txt | awk '{print tolower($0)}' >mywordslow.txt
michael@0 72
michael@0 73 DEVELOPMENT
michael@0 74
michael@0 75 See README.hyphen for hyphenation algorithm, README.nonstandard
michael@0 76 and doc/tb87nemeth.pdf for non-standard hyphenation,
michael@0 77 README.compound for compound word hyphenation, and tests/*.
michael@0 78
michael@0 79 Description of the dictionary format:
michael@0 80
michael@0 81 First line contains the character encoding (ISO8859-x, UTF-8).
michael@0 82
michael@0 83 Possible options in the following lines:
michael@0 84
michael@0 85 LEFTHYPHENMIN num minimal hyphenation distance from the left word end
michael@0 86 RIGHTHYPHENMIN num minimal hyphation distance from the right word end
michael@0 87 COMPOUNDLEFTHYPHENMIN num min. hyph. dist. from the left compound word boundary
michael@0 88 COMPOUNDRIGHTHYPHENMIN num min. hyph. dist. from the right comp. word boundary
michael@0 89
michael@0 90 hyphenation patterns see README.* files
michael@0 91
michael@0 92 NEXTWORD separate the two compound sets (see README.compound)
michael@0 93
michael@0 94 Default values:
michael@0 95 Without explicite declarations, hyphenmin fields of dict struct
michael@0 96 are zeroes, but in this case the lefthyphenmin and righthyphenmin
michael@0 97 will be the default 2 under the hyphenation (for backward compatibility).
michael@0 98
michael@0 99 Comments
michael@0 100
michael@0 101 Use percent sign at the beginning of the lines to add comments to your
michael@0 102 hpyhenation patterns (after the character encoding in the first line):
michael@0 103
michael@0 104 % comment
michael@0 105
michael@0 106 *****************************************************************************
michael@0 107 * Warning! Correct working of Libhnj *needs* prepared hyphenation patterns. *
michael@0 108
michael@0 109 For example, generating hyph_en_US.dic from "hyphen.us" TeX patterns:
michael@0 110
michael@0 111 perl substrings.pl hyphen.us hyph_en_US.dic ISO8859-1
michael@0 112
michael@0 113 or with default LEFTHYPHENMIN and RIGHTHYPHENMIN values:
michael@0 114
michael@0 115 perl substrings.pl hyphen.us hyph_en_US.dic ISO8859-1 2 3
michael@0 116 perl substrings.pl hyphen.gb hyph_en_GB.dic ISO8859-1 3 3
michael@0 117 ****************************************************************************
michael@0 118
michael@0 119 OTHERS
michael@0 120
michael@0 121 Java hyphenation: Peter B. West (Folio project) implements a hyphenator with
michael@0 122 non standard hyphenation facilities based on extended Libhnj. The HyFo module
michael@0 123 is released in binary form as jar files and in source form as zip files.
michael@0 124 See http://sourceforge.net/project/showfiles.php?group_id=119136
michael@0 125
michael@0 126 László Németh
michael@0 127 <nemeth (at) numbertext (dot) org>

mercurial