intl/hyphenation/src/README.compound

branch
TOR_BUG_3246
changeset 7
129ffea94266
equal deleted inserted replaced
-1:000000000000 0:07797551d3e8
1 New option of Libhyphen 2.7: NOHYPHEN
2
3 Hyphen, apostrophe and other characters may be word boundary characters,
4 but they don't need (extra) hyphenation. With NOHYPHEN option
5 it's possible to hyphenate the words parts correctly.
6
7 Example:
8
9 ISO8859-1
10 NOHYPHEN -,'
11 1-1
12 1'1
13 NEXTLEVEL
14
15 Description:
16
17 1-1 and 1'1 declare hyphen and apostrophe as word boundary characters
18 and NOHYPHEN with the comma separated character (or character sequence)
19 list forbid the (extra) hyphens at the hyphen and apostrophe characters.
20
21 Implicite NOHYPHEN declaration
22
23 Without explicite NEXTLEVEL declaration, Hyphen 2.8 uses the
24 previous settings, plus in UTF-8 encoding, endash (U+2013) and
25 typographical apostrophe (U+2019) are NOHYPHEN characters, too.
26
27 It's possible to enlarge the hyphenation distance from these
28 NOHYPHEN characters by using COMPOUNDLEFTHYPHENMIN and
29 COMPOUNDRIGHTHYPHENMIN attributes.
30
31 Compound word hyphenation
32
33 Hyphen library supports better compound word hyphenation and special
34 rules of compound word hyphenation of German languages and other
35 languages with arbitrary number of compound words. The new options,
36 COMPOUNDLEFTHYPHENMIN and COMPOUNDRIGHTHYPHENMIN help to set the right
37 style for the hyphenation of compound words.
38
39 Algorithm
40
41 The algorithm is an extension of the original pattern based hyphenation
42 algorithm. It uses two hyphenation pattern sets, defined in the same
43 pattern file and separated by the NEXTLEVEL keyword. First pattern
44 set is for hyphenation only at compound word boundaries, the second one
45 is for hyphenation within words or word parts.
46
47 Recursive compound level hyphenation
48
49 The algorithm is recursive: every word parts of a successful
50 first (compound) level hyphenation will be rehyphenated
51 by the same (first) pattern set.
52
53 Finally, when first level hyphenation is not possible, Hyphen uses
54 the second level hyphenation for the word or the word parts.
55
56 Word endings and word parts
57
58 Patterns for word endings (patterns with ellipses) match the
59 word parts, too.
60
61 Options
62
63 COMPOUNDLEFTHYPHENMIN: min. hyph. dist. from the left compound word boundary
64 COMPOUNDRIGHTHYPHENMIN: min. hyph. dist. from the right comp. word boundary
65 NEXTLEVEL: sign second level hyphenation patterns
66
67 Default hyphenmin values
68
69 Default values of COMPOUNDLEFTHYPHENMIN and COMPOUNDRIGHTHYPHENMIN are 0,
70 and 0 under the hyphenation, too. ("0" values of
71 LEFTHYPHENMIN and RIGHTHYPHENMIN mean the default "2" under the hyphenation.)
72
73 Examples
74
75 See tests/compound* test files.
76
77 Preparation of hyphenation patterns
78
79 It hasn't been special pattern generator tool for compound hyphenation
80 patterns, yet. It is possible to use PATGEN to generate both of
81 pattern sets, concatenate it manually and set the requested HYPHENMIN values.
82 (But don't forget the preprocessing steps by substrings.pl before
83 concatenation.) One of the disadvantage of this method, that PATGEN
84 doesn't know recursive compound hyphenation of Hyphen.
85
86 László Németh
87 <nemeth (at) openoffice.org>

mercurial