|
1 New option of Libhyphen 2.7: NOHYPHEN |
|
2 |
|
3 Hyphen, apostrophe and other characters may be word boundary characters, |
|
4 but they don't need (extra) hyphenation. With NOHYPHEN option |
|
5 it's possible to hyphenate the words parts correctly. |
|
6 |
|
7 Example: |
|
8 |
|
9 ISO8859-1 |
|
10 NOHYPHEN -,' |
|
11 1-1 |
|
12 1'1 |
|
13 NEXTLEVEL |
|
14 |
|
15 Description: |
|
16 |
|
17 1-1 and 1'1 declare hyphen and apostrophe as word boundary characters |
|
18 and NOHYPHEN with the comma separated character (or character sequence) |
|
19 list forbid the (extra) hyphens at the hyphen and apostrophe characters. |
|
20 |
|
21 Implicite NOHYPHEN declaration |
|
22 |
|
23 Without explicite NEXTLEVEL declaration, Hyphen 2.8 uses the |
|
24 previous settings, plus in UTF-8 encoding, endash (U+2013) and |
|
25 typographical apostrophe (U+2019) are NOHYPHEN characters, too. |
|
26 |
|
27 It's possible to enlarge the hyphenation distance from these |
|
28 NOHYPHEN characters by using COMPOUNDLEFTHYPHENMIN and |
|
29 COMPOUNDRIGHTHYPHENMIN attributes. |
|
30 |
|
31 Compound word hyphenation |
|
32 |
|
33 Hyphen library supports better compound word hyphenation and special |
|
34 rules of compound word hyphenation of German languages and other |
|
35 languages with arbitrary number of compound words. The new options, |
|
36 COMPOUNDLEFTHYPHENMIN and COMPOUNDRIGHTHYPHENMIN help to set the right |
|
37 style for the hyphenation of compound words. |
|
38 |
|
39 Algorithm |
|
40 |
|
41 The algorithm is an extension of the original pattern based hyphenation |
|
42 algorithm. It uses two hyphenation pattern sets, defined in the same |
|
43 pattern file and separated by the NEXTLEVEL keyword. First pattern |
|
44 set is for hyphenation only at compound word boundaries, the second one |
|
45 is for hyphenation within words or word parts. |
|
46 |
|
47 Recursive compound level hyphenation |
|
48 |
|
49 The algorithm is recursive: every word parts of a successful |
|
50 first (compound) level hyphenation will be rehyphenated |
|
51 by the same (first) pattern set. |
|
52 |
|
53 Finally, when first level hyphenation is not possible, Hyphen uses |
|
54 the second level hyphenation for the word or the word parts. |
|
55 |
|
56 Word endings and word parts |
|
57 |
|
58 Patterns for word endings (patterns with ellipses) match the |
|
59 word parts, too. |
|
60 |
|
61 Options |
|
62 |
|
63 COMPOUNDLEFTHYPHENMIN: min. hyph. dist. from the left compound word boundary |
|
64 COMPOUNDRIGHTHYPHENMIN: min. hyph. dist. from the right comp. word boundary |
|
65 NEXTLEVEL: sign second level hyphenation patterns |
|
66 |
|
67 Default hyphenmin values |
|
68 |
|
69 Default values of COMPOUNDLEFTHYPHENMIN and COMPOUNDRIGHTHYPHENMIN are 0, |
|
70 and 0 under the hyphenation, too. ("0" values of |
|
71 LEFTHYPHENMIN and RIGHTHYPHENMIN mean the default "2" under the hyphenation.) |
|
72 |
|
73 Examples |
|
74 |
|
75 See tests/compound* test files. |
|
76 |
|
77 Preparation of hyphenation patterns |
|
78 |
|
79 It hasn't been special pattern generator tool for compound hyphenation |
|
80 patterns, yet. It is possible to use PATGEN to generate both of |
|
81 pattern sets, concatenate it manually and set the requested HYPHENMIN values. |
|
82 (But don't forget the preprocessing steps by substrings.pl before |
|
83 concatenation.) One of the disadvantage of this method, that PATGEN |
|
84 doesn't know recursive compound hyphenation of Hyphen. |
|
85 |
|
86 László Németh |
|
87 <nemeth (at) openoffice.org> |