Wed, 31 Dec 2014 07:22:50 +0100
Correct previous dual key logic pending first delivery installment.
michael@0 | 1 | New option of Libhyphen 2.7: NOHYPHEN |
michael@0 | 2 | |
michael@0 | 3 | Hyphen, apostrophe and other characters may be word boundary characters, |
michael@0 | 4 | but they don't need (extra) hyphenation. With NOHYPHEN option |
michael@0 | 5 | it's possible to hyphenate the words parts correctly. |
michael@0 | 6 | |
michael@0 | 7 | Example: |
michael@0 | 8 | |
michael@0 | 9 | ISO8859-1 |
michael@0 | 10 | NOHYPHEN -,' |
michael@0 | 11 | 1-1 |
michael@0 | 12 | 1'1 |
michael@0 | 13 | NEXTLEVEL |
michael@0 | 14 | |
michael@0 | 15 | Description: |
michael@0 | 16 | |
michael@0 | 17 | 1-1 and 1'1 declare hyphen and apostrophe as word boundary characters |
michael@0 | 18 | and NOHYPHEN with the comma separated character (or character sequence) |
michael@0 | 19 | list forbid the (extra) hyphens at the hyphen and apostrophe characters. |
michael@0 | 20 | |
michael@0 | 21 | Implicite NOHYPHEN declaration |
michael@0 | 22 | |
michael@0 | 23 | Without explicite NEXTLEVEL declaration, Hyphen 2.8 uses the |
michael@0 | 24 | previous settings, plus in UTF-8 encoding, endash (U+2013) and |
michael@0 | 25 | typographical apostrophe (U+2019) are NOHYPHEN characters, too. |
michael@0 | 26 | |
michael@0 | 27 | It's possible to enlarge the hyphenation distance from these |
michael@0 | 28 | NOHYPHEN characters by using COMPOUNDLEFTHYPHENMIN and |
michael@0 | 29 | COMPOUNDRIGHTHYPHENMIN attributes. |
michael@0 | 30 | |
michael@0 | 31 | Compound word hyphenation |
michael@0 | 32 | |
michael@0 | 33 | Hyphen library supports better compound word hyphenation and special |
michael@0 | 34 | rules of compound word hyphenation of German languages and other |
michael@0 | 35 | languages with arbitrary number of compound words. The new options, |
michael@0 | 36 | COMPOUNDLEFTHYPHENMIN and COMPOUNDRIGHTHYPHENMIN help to set the right |
michael@0 | 37 | style for the hyphenation of compound words. |
michael@0 | 38 | |
michael@0 | 39 | Algorithm |
michael@0 | 40 | |
michael@0 | 41 | The algorithm is an extension of the original pattern based hyphenation |
michael@0 | 42 | algorithm. It uses two hyphenation pattern sets, defined in the same |
michael@0 | 43 | pattern file and separated by the NEXTLEVEL keyword. First pattern |
michael@0 | 44 | set is for hyphenation only at compound word boundaries, the second one |
michael@0 | 45 | is for hyphenation within words or word parts. |
michael@0 | 46 | |
michael@0 | 47 | Recursive compound level hyphenation |
michael@0 | 48 | |
michael@0 | 49 | The algorithm is recursive: every word parts of a successful |
michael@0 | 50 | first (compound) level hyphenation will be rehyphenated |
michael@0 | 51 | by the same (first) pattern set. |
michael@0 | 52 | |
michael@0 | 53 | Finally, when first level hyphenation is not possible, Hyphen uses |
michael@0 | 54 | the second level hyphenation for the word or the word parts. |
michael@0 | 55 | |
michael@0 | 56 | Word endings and word parts |
michael@0 | 57 | |
michael@0 | 58 | Patterns for word endings (patterns with ellipses) match the |
michael@0 | 59 | word parts, too. |
michael@0 | 60 | |
michael@0 | 61 | Options |
michael@0 | 62 | |
michael@0 | 63 | COMPOUNDLEFTHYPHENMIN: min. hyph. dist. from the left compound word boundary |
michael@0 | 64 | COMPOUNDRIGHTHYPHENMIN: min. hyph. dist. from the right comp. word boundary |
michael@0 | 65 | NEXTLEVEL: sign second level hyphenation patterns |
michael@0 | 66 | |
michael@0 | 67 | Default hyphenmin values |
michael@0 | 68 | |
michael@0 | 69 | Default values of COMPOUNDLEFTHYPHENMIN and COMPOUNDRIGHTHYPHENMIN are 0, |
michael@0 | 70 | and 0 under the hyphenation, too. ("0" values of |
michael@0 | 71 | LEFTHYPHENMIN and RIGHTHYPHENMIN mean the default "2" under the hyphenation.) |
michael@0 | 72 | |
michael@0 | 73 | Examples |
michael@0 | 74 | |
michael@0 | 75 | See tests/compound* test files. |
michael@0 | 76 | |
michael@0 | 77 | Preparation of hyphenation patterns |
michael@0 | 78 | |
michael@0 | 79 | It hasn't been special pattern generator tool for compound hyphenation |
michael@0 | 80 | patterns, yet. It is possible to use PATGEN to generate both of |
michael@0 | 81 | pattern sets, concatenate it manually and set the requested HYPHENMIN values. |
michael@0 | 82 | (But don't forget the preprocessing steps by substrings.pl before |
michael@0 | 83 | concatenation.) One of the disadvantage of this method, that PATGEN |
michael@0 | 84 | doesn't know recursive compound hyphenation of Hyphen. |
michael@0 | 85 | |
michael@0 | 86 | László Németh |
michael@0 | 87 | <nemeth (at) openoffice.org> |