intl/hyphenation/src/README.compound

Wed, 31 Dec 2014 07:22:50 +0100

author
Michael Schloh von Bennewitz <michael@schloh.com>
date
Wed, 31 Dec 2014 07:22:50 +0100
branch
TOR_BUG_3246
changeset 4
fc2d59ddac77
permissions
-rw-r--r--

Correct previous dual key logic pending first delivery installment.

michael@0 1 New option of Libhyphen 2.7: NOHYPHEN
michael@0 2
michael@0 3 Hyphen, apostrophe and other characters may be word boundary characters,
michael@0 4 but they don't need (extra) hyphenation. With NOHYPHEN option
michael@0 5 it's possible to hyphenate the words parts correctly.
michael@0 6
michael@0 7 Example:
michael@0 8
michael@0 9 ISO8859-1
michael@0 10 NOHYPHEN -,'
michael@0 11 1-1
michael@0 12 1'1
michael@0 13 NEXTLEVEL
michael@0 14
michael@0 15 Description:
michael@0 16
michael@0 17 1-1 and 1'1 declare hyphen and apostrophe as word boundary characters
michael@0 18 and NOHYPHEN with the comma separated character (or character sequence)
michael@0 19 list forbid the (extra) hyphens at the hyphen and apostrophe characters.
michael@0 20
michael@0 21 Implicite NOHYPHEN declaration
michael@0 22
michael@0 23 Without explicite NEXTLEVEL declaration, Hyphen 2.8 uses the
michael@0 24 previous settings, plus in UTF-8 encoding, endash (U+2013) and
michael@0 25 typographical apostrophe (U+2019) are NOHYPHEN characters, too.
michael@0 26
michael@0 27 It's possible to enlarge the hyphenation distance from these
michael@0 28 NOHYPHEN characters by using COMPOUNDLEFTHYPHENMIN and
michael@0 29 COMPOUNDRIGHTHYPHENMIN attributes.
michael@0 30
michael@0 31 Compound word hyphenation
michael@0 32
michael@0 33 Hyphen library supports better compound word hyphenation and special
michael@0 34 rules of compound word hyphenation of German languages and other
michael@0 35 languages with arbitrary number of compound words. The new options,
michael@0 36 COMPOUNDLEFTHYPHENMIN and COMPOUNDRIGHTHYPHENMIN help to set the right
michael@0 37 style for the hyphenation of compound words.
michael@0 38
michael@0 39 Algorithm
michael@0 40
michael@0 41 The algorithm is an extension of the original pattern based hyphenation
michael@0 42 algorithm. It uses two hyphenation pattern sets, defined in the same
michael@0 43 pattern file and separated by the NEXTLEVEL keyword. First pattern
michael@0 44 set is for hyphenation only at compound word boundaries, the second one
michael@0 45 is for hyphenation within words or word parts.
michael@0 46
michael@0 47 Recursive compound level hyphenation
michael@0 48
michael@0 49 The algorithm is recursive: every word parts of a successful
michael@0 50 first (compound) level hyphenation will be rehyphenated
michael@0 51 by the same (first) pattern set.
michael@0 52
michael@0 53 Finally, when first level hyphenation is not possible, Hyphen uses
michael@0 54 the second level hyphenation for the word or the word parts.
michael@0 55
michael@0 56 Word endings and word parts
michael@0 57
michael@0 58 Patterns for word endings (patterns with ellipses) match the
michael@0 59 word parts, too.
michael@0 60
michael@0 61 Options
michael@0 62
michael@0 63 COMPOUNDLEFTHYPHENMIN: min. hyph. dist. from the left compound word boundary
michael@0 64 COMPOUNDRIGHTHYPHENMIN: min. hyph. dist. from the right comp. word boundary
michael@0 65 NEXTLEVEL: sign second level hyphenation patterns
michael@0 66
michael@0 67 Default hyphenmin values
michael@0 68
michael@0 69 Default values of COMPOUNDLEFTHYPHENMIN and COMPOUNDRIGHTHYPHENMIN are 0,
michael@0 70 and 0 under the hyphenation, too. ("0" values of
michael@0 71 LEFTHYPHENMIN and RIGHTHYPHENMIN mean the default "2" under the hyphenation.)
michael@0 72
michael@0 73 Examples
michael@0 74
michael@0 75 See tests/compound* test files.
michael@0 76
michael@0 77 Preparation of hyphenation patterns
michael@0 78
michael@0 79 It hasn't been special pattern generator tool for compound hyphenation
michael@0 80 patterns, yet. It is possible to use PATGEN to generate both of
michael@0 81 pattern sets, concatenate it manually and set the requested HYPHENMIN values.
michael@0 82 (But don't forget the preprocessing steps by substrings.pl before
michael@0 83 concatenation.) One of the disadvantage of this method, that PATGEN
michael@0 84 doesn't know recursive compound hyphenation of Hyphen.
michael@0 85
michael@0 86 László Németh
michael@0 87 <nemeth (at) openoffice.org>

mercurial