1.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 1.2 +++ b/intl/hyphenation/src/README.compound Wed Dec 31 06:09:35 2014 +0100 1.3 @@ -0,0 +1,87 @@ 1.4 +New option of Libhyphen 2.7: NOHYPHEN 1.5 + 1.6 +Hyphen, apostrophe and other characters may be word boundary characters, 1.7 +but they don't need (extra) hyphenation. With NOHYPHEN option 1.8 +it's possible to hyphenate the words parts correctly. 1.9 + 1.10 +Example: 1.11 + 1.12 +ISO8859-1 1.13 +NOHYPHEN -,' 1.14 +1-1 1.15 +1'1 1.16 +NEXTLEVEL 1.17 + 1.18 +Description: 1.19 + 1.20 +1-1 and 1'1 declare hyphen and apostrophe as word boundary characters 1.21 +and NOHYPHEN with the comma separated character (or character sequence) 1.22 +list forbid the (extra) hyphens at the hyphen and apostrophe characters. 1.23 + 1.24 +Implicite NOHYPHEN declaration 1.25 + 1.26 +Without explicite NEXTLEVEL declaration, Hyphen 2.8 uses the 1.27 +previous settings, plus in UTF-8 encoding, endash (U+2013) and 1.28 +typographical apostrophe (U+2019) are NOHYPHEN characters, too. 1.29 + 1.30 +It's possible to enlarge the hyphenation distance from these 1.31 +NOHYPHEN characters by using COMPOUNDLEFTHYPHENMIN and 1.32 +COMPOUNDRIGHTHYPHENMIN attributes. 1.33 + 1.34 +Compound word hyphenation 1.35 + 1.36 +Hyphen library supports better compound word hyphenation and special 1.37 +rules of compound word hyphenation of German languages and other 1.38 +languages with arbitrary number of compound words. The new options, 1.39 +COMPOUNDLEFTHYPHENMIN and COMPOUNDRIGHTHYPHENMIN help to set the right 1.40 +style for the hyphenation of compound words. 1.41 + 1.42 +Algorithm 1.43 + 1.44 +The algorithm is an extension of the original pattern based hyphenation 1.45 +algorithm. It uses two hyphenation pattern sets, defined in the same 1.46 +pattern file and separated by the NEXTLEVEL keyword. First pattern 1.47 +set is for hyphenation only at compound word boundaries, the second one 1.48 +is for hyphenation within words or word parts. 1.49 + 1.50 +Recursive compound level hyphenation 1.51 + 1.52 +The algorithm is recursive: every word parts of a successful 1.53 +first (compound) level hyphenation will be rehyphenated 1.54 +by the same (first) pattern set. 1.55 + 1.56 +Finally, when first level hyphenation is not possible, Hyphen uses 1.57 +the second level hyphenation for the word or the word parts. 1.58 + 1.59 +Word endings and word parts 1.60 + 1.61 +Patterns for word endings (patterns with ellipses) match the 1.62 +word parts, too. 1.63 + 1.64 +Options 1.65 + 1.66 +COMPOUNDLEFTHYPHENMIN: min. hyph. dist. from the left compound word boundary 1.67 +COMPOUNDRIGHTHYPHENMIN: min. hyph. dist. from the right comp. word boundary 1.68 +NEXTLEVEL: sign second level hyphenation patterns 1.69 + 1.70 +Default hyphenmin values 1.71 + 1.72 +Default values of COMPOUNDLEFTHYPHENMIN and COMPOUNDRIGHTHYPHENMIN are 0, 1.73 +and 0 under the hyphenation, too. ("0" values of 1.74 +LEFTHYPHENMIN and RIGHTHYPHENMIN mean the default "2" under the hyphenation.) 1.75 + 1.76 +Examples 1.77 + 1.78 +See tests/compound* test files. 1.79 + 1.80 +Preparation of hyphenation patterns 1.81 + 1.82 +It hasn't been special pattern generator tool for compound hyphenation 1.83 +patterns, yet. It is possible to use PATGEN to generate both of 1.84 +pattern sets, concatenate it manually and set the requested HYPHENMIN values. 1.85 +(But don't forget the preprocessing steps by substrings.pl before 1.86 +concatenation.) One of the disadvantage of this method, that PATGEN 1.87 +doesn't know recursive compound hyphenation of Hyphen. 1.88 + 1.89 +László Németh 1.90 +<nemeth (at) openoffice.org>