intl/hyphenation/src/README.compound

changeset 0
6474c204b198
     1.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     1.2 +++ b/intl/hyphenation/src/README.compound	Wed Dec 31 06:09:35 2014 +0100
     1.3 @@ -0,0 +1,87 @@
     1.4 +New option of Libhyphen 2.7: NOHYPHEN
     1.5 +
     1.6 +Hyphen, apostrophe and other characters may be word boundary characters,
     1.7 +but they don't need (extra) hyphenation. With NOHYPHEN option
     1.8 +it's possible to hyphenate the words parts correctly.
     1.9 +
    1.10 +Example:
    1.11 +
    1.12 +ISO8859-1
    1.13 +NOHYPHEN -,'
    1.14 +1-1
    1.15 +1'1
    1.16 +NEXTLEVEL
    1.17 +
    1.18 +Description:
    1.19 +
    1.20 +1-1 and 1'1 declare hyphen and apostrophe as word boundary characters
    1.21 +and NOHYPHEN with the comma separated character (or character sequence)
    1.22 +list forbid the (extra) hyphens at the hyphen and apostrophe characters.
    1.23 +
    1.24 +Implicite NOHYPHEN declaration
    1.25 +
    1.26 +Without explicite NEXTLEVEL declaration, Hyphen 2.8 uses the
    1.27 +previous settings, plus in UTF-8 encoding, endash (U+2013) and
    1.28 +typographical apostrophe (U+2019) are NOHYPHEN characters, too.
    1.29 +
    1.30 +It's possible to enlarge the hyphenation distance from these
    1.31 +NOHYPHEN characters by using COMPOUNDLEFTHYPHENMIN and
    1.32 +COMPOUNDRIGHTHYPHENMIN attributes.
    1.33 +
    1.34 +Compound word hyphenation
    1.35 +
    1.36 +Hyphen library supports better compound word hyphenation and special
    1.37 +rules of compound word hyphenation of German languages and other
    1.38 +languages with arbitrary number of compound words. The new options,
    1.39 +COMPOUNDLEFTHYPHENMIN and COMPOUNDRIGHTHYPHENMIN help to set the right
    1.40 +style for the hyphenation of compound words.
    1.41 +
    1.42 +Algorithm
    1.43 +
    1.44 +The algorithm is an extension of the original pattern based hyphenation
    1.45 +algorithm. It uses two hyphenation pattern sets, defined in the same
    1.46 +pattern file and separated by the NEXTLEVEL keyword. First pattern
    1.47 +set is for hyphenation only at compound word boundaries, the second one
    1.48 +is for hyphenation within words or word parts.
    1.49 +
    1.50 +Recursive compound level hyphenation
    1.51 +
    1.52 +The algorithm is recursive: every word parts of a successful 
    1.53 +first (compound) level hyphenation will be rehyphenated
    1.54 +by the same (first) pattern set.
    1.55 +
    1.56 +Finally, when first level hyphenation is not possible, Hyphen uses
    1.57 +the second level hyphenation for the word or the word parts.
    1.58 +
    1.59 +Word endings and word parts
    1.60 +
    1.61 +Patterns for word endings (patterns with ellipses) match the
    1.62 +word parts, too.
    1.63 +
    1.64 +Options
    1.65 +
    1.66 +COMPOUNDLEFTHYPHENMIN: min. hyph. dist. from the left compound word boundary
    1.67 +COMPOUNDRIGHTHYPHENMIN: min. hyph. dist. from the right comp. word boundary
    1.68 +NEXTLEVEL: sign second level hyphenation patterns
    1.69 +
    1.70 +Default hyphenmin values
    1.71 +
    1.72 +Default values of COMPOUNDLEFTHYPHENMIN and COMPOUNDRIGHTHYPHENMIN are 0,
    1.73 +and 0 under the hyphenation, too. ("0" values of
    1.74 +LEFTHYPHENMIN and RIGHTHYPHENMIN mean the default "2" under the hyphenation.)
    1.75 +
    1.76 +Examples
    1.77 +
    1.78 +See tests/compound* test files.
    1.79 +
    1.80 +Preparation of hyphenation patterns
    1.81 +
    1.82 +It hasn't been special pattern generator tool for compound hyphenation
    1.83 +patterns, yet. It is possible to use PATGEN to generate both of
    1.84 +pattern sets, concatenate it manually and set the requested HYPHENMIN values.
    1.85 +(But don't forget the preprocessing steps by substrings.pl before
    1.86 +concatenation.) One of the disadvantage of this method, that PATGEN
    1.87 +doesn't know recursive compound hyphenation of Hyphen.
    1.88 +
    1.89 +László Németh
    1.90 +<nemeth (at) openoffice.org>

mercurial