intl/hyphenation/src/README.compound

Wed, 31 Dec 2014 07:22:50 +0100

author
Michael Schloh von Bennewitz <michael@schloh.com>
date
Wed, 31 Dec 2014 07:22:50 +0100
branch
TOR_BUG_3246
changeset 4
fc2d59ddac77
permissions
-rw-r--r--

Correct previous dual key logic pending first delivery installment.

     1 New option of Libhyphen 2.7: NOHYPHEN
     3 Hyphen, apostrophe and other characters may be word boundary characters,
     4 but they don't need (extra) hyphenation. With NOHYPHEN option
     5 it's possible to hyphenate the words parts correctly.
     7 Example:
     9 ISO8859-1
    10 NOHYPHEN -,'
    11 1-1
    12 1'1
    13 NEXTLEVEL
    15 Description:
    17 1-1 and 1'1 declare hyphen and apostrophe as word boundary characters
    18 and NOHYPHEN with the comma separated character (or character sequence)
    19 list forbid the (extra) hyphens at the hyphen and apostrophe characters.
    21 Implicite NOHYPHEN declaration
    23 Without explicite NEXTLEVEL declaration, Hyphen 2.8 uses the
    24 previous settings, plus in UTF-8 encoding, endash (U+2013) and
    25 typographical apostrophe (U+2019) are NOHYPHEN characters, too.
    27 It's possible to enlarge the hyphenation distance from these
    28 NOHYPHEN characters by using COMPOUNDLEFTHYPHENMIN and
    29 COMPOUNDRIGHTHYPHENMIN attributes.
    31 Compound word hyphenation
    33 Hyphen library supports better compound word hyphenation and special
    34 rules of compound word hyphenation of German languages and other
    35 languages with arbitrary number of compound words. The new options,
    36 COMPOUNDLEFTHYPHENMIN and COMPOUNDRIGHTHYPHENMIN help to set the right
    37 style for the hyphenation of compound words.
    39 Algorithm
    41 The algorithm is an extension of the original pattern based hyphenation
    42 algorithm. It uses two hyphenation pattern sets, defined in the same
    43 pattern file and separated by the NEXTLEVEL keyword. First pattern
    44 set is for hyphenation only at compound word boundaries, the second one
    45 is for hyphenation within words or word parts.
    47 Recursive compound level hyphenation
    49 The algorithm is recursive: every word parts of a successful 
    50 first (compound) level hyphenation will be rehyphenated
    51 by the same (first) pattern set.
    53 Finally, when first level hyphenation is not possible, Hyphen uses
    54 the second level hyphenation for the word or the word parts.
    56 Word endings and word parts
    58 Patterns for word endings (patterns with ellipses) match the
    59 word parts, too.
    61 Options
    63 COMPOUNDLEFTHYPHENMIN: min. hyph. dist. from the left compound word boundary
    64 COMPOUNDRIGHTHYPHENMIN: min. hyph. dist. from the right comp. word boundary
    65 NEXTLEVEL: sign second level hyphenation patterns
    67 Default hyphenmin values
    69 Default values of COMPOUNDLEFTHYPHENMIN and COMPOUNDRIGHTHYPHENMIN are 0,
    70 and 0 under the hyphenation, too. ("0" values of
    71 LEFTHYPHENMIN and RIGHTHYPHENMIN mean the default "2" under the hyphenation.)
    73 Examples
    75 See tests/compound* test files.
    77 Preparation of hyphenation patterns
    79 It hasn't been special pattern generator tool for compound hyphenation
    80 patterns, yet. It is possible to use PATGEN to generate both of
    81 pattern sets, concatenate it manually and set the requested HYPHENMIN values.
    82 (But don't forget the preprocessing steps by substrings.pl before
    83 concatenation.) One of the disadvantage of this method, that PATGEN
    84 doesn't know recursive compound hyphenation of Hyphen.
    86 László Németh
    87 <nemeth (at) openoffice.org>

mercurial