michael@0: .\" Hey, Emacs! This is -*-nroff-*- you know... michael@0: .\" michael@0: .\" gendict.1: manual page for the gendict utility michael@0: .\" michael@0: .\" Copyright (C) 2012 International Business Machines Corporation and others michael@0: .\" michael@0: .TH GENDICT 1 "1 June 2012" "ICU MANPAGE" "ICU @VERSION@ Manual" michael@0: .SH NAME michael@0: .B gendict michael@0: \- Compiles word list into ICU string trie dictionary michael@0: .SH SYNOPSIS michael@0: .B gendict michael@0: [ michael@0: .BR "\fB\-\-uchars" michael@0: | michael@0: .BR "\fB\-\-bytes" michael@0: .BI "\fB\-\-transform" " transform" michael@0: ] michael@0: [ michael@0: .BR "\-h\fP, \fB\-?\fP, \fB\-\-help" michael@0: ] michael@0: [ michael@0: .BR "\-V\fP, \fB\-\-version" michael@0: ] michael@0: [ michael@0: .BR "\-c\fP, \fB\-\-copyright" michael@0: ] michael@0: [ michael@0: .BR "\-v\fP, \fB\-\-verbose" michael@0: ] michael@0: [ michael@0: .BI "\-i\fP, \fB\-\-icudatadir" " directory" michael@0: ] michael@0: .IR " input-file" michael@0: .IR " output\-file" michael@0: .SH DESCRIPTION michael@0: .B gendict michael@0: reads the word list from michael@0: .I dictionary-file michael@0: and creates a string trie dictionary file. Normally this data file has the michael@0: .B .dict michael@0: extension. michael@0: .PP michael@0: Words begin at the beginning of a line and are terminated by the first whitespace. michael@0: Lines that begin with whitespace are ignored. michael@0: .SH OPTIONS michael@0: .TP michael@0: .BR "\-h\fP, \fB\-?\fP, \fB\-\-help" michael@0: Print help about usage and exit. michael@0: .TP michael@0: .BR "\-V\fP, \fB\-\-version" michael@0: Print the version of michael@0: .B gendict michael@0: and exit. michael@0: .TP michael@0: .BR "\-c\fP, \fB\-\-copyright" michael@0: Embeds the standard ICU copyright into the michael@0: .IR output-file . michael@0: .TP michael@0: .BR "\-v\fP, \fB\-\-verbose" michael@0: Display extra informative messages during execution. michael@0: .TP michael@0: .BI "\-i\fP, \fB\-\-icudatadir" " directory" michael@0: Look for any necessary ICU data files in michael@0: .IR directory . michael@0: For example, the file michael@0: .B pnames.icu michael@0: must be located when ICU's data is not built as a shared library. michael@0: The default ICU data directory is specified by the environment variable michael@0: .BR ICU_DATA . michael@0: Most configurations of ICU do not require this argument. michael@0: .TP michael@0: .BR "\fB\-\-uchars" michael@0: Set the output trie type to UChar. Mutually exclusive with michael@0: .BR --bytes. michael@0: .TP michael@0: .BR "\fB\-\-bytes" michael@0: Set the output trie type to Bytes. Mutually exclusive with michael@0: .BR --uchars. michael@0: .TP michael@0: .BR "\fB\-\-transform" michael@0: Set the transform type. Should only be specified with michael@0: .BR --bytes. michael@0: Currently supported transforms are: michael@0: .BR offset-, michael@0: which specifies an offset to subtract from all input characters. michael@0: It should be noted that the offset transform also maps U+200D michael@0: to 0xFF and U+200C to 0xFE, in order to offer compatibility to michael@0: languages that require these characters. michael@0: A transform must be specified for a bytes trie, and when applied michael@0: to the non-value characters in the michael@0: .IR input-file michael@0: must produce output between 0x00 and 0xFF. michael@0: .TP michael@0: .BI " input\-file" michael@0: The source file to read. michael@0: .TP michael@0: .BI " output\-file" michael@0: The file to write the output dictionary to. michael@0: .SH CAVEATS michael@0: The michael@0: .IR input-file michael@0: is assumed to be encoded in UTF-8. michael@0: The integers in the michael@0: .IR input-file michael@0: that are used as values must be made up of ASCII digits. They michael@0: may be specified either in hex, by using a 0x prefix, or in michael@0: decimal. michael@0: Either michael@0: .BI --bytes michael@0: or michael@0: .BI --uchars michael@0: must be specified. michael@0: .SH ENVIRONMENT michael@0: .TP 10 michael@0: .B ICU_DATA michael@0: Specifies the directory containing ICU data. Defaults to michael@0: .BR @thepkgicudatadir@/@PACKAGE@/@VERSION@/ . michael@0: Some tools in ICU depend on the presence of the trailing slash. It is thus michael@0: important to make sure that it is present if michael@0: .B ICU_DATA michael@0: is set. michael@0: .SH AUTHORS michael@0: Maxime Serrano michael@0: .SH VERSION michael@0: 1.0 michael@0: .SH COPYRIGHT michael@0: Copyright (C) 2012 International Business Machines Corporation and others michael@0: .SH SEE ALSO michael@0: .BR http://www.icu-project.org/userguide/boundaryAnalysis.html michael@0: