intl/icu/source/data/cldr-icu-readme.txt

Wed, 31 Dec 2014 07:22:50 +0100

author
Michael Schloh von Bennewitz <michael@schloh.com>
date
Wed, 31 Dec 2014 07:22:50 +0100
branch
TOR_BUG_3246
changeset 4
fc2d59ddac77
permissions
-rw-r--r--

Correct previous dual key logic pending first delivery installment.

michael@0 1 # Copyright (C) 2010-2013, International Business Machines Corporation and others.
michael@0 2 # All Rights Reserved.
michael@0 3 #
michael@0 4 # Commands for regenerating ICU4C locale data (.txt files) from CLDR.
michael@0 5 #
michael@0 6 # The process requires local copies of
michael@0 7 # - CLDR (the source of most of the data, and some Java tools)
michael@0 8 # - ICU4J (used only for checking the converted data)
michael@0 9 # - ICU4C (the destination for the new data, and the source for some of it)
michael@0 10 # (Either check out ICU4C from Subversion, or download the additional
michael@0 11 # icu4c-*-data.zip file so that the icu/source/data/ directory is fully
michael@0 12 # populated.)
michael@0 13 #
michael@0 14 # For an official CLDR data integration into ICU, these should be clean, freshly
michael@0 15 # checked-out. For released CLDR sources, an alternative to checking out sources
michael@0 16 # for a given version is downloading the zipped sources for the common (core.zip)
michael@0 17 # and tools (tools.zip) directory subtrees from the Data column in
michael@0 18 # [http://cldr.unicode.org/index/downloads].
michael@0 19 #
michael@0 20 # The versions of each of these must match. Included with the release notes for
michael@0 21 # ICU is the version number and/or a CLDR svn tag name for the revision of CLDR
michael@0 22 # that was the source of the data for that release of ICU.
michael@0 23 #
michael@0 24 # Note: Some versions of the OpenJDK will not build the CLDR java utilities.
michael@0 25 # If you see compilation errors complaining about type incompatibilities with
michael@0 26 # functions on generic classes, try switching to the Sun JDK.
michael@0 27 #
michael@0 28 # Besides a standard JDK, the process also requires ant
michael@0 29 # (http://ant.apache.org/),
michael@0 30 # plus the xml-apis.jar from the Apache xalan package
michael@0 31 # (http://xml.apache.org/xalan-j/downloads.html).
michael@0 32 #
michael@0 33 # Note: Enough things can (and will) fail in this process that it is best to
michael@0 34 # run the commands separately from an interactive shell. They should all
michael@0 35 # copy and paste without problems.
michael@0 36 #
michael@0 37 # It is often useful to save logs of the output of many of the steps in this
michael@0 38 # process. The commands below put log files in /tmp; you may want to put them
michael@0 39 # somewhere else.
michael@0 40 #
michael@0 41 #----
michael@0 42 #
michael@0 43 # There are several environment variables that need to be defined.
michael@0 44 #
michael@0 45 # a) Java- and ant-related variables
michael@0 46 #
michael@0 47 # JAVA_HOME: Path to JDK (a directory, containing e.g. bin/java, bin/javac,
michael@0 48 # etc.); on many systems this can be set using
michael@0 49 # `/usr/libexec/java_home`.
michael@0 50 #
michael@0 51 # ANT_OPTS: You may want to set:
michael@0 52 #
michael@0 53 # -Xmx1024m, to give Java more memory; otherwise it may run out
michael@0 54 # of heap.
michael@0 55 #
michael@0 56 # b) CLDR-related variables
michael@0 57 #
michael@0 58 # CLDR_DIR: Path to root of CLDR sources, below which are the common and
michael@0 59 # tools directories.
michael@0 60 # CLDR_CLASSES: Defined relative to CLDR_DIR. It only needs to be set if you
michael@0 61 # are not running ant jar for CLDR and have a non-default output
michael@0 62 # folder for cldr-tools classes.
michael@0 63 #
michael@0 64 # c) ICU-related variables
michael@0 65 # These variables only need to be set if you're directly reusing the
michael@0 66 # commands below.
michael@0 67 #
michael@0 68 # ICU4C_DIR: Path to root of ICU4C sources, below which is the source dir.
michael@0 69 #
michael@0 70 # ICU4J_ROOT: Path to root of ICU4J sources, below which is the main dir.
michael@0 71 #
michael@0 72 #----
michael@0 73 #
michael@0 74 # If you are adding or removing locales, or specific kinds of locale data,
michael@0 75 # there are some xml files in the ICU sources that need to be updated (these xml
michael@0 76 # files are used in addition to the CLDR files as inputs to the CLDR data build
michael@0 77 # process for ICU):
michael@0 78 #
michael@0 79 # icu/trunk/source/data/icu-config.xml - Update <locales> to add or remove
michael@0 80 # CLDR locales for inclusion in ICU. Update <paths> to prefer
michael@0 81 # alt forms for certain paths, or to exclude certain paths; note
michael@0 82 # that <paths> items can only have draft or alt attributes.
michael@0 83 #
michael@0 84 # Note that if a language-only locale (e.g. "de") is included in
michael@0 85 # <locales>, then all region sublocales for that language that
michael@0 86 # are present in CLDR data (e.g. "de_AT", "de_BE", "de_CH", etc.)
michael@0 87 # should also be included in <locales>, per PMC policy decision
michael@0 88 # 2012-05-02 (see http://bugs.icu-project.org/trac/ticket/9298).
michael@0 89 #
michael@0 90 # icu/trunk/source/data/build.xml - If you are adding or removing break
michael@0 91 # iterators, you need to update <fileset id="brkitr" ...> under
michael@0 92 # <target name="clean" ...> to clean the correct set of files.
michael@0 93 #
michael@0 94 # icu/trunk/source/data/xml/ - If you are adding a new locale, break
michael@0 95 # iterator, collation tailoring, or rule-based number formatter,
michael@0 96 # you may need to add a corresponding xml file in (respectively)
michael@0 97 # the main/, brkitr/, collation/, or rbnf/ subdirectory here.
michael@0 98 #
michael@0 99 #----
michael@0 100 #
michael@0 101 # For an official CLDR data integration into ICU, there are some additional
michael@0 102 # considerations:
michael@0 103 #
michael@0 104 # a) Don't commit anything in ICU sources (and possibly any changes in CLDR
michael@0 105 # sources, depending on their nature) until you have finished testing and
michael@0 106 # resolving build issues and test failures for both ICU4C and ICU4J.
michael@0 107 #
michael@0 108 # b) There are version numbers that may need manual updating in CLDR (other
michael@0 109 # version numbers get updated automatically, based on these):
michael@0 110 #
michael@0 111 # common/dtd/ldml.dtd - update cldrVersion
michael@0 112 # common/dtd/ldmlBCP47.dtd - update cldrVersion
michael@0 113 # common/dtd/ldmlSupplemental.dtd - update cldrVersion
michael@0 114 # tools/java/org/unicode/cldr/util/CLDRFile.java - update GEN_VERSION
michael@0 115 #
michael@0 116 # c) After everything is committed, you will need to tag the CLDR, ICU4J, and
michael@0 117 # ICU4C sources that ended up being used for the integration; see step 17
michael@0 118 # below.
michael@0 119 #
michael@0 120 ################################################################################
michael@0 121
michael@0 122 # 1a. Java and ant variables, adjust for your system
michael@0 123
michael@0 124 export JAVA_HOME=`/usr/libexec/java_home`
michael@0 125 export ANT_OPTS="-Xmx1024m
michael@0 126
michael@0 127 # 1b. CLDR variables, adjust for your setup; with cygwin it might be e.g.
michael@0 128 # CLDR_DIR=`cygpath -wp /build/cldr`
michael@0 129
michael@0 130 export CLDR_DIR=$HOME/cldr/trunk
michael@0 131 #export CLDR_CLASSES=$CLDR_DIR/tools/java/classes
michael@0 132
michael@0 133 # 1c. ICU variables
michael@0 134
michael@0 135 export ICU4C_DIR=$HOME/icu/icu/trunk
michael@0 136 export ICU4J_ROOT=$HOME/icu/icu4j/trunk
michael@0 137
michael@0 138 # 2. Build the CLDR Java tools
michael@0 139
michael@0 140 cd $CLDR_DIR/tools/java
michael@0 141 #cd $CLDR_DIR/cldr-tools
michael@0 142 ant jar
michael@0 143
michael@0 144 # 3. Configure ICU4C, build and test without new data first, to verify that
michael@0 145 # there are no pre-existing errors (configure shown here for MacOSX, adjust
michael@0 146 # for your platform).
michael@0 147
michael@0 148 cd $ICU4C_DIR/source
michael@0 149 ./runConfigureICU MacOSX
michael@0 150 make all 2>&1 | tee /tmp/icu4c-oldData-makeAll.txt
michael@0 151 make check 2>&1 | tee /tmp/icu4c-oldData-makeCheck.txt
michael@0 152
michael@0 153 # 4. Build the new ICU4C data files; these include .txt files and .mk files.
michael@0 154 # These new files will replace whatever was already present in the ICU4C sources.
michael@0 155 # This process uses ant with ICU's data/build.xml and data/icu-config.xml to
michael@0 156 # operate (via CLDR's ant/CLDRConverterTool.java and ant/CLDRBuild.java) the
michael@0 157 # necessary CLDR tools including LDML2ICUConverter, ConvertTransforms, etc.
michael@0 158 # This process will take several minutes.
michael@0 159 # Keep a log so you can investigate anything that looks suspicious.
michael@0 160
michael@0 161 cd $ICU4C_DIR/source/data
michael@0 162 ant clean
michael@0 163 ant all 2>&1 | tee /tmp/cldrNN-buildLog.txt
michael@0 164
michael@0 165 # 5. Check which data files have modifications, which have been added or removed
michael@0 166 # (if there are no changes, you may not need to proceed further). Make sure the
michael@0 167 # list seems reasonable.
michael@0 168
michael@0 169 svn status
michael@0 170
michael@0 171 # 6. Fix any errors, investigate any warnings. Some warnings are expected,
michael@0 172 # including warnings for missing versions in locale names which specify some
michael@0 173 # collationvariants, e.g.
michael@0 174 # [cldr-build] WARNING (ja_JP_TRADITIONAL): No version #??
michael@0 175 # [cldr-build] WARNING (zh_TW_STROKE): No version #??
michael@0 176 # and warnings for some empty collation bundles, e.g.
michael@0 177 # [cldr-build] WARNING (en): warning: No collations found. Bundle will ...
michael@0 178 # [cldr-build] WARNING (to): warning: No collations found. Bundle will ...
michael@0 179 #
michael@0 180 # Fixing may entail modifying CLDR source data or tools - for example,
michael@0 181 # updating the validSubLocales for collation data (file a bug if appropriate).
michael@0 182 # Repeat steps 4-5 until there are no build errors and no unexpected
michael@0 183 # warnings.
michael@0 184
michael@0 185 # 7. Now rebuild ICU4C with the new data and run make check tests.
michael@0 186 # Again, keep a log so you can investigate the errors.
michael@0 187
michael@0 188 cd $ICU4C_DIR/source
michael@0 189 make check 2>&1 | tee /tmp/icu4c-newData-makeCheck.txt
michael@0 190
michael@0 191 # 8. Investigate each test case failure. The first run processing new CLDR data
michael@0 192 # from the Survey Tool can result in thousands of failures (in many cases, one
michael@0 193 # CLDR data fix can resolve hundreds of test failures). If the error is caused
michael@0 194 # by bad CLDR data, then file a CLDR bug, fix the data, and regenerate from
michael@0 195 # step 4. If the data is OK but the testcase needs to be updated because the
michael@0 196 # data has legitimately changed, then update the testcase. You will check in
michael@0 197 # the updated testcases along with the new ICU data at the end of this process.
michael@0 198 # Note that if the new data has any differences in structure, you will have to
michael@0 199 # update test/testdata/structLocale.txt or /tsutil/cldrtest/TestLocaleStructure
michael@0 200 # may fail.
michael@0 201 # Repeat steps 4-7 until there are no errors.
michael@0 202
michael@0 203 # 9. Now run the make check tests in exhaustive mode:
michael@0 204
michael@0 205 cd $ICU4C_DIR/source
michael@0 206 export INTLTEST_OPTS="-e"
michael@0 207 export CINTLTST_OPTS="-e"
michael@0 208 make check 2>&1 | tee /tmp/icu4c-newData-makeCheckEx.txt
michael@0 209
michael@0 210 # 10. Again, investigate each failure, fixing CLDR data or ICU test cases as
michael@0 211 # appropriate, and repeating steps 4-7 and 9 until there are no errors.
michael@0 212
michael@0 213 # 11. Now with ICU4J, build and test without new data first, to verify that
michael@0 214 # there are no pre-existing errors (or at least to have the pre-existing errors
michael@0 215 # as a base for comparison):
michael@0 216
michael@0 217 cd $ICU4J_ROOT
michael@0 218 ant all 2>&1 | tee /tmp/icu4j-oldData-antAll.txt
michael@0 219 ant check 2>&1 | tee /tmp/icu4j-oldData-antCheck.txt
michael@0 220
michael@0 221 # 12. Now build the new data for ICU4J
michael@0 222
michael@0 223 cd $ICU4C_DIR/source/data
michael@0 224 make icu4j-data-install
michael@0 225
michael@0 226 # 13. Now rebuild ICU4J with the new data and run tests:
michael@0 227 # Keep a log so you can investigate the errors.
michael@0 228
michael@0 229 cd $ICU4J_ROOT
michael@0 230 ant check 2>&1 | tee /tmp/icu4j-newData-antCheck.txt
michael@0 231
michael@0 232 # 14. Investigate test case failures; fix test cases and repeat from step 12,
michael@0 233 # or fix CLDR data and repeat from step 4, as appropriate, until; there are no
michael@0 234 # more failures in ICU4C or ICU4J (except failures that were present before you
michael@0 235 # began testing the new CLDR data).
michael@0 236
michael@0 237 # 15. Check the file changes; then svn add or svn remove as necessary, and
michael@0 238 # commit the changes.
michael@0 239
michael@0 240 cd $ICU4C_DIR/source
michael@0 241 svn status
michael@0 242 # add or remove as necessary, then commit
michael@0 243
michael@0 244 cd $ICU4J_ROOT
michael@0 245 svn status
michael@0 246 # add or remove as necessary, then commit
michael@0 247
michael@0 248 # 16. For an official CLDR data integration into ICU, now tag the CLDR, ICU4J,
michael@0 249 # and ICU4C sources with an appropriate CLDR milestone (you can check previous
michael@0 250 # tags for format), e.g.:
michael@0 251
michael@0 252 svn copy svn+ssh://unicode.org/repos/cldr/trunk \
michael@0 253 svn+ssh://unicode.org/repos/cldr/tags/release-NNN \
michael@0 254 --parents -m "cldrbug nnnn: tag cldr sources for NNN"
michael@0 255
michael@0 256 svn copy svn+ssh://source.icu-project.org/repos/icu/icu4j/trunk \
michael@0 257 svn+ssh://source.icu-project.org/repos/icu/icu4j/tags/cldr-NNN \
michael@0 258 --parents -m 'ticket:mmmm: tag the version used for integrating CLDR NNN'
michael@0 259
michael@0 260 svn copy svn+ssh://source.icu-project.org/repos/icu/icu/trunk \
michael@0 261 svn+ssh://source.icu-project.org/repos/icu/icu/tags/cldr-NNN \
michael@0 262 --parents -m 'ticket:mmmm: tag the version used for integrating CLDR NNN'
michael@0 263

mercurial