1.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 1.2 +++ b/intl/icu/source/data/unidata/changes.txt Wed Dec 31 06:09:35 2014 +0100 1.3 @@ -0,0 +1,1693 @@ 1.4 +* Copyright (C) 2004-2013, International Business Machines 1.5 +* Corporation and others. All Rights Reserved. 1.6 +* 1.7 +* file name: changes.txt 1.8 +* encoding: US-ASCII 1.9 +* tab size: 8 (not used) 1.10 +* indentation:4 1.11 +* 1.12 +* created on: 2004may06 1.13 +* created by: Markus W. Scherer 1.14 +* 1.15 +* change log for Unicode updates 1.16 + 1.17 +---------------------------------------------------------------------------- *** 1.18 + 1.19 +Unicode 6.3 update 1.20 + 1.21 +http://www.unicode.org/review/pri249/ -- beta review 1.22 +http://www.unicode.org/reports/uax-proposed-updates.html 1.23 +http://www.unicode.org/versions/beta-6.3.0.html#notable_issues 1.24 +http://www.unicode.org/reports/tr44/tr44-11.html 1.25 + 1.26 +*** ICU Trac 1.27 + 1.28 +- ticket 10128: update ICU to Unicode 6.3 beta 1.29 +- ticket 10168: update ICU to Unicode 6.3 final 1.30 +- C++ branches/markus/uni63 at r33552 from trunk at r33551 1.31 +- Java branches/markus/uni63 at r33550 from trunk at r33553 1.32 + 1.33 +- ticket 10142: implement Unicode 6.3 bidi algorithm additions 1.34 + 1.35 +*** Unicode version numbers 1.36 +- makedata.mak 1.37 +- uchar.h 1.38 + (configure.in & configure: have been modified to extract the version from uchar.h) 1.39 +- com.ibm.icu.util.VersionInfo 1.40 +- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_ 1.41 + 1.42 +- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h 1.43 + so that the makefiles see the new version number. 1.44 + 1.45 +*** data files & enums & parser code 1.46 + 1.47 +* file preparation 1.48 + 1.49 +- download UCD, UCA & IDNA files 1.50 +- make sure that the Unicode data folder passed into preparseucd.py 1.51 + includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder) 1.52 +- modify preparseucd.py: 1.53 + parse new file BidiBrackets.txt 1.54 + with new properties bpb=Bidi_Paired_Bracket and bpt=Bidi_Paired_Bracket_Type 1.55 +- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni63/20130425 ~/svn.icu/uni63/src ~/svn.icutools/trunk/src 1.56 +- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders. 1.57 +- Check test file diffs for previously commented-out, known-failing data lines; 1.58 + probably need to keep those commented out. 1.59 + 1.60 +* PropertyAliases.txt changes 1.61 +- 1 new Enumerated Property 1.62 + bpt ; Bidi_Paired_Bracket_Type 1.63 + -> uchar.h & UProperty.java & UCharacter.BidiPairedBracketType 1.64 + -> ubidi_props.h & .c & UBiDiProps.java 1.65 + -> remember to write the max value at UBIDI_MAX_VALUES_INDEX 1.66 + -> uprops.cpp 1.67 + -> change ubidi.icu format version from 2.0 to 2.1 1.68 +- 1 new Miscellaneous Property 1.69 + bpb ; Bidi_Paired_Bracket 1.70 + -> uchar.h & UProperty.java 1.71 + -> ppucd.h & .cpp 1.72 + 1.73 +* PropertyValueAliases.txt changes 1.74 +- 3 Bidi_Paired_Bracket_Type (bpt) values: 1.75 + bpt; c ; Close 1.76 + bpt; n ; None 1.77 + bpt; o ; Open 1.78 + -> uchar.h & UCharacter.BidiPairedBracketType 1.79 + -> ubidi_props.h & .c & UBiDiProps.java 1.80 + -> change ubidi.icu format version from 2.0 to 2.1 1.81 +- 4 new Bidi_Class (bc) values: 1.82 + bc ; FSI ; First_Strong_Isolate 1.83 + bc ; LRI ; Left_To_Right_Isolate 1.84 + bc ; RLI ; Right_To_Left_Isolate 1.85 + bc ; PDI ; Pop_Directional_Isolate 1.86 + -> uchar.h & UCharacterEnums.ECharacterDirection 1.87 + -> until the bidi code gets updated, 1.88 + Roozbeh suggests mapping the new bc values to ON (Other_Neutral) 1.89 +- 3 new Word_Break (WB) values: 1.90 + WB ; HL ; Hebrew_Letter 1.91 + WB ; SQ ; Single_Quote 1.92 + WB ; DQ ; Double_Quote 1.93 + -> uchar.h & UCharacter.WordBreak 1.94 + -> first time Word_Break numeric constants exceed 4 bits (now 17 values) 1.95 +- 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html 1.96 + (added 2012-10-16) 1.97 + Aghb 239 Caucasian Albanian 1.98 + Mahj 314 Mahajani 1.99 + -> uscript.h 1.100 + -> com.ibm.icu.lang.UScript 1.101 + find USCRIPT_([^ ]+) *= ([0-9]+),(.+) 1.102 + replace public static final int \1 = \2;\3 1.103 + -> preparseucd.py _scripts_only_in_iso15924 1.104 + -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI() 1.105 + and in com.ibm.icu.dev.test.lang.TestUScript.java 1.106 + -> update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata 1.107 + (not strictly necessary for NOT_ENCODED scripts) 1.108 + 1.109 +* generate normalization data files 1.110 +- ~/svn.icu/uni63/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni63/dbg/lib 1.111 +- ~/svn.icu/uni63/dbg$ SRC_DATA_IN=~/svn.icu/uni63/src/source/data/in 1.112 +- ~/svn.icu/uni63/dbg$ UNIDATA=~/svn.icu/uni63/src/source/data/unidata 1.113 +- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt 1.114 +- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt 1.115 +- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt 1.116 +- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt 1.117 + 1.118 +* build ICU (make install) 1.119 + so that the tools build can pick up the new definitions from the installed header files. 1.120 + 1.121 +~/svn.icu/uni63/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt 1.122 + 1.123 +* build Unicode tools using CMake+make 1.124 + 1.125 +~/svn.icutools/trunk/src/unicode/c/icudefs.txt: 1.126 + 1.127 +# Location (--prefix) of where ICU was installed. 1.128 +set(ICU_INST_DIR /home/mscherer/svn.icu/uni63/inst) 1.129 +# Location of the ICU source tree. 1.130 +set(ICU_SRC_DIR /home/mscherer/svn.icu/uni63/src) 1.131 + 1.132 +~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c 1.133 +~/svn.icutools/trunk/dbg/unicode/c$ make 1.134 + 1.135 +* generate core properties data files 1.136 +- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops ~/svn.icu/uni63/src 1.137 +- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca -i ~/svn.icu/uni63/dbg/data/out/build/icudt52l ~/svn.icu/uni63/src 1.138 +- rebuild ICU (make install) & tools 1.139 +- run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm 1.140 +- rebuild ICU (make install) & tools 1.141 + 1.142 +* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to 1.143 + sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar) 1.144 +- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters 1.145 +- Unicode 6.0..6.3: U+2260, U+226E, U+226F 1.146 +- nothing new in 6.3, no test file to update 1.147 + 1.148 +* update Java data files 1.149 +- refresh just the UCD-related files, just to be safe 1.150 +- see (ICU4C)/source/data/icu4j-readme.txt 1.151 +- mkdir /tmp/icu4j 1.152 +- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 1.153 + output: 1.154 + ... 1.155 + Unicode .icu files built to ./out/build/icudt52l 1.156 + mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt52b 1.157 + mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b 1.158 + echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt 1.159 + LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt52l.dat ./out/icu4j/icudt52b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt52l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt52b 1.160 + mv ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b" 1.161 + jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt52b/ 1.162 + mkdir -p /tmp/icu4j/main/shared/data 1.163 + cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data 1.164 + jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt52b/ 1.165 + mkdir -p /tmp/icu4j/main/shared/data 1.166 + cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data 1.167 + make[1]: Leaving directory `/home/mscherer/svn.icu/uni63/dbg/data' 1.168 +- copy the big-endian Unicode data files to another location, 1.169 + separate from the other data files 1.170 + mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll 1.171 + mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr 1.172 + ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b 1.173 + ~/svn.icu/uni63/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/cnvalias.icu 1.174 + ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b 1.175 + ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll 1.176 + ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr 1.177 +- refresh ICU4J 1.178 + ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b 1.179 + 1.180 +* refresh Java test .txt files 1.181 +- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode 1.182 + 1.183 +* UCA -- mostly skipped for ICU 52 / Unicode 6.3, except update coll/* files 1.184 + 1.185 +- get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/ 1.186 +- CLDR root files for ICU are in CollationAuxiliary.zip; unpack that 1.187 +- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt 1.188 +- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt 1.189 + (note removing the underscore before "Rules") 1.190 +- update (ICU4C)/source/test/testdata/CollationTest_*.txt 1.191 + and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt 1.192 + with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt) 1.193 +- check test file diffs for previously commented-out, known-failing data lines; 1.194 + probably need to keep those commented out 1.195 +- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani 1.196 +- run genuca, see command line above 1.197 +- rebuild ICU4C 1.198 +- refresh ICU4J collation data: 1.199 + (subset of instructions above for properties data refresh, except copies all coll/*) 1.200 + ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 1.201 + ~/svn.icu/uni63/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll 1.202 + ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll 1.203 + ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b 1.204 +- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging) 1.205 +- note on intltest: if collate/UCAConformanceTest fails, then 1.206 + utility/MultithreadTest/TestCollators will fail as well; 1.207 + fix the conformance test before looking into the multi-thread test 1.208 + 1.209 +* test ICU, fix test code where necessary 1.210 + 1.211 +* When refreshing all of ICU4J data from ICU4C 1.212 +- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 1.213 +- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data 1.214 +or 1.215 +- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install 1.216 + 1.217 +*** LayoutEngine script information 1.218 +- skipped for Unicode 6.3: no new scripts 1.219 + 1.220 +*** merge the Unicode update branches back onto the trunk 1.221 +- do not merge the icudata.jar and testdata.jar, 1.222 + instead rebuild them from merged & tested ICU4C 1.223 + 1.224 +---------------------------------------------------------------------------- *** 1.225 + 1.226 +Unicode 6.2 update 1.227 + 1.228 +http://www.unicode.org/review/pri230/ 1.229 +http://www.unicode.org/versions/beta-6.2.0.html 1.230 +http://www.unicode.org/reports/tr44/tr44-9.html#Unicode_6.2.0 1.231 +http://www.unicode.org/review/pri227/ Changes to Script Extensions Property Values 1.232 +http://www.unicode.org/review/pri228/ Changing some common characters from Punctuation to Symbol 1.233 +http://www.unicode.org/review/pri229/ Linebreaking Changes for Pictographic Symbols 1.234 +http://www.unicode.org/reports/tr46/tr46-8.html IDNA 1.235 +http://unicode.org/Public/idna/6.2.0/ 1.236 + 1.237 +*** ICU Trac 1.238 + 1.239 +- ticket 9515: Unicode 6.2: final ICU update 1.240 + 1.241 +- ticket 9514: UCA 6.2: fix UCARules.txt 1.242 + 1.243 +- ticket 9437: update ICU to Unicode 6.2 1.244 +- C++ branches/markus/uni62 at r32050 from trunk at r32041 1.245 +- Java branches/markus/uni62 at r32068 from trunk at r32066 1.246 + 1.247 +*** Unicode version numbers 1.248 +- makedata.mak 1.249 +- uchar.h 1.250 + (configure.in & configure: have been modified to extract the version from uchar.h) 1.251 +- com.ibm.icu.util.VersionInfo 1.252 +- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_ 1.253 + 1.254 +*** data files & enums & parser code 1.255 + 1.256 +* file preparation 1.257 + 1.258 +- download UCD, UCA & IDNA files 1.259 +- make sure that the Unicode data folder passed into preparseucd.py 1.260 + includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder) 1.261 +- modify preparseucd.py: NamesList.txt is now in UTF-8 1.262 +- ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni62/20120816 ~/svn.icu/uni62/src ~/svn.icu/tools/trunk/src 1.263 +- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders. 1.264 +- Check test file diffs for previously commented-out, known-failing data lines; 1.265 + probably need to keep those commented out. 1.266 + 1.267 +* PropertyValueAliases.txt changes 1.268 +- 1 new Line_Break (lb) value: 1.269 + lb ; RI ; Regional_Indicator 1.270 + -> uchar.h & UCharacter.LineBreak 1.271 +- 1 new Word_Break (WB) value: 1.272 + WB ; RI ; Regional_Indicator 1.273 + -> uchar.h & UCharacter.WordBreak 1.274 +- 1 new Grapheme_Cluster_Break (GCB) value: 1.275 + GCB; RI ; Regional_Indicator 1.276 + -> uchar.h & UCharacter.GraphemeClusterBreak 1.277 + 1.278 +* 3 new numeric values 1.279 + The new value -1, which was really supposed to be NaN but that would have required 1.280 + new UnicodeData.txt syntax, can already be represented as a "fraction" of -1/1, 1.281 + but encodeNumericValue() in corepropsbuilder.cpp had to be fixed. 1.282 + cp;12456;na=CUNEIFORM NUMERIC SIGN NIGIDAMIN;nv=-1 1.283 + cp;12457;na=CUNEIFORM NUMERIC SIGN NIGIDAESH;nv=-1 1.284 + The two new values 216000 and 432000 require an addition to the encoding of numeric values. 1.285 + cp;12432;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS DISH;nv=216000 1.286 + cp;12433;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS MIN;nv=432000 1.287 + -> uprops.h, uchar.c & UCharacterProperty.java 1.288 + -> cucdtst.c & UCharacterTest.java 1.289 + 1.290 +* generate normalization data files 1.291 +- ~/svn.icu/uni62/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni62/dbg/lib 1.292 +- ~/svn.icu/uni62/dbg$ SRC_DATA_IN=~/svn.icu/uni62/src/source/data/in 1.293 +- ~/svn.icu/uni62/dbg$ UNIDATA=~/svn.icu/uni62/src/source/data/unidata 1.294 +- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt 1.295 +- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt 1.296 +- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt 1.297 +- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt 1.298 + 1.299 +* build ICU (make install) 1.300 + so that the tools build can pick up the new definitions from the installed header files. 1.301 +* build Unicode tools using CMake+make 1.302 + 1.303 +* generate core properties data files 1.304 +- ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/uni62/src 1.305 +- in initial bootstrapping, change the UCA version 1.306 + in source/data/unidata/FractionalUCA.txt to match the new Unicode version 1.307 +- ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/uni62/dbg/data/out/build/icudt50l ~/svn.icu/uni62/src 1.308 +- rebuild ICU (make install) & tools 1.309 + + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR, 1.310 + check if the UCA version in FractionalUCA.txt matches the new Unicode version 1.311 + (see step above) 1.312 +- run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm 1.313 +- rebuild ICU (make install) & tools 1.314 + 1.315 +* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to 1.316 + sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar) 1.317 +- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters 1.318 +- Unicode 6.0..6.2: U+2260, U+226E, U+226F 1.319 +- nothing new in 6.2, no test file to update 1.320 + 1.321 +* update Java data files 1.322 +- refresh just the UCD-related files, just to be safe 1.323 +- see (ICU4C)/source/data/icu4j-readme.txt 1.324 +- mkdir /tmp/icu4j 1.325 +- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 1.326 + output: 1.327 + ... 1.328 + Unicode .icu files built to ./out/build/icudt50l 1.329 + mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt50b 1.330 + mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b 1.331 + echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt 1.332 + LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt50l.dat ./out/icu4j/icudt50b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt50l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt50b 1.333 + mv ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b" 1.334 + jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt50b/ 1.335 + mkdir -p /tmp/icu4j/main/shared/data 1.336 + cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data 1.337 + jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt50b/ 1.338 + mkdir -p /tmp/icu4j/main/shared/data 1.339 + cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data 1.340 + make[1]: Leaving directory `/home/mscherer/svn.icu/uni62/dbg/data' 1.341 +- copy the big-endian Unicode data files to another location, 1.342 + separate from the other data files 1.343 + mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll 1.344 + mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr 1.345 + ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b 1.346 + ~/svn.icu/uni62/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/cnvalias.icu 1.347 + ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b 1.348 + ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll 1.349 + ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr 1.350 +- refresh ICU4J 1.351 + ~/svn.icu/uni62/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b 1.352 + 1.353 +* refresh Java test .txt files 1.354 +- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode 1.355 + 1.356 +* UCA 1.357 + 1.358 +- get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/ 1.359 +- CLDR root files for ICU are in CollationAuxiliary.zip; unpack that 1.360 +- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt 1.361 +- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt 1.362 + (note removing the underscore before "Rules") 1.363 +- update (ICU4C)/source/test/testdata/CollationTest_*.txt 1.364 + and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt 1.365 + with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt) 1.366 +- check test file diffs for previously commented-out, known-failing data lines; 1.367 + probably need to keep those commented out 1.368 +- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani 1.369 +- run genuca, see command line above 1.370 +- rebuild ICU4C 1.371 +- refresh ICU4J collation data: 1.372 + (subset of instructions above for properties data refresh, except copies all coll/*) 1.373 + ~/svn.icu/uni62/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 1.374 + ~/svn.icu/uni62/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll 1.375 + ~/svn.icu/uni62/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll 1.376 + ~/svn.icu/uni62/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b 1.377 +- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging) 1.378 +- note on intltest: if collate/UCAConformanceTest fails, then 1.379 + utility/MultithreadTest/TestCollators will fail as well; 1.380 + fix the conformance test before looking into the multi-thread test 1.381 + 1.382 +* test ICU, fix test code where necessary 1.383 + 1.384 +* When refreshing all of ICU4J data from ICU4C 1.385 +- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 1.386 +- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data 1.387 +or 1.388 +- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install 1.389 + 1.390 +*** LayoutEngine script information 1.391 +- skipped for Unicode 6.2: no new scripts 1.392 + 1.393 +*** merge the Unicode update branches back onto the trunk 1.394 +- do not merge the icudata.jar and testdata.jar, 1.395 + instead rebuild them from merged & tested ICU4C 1.396 + 1.397 +---------------------------------------------------------------------------- *** 1.398 + 1.399 +Future Unicode update 1.400 + 1.401 +Tools simplified since the Unicode 6.1 update. See 1.402 +- http://site.icu-project.org/design/props/ppucd 1.403 +- http://bugs.icu-project.org/trac/wiki/Markus/ReviewTicket8972 1.404 + 1.405 +* Unicode version numbers 1.406 +- icutools/unicode/makedefs.sh was deleted, so one fewer place for version & path updates 1.407 + 1.408 +* file preparation 1.409 +- ucdcopy.py, idna2nrm.py and genpname/preparse.pl replaced by preparseucd.py: 1.410 +- ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni61/20120118 ~/svn.icu/trunk/src ~/svn.icu/tools/trunk/src 1.411 +- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders. 1.412 +- Check test file diffs for previously commented-out, known-failing data lines; 1.413 + probably need to keep those commented out. 1.414 + 1.415 +* PropertyValueAliases.txt changes 1.416 +- Script codes that are in ISO 15924 but not in Unicode are now listed in 1.417 + preparseucd.py, in the _scripts_only_in_iso15924 variable. 1.418 + If there are new ISO codes, then add them. 1.419 + If Unicode adds some of them, then remove them from the .py variable. 1.420 + 1.421 +* UnicodeData.txt changes 1.422 +- No more manual changes for CJK ranges for algorithmic names; 1.423 + those are now written to ppucd.txt and genprops reads them from there. 1.424 + 1.425 +* generate core properties data files (makeprops.sh was deleted) 1.426 +- ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/trunk/src 1.427 + 1.428 +* no more manual updates of source/data/unidata/norm2/nfkc_cf.txt 1.429 +- it is now generated by preparseucd.py 1.430 + 1.431 +* no more separate idna2nrm.py run and manual copying to generate source/data/unidata/norm2/uts46.txt 1.432 +- it is now generated by preparseucd.py 1.433 +- make sure that the Unicode data folder passed into preparseucd.py 1.434 + includes a copy of http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt 1.435 + (can be in some subfolder) 1.436 + 1.437 +* generate normalization data files 1.438 +- ~/svn.icu/trunk/dbg$ export LD_LIBRARY_PATH=~/svn.icu/trunk/dbg/lib 1.439 +- ~/svn.icu/trunk/dbg$ SRC_DATA_IN=~/svn.icu/trunk/src/source/data/in 1.440 +- ~/svn.icu/trunk/dbg$ UNIDATA=~/svn.icu/trunk/src/source/data/unidata 1.441 +- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt 1.442 +- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt 1.443 +- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt 1.444 +- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt 1.445 + 1.446 +* build ICU (make install) 1.447 +* build Unicode tools using CMake+make 1.448 + 1.449 +* new way to call genuca (makeuca.sh was deleted) 1.450 +- ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/trunk/dbg/data/out/build/icudt49l ~/svn.icu/trunk/src 1.451 + 1.452 +---------------------------------------------------------------------------- *** 1.453 + 1.454 +Unicode 6.1 update 1.455 + 1.456 +*** ICU Trac 1.457 + 1.458 +- ticket 8995 final update to Unicode 6.1 1.459 +- ticket 8994 regenerate source/layout/CanonData.cpp 1.460 + 1.461 +- ticket 8961 support Unicode "Age" value *names* 1.462 +- ticket 8963 support multiple character name aliases & types 1.463 + 1.464 +- ticket 8827 "update ICU to Unicode 6.1" 1.465 +- C++ branches/markus/uni61 at r30864 from trunk at r30843 1.466 +- Java branches/markus/uni61 at r30865 from trunk at r30863 1.467 + 1.468 +*** Unicode version numbers 1.469 +- makedata.mak 1.470 +- uchar.h 1.471 + (configure.in & configure: have been modified to extract the version from uchar.h) 1.472 +- com.ibm.icu.util.VersionInfo 1.473 +- icutools/unicode/makedefs.sh 1.474 + + also review & update other definitions in that file, 1.475 + e.g. the ICU version in this path: BLD_DATA_FILES=$ICU_BLD/data/out/build/icudt49l 1.476 + 1.477 +*** data files & enums & parser code 1.478 + 1.479 +* file preparation 1.480 + 1.481 +~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni61/20111205/ucd ~/uni61/processed 1.482 +- This prepares both unidata and testdata files in respective output subfolders. 1.483 +- Check test file diffs for previously commented-out, known-failing data lines; 1.484 + probably need to keep those commented out. 1.485 + 1.486 +* PropertyValueAliases.txt changes 1.487 +- 11 new block names: 1.488 + Arabic_Extended_A 1.489 + Arabic_Mathematical_Alphabetic_Symbols 1.490 + Chakma 1.491 + Meetei_Mayek_Extensions 1.492 + Meroitic_Cursive 1.493 + Meroitic_Hieroglyphs 1.494 + Miao 1.495 + Sharada 1.496 + Sora_Sompeng 1.497 + Sundanese_Supplement 1.498 + Takri 1.499 + -> add to uchar.h 1.500 + -> add to UCharacter.UnicodeBlock IDs 1.501 + Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+) 1.502 + replace public static final int \1_ID = \2; \3 1.503 + -> add to UCharacter.UnicodeBlock objects 1.504 + Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+) 1.505 + replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2 1.506 +- 1 new Joining_Group (jg) value: 1.507 + Rohingya_Yeh 1.508 + -> uchar.h & UCharacter.JoiningGroup 1.509 +- 2 new Line_Break (lb) values: 1.510 + CJ=Conditional_Japanese_Starter 1.511 + HL=Hebrew_Letter 1.512 + -> uchar.h & UCharacter.LineBreak 1.513 +- 7 new scripts: 1.514 + sc ; Cakm ; Chakma 1.515 + sc ; Merc ; Meroitic_Cursive 1.516 + sc ; Mero ; Meroitic_Hieroglyphs 1.517 + sc ; Plrd ; Miao 1.518 + sc ; Shrd ; Sharada 1.519 + sc ; Sora ; Sora_Sompeng 1.520 + sc ; Takr ; Takri 1.521 + -> remove these from SyntheticPropertyValueAliases.txt 1.522 + -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI() 1.523 + and in com.ibm.icu.dev.test.lang.TestUScript.java 1.524 +- 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html 1.525 + (added 2011-06-21) 1.526 + Khoj 322 Khojki 1.527 + Tirh 326 Tirhuta 1.528 + and another one added 2011-12-09 1.529 + Hluw 080 Anatolian Hieroglyphs (Luwian Hieroglyphs, Hittite Hieroglyphs) 1.530 + -> uscript.h 1.531 + -> com.ibm.icu.lang.UScript 1.532 + find USCRIPT_([^ ]+) *= ([0-9]+),(.+) 1.533 + replace public static final int \1 = \2;\3 1.534 + -> SyntheticPropertyValueAliases.txt 1.535 + -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI() 1.536 + and in com.ibm.icu.dev.test.lang.TestUScript.java 1.537 + 1.538 +* UnicodeData.txt changes 1.539 +- the last Unihan code point changes from U+9FCB to U+9FCC 1.540 + search for both 9FCB (end) and 9FCC (limit) (regex 9FC[BC], case-insensitive) 1.541 + + do change gennames.c 1.542 + + do change swapCJK() in ucol.cpp & ImplicitCEGenerator.java 1.543 + 1.544 +* DerivedBidiClass.txt changes 1.545 +- 2 new default-AL blocks: 1.546 +# Arabic Extended-A: U+08A0 - U+08FF (was default-R) 1.547 +# Arabic Mathematical Alphabetic Symbols: 1.548 +# U+1EE00 - U+1EEFF (was default-R) 1.549 +- 2 new default-R blocks: 1.550 +# Meroitic Hieroglyphs: 1.551 +# U+10980 - U+1099F 1.552 +# Meroitic Cursive: U+109A0 - U+109FF 1.553 + -> should be picked up by the explicit data in the file 1.554 + 1.555 +* NameAliases.txt changes 1.556 +- from 1.557 + # Each line has two fields 1.558 + # First field: Code point 1.559 + # Second field: Alias 1.560 +- to 1.561 + # Each line has three fields, as described here: 1.562 + # 1.563 + # First field: Code point 1.564 + # Second field: Alias 1.565 + # Third field: Type 1.566 +- Also, the file previously allowed multiple aliases but only now does it 1.567 + actually provide multiple, even multiple of the same type. For example, 1.568 + FEFF;BYTE ORDER MARK;alternate 1.569 + FEFF;BOM;abbreviation 1.570 + FEFF;ZWNBSP;abbreviation 1.571 +- This breaks our gennames parser, unames.icu data structure, and API. 1.572 + Fix gennames to only pick up "correction" aliases. 1.573 + New ticket #8963 for further changes. 1.574 + 1.575 +* run genpname/preparse.pl (on Linux) 1.576 + + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname 1.577 + + make sure that data.h is writable 1.578 + + perl preparse.pl ~/svn.icu/trunk/src > out.txt 1.579 + + preparse.pl shows no errors, out.txt Info and Warning lines look ok 1.580 + 1.581 +* build ICU (make install) 1.582 + so that the tools build can pick up the new definitions from the installed header files. 1.583 +* build Unicode tools (at least genpname) using CMake+make 1.584 + 1.585 +* run genpname 1.586 + (builds both pnames.icu and propname_data.h) 1.587 +- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in 1.588 +- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource 1.589 + 1.590 +* build ICU (make install) 1.591 +* build Unicode tools using CMake+make 1.592 + 1.593 +* update source/data/unidata/norm2/nfkc_cf.txt 1.594 +- follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt 1.595 + 1.596 +* update source/data/unidata/norm2/uts46.txt 1.597 +- download http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt 1.598 + to ~/svn.icu/tools/trunk/src/unicode/py 1.599 +- adjust idna2nrm.py to remove "; NV8": For UTS #46, we do not care about "not valid in IDNA2008". 1.600 +- ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py 1.601 +- ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2 1.602 + 1.603 +* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to 1.604 + sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar) 1.605 +- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters 1.606 +- Unicode 6.0..6.1: U+2260, U+226E, U+226F 1.607 +- nothing new in 6.1, no test file to update 1.608 + 1.609 +* generate core properties data files 1.610 +- in initial bootstrapping, change the UCA version 1.611 + in source/data/unidata/FractionalUCA.txt to match the new Unicode version 1.612 +- ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld 1.613 +- rebuild ICU & tools 1.614 + + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR, 1.615 + check if the UCA version in FractionalUCA.txt matches the new Unicode version 1.616 + (see step above) 1.617 +- run makeuca.sh so that genuca picks up the new case mappings and nfc.nrm: 1.618 + ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld 1.619 +- rebuild ICU & tools 1.620 + 1.621 +* update Java data files 1.622 +- refresh just the UCD-related files, just to be safe 1.623 +- see (ICU4C)/source/data/icu4j-readme.txt 1.624 +- mkdir /tmp/icu4j 1.625 +- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 1.626 + output: 1.627 + ... 1.628 + Unicode .icu files built to ./out/build/icudt49l 1.629 + mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt49b 1.630 + mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b 1.631 + echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt 1.632 + LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt49l.dat ./out/icu4j/icudt49b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt49l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt49b 1.633 + mv ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b" 1.634 + jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt49b/ 1.635 + mkdir -p /tmp/icu4j/main/shared/data 1.636 + cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data 1.637 + jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt49b/ 1.638 + mkdir -p /tmp/icu4j/main/shared/data 1.639 + cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data 1.640 + make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/bld/data' 1.641 +- copy the big-endian Unicode data files to another location, 1.642 + separate from the other data files 1.643 + mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll 1.644 + mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr 1.645 + ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b 1.646 + ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/cnvalias.icu 1.647 + ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b 1.648 + ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll 1.649 + ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr 1.650 +- refresh ICU4J 1.651 + ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b 1.652 + 1.653 +* refresh Java test .txt files 1.654 +- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode 1.655 + 1.656 +* test ICU so far, fix test code where necessary 1.657 +- temporarily ignore collation issues that look like UCA/UCD mismatches, 1.658 + until UCA data is updated 1.659 + 1.660 +* UCA 1.661 + 1.662 +- get output from Mark's tools; look in 1.663 + http://www.unicode.org/Public/UCA/6.1.0/CollationAuxiliary-<dev. version>.txt 1.664 +- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt 1.665 +- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt 1.666 + (note removing the underscore before "Rules") 1.667 +- update (ICU)/source/test/testdata/CollationTest_*.txt 1.668 + and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt 1.669 + with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt) 1.670 +- check test file diffs for previously commented-out, known-failing data lines; 1.671 + probably need to keep those commented out 1.672 +- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani 1.673 +- run makeuca.sh: 1.674 + ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld 1.675 +- rebuild ICU4C 1.676 +- refresh ICU4J collation data: 1.677 + (subset of instructions above for properties data refresh, except copies all coll/*) 1.678 + ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 1.679 + ~/svn.icu/trunk/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll 1.680 + ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll 1.681 + ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b 1.682 +- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging) 1.683 +- note on intltest: if collate/UCAConformanceTest fails, then 1.684 + utility/MultithreadTest/TestCollators will fail as well; 1.685 + fix the conformance test before looking into the multi-thread test 1.686 + 1.687 +* When refreshing all of ICU4J data from ICU4C 1.688 +- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 1.689 +- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data 1.690 +or 1.691 +- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install 1.692 + 1.693 +*** LayoutEngine script information 1.694 + 1.695 +(For details see the Unicode 5.2 change log below.) 1.696 + 1.697 +* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder. 1.698 + This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp 1.699 + in the working directory. 1.700 + (It also generates ScriptRunData.cpp, which is no longer needed.) 1.701 + 1.702 + The generated files have a current copyright date and "@draft" statement. 1.703 + 1.704 +- diff current <icu>/source/layout files vs. generated ones 1.705 + ~/svn.icu4j/trunk/src$ kdiff3 ~/svn.icu/trunk/src/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout 1.706 + review and manually merge desired changes; 1.707 + fix gratuitous changes, incorrect @draft and missing aliases; 1.708 + Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc. 1.709 +- if you just copy the above files, then 1.710 + fix mixed line endings, review the diffs as above and restore changes to API tags etc.; 1.711 + manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h 1.712 + 1.713 +*** merge the Unicode update branches back onto the trunk 1.714 +- do not merge the icudata.jar and testdata.jar, 1.715 + instead rebuild them from merged & tested ICU4C 1.716 + 1.717 +---------------------------------------------------------------------------- *** 1.718 + 1.719 +ICU 4.8 (no Unicode update, just new script codes) 1.720 + 1.721 +* 9 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html 1.722 + (added 2010-12-21) 1.723 + Afak 439 Afaka 1.724 + Jurc 510 Jurchen 1.725 + Mroo 199 Mro, Mru 1.726 + Nshu 499 Nüshu 1.727 + Shrd 319 Sharada, Śāradā 1.728 + Sora 398 Sora Sompeng 1.729 + Takr 321 Takri, Ṭākrī, Ṭāṅkrī 1.730 + Tang 520 Tangut 1.731 + Wole 480 Woleai 1.732 + -> uscript.h 1.733 + -> com.ibm.icu.lang.UScript 1.734 + find USCRIPT_([^ ]+) *= ([0-9]+),(.+) 1.735 + replace public static final int \1 = \2;\3 1.736 + -> genpname/SyntheticPropertyValueAliases.txt 1.737 + -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI() 1.738 + and in com.ibm.icu.dev.test.lang.TestUScript.java 1.739 + 1.740 +* run genpname/preparse.pl (on Linux) 1.741 + + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname 1.742 + + make sure that data.h is writable 1.743 + + perl preparse.pl ~/svn.icu/trunk/src > out.txt 1.744 + + preparse.pl shows no errors, out.txt Info and Warning lines look ok 1.745 + 1.746 +* rebuild Unicode tools (at least genpname) using make 1.747 +- You might first need to "make install" ICU so that the tools build can pick 1.748 + up the new definitions from the installed header files. 1.749 + 1.750 +* run genpname 1.751 + (builds both pnames.icu and propname_data.h) 1.752 +- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in 1.753 +- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource 1.754 +- rebuild ICU & tools 1.755 + 1.756 +* run genprops 1.757 +- ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/data/in -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0 1.758 +- ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/common --csource -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0 1.759 +- rebuild ICU & tools 1.760 + 1.761 +* update Java data files 1.762 +- refresh just the UCD-related files, just to be safe 1.763 +- see (ICU4C)/source/data/icu4j-readme.txt 1.764 +- mkdir /tmp/icu4j 1.765 +- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 1.766 +- copy the big-endian Unicode data files to another location, 1.767 + separate from the other data files 1.768 + mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt48b 1.769 + ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b 1.770 + ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/uprops.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b 1.771 +- refresh ICU4J 1.772 + ~/svn.icu/trunk/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt48b 1.773 + 1.774 +* should have updated the layout engine script codes but forgot 1.775 + 1.776 +---------------------------------------------------------------------------- *** 1.777 + 1.778 +Unicode 6.0 update 1.779 + 1.780 +*** related ICU Trac tickets 1.781 + 1.782 +7264 Unicode 6.0 Update 1.783 + 1.784 +*** Unicode version numbers 1.785 +- makedata.mak 1.786 +- uchar.h 1.787 + (configure.in & configure: have been modified to extract the version from uchar.h) 1.788 +- com.ibm.icu.util.VersionInfo 1.789 + 1.790 +*** data files & enums & parser code 1.791 + 1.792 +* file preparation 1.793 + 1.794 +~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/processed 1.795 +- This now prepares both unidata and testdata files in respective output subfolders. 1.796 + 1.797 +* PropertyAliases.txt changes 1.798 +- new Script_Extensions property defined in the new ScriptExtensions.txt file 1.799 + but not listed in PropertyAliases.txt; reported to unicode.org; 1.800 + -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt 1.801 + scx; Script_Extensions 1.802 + -> uchar.h with new UProperty section 1.803 + -> com.ibm.icu.lang.UProperty, parallel with uchar.h 1.804 + 1.805 +* PropertyValueAliases.txt changes 1.806 +- 12 new block names: 1.807 + Alchemical_Symbols 1.808 + Bamum_Supplement 1.809 + Batak 1.810 + Brahmi 1.811 + CJK_Unified_Ideographs_Extension_D 1.812 + Emoticons 1.813 + Ethiopic_Extended_A 1.814 + Kana_Supplement 1.815 + Mandaic 1.816 + Miscellaneous_Symbols_And_Pictographs 1.817 + Playing_Cards 1.818 + Transport_And_Map_Symbols 1.819 + -> add to uchar.h 1.820 + -> add to UCharacter.UnicodeBlock 1.821 + Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+) 1.822 + replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2 1.823 +- Joining_Group (jg) values: 1.824 + Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal which becomes an alias 1.825 + -> uchar.h & UCharacter.JoiningGroup 1.826 +- 3 new scripts: 1.827 + sc ; Batk ; Batak 1.828 + sc ; Brah ; Brahmi 1.829 + sc ; Mand ; Mandaic 1.830 + -> remove these from SyntheticPropertyValueAliases.txt 1.831 + -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN 1.832 + -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI() 1.833 + and in com.ibm.icu.dev.test.lang.TestUScript.java 1.834 +- 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html 1.835 + (added 2009-11-11..2010-07-18) 1.836 + Bass 259 Bassa Vah 1.837 + Dupl 755 Duployan shortand 1.838 + Elba 226 Elbasan 1.839 + Gran 343 Grantha 1.840 + Kpel 436 Kpelle 1.841 + Loma 437 Loma 1.842 + Mend 438 Mende 1.843 + Merc 101 Meroitic Cursive 1.844 + Narb 106 Old North Arabian 1.845 + Nbat 159 Nabataean 1.846 + Palm 126 Palmyrene 1.847 + Sind 318 Sindhi 1.848 + Wara 262 Warang Citi 1.849 + -> uscript.h 1.850 + -> com.ibm.icu.lang.UScript 1.851 + find USCRIPT_([^ ]+) *= ([0-9]+),(.+) 1.852 + replace public static final int \1 = \2;\3 1.853 + -> SyntheticPropertyValueAliases.txt 1.854 + -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI() 1.855 + and in com.ibm.icu.dev.test.lang.TestUScript.java 1.856 +- ISO 15924 name change 1.857 + Mero 100 Meroitic Hieroglyphs (was Meroitic) 1.858 + -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC 1.859 +- property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt 1.860 + 1.861 +* UnicodeData.txt changes 1.862 +- new CJK block: 1.863 + 2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;; 1.864 + 2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;; 1.865 + -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion 1.866 + 1.867 +* build Unicode tools using CMake+make 1.868 + 1.869 +* run genpname/preparse.pl (on Linux) 1.870 + + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname 1.871 + + make sure that data.h is writable 1.872 + + perl preparse.pl ~/svn.icu/trunk/src > out.txt 1.873 + + preparse.pl shows no errors, out.txt Info and Warning lines look ok 1.874 + 1.875 +* rebuild Unicode tools (at least genpname) using make 1.876 +- You might first need to "make install" ICU so that the tools build can pick 1.877 + up the new definitions from the installed header files. 1.878 + 1.879 +* run genpname 1.880 +- ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in 1.881 +- rebuild ICU & tools 1.882 + 1.883 +* update source/data/unidata/norm2/nfkc_cf.txt 1.884 +- follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt 1.885 + 1.886 +* update source/data/unidata/norm2/uts46.txt 1.887 +- download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt 1.888 + to ~/svn.icu/tools/trunk/src/unicode/py 1.889 +- adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values 1.890 +- ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py 1.891 +- ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2 1.892 + 1.893 +* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to 1.894 + sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar) 1.895 +- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters 1.896 +- Unicode 6.0: U+2260, U+226E, U+226F 1.897 + 1.898 +* generate core properties data files 1.899 +- ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld 1.900 +- rebuild ICU & tools 1.901 +- run makeuca.sh so that genuca picks up the new nfc.nrm: 1.902 + ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld 1.903 +- rebuild ICU & tools 1.904 + 1.905 +* implement new Script_Extensions property (provisional) 1.906 +- parser & generator: genprops & uprops.icu 1.907 +- uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/usettest.cpp 1.908 +- UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java 1.909 + 1.910 +* switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2 1.911 +- (one-time change) 1.912 +- genbidi/gencase/genprops tools changes 1.913 +- re-run makeprops.sh (see above) 1.914 +- UCharacterProperty.java, UCharacterTypeIterator.java, 1.915 + UBiDiProps.java, UCaseProps.java, and several others with minor changes; 1.916 + UCharacterPropertyReader.java deleted and its code folded into UCharacterProperty.java 1.917 + 1.918 +* update Java data files 1.919 +- refresh just the UCD-related files, just to be safe 1.920 +- see (ICU4C)/source/data/icu4j-readme.txt 1.921 +- mkdir /tmp/icu4j 1.922 +- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 1.923 + output: 1.924 + ... 1.925 + Unicode .icu files built to ./out/build/icudt45l 1.926 + mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b 1.927 + echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt 1.928 + LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b 1.929 + jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b 1.930 + mkdir -p /tmp/icu4j/main/shared/data 1.931 + cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data 1.932 +- copy the big-endian Unicode data files to another location, 1.933 + separate from the other data files 1.934 + mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll 1.935 + mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr 1.936 + ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b 1.937 + ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/cnvalias.icu 1.938 + ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b 1.939 + ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll 1.940 + ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr 1.941 +- refresh ICU4J 1.942 + ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b 1.943 + 1.944 +* refresh Java test .txt files 1.945 +- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode 1.946 + 1.947 +* un-hardcode normalization skippable (NF*_Inert) test data 1.948 +- removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools 1.949 + 1.950 +* copy updated break iterator test files 1.951 +- now handled by early ucdcopy.py and 1.952 + copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/testdata 1.953 + (old instructions: 1.954 + copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt 1.955 + to ~/svn.icu/trunk/src/source/test/testdata) 1.956 +- they are not used in ICU4J 1.957 + 1.958 +* UCA 1.959 + 1.960 +- get output from Mark's tools; look in 1.961 + http://www.unicode.org/~book/incoming/mark/uca6.0.0/ 1.962 + http://www.macchiato.com/unicode/utc/additional-uca-files 1.963 + http://www.unicode.org/Public/UCA/6.0.0/ 1.964 + http://www.unicode.org/~mdavis/uca/ 1.965 +- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt 1.966 +- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt 1.967 +- update Han-implicit ranges for new CJK extensions: 1.968 + swapCJK() in ucol.cpp & ImplicitCEGenerator.java 1.969 +- genuca: allow bytes 02 for U+FFFE, new merge-sort character; 1.970 + do not add it into invuca so that tailoring primary-after an ignorable works 1.971 +- genuca: permit space between [variable top] bytes 1.972 +- ucol.cpp: treat noncharacters like unassigned rather than ignorable 1.973 +- run makeuca.sh: 1.974 + ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld 1.975 +- rebuild ICU4C 1.976 +- refresh ICU4J collation data: 1.977 + (subset of instructions above for properties data refresh, except copies all coll/*) 1.978 + ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 1.979 + mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll 1.980 + ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll 1.981 + ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b 1.982 +- update (ICU)/source/test/testdata/CollationTest_*.txt 1.983 + and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt 1.984 + with output from Mark's Unicode tools 1.985 +- run all tests with the *_SHORT.txt or the full files (the full ones have comments) 1.986 +- note on intltest: if collate/UCAConformanceTest fails, then 1.987 + utility/MultithreadTest/TestCollators will fail as well; 1.988 + fix the conformance test before looking into the multi-thread test 1.989 + 1.990 +* When refreshing all of ICU4J data from ICU4C 1.991 +- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 1.992 +- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data 1.993 +or 1.994 +- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install 1.995 + 1.996 +*** LayoutEngine script information 1.997 + 1.998 +(For details see the Unicode 5.2 change log below.) 1.999 + 1.1000 +* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h, 1.1001 +ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates 1.1002 +ScriptRunData.cpp, which is no longer needed.) 1.1003 + 1.1004 +The generated files have a current copyright date and "@draft" statement. 1.1005 + 1.1006 +* copy the above files into <icu>/source/layout, replacing the old files. 1.1007 +* fix mixed line endings 1.1008 +* review the diffs and fix incorrect @draft and missing aliases; 1.1009 + Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc. 1.1010 +* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h 1.1011 + 1.1012 +---------------------------------------------------------------------------- *** 1.1013 + 1.1014 +Unicode 5.2 update 1.1015 + 1.1016 +*** related ICU Trac tickets 1.1017 + 1.1018 +7084 Unicode 5.2 1.1019 + 1.1020 +7167 verify collation bytes 1.1021 +7235 Java test NAME_ALIAS 1.1022 +7236 Java DerivedCoreProperties.txt test 1.1023 +7237 Java BidiTest.txt 1.1024 +7238 UTrie2 in core unidata 1.1025 +7239 test for tailoring gaps 1.1026 +7240 Java fix CollationMiscTest 1.1027 +7243 update layout engine for Unicode 5.2 1.1028 + 1.1029 +*** Unicode version numbers 1.1030 +- makedata.mak 1.1031 +- uchar.h 1.1032 +- configure.in & configure 1.1033 +- update ucdVersion in gennames.c if an algorithmic range changes 1.1034 + 1.1035 +*** data files & enums & parser code 1.1036 + 1.1037 +* file preparation 1.1038 + 1.1039 +python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata 1.1040 +- includes finding files regardless of version numbers, 1.1041 + copying them, and performing the equivalent processing of the 1.1042 + ucdstrip and ucdmerge tools on the desired set of files 1.1043 + 1.1044 +* notes on changes 1.1045 +- PropertyAliases.txt 1.1046 + moved from numeric to enumerated: 1.1047 + ccc ; Canonical_Combining_Class 1.1048 + new string properties: 1.1049 + NFKC_CF ; NFKC_Casefold 1.1050 + Name_Alias; Name_Alias 1.1051 + new binary properties: 1.1052 + Cased ; Cased 1.1053 + CI ; Case_Ignorable 1.1054 + CWCF ; Changes_When_Casefolded 1.1055 + CWCM ; Changes_When_Casemapped 1.1056 + CWKCF ; Changes_When_NFKC_Casefolded 1.1057 + CWL ; Changes_When_Lowercased 1.1058 + CWT ; Changes_When_Titlecased 1.1059 + CWU ; Changes_When_Uppercased 1.1060 + new CJK Unihan properties (not supported by ICU) 1.1061 +- PropertyValueAliases.txt 1.1062 + new block names 1.1063 + new scripts 1.1064 + one script code change: 1.1065 + sc ; Qaai ; Inherited 1.1066 + -> 1.1067 + sc ; Zinh ; Inherited ; Qaai 1.1068 + new Line_Break (lb) value: 1.1069 + lb ; CP ; Close_Parenthesis 1.1070 + new Joining_Group (jg) values: Farsi_Yeh, Nya 1.1071 + other new values: 1.1072 + ccc; 214; ATA ; Attached_Above 1.1073 +- DerivedBidiClass.txt 1.1074 + new default-R range: U+1E800 - U+1EFFF 1.1075 +- UnicodeData.txt 1.1076 + all of the ISO comments are gone 1.1077 + new CJK block end: 1.1078 + 9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last> 1.1079 + new CJK block: 1.1080 + 2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;; 1.1081 + 2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;; 1.1082 + 1.1083 +* genpname 1.1084 +- run preparse.pl 1.1085 + + cd \svn\icuproj\icu\trunk\source\tools\genpname 1.1086 + + make sure that data.h is writable 1.1087 + + perl preparse.pl \svn\icuproj\icu\trunk > out.txt 1.1088 + + preparse.pl complains with errors like the following: 1.1089 + Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34. 1.1090 + This is because ICU 4.0 had scripts from ISO 15924 which are now 1.1091 + added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt 1.1092 + and PropertyValueAliases.txt. 1.1093 + -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt: 1.1094 + Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt 1.1095 + + preparse.pl complains with errors about block names missing from uchar.h; add them 1.1096 + 1.1097 +* uchar.h & uscript.h & uprops.h & uprops.c & genprops 1.1098 +- new block & script values 1.1099 + + 26 new blocks 1.1100 + copy new blocks from Blocks.txt 1.1101 + MS VC++ 2008 regular expression: 1.1102 + find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$" 1.1103 + replace with " UBLOCK_\3 = 172, /*[\1]*/" 1.1104 + + several new script values already added in ICU 4.0 for ISO 15924 coverage 1.1105 + (removed from SyntheticPropertyValueAliases.txt, see genpname notes above) 1.1106 + + 3 new script values added for ISO 15924 and Unicode 5.2 coverage 1.1107 + + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2) 1.1108 + (added to SyntheticPropertyValueAliases.txt) 1.1109 +- new Joining Group (JG) values: Farsi_Yeh, Nya 1.1110 +- new Line_Break (lb) value: 1.1111 + lb ; CP ; Close_Parenthesis 1.1112 + 1.1113 +* hardcoded Unihan range end/limit 1.1114 +- Unihan range end moves from 9FC3 to 9FCB 1.1115 + search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive) 1.1116 + + do change gennames.c 1.1117 + 1.1118 +* Compare definitions of new binary properties with what we used to use 1.1119 + in algorithms, to see if the definitions changed. 1.1120 +- Verified that definitions for Cased and Case_Ignorable are unchanged. 1.1121 + The gencase tool now parses the newly public Case_Ignorable values 1.1122 + in case the definition changes in the future. 1.1123 + 1.1124 +* uchar.c & uprops.h & uprops.c & genprops 1.1125 +- new numeric values that didn't exist in Unicode data before: 1.1126 + 1/7, 1/9, 1/10, 3/10, 1/16, 3/16 1.1127 + the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5, 1.1128 + therefore redesign the encoding of numeric types and values for formatVersion 6; 1.1129 + design for simple numbers up to at least 144 ("one gross"), 1.1130 + large values up to at least 10^20, 1.1131 + and fractions with numerators -1..17 and denominators 1..16 1.1132 + to cover current and expected future values 1.1133 + (e.g., more Han numeric values, Meroitic twelfths) 1.1134 + 1.1135 +* reimplement Hangul_Syllable_Type for new Jamo characters 1.1136 +- the old code assumed that all Jamo characters are in the 11xx block 1.1137 +- Unicode 5.2 fills holes there and adds new Jamo characters in 1.1138 + A960..A97F; Hangul Jamo Extended-A 1.1139 + and in 1.1140 + D7B0..D7FF; Hangul Jamo Extended-B 1.1141 +- Hangul_Syllable_Type can be trivially derived from a subset of 1.1142 + Grapheme_Cluster_Break values 1.1143 + 1.1144 +* build Unicode data source code for hardcoding core data 1.1145 +C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data 1.1146 + 1.1147 +ICU data make path is \svn\icuproj\icu\trunk\source\data\ 1.1148 +ICU root path is \svn\icuproj\icu\trunk 1.1149 +Information: cannot find "ucmlocal.mk". Not building user-additional converter files. 1.1150 +Information: cannot find "brklocal.mk". Not building user-additional break iterator files. 1.1151 +Information: cannot find "reslocal.mk". Not building user-additional resource bundle files. 1.1152 +Information: cannot find "collocal.mk". Not building user-additional resource bundle files. 1.1153 +Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files. 1.1154 +Information: cannot find "trnslocal.mk". Not building user-additional transliterator files. 1.1155 +Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files. 1.1156 +Information: cannot find "spreplocal.mk". Not building user-additional stringprep files. 1.1157 +Creating data file for Unicode Property Names 1.1158 +Creating data file for Unicode Character Properties 1.1159 +Creating data file for Unicode Case Mapping Properties 1.1160 +Creating data file for Unicode BiDi/Shaping Properties 1.1161 +Creating data file for Unicode Normalization 1.1162 +Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l" 1.1163 +Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp" 1.1164 + 1.1165 +- copy the .c source files to C:\svn\icuproj\icu\trunk\source\common 1.1166 + and rebuild the common library 1.1167 + 1.1168 +*** UCA 1.1169 + 1.1170 +- update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools) 1.1171 +- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools 1.1172 +- update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools 1.1173 +[ Begin obsolete instructions: 1.1174 + Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files. 1.1175 + - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py 1.1176 + on Windows: 1.1177 + python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt 1.1178 + python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt 1.1179 + End obsolete instructions] 1.1180 +- run all tests with the *_SHORT.txt or the full files (the full ones have comments) 1.1181 + not just the *_STUB.txt files 1.1182 +- note on intltest: if collate/UCAConformanceTest fails, then 1.1183 + utility/MultithreadTest/TestCollators will fail as well; 1.1184 + fix the conformance test before looking into the multi-thread test 1.1185 + 1.1186 +*** Implement Cased & Case_Ignorable properties 1.1187 +- via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable() 1.1188 +- Problem: These properties should be disjoint, but aren't 1.1189 +- UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not 1.1190 +- change ucase.icu to be able to store any combination of Cased and Case_Ignorable 1.1191 + 1.1192 +*** Implement Changes_When_Xyz properties 1.1193 +- without stored data 1.1194 + 1.1195 +*** Implement Name_Alias property 1.1196 +- add it as another name field in unames.icu 1.1197 +- make it available via u_charName() and UCharNameChoice and 1.1198 +- consider it in u_charFromName() 1.1199 + 1.1200 +*** Break iterators 1.1201 + 1.1202 +* Update break iterator rules to new UAX versions and new property values 1.1203 +* Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary 1.1204 + 1.1205 +*** new BidiTest file 1.1206 +- review format and data 1.1207 +- copy BidiTest.txt to source/test/testdata 1.1208 +- write test code using this data 1.1209 +- fix ICU code where it fails the conformance test 1.1210 + 1.1211 +*** Java 1.1212 +- generally, find and update code corresponding to C/C++ 1.1213 +- UCharacter.UnicodeBlock constants: 1.1214 + a) add an _ID integer per new block, update COUNT 1.1215 + b) add a class instance per new block 1.1216 + Visual Studio regex: 1.1217 + find UBLOCK_{[^ ]+} = [0-9]+, {/.+} 1.1218 + replace with public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2 1.1219 +- CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias() 1.1220 + 1.1221 +- port test changes to Java 1.1222 + 1.1223 +*** LayoutEngine script information 1.1224 + 1.1225 +(For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833) 1.1226 + 1.1227 +* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h, 1.1228 +ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates 1.1229 +ScriptRunData.cpp, which is no longer needed.) 1.1230 + 1.1231 +The generated files have a current copyright date and "@draft" statement. 1.1232 + 1.1233 +-> Eric Mader wrote in email on 20090930: 1.1234 + "I think the tool has been modified to update @draft to @stable for 1.1235 + older scripts and to add @draft for new scripts. 1.1236 + (I worked with an intern on this last year.) 1.1237 + You should check the output after you run it." 1.1238 + 1.1239 +* copy the above files into <icu>/source/layout, replacing the old files. 1.1240 +* fix mixed line endings 1.1241 +* review the diffs and fix incorrect @draft and missing aliases 1.1242 +* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h 1.1243 + 1.1244 +Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp 1.1245 +and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...) 1.1246 + 1.1247 +-> Eric Mader wrote in email on 20090930: 1.1248 + "This is just a matter of making sure that all the per-script tables have 1.1249 + entries for any new scripts that were added. 1.1250 + If any new Indic characters were added, then the class tables in 1.1251 + IndicClassTables.cpp should be updated to reflect this. 1.1252 + John Emmons should know how to do this if it's required." 1.1253 + 1.1254 +* rebuild the layout and layoutex libraries. 1.1255 + 1.1256 +*** Documentation 1.1257 +- Update User Guide 1.1258 + + Jamo_Short_Name, sfc->scf, binary property value aliases 1.1259 + 1.1260 +---------------------------------------------------------------------------- *** 1.1261 + 1.1262 +Unicode 5.1 update 1.1263 + 1.1264 +*** related ICU Trac tickets 1.1265 + 1.1266 +5696 Update to Unicode 5.1 1.1267 + 1.1268 +*** Unicode version numbers 1.1269 +- makedata.mak 1.1270 +- uchar.h 1.1271 +- configure.in & configure 1.1272 +- update ucdVersion in gennames.c if an algorithmic range changes 1.1273 + 1.1274 +*** data files & enums & parser code 1.1275 + 1.1276 +* file preparation 1.1277 +- ucdstrip: 1.1278 + DerivedCoreProperties.txt 1.1279 + DerivedNormalizationProps.txt 1.1280 + NormalizationTest.txt 1.1281 + PropList.txt 1.1282 + Scripts.txt 1.1283 + GraphemeBreakProperty.txt 1.1284 + SentenceBreakProperty.txt 1.1285 + WordBreakProperty.txt 1.1286 +- ucdstrip and ucdmerge: 1.1287 + EastAsianWidth.txt 1.1288 + LineBreak.txt 1.1289 + 1.1290 +* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers) 1.1291 +copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\ 1.1292 +copy 5.1.0\ucd\Blocks.txt ..\unidata\ 1.1293 +copy 5.1.0\ucd\CaseFolding.txt ..\unidata\ 1.1294 +copy 5.1.0\ucd\DerivedAge.txt ..\unidata\ 1.1295 +copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\ 1.1296 +copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\ 1.1297 +copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\ 1.1298 +copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\ 1.1299 +copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\ 1.1300 +copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\ 1.1301 +copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\ 1.1302 +copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\ 1.1303 +copy 5.1.0\ucd\UnicodeData.txt ..\unidata\ 1.1304 + 1.1305 +ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt 1.1306 +ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt 1.1307 +ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt 1.1308 +ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt 1.1309 +ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt 1.1310 +ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt 1.1311 +ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt 1.1312 +ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt 1.1313 +ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt 1.1314 +ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt 1.1315 + 1.1316 +* genpname 1.1317 +- run preparse.pl 1.1318 + + cd \svn\icuproj\icu\uni51\source\tools\genpname 1.1319 + + make sure that data.h is writable 1.1320 + + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt 1.1321 + + preparse.pl complains with errors like the following: 1.1322 + Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30. 1.1323 + This is because ICU 3.8 had scripts from ISO 15924 which are now 1.1324 + added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt 1.1325 + and PropertyValueAliases.txt. 1.1326 + -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt: 1.1327 + Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii 1.1328 + + PropertyValueAliases.txt now explicitly contains values for boolean properties: 1.1329 + N/Y, No/Yes, F/T, False/True 1.1330 + -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases. 1.1331 + It will use further values from the file if present. 1.1332 + 1.1333 +* uchar.h & uscript.h & uprops.h & uprops.c & genprops 1.1334 +- new block & script values 1.1335 + + 17 new blocks 1.1336 + + 11 new script values already added in ICU 3.8 for ISO 15924 coverage 1.1337 + (removed from SyntheticPropertyValueAliases.txt) 1.1338 + + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1) 1.1339 + (added to SyntheticPropertyValueAliases.txt) 1.1340 +- uprops.icu (uprops.h) only provides 7 bits for script codes. 1.1341 + In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now. 1.1342 + There is none above 127 yet which is the script code for an 1.1343 + assigned Unicode character, so ICU 4.0 uprops.icu does not store any 1.1344 + script code values greater than 127. 1.1345 + However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129 1.1346 + in a parallel bit field, and that overflows now. 1.1347 + Also, future values >=128 would be incompatible anyway. 1.1348 + uprops.h is modified to move around several of the bit fields 1.1349 + in the properties vector words, and now uses 8 bits for the script code. 1.1350 + Two other bit fields also grow to accommodate future growth: 1.1351 + Block (current count: 172) grows from 8 to 9 bits, 1.1352 + and Word_Break grows from 4 to 5 bits. 1.1353 +- renamed property Simple_Case_Folding (sfc->scf) 1.1354 + + nothing to be done: handled as normal alias 1.1355 +- new property JSN Jamo_Short_Name 1.1356 + + no new API: only contributes to the Name property 1.1357 +- new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark 1.1358 +- new Joining Group (JG) value: Burushashki_Yeh_Barree 1.1359 +- new Sentence_Break (SB) values: 1.1360 + SB ; CR ; CR 1.1361 + SB ; EX ; Extend 1.1362 + SB ; LF ; LF 1.1363 + SB ; SC ; SContinue 1.1364 +- new Word_Break (WB) values: 1.1365 + WB ; CR ; CR 1.1366 + WB ; Extend ; Extend 1.1367 + WB ; LF ; LF 1.1368 + WB ; MB ; MidNumLet 1.1369 + 1.1370 +* Further changes in the 2008-02-29 update: 1.1371 +- Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP 1.1372 + because they should not normally be invisible. 1.1373 +- new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed) 1.1374 +- new Grapheme_Cluster_Break (GCB) value: PP=Prepend 1.1375 +- new Word_Break (WB) value: NL=Newline 1.1376 + 1.1377 +* hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison) 1.1378 +- Unihan range end moves from 9FBB to 9FC3 1.1379 + search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive) 1.1380 + + do change gennames.c 1.1381 + 1.1382 +* build Unicode data source code for hardcoding core data 1.1383 +C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data 1.1384 + 1.1385 +ICU data make path is \svn\icuproj\icu\uni51\source\data\ 1.1386 +ICU root path is \svn\icuproj\icu\uni51 1.1387 +Information: cannot find "ucmlocal.mk". Not building user-additional converter files. 1.1388 +Information: cannot find "brklocal.mk". Not building user-additional break iterator files. 1.1389 +Information: cannot find "reslocal.mk". Not building user-additional resource bundle files. 1.1390 +Information: cannot find "collocal.mk". Not building user-additional resource bundle files. 1.1391 +Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files. 1.1392 +Information: cannot find "trnslocal.mk". Not building user-additional transliterator files. 1.1393 +Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files. 1.1394 +Creating data file for Unicode Character Properties 1.1395 +Creating data file for Unicode Case Mapping Properties 1.1396 +Creating data file for Unicode BiDi/Shaping Properties 1.1397 +Creating data file for Unicode Normalization 1.1398 +Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l" 1.1399 +Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp" 1.1400 + 1.1401 +- copy the .c source files to C:\svn\icuproj\icu\uni51\source\common 1.1402 + and rebuild the common library 1.1403 + 1.1404 +*** Break iterators 1.1405 + 1.1406 +* Update break iterator rules to new UAX versions and new property values 1.1407 + 1.1408 +*** UCA 1.1409 + 1.1410 +* update FractionalUCA.txt and UCARules.txt with new canonical closure 1.1411 + 1.1412 +*** Test suites 1.1413 +- Test that APIs using Unicode property value aliases (like UnicodeSet) 1.1414 + support all of the boolean values N/Y, No/Yes, F/T, False/True 1.1415 + -> TestBinaryValues() tests in both cintltst and intltest 1.1416 + 1.1417 +*** LayoutEngine script information 1.1418 +* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h, 1.1419 +ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates 1.1420 +ScriptRunData.cpp, which is no longer needed.) 1.1421 + 1.1422 +The generated files have a current copyright date and "@draft" statement. 1.1423 + 1.1424 +* copy the above files into <icu>/source/layout, replacing the old files. 1.1425 + 1.1426 +Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp 1.1427 +and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...) 1.1428 + 1.1429 +* rebuild the layout and layoutex libraries. 1.1430 + 1.1431 +*** Documentation 1.1432 +- Update User Guide 1.1433 + + Jamo_Short_Name, sfc->scf, binary property value aliases 1.1434 + 1.1435 +---------------------------------------------------------------------------- *** 1.1436 + 1.1437 +Unicode 5.0 update 1.1438 + 1.1439 +*** related Jitterbugs 1.1440 + 1.1441 +5084 RFE: Update to Unicode 5.0 1.1442 + 1.1443 +*** data files & enums & parser code 1.1444 + 1.1445 +* file preparation 1.1446 +- ucdstrip: 1.1447 + DerivedCoreProperties.txt 1.1448 + DerivedNormalizationProps.txt 1.1449 + NormalizationTest.txt 1.1450 + PropList.txt 1.1451 + Scripts.txt 1.1452 + GraphemeBreakProperty.txt 1.1453 + SentenceBreakProperty.txt 1.1454 + WordBreakProperty.txt 1.1455 +- ucdstrip and ucdmerge: 1.1456 + EastAsianWidth.txt 1.1457 + LineBreak.txt 1.1458 + 1.1459 +* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers) 1.1460 +copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\ 1.1461 +copy 5.0.0\ucd\Blocks.txt ..\unidata\ 1.1462 +copy 5.0.0\ucd\CaseFolding.txt ..\unidata\ 1.1463 +copy 5.0.0\ucd\DerivedAge.txt ..\unidata\ 1.1464 +copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\ 1.1465 +copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\ 1.1466 +copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\ 1.1467 +copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\ 1.1468 +copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\ 1.1469 +copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\ 1.1470 +copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\ 1.1471 +copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\ 1.1472 +copy 5.0.0\ucd\UnicodeData.txt ..\unidata\ 1.1473 + 1.1474 +ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt 1.1475 +ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt 1.1476 +ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt 1.1477 +ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt 1.1478 +ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt 1.1479 +ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt 1.1480 +ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt 1.1481 +ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt 1.1482 +ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt 1.1483 +ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt 1.1484 + 1.1485 +* update FractionalUCA.txt and UCARules.txt with new canonical closure 1.1486 + 1.1487 +* genpname 1.1488 +- run preparse.pl 1.1489 + + make sure that data.h is writable 1.1490 + + perl preparse.pl \cvs\oss\icu > out.txt 1.1491 + 1.1492 +* uchar.h & uscript.h & uprops.h & uprops.c & genprops 1.1493 +- new block & script values 1.1494 + + script values already added in ICU 3.6 because all of ISO 15924 is now covered 1.1495 + 1.1496 +* build Unicode data source code for hardcoding core data 1.1497 +C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data 1.1498 + 1.1499 +ICU data make path is \cvs\oss\icu\source\data\ 1.1500 +ICU root path is \cvs\oss\icu 1.1501 +Information: cannot find "ucmlocal.mk". Not building user-additional converter files. 1.1502 +[etc.] 1.1503 +Creating data file for Unicode Character Properties 1.1504 +Creating data file for Unicode Case Mapping Properties 1.1505 +Creating data file for Unicode BiDi/Shaping Properties 1.1506 +Creating data file for Unicode Normalization 1.1507 +Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l" 1.1508 +Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp" 1.1509 + 1.1510 +- copy the .c source files to C:\cvs\oss\icu\source\common 1.1511 + and rebuild the common library 1.1512 + 1.1513 +*** Unicode version numbers 1.1514 +- makedata.mak 1.1515 +- uchar.h 1.1516 +- configure.in 1.1517 + 1.1518 +*** LayoutEngine script information 1.1519 +* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h, 1.1520 +ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates 1.1521 +ScriptRunData.cpp, which is no longer needed.) 1.1522 + 1.1523 +The generated files have a current copyright date and "@draft" statement. 1.1524 + 1.1525 +* copy the above files into <icu>/source/layout, replacing the old files. 1.1526 + 1.1527 +Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp 1.1528 +and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...) 1.1529 + 1.1530 +* rebuild the layout and layoutex libraries. 1.1531 + 1.1532 +---------------------------------------------------------------------------- *** 1.1533 + 1.1534 +Unicode 4.1 update 1.1535 + 1.1536 +*** related Jitterbugs 1.1537 + 1.1538 +4332 RFE: Update to Unicode 4.1 1.1539 +4157 RBBI, TR29 4.1 updates 1.1540 + 1.1541 +*** data files & enums & parser code 1.1542 + 1.1543 +* file preparation 1.1544 +- ucdstrip: 1.1545 + DerivedCoreProperties.txt 1.1546 + DerivedNormalizationProps.txt 1.1547 + NormalizationTest.txt 1.1548 + GraphemeBreakProperty.txt 1.1549 + SentenceBreakProperty.txt 1.1550 + WordBreakProperty.txt 1.1551 +- ucdstrip and ucdmerge: 1.1552 + EastAsianWidth.txt 1.1553 + LineBreak.txt 1.1554 + 1.1555 +* add new files to the repository 1.1556 + GraphemeBreakProperty.txt 1.1557 + SentenceBreakProperty.txt 1.1558 + WordBreakProperty.txt 1.1559 + 1.1560 +* update FractionalUCA.txt and UCARules.txt with new canonical closure 1.1561 + 1.1562 +* genpname 1.1563 +- handle new enumerated properties in sub read_uchar 1.1564 +- run preparse.pl 1.1565 + 1.1566 +* uchar.h & uscript.h & uprops.h & uprops.c & genprops 1.1567 +- new binary properties 1.1568 + + Pattern_Syntax 1.1569 + + Pattern_White_Space 1.1570 +- new enumerated properties 1.1571 + + Grapheme_Cluster_Break 1.1572 + + Sentence_Break 1.1573 + + Word_Break 1.1574 +- new block & script & line break values 1.1575 + 1.1576 +* gencase 1.1577 +- case-ignorable changes 1.1578 + see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods 1.1579 + now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk 1.1580 + 1.1581 +*** Unicode version numbers 1.1582 +- makedata.mak 1.1583 +- uchar.h 1.1584 +- configure.in 1.1585 + 1.1586 +*** tests 1.1587 +- verify that u_charMirror() round-trips 1.1588 +- test all new properties and some new values of old properties 1.1589 + 1.1590 +*** other code 1.1591 + 1.1592 +* hardcoded Unihan range end/limit 1.1593 +- Unihan range end moves from 9FA5 to 9FBB 1.1594 + search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive) 1.1595 + + do not modify BOCU/BOCSU code because that would change the encoding 1.1596 + and break binary compatibility! 1.1597 + + similarly, do not change the GB 18030 range data (ucnvmbcs.c), 1.1598 + NamePrepProfile.txt 1.1599 + + ignore trietest.c: test data is arbitrary 1.1600 + + ignore tstnorm.cpp: test optimization, not important 1.1601 + + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF 1.1602 + + do change line_th.txt and word_th.txt 1.1603 + by replacing hardcoded ranges with the new property values 1.1604 + + do change gennames.c 1.1605 + 1.1606 +source\data\brkitr\line_th.txt(229): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6 1.1607 +source\data\brkitr\word_th.txt(23): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6 1.1608 +source\tools\gennames\gennames.c(971): 0x4e00, 0x9fa5, 1.1609 + 1.1610 +* case mappings 1.1611 +- compare new special casing context conditions with previous ones 1.1612 + see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods 1.1613 + 1.1614 +* genpname 1.1615 +- consider storing only the short name if it is the same as the long name 1.1616 + 1.1617 +*** other reviews 1.1618 +- UAX #29 changes (grapheme/word/sentence breaks) 1.1619 +- UAX #14 changes (line breaks) 1.1620 +- Pattern_Syntax & Pattern_White_Space 1.1621 + 1.1622 +---------------------------------------------------------------------------- *** 1.1623 + 1.1624 +Unicode 4.0.1 update 1.1625 + 1.1626 +*** related Jitterbugs 1.1627 + 1.1628 +3170 RFE: Update to Unicode 4.0.1 1.1629 +3171 Add new Unicode 4.0.1 properties 1.1630 +3520 use Unicode 4.0.1 updates for break iteration 1.1631 + 1.1632 +*** data files & enums & parser code 1.1633 + 1.1634 +* file preparation 1.1635 +- ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt 1.1636 +- ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt 1.1637 + 1.1638 +* file fixes 1.1639 +- fix UnicodeData.txt general categories of Ethiopic digits Nd->No 1.1640 + according to PRI #26 1.1641 + http://www.unicode.org/review/resolved-pri.html#pri26 1.1642 +- undone again because no corrigendum in sight; 1.1643 + instead modified tests to not check consistency on this for Unicode 4.0.1 1.1644 + 1.1645 +* ucdterms.txt 1.1646 +- update from http://www.unicode.org/copyright.html 1.1647 + formatted for plain text 1.1648 + 1.1649 +* uchar.h & uprops.h & uprops.c & genprops 1.1650 +- add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed 1.1651 +- add U_LB_INSEPARABLE due to a spelling fix 1.1652 + + put short name comment only on line with new constant 1.1653 + for genpname perl script parser 1.1654 +- new binary properties 1.1655 + + STerm 1.1656 + + Variation_Selector 1.1657 + 1.1658 +* genpname 1.1659 +- fix genpname perl script so that it doesn't choke on more than 2 names per property value 1.1660 +- perl script: correctly calculate the maximum number of fields per row 1.1661 + 1.1662 +* uscript.h 1.1663 +- new script code Hrkt=Katakana_Or_Hiragana 1.1664 + 1.1665 +* gennorm.c track changes in DerivedNormalizationProps.txt 1.1666 +- "FNC" -> "FC_NFKC" 1.1667 +- single field "NFD_NO" -> two fields "NFD_QC; N" etc. 1.1668 + 1.1669 +* genprops/props2.c track changes in DerivedNumericValues.txt 1.1670 +- changed from 3 columns to 2, dropping the numeric type 1.1671 + + assume that the type is always numeric for Han characters, 1.1672 + and that only those are added in addition to what UnicodeData.txt lists 1.1673 + 1.1674 +*** Unicode version numbers 1.1675 +- makedata.mak 1.1676 +- uchar.h 1.1677 +- configure.in 1.1678 + 1.1679 +*** tests 1.1680 +- update test of default bidi classes according to PRI #28 1.1681 + /tsutil/cucdtst/TestUnicodeData 1.1682 + http://www.unicode.org/review/resolved-pri.html#pri28 1.1683 +- bidi tests: change exemplar character for ES depending on Unicode version 1.1684 +- change hardcoded expected property values where they change 1.1685 + 1.1686 +*** other code 1.1687 + 1.1688 +* name matching 1.1689 +- read UCD.html 1.1690 + 1.1691 +* scripts 1.1692 +- use new Hrkt=Katakana_Or_Hiragana 1.1693 + 1.1694 +* ZWJ & ZWNJ 1.1695 +- are now part of combining character sequences 1.1696 +- break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ