intl/icu/source/data/unidata/changes.txt

changeset 0
6474c204b198
     1.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     1.2 +++ b/intl/icu/source/data/unidata/changes.txt	Wed Dec 31 06:09:35 2014 +0100
     1.3 @@ -0,0 +1,1693 @@
     1.4 +* Copyright (C) 2004-2013, International Business Machines
     1.5 +* Corporation and others.  All Rights Reserved.
     1.6 +*
     1.7 +*   file name:  changes.txt
     1.8 +*   encoding:   US-ASCII
     1.9 +*   tab size:   8 (not used)
    1.10 +*   indentation:4
    1.11 +*
    1.12 +*   created on: 2004may06
    1.13 +*   created by: Markus W. Scherer
    1.14 +*
    1.15 +* change log for Unicode updates
    1.16 +
    1.17 +---------------------------------------------------------------------------- ***
    1.18 +
    1.19 +Unicode 6.3 update
    1.20 +
    1.21 +http://www.unicode.org/review/pri249/  -- beta review
    1.22 +http://www.unicode.org/reports/uax-proposed-updates.html
    1.23 +http://www.unicode.org/versions/beta-6.3.0.html#notable_issues
    1.24 +http://www.unicode.org/reports/tr44/tr44-11.html
    1.25 +
    1.26 +*** ICU Trac
    1.27 +
    1.28 +- ticket 10128: update ICU to Unicode 6.3 beta
    1.29 +- ticket 10168: update ICU to Unicode 6.3 final
    1.30 +- C++ branches/markus/uni63 at r33552 from trunk at r33551
    1.31 +- Java branches/markus/uni63 at r33550 from trunk at r33553
    1.32 +
    1.33 +- ticket 10142: implement Unicode 6.3 bidi algorithm additions
    1.34 +
    1.35 +*** Unicode version numbers
    1.36 +- makedata.mak
    1.37 +- uchar.h
    1.38 +  (configure.in & configure: have been modified to extract the version from uchar.h)
    1.39 +- com.ibm.icu.util.VersionInfo
    1.40 +- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
    1.41 +
    1.42 +- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
    1.43 +  so that the makefiles see the new version number.
    1.44 +
    1.45 +*** data files & enums & parser code
    1.46 +
    1.47 +* file preparation
    1.48 +
    1.49 +- download UCD, UCA & IDNA files
    1.50 +- make sure that the Unicode data folder passed into preparseucd.py
    1.51 +  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
    1.52 +- modify preparseucd.py:
    1.53 +  parse new file BidiBrackets.txt
    1.54 +  with new properties bpb=Bidi_Paired_Bracket and bpt=Bidi_Paired_Bracket_Type
    1.55 +- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni63/20130425 ~/svn.icu/uni63/src ~/svn.icutools/trunk/src
    1.56 +- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
    1.57 +- Check test file diffs for previously commented-out, known-failing data lines;
    1.58 +  probably need to keep those commented out.
    1.59 +
    1.60 +* PropertyAliases.txt changes
    1.61 +- 1 new Enumerated Property
    1.62 +  bpt                      ; Bidi_Paired_Bracket_Type
    1.63 +  -> uchar.h & UProperty.java & UCharacter.BidiPairedBracketType
    1.64 +  -> ubidi_props.h & .c & UBiDiProps.java
    1.65 +  -> remember to write the max value at UBIDI_MAX_VALUES_INDEX
    1.66 +  -> uprops.cpp
    1.67 +  -> change ubidi.icu format version from 2.0 to 2.1
    1.68 +- 1 new Miscellaneous Property
    1.69 +  bpb                      ; Bidi_Paired_Bracket
    1.70 +  -> uchar.h & UProperty.java
    1.71 +  -> ppucd.h & .cpp
    1.72 +
    1.73 +* PropertyValueAliases.txt changes
    1.74 +- 3 Bidi_Paired_Bracket_Type (bpt) values:
    1.75 +  bpt; c                                ; Close
    1.76 +  bpt; n                                ; None
    1.77 +  bpt; o                                ; Open
    1.78 +  -> uchar.h & UCharacter.BidiPairedBracketType
    1.79 +  -> ubidi_props.h & .c & UBiDiProps.java
    1.80 +  -> change ubidi.icu format version from 2.0 to 2.1
    1.81 +- 4 new Bidi_Class (bc) values:
    1.82 +  bc ; FSI                              ; First_Strong_Isolate
    1.83 +  bc ; LRI                              ; Left_To_Right_Isolate
    1.84 +  bc ; RLI                              ; Right_To_Left_Isolate
    1.85 +  bc ; PDI                              ; Pop_Directional_Isolate
    1.86 +  -> uchar.h & UCharacterEnums.ECharacterDirection
    1.87 +  -> until the bidi code gets updated,
    1.88 +     Roozbeh suggests mapping the new bc values to ON (Other_Neutral)
    1.89 +- 3 new Word_Break (WB) values:
    1.90 +  WB ; HL                               ; Hebrew_Letter
    1.91 +  WB ; SQ                               ; Single_Quote
    1.92 +  WB ; DQ                               ; Double_Quote
    1.93 +  -> uchar.h & UCharacter.WordBreak
    1.94 +  -> first time Word_Break numeric constants exceed 4 bits (now 17 values)
    1.95 +- 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
    1.96 +  (added 2012-10-16)
    1.97 +  Aghb  239     Caucasian Albanian
    1.98 +  Mahj  314     Mahajani
    1.99 +  -> uscript.h
   1.100 +  -> com.ibm.icu.lang.UScript
   1.101 +    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
   1.102 +    replace  public static final int \1 = \2;\3
   1.103 +  -> preparseucd.py _scripts_only_in_iso15924
   1.104 +  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
   1.105 +      and in com.ibm.icu.dev.test.lang.TestUScript.java
   1.106 +  -> update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
   1.107 +     (not strictly necessary for NOT_ENCODED scripts)
   1.108 +
   1.109 +* generate normalization data files
   1.110 +- ~/svn.icu/uni63/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni63/dbg/lib
   1.111 +- ~/svn.icu/uni63/dbg$ SRC_DATA_IN=~/svn.icu/uni63/src/source/data/in
   1.112 +- ~/svn.icu/uni63/dbg$ UNIDATA=~/svn.icu/uni63/src/source/data/unidata
   1.113 +- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
   1.114 +- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
   1.115 +- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
   1.116 +- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
   1.117 +
   1.118 +* build ICU (make install)
   1.119 +  so that the tools build can pick up the new definitions from the installed header files.
   1.120 +
   1.121 +~/svn.icu/uni63/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
   1.122 +
   1.123 +* build Unicode tools using CMake+make
   1.124 +
   1.125 +~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
   1.126 +
   1.127 +# Location (--prefix) of where ICU was installed.
   1.128 +set(ICU_INST_DIR /home/mscherer/svn.icu/uni63/inst)
   1.129 +# Location of the ICU source tree.
   1.130 +set(ICU_SRC_DIR /home/mscherer/svn.icu/uni63/src)
   1.131 +
   1.132 +~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
   1.133 +~/svn.icutools/trunk/dbg/unicode/c$ make
   1.134 +
   1.135 +* generate core properties data files
   1.136 +- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops ~/svn.icu/uni63/src
   1.137 +- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca -i ~/svn.icu/uni63/dbg/data/out/build/icudt52l ~/svn.icu/uni63/src
   1.138 +- rebuild ICU (make install) & tools
   1.139 +- run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
   1.140 +- rebuild ICU (make install) & tools
   1.141 +
   1.142 +* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
   1.143 +  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
   1.144 +- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
   1.145 +- Unicode 6.0..6.3: U+2260, U+226E, U+226F
   1.146 +- nothing new in 6.3, no test file to update
   1.147 +
   1.148 +* update Java data files
   1.149 +- refresh just the UCD-related files, just to be safe
   1.150 +- see (ICU4C)/source/data/icu4j-readme.txt
   1.151 +- mkdir /tmp/icu4j
   1.152 +- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   1.153 +  output:
   1.154 +    ...
   1.155 +    Unicode .icu files built to ./out/build/icudt52l
   1.156 +    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt52b
   1.157 +    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b
   1.158 +    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
   1.159 +    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt52l.dat ./out/icu4j/icudt52b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt52l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt52b
   1.160 +    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b"
   1.161 +    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt52b/
   1.162 +    mkdir -p /tmp/icu4j/main/shared/data
   1.163 +    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
   1.164 +    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt52b/
   1.165 +    mkdir -p /tmp/icu4j/main/shared/data
   1.166 +    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
   1.167 +    make[1]: Leaving directory `/home/mscherer/svn.icu/uni63/dbg/data'
   1.168 +- copy the big-endian Unicode data files to another location,
   1.169 +  separate from the other data files
   1.170 +    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
   1.171 +    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
   1.172 +    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
   1.173 +    ~/svn.icu/uni63/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/cnvalias.icu
   1.174 +    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
   1.175 +    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
   1.176 +    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
   1.177 +- refresh ICU4J
   1.178 +    ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
   1.179 +
   1.180 +* refresh Java test .txt files
   1.181 +- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
   1.182 +
   1.183 +* UCA -- mostly skipped for ICU 52 / Unicode 6.3, except update coll/* files
   1.184 +
   1.185 +- get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
   1.186 +- CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
   1.187 +- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
   1.188 +- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
   1.189 +  (note removing the underscore before "Rules")
   1.190 +- update (ICU4C)/source/test/testdata/CollationTest_*.txt
   1.191 +  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
   1.192 +  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
   1.193 +- check test file diffs for previously commented-out, known-failing data lines;
   1.194 +  probably need to keep those commented out
   1.195 +- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
   1.196 +- run genuca, see command line above
   1.197 +- rebuild ICU4C
   1.198 +- refresh ICU4J collation data:
   1.199 +  (subset of instructions above for properties data refresh, except copies all coll/*)
   1.200 +    ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   1.201 +    ~/svn.icu/uni63/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
   1.202 +    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
   1.203 +    ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
   1.204 +- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
   1.205 +- note on intltest: if collate/UCAConformanceTest fails, then
   1.206 +  utility/MultithreadTest/TestCollators will fail as well;
   1.207 +  fix the conformance test before looking into the multi-thread test
   1.208 +
   1.209 +* test ICU, fix test code where necessary
   1.210 +
   1.211 +* When refreshing all of ICU4J data from ICU4C
   1.212 +- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   1.213 +- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
   1.214 +or
   1.215 +- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
   1.216 +
   1.217 +*** LayoutEngine script information
   1.218 +- skipped for Unicode 6.3: no new scripts
   1.219 +
   1.220 +*** merge the Unicode update branches back onto the trunk
   1.221 +- do not merge the icudata.jar and testdata.jar,
   1.222 +  instead rebuild them from merged & tested ICU4C
   1.223 +
   1.224 +---------------------------------------------------------------------------- ***
   1.225 +
   1.226 +Unicode 6.2 update
   1.227 +
   1.228 +http://www.unicode.org/review/pri230/
   1.229 +http://www.unicode.org/versions/beta-6.2.0.html
   1.230 +http://www.unicode.org/reports/tr44/tr44-9.html#Unicode_6.2.0
   1.231 +http://www.unicode.org/review/pri227/  Changes to Script Extensions Property Values
   1.232 +http://www.unicode.org/review/pri228/  Changing some common characters from Punctuation to Symbol
   1.233 +http://www.unicode.org/review/pri229/  Linebreaking Changes for Pictographic Symbols
   1.234 +http://www.unicode.org/reports/tr46/tr46-8.html  IDNA
   1.235 +http://unicode.org/Public/idna/6.2.0/
   1.236 +
   1.237 +*** ICU Trac
   1.238 +
   1.239 +- ticket 9515: Unicode 6.2: final ICU update
   1.240 +
   1.241 +- ticket 9514: UCA 6.2: fix UCARules.txt
   1.242 +
   1.243 +- ticket 9437: update ICU to Unicode 6.2
   1.244 +- C++ branches/markus/uni62 at r32050 from trunk at r32041
   1.245 +- Java branches/markus/uni62 at r32068 from trunk at r32066
   1.246 +
   1.247 +*** Unicode version numbers
   1.248 +- makedata.mak
   1.249 +- uchar.h
   1.250 +  (configure.in & configure: have been modified to extract the version from uchar.h)
   1.251 +- com.ibm.icu.util.VersionInfo
   1.252 +- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
   1.253 +
   1.254 +*** data files & enums & parser code
   1.255 +
   1.256 +* file preparation
   1.257 +
   1.258 +- download UCD, UCA & IDNA files
   1.259 +- make sure that the Unicode data folder passed into preparseucd.py
   1.260 +  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
   1.261 +- modify preparseucd.py: NamesList.txt is now in UTF-8
   1.262 +- ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni62/20120816 ~/svn.icu/uni62/src ~/svn.icu/tools/trunk/src
   1.263 +- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
   1.264 +- Check test file diffs for previously commented-out, known-failing data lines;
   1.265 +  probably need to keep those commented out.
   1.266 +
   1.267 +* PropertyValueAliases.txt changes
   1.268 +- 1 new Line_Break (lb) value:
   1.269 +  lb ; RI                               ; Regional_Indicator
   1.270 +  -> uchar.h & UCharacter.LineBreak
   1.271 +- 1 new Word_Break (WB) value:
   1.272 +  WB ; RI                               ; Regional_Indicator
   1.273 +  -> uchar.h & UCharacter.WordBreak
   1.274 +- 1 new Grapheme_Cluster_Break (GCB) value:
   1.275 +  GCB; RI                               ; Regional_Indicator
   1.276 +  -> uchar.h & UCharacter.GraphemeClusterBreak
   1.277 +
   1.278 +* 3 new numeric values
   1.279 +  The new value -1, which was really supposed to be NaN but that would have required
   1.280 +  new UnicodeData.txt syntax, can already be represented as a "fraction" of -1/1,
   1.281 +  but encodeNumericValue() in corepropsbuilder.cpp had to be fixed.
   1.282 +    cp;12456;na=CUNEIFORM NUMERIC SIGN NIGIDAMIN;nv=-1
   1.283 +    cp;12457;na=CUNEIFORM NUMERIC SIGN NIGIDAESH;nv=-1
   1.284 +  The two new values 216000 and 432000 require an addition to the encoding of numeric values.
   1.285 +    cp;12432;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS DISH;nv=216000
   1.286 +    cp;12433;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS MIN;nv=432000
   1.287 +  -> uprops.h, uchar.c & UCharacterProperty.java
   1.288 +  -> cucdtst.c & UCharacterTest.java
   1.289 +
   1.290 +* generate normalization data files
   1.291 +- ~/svn.icu/uni62/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni62/dbg/lib
   1.292 +- ~/svn.icu/uni62/dbg$ SRC_DATA_IN=~/svn.icu/uni62/src/source/data/in
   1.293 +- ~/svn.icu/uni62/dbg$ UNIDATA=~/svn.icu/uni62/src/source/data/unidata
   1.294 +- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
   1.295 +- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
   1.296 +- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
   1.297 +- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
   1.298 +
   1.299 +* build ICU (make install)
   1.300 +  so that the tools build can pick up the new definitions from the installed header files.
   1.301 +* build Unicode tools using CMake+make
   1.302 +
   1.303 +* generate core properties data files
   1.304 +- ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/uni62/src
   1.305 +- in initial bootstrapping, change the UCA version
   1.306 +  in source/data/unidata/FractionalUCA.txt to match the new Unicode version
   1.307 +- ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/uni62/dbg/data/out/build/icudt50l ~/svn.icu/uni62/src
   1.308 +- rebuild ICU (make install) & tools
   1.309 +  + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
   1.310 +    check if the UCA version in FractionalUCA.txt matches the new Unicode version
   1.311 +    (see step above)
   1.312 +- run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
   1.313 +- rebuild ICU (make install) & tools
   1.314 +
   1.315 +* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
   1.316 +  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
   1.317 +- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
   1.318 +- Unicode 6.0..6.2: U+2260, U+226E, U+226F
   1.319 +- nothing new in 6.2, no test file to update
   1.320 +
   1.321 +* update Java data files
   1.322 +- refresh just the UCD-related files, just to be safe
   1.323 +- see (ICU4C)/source/data/icu4j-readme.txt
   1.324 +- mkdir /tmp/icu4j
   1.325 +- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   1.326 +  output:
   1.327 +    ...
   1.328 +    Unicode .icu files built to ./out/build/icudt50l
   1.329 +    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt50b
   1.330 +    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b
   1.331 +    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
   1.332 +    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt50l.dat ./out/icu4j/icudt50b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt50l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt50b
   1.333 +    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b"
   1.334 +    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt50b/
   1.335 +    mkdir -p /tmp/icu4j/main/shared/data
   1.336 +    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
   1.337 +    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt50b/
   1.338 +    mkdir -p /tmp/icu4j/main/shared/data
   1.339 +    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
   1.340 +    make[1]: Leaving directory `/home/mscherer/svn.icu/uni62/dbg/data'
   1.341 +- copy the big-endian Unicode data files to another location,
   1.342 +  separate from the other data files
   1.343 +    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
   1.344 +    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
   1.345 +    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
   1.346 +    ~/svn.icu/uni62/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/cnvalias.icu
   1.347 +    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
   1.348 +    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
   1.349 +    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
   1.350 +- refresh ICU4J
   1.351 +    ~/svn.icu/uni62/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
   1.352 +
   1.353 +* refresh Java test .txt files
   1.354 +- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
   1.355 +
   1.356 +* UCA
   1.357 +
   1.358 +- get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
   1.359 +- CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
   1.360 +- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
   1.361 +- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
   1.362 +  (note removing the underscore before "Rules")
   1.363 +- update (ICU4C)/source/test/testdata/CollationTest_*.txt
   1.364 +  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
   1.365 +  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
   1.366 +- check test file diffs for previously commented-out, known-failing data lines;
   1.367 +  probably need to keep those commented out
   1.368 +- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
   1.369 +- run genuca, see command line above
   1.370 +- rebuild ICU4C
   1.371 +- refresh ICU4J collation data:
   1.372 +  (subset of instructions above for properties data refresh, except copies all coll/*)
   1.373 +    ~/svn.icu/uni62/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   1.374 +    ~/svn.icu/uni62/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
   1.375 +    ~/svn.icu/uni62/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
   1.376 +    ~/svn.icu/uni62/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
   1.377 +- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
   1.378 +- note on intltest: if collate/UCAConformanceTest fails, then
   1.379 +  utility/MultithreadTest/TestCollators will fail as well;
   1.380 +  fix the conformance test before looking into the multi-thread test
   1.381 +
   1.382 +* test ICU, fix test code where necessary
   1.383 +
   1.384 +* When refreshing all of ICU4J data from ICU4C
   1.385 +- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   1.386 +- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
   1.387 +or
   1.388 +- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
   1.389 +
   1.390 +*** LayoutEngine script information
   1.391 +- skipped for Unicode 6.2: no new scripts
   1.392 +
   1.393 +*** merge the Unicode update branches back onto the trunk
   1.394 +- do not merge the icudata.jar and testdata.jar,
   1.395 +  instead rebuild them from merged & tested ICU4C
   1.396 +
   1.397 +---------------------------------------------------------------------------- ***
   1.398 +
   1.399 +Future Unicode update
   1.400 +
   1.401 +Tools simplified since the Unicode 6.1 update. See
   1.402 +- http://site.icu-project.org/design/props/ppucd
   1.403 +- http://bugs.icu-project.org/trac/wiki/Markus/ReviewTicket8972
   1.404 +
   1.405 +* Unicode version numbers
   1.406 +- icutools/unicode/makedefs.sh was deleted, so one fewer place for version & path updates
   1.407 +
   1.408 +* file preparation
   1.409 +- ucdcopy.py, idna2nrm.py and genpname/preparse.pl replaced by preparseucd.py:
   1.410 +- ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni61/20120118 ~/svn.icu/trunk/src ~/svn.icu/tools/trunk/src
   1.411 +- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
   1.412 +- Check test file diffs for previously commented-out, known-failing data lines;
   1.413 +  probably need to keep those commented out.
   1.414 +
   1.415 +* PropertyValueAliases.txt changes
   1.416 +- Script codes that are in ISO 15924 but not in Unicode are now listed in
   1.417 +  preparseucd.py, in the _scripts_only_in_iso15924 variable.
   1.418 +  If there are new ISO codes, then add them.
   1.419 +  If Unicode adds some of them, then remove them from the .py variable.
   1.420 +
   1.421 +* UnicodeData.txt changes
   1.422 +- No more manual changes for CJK ranges for algorithmic names;
   1.423 +  those are now written to ppucd.txt and genprops reads them from there.
   1.424 +
   1.425 +* generate core properties data files (makeprops.sh was deleted)
   1.426 +- ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/trunk/src
   1.427 +
   1.428 +* no more manual updates of source/data/unidata/norm2/nfkc_cf.txt
   1.429 +- it is now generated by preparseucd.py
   1.430 +
   1.431 +* no more separate idna2nrm.py run and manual copying to generate source/data/unidata/norm2/uts46.txt
   1.432 +- it is now generated by preparseucd.py
   1.433 +- make sure that the Unicode data folder passed into preparseucd.py
   1.434 +  includes a copy of http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
   1.435 +  (can be in some subfolder)
   1.436 +
   1.437 +* generate normalization data files
   1.438 +- ~/svn.icu/trunk/dbg$ export LD_LIBRARY_PATH=~/svn.icu/trunk/dbg/lib
   1.439 +- ~/svn.icu/trunk/dbg$ SRC_DATA_IN=~/svn.icu/trunk/src/source/data/in
   1.440 +- ~/svn.icu/trunk/dbg$ UNIDATA=~/svn.icu/trunk/src/source/data/unidata
   1.441 +- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
   1.442 +- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
   1.443 +- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
   1.444 +- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
   1.445 +
   1.446 +* build ICU (make install)
   1.447 +* build Unicode tools using CMake+make
   1.448 +
   1.449 +* new way to call genuca (makeuca.sh was deleted)
   1.450 +- ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/trunk/dbg/data/out/build/icudt49l ~/svn.icu/trunk/src
   1.451 +
   1.452 +---------------------------------------------------------------------------- ***
   1.453 +
   1.454 +Unicode 6.1 update
   1.455 +
   1.456 +*** ICU Trac
   1.457 +
   1.458 +- ticket 8995 final update to Unicode 6.1
   1.459 +- ticket 8994 regenerate source/layout/CanonData.cpp
   1.460 +
   1.461 +- ticket 8961 support Unicode "Age" value *names*
   1.462 +- ticket 8963 support multiple character name aliases & types
   1.463 +
   1.464 +- ticket 8827 "update ICU to Unicode 6.1"
   1.465 +- C++ branches/markus/uni61 at r30864 from trunk at r30843
   1.466 +- Java branches/markus/uni61 at r30865 from trunk at r30863
   1.467 +
   1.468 +*** Unicode version numbers
   1.469 +- makedata.mak
   1.470 +- uchar.h
   1.471 +  (configure.in & configure: have been modified to extract the version from uchar.h)
   1.472 +- com.ibm.icu.util.VersionInfo
   1.473 +- icutools/unicode/makedefs.sh
   1.474 +  + also review & update other definitions in that file,
   1.475 +    e.g. the ICU version in this path: BLD_DATA_FILES=$ICU_BLD/data/out/build/icudt49l
   1.476 +
   1.477 +*** data files & enums & parser code
   1.478 +
   1.479 +* file preparation
   1.480 +
   1.481 +~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni61/20111205/ucd ~/uni61/processed
   1.482 +- This prepares both unidata and testdata files in respective output subfolders.
   1.483 +- Check test file diffs for previously commented-out, known-failing data lines;
   1.484 +  probably need to keep those commented out.
   1.485 +
   1.486 +* PropertyValueAliases.txt changes
   1.487 +- 11 new block names:
   1.488 +  Arabic_Extended_A
   1.489 +  Arabic_Mathematical_Alphabetic_Symbols
   1.490 +  Chakma
   1.491 +  Meetei_Mayek_Extensions
   1.492 +  Meroitic_Cursive
   1.493 +  Meroitic_Hieroglyphs
   1.494 +  Miao
   1.495 +  Sharada
   1.496 +  Sora_Sompeng
   1.497 +  Sundanese_Supplement
   1.498 +  Takri
   1.499 +  -> add to uchar.h
   1.500 +  -> add to UCharacter.UnicodeBlock IDs
   1.501 +    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
   1.502 +            replace  public static final int \1_ID = \2; \3
   1.503 +  -> add to UCharacter.UnicodeBlock objects
   1.504 +    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
   1.505 +            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
   1.506 +- 1 new Joining_Group (jg) value:
   1.507 +  Rohingya_Yeh
   1.508 +  -> uchar.h & UCharacter.JoiningGroup
   1.509 +- 2 new Line_Break (lb) values:
   1.510 +  CJ=Conditional_Japanese_Starter
   1.511 +  HL=Hebrew_Letter
   1.512 +  -> uchar.h & UCharacter.LineBreak
   1.513 +- 7 new scripts:
   1.514 +  sc ; Cakm      ; Chakma
   1.515 +  sc ; Merc      ; Meroitic_Cursive
   1.516 +  sc ; Mero      ; Meroitic_Hieroglyphs
   1.517 +  sc ; Plrd      ; Miao
   1.518 +  sc ; Shrd      ; Sharada
   1.519 +  sc ; Sora      ; Sora_Sompeng
   1.520 +  sc ; Takr      ; Takri
   1.521 +  -> remove these from SyntheticPropertyValueAliases.txt
   1.522 +  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
   1.523 +      and in com.ibm.icu.dev.test.lang.TestUScript.java
   1.524 +- 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
   1.525 +  (added 2011-06-21)
   1.526 +  Khoj        322     Khojki
   1.527 +  Tirh        326     Tirhuta
   1.528 +    and another one added 2011-12-09
   1.529 +  Hluw        080     Anatolian Hieroglyphs (Luwian Hieroglyphs, Hittite Hieroglyphs)
   1.530 +  -> uscript.h
   1.531 +  -> com.ibm.icu.lang.UScript
   1.532 +    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
   1.533 +    replace  public static final int \1 = \2;\3
   1.534 +  -> SyntheticPropertyValueAliases.txt
   1.535 +  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
   1.536 +      and in com.ibm.icu.dev.test.lang.TestUScript.java
   1.537 +
   1.538 +* UnicodeData.txt changes
   1.539 +- the last Unihan code point changes from U+9FCB to U+9FCC
   1.540 +  search for both 9FCB (end) and 9FCC (limit) (regex 9FC[BC], case-insensitive)
   1.541 +  + do change gennames.c
   1.542 +  + do change swapCJK() in ucol.cpp & ImplicitCEGenerator.java
   1.543 +
   1.544 +* DerivedBidiClass.txt changes
   1.545 +- 2 new default-AL blocks:
   1.546 +#     Arabic Extended-A: U+08A0  -  U+08FF  (was default-R)
   1.547 +#     Arabic Mathematical Alphabetic Symbols:
   1.548 +#                       U+1EE00  - U+1EEFF  (was default-R)
   1.549 +- 2 new default-R blocks:
   1.550 +#     Meroitic Hieroglyphs:
   1.551 +#                        U+10980 - U+1099F
   1.552 +#     Meroitic Cursive:  U+109A0 - U+109FF
   1.553 +  -> should be picked up by the explicit data in the file
   1.554 +
   1.555 +* NameAliases.txt changes
   1.556 +- from
   1.557 +    # Each line has two fields
   1.558 +    # First field: Code point
   1.559 +    # Second field: Alias
   1.560 +- to
   1.561 +    # Each line has three fields, as described here:
   1.562 +    #
   1.563 +    # First field:  Code point
   1.564 +    # Second field: Alias
   1.565 +    # Third field:  Type
   1.566 +- Also, the file previously allowed multiple aliases but only now does it
   1.567 +  actually provide multiple, even multiple of the same type. For example,
   1.568 +    FEFF;BYTE ORDER MARK;alternate
   1.569 +    FEFF;BOM;abbreviation
   1.570 +    FEFF;ZWNBSP;abbreviation
   1.571 +- This breaks our gennames parser, unames.icu data structure, and API.
   1.572 +  Fix gennames to only pick up "correction" aliases.
   1.573 +  New ticket #8963 for further changes.
   1.574 +
   1.575 +* run genpname/preparse.pl (on Linux)
   1.576 +  + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
   1.577 +  + make sure that data.h is writable
   1.578 +  + perl preparse.pl ~/svn.icu/trunk/src > out.txt
   1.579 +  + preparse.pl shows no errors, out.txt Info and Warning lines look ok
   1.580 +
   1.581 +* build ICU (make install)
   1.582 +  so that the tools build can pick up the new definitions from the installed header files.
   1.583 +* build Unicode tools (at least genpname) using CMake+make
   1.584 +
   1.585 +* run genpname
   1.586 +  (builds both pnames.icu and propname_data.h)
   1.587 +- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
   1.588 +- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
   1.589 +
   1.590 +* build ICU (make install)
   1.591 +* build Unicode tools using CMake+make
   1.592 +
   1.593 +* update source/data/unidata/norm2/nfkc_cf.txt
   1.594 +- follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
   1.595 +
   1.596 +* update source/data/unidata/norm2/uts46.txt
   1.597 +- download http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
   1.598 +  to ~/svn.icu/tools/trunk/src/unicode/py
   1.599 +- adjust idna2nrm.py to remove "; NV8": For UTS #46, we do not care about "not valid in IDNA2008".
   1.600 +- ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
   1.601 +- ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
   1.602 +
   1.603 +* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
   1.604 +  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
   1.605 +- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
   1.606 +- Unicode 6.0..6.1: U+2260, U+226E, U+226F
   1.607 +- nothing new in 6.1, no test file to update
   1.608 +
   1.609 +* generate core properties data files
   1.610 +- in initial bootstrapping, change the UCA version
   1.611 +  in source/data/unidata/FractionalUCA.txt to match the new Unicode version
   1.612 +- ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
   1.613 +- rebuild ICU & tools
   1.614 +  + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
   1.615 +    check if the UCA version in FractionalUCA.txt matches the new Unicode version
   1.616 +    (see step above)
   1.617 +- run makeuca.sh so that genuca picks up the new case mappings and nfc.nrm:
   1.618 +  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
   1.619 +- rebuild ICU & tools
   1.620 +
   1.621 +* update Java data files
   1.622 +- refresh just the UCD-related files, just to be safe
   1.623 +- see (ICU4C)/source/data/icu4j-readme.txt
   1.624 +- mkdir /tmp/icu4j
   1.625 +- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   1.626 +  output:
   1.627 +    ...
   1.628 +    Unicode .icu files built to ./out/build/icudt49l
   1.629 +    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt49b
   1.630 +    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b
   1.631 +    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
   1.632 +    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt49l.dat ./out/icu4j/icudt49b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt49l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt49b
   1.633 +    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b"
   1.634 +    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt49b/
   1.635 +    mkdir -p /tmp/icu4j/main/shared/data
   1.636 +    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
   1.637 +    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt49b/
   1.638 +    mkdir -p /tmp/icu4j/main/shared/data
   1.639 +    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
   1.640 +    make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/bld/data'
   1.641 +- copy the big-endian Unicode data files to another location,
   1.642 +  separate from the other data files
   1.643 +    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
   1.644 +    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
   1.645 +    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
   1.646 +    ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/cnvalias.icu
   1.647 +    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
   1.648 +    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
   1.649 +    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
   1.650 +- refresh ICU4J
   1.651 +    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
   1.652 +
   1.653 +* refresh Java test .txt files
   1.654 +- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
   1.655 +
   1.656 +* test ICU so far, fix test code where necessary
   1.657 +- temporarily ignore collation issues that look like UCA/UCD mismatches,
   1.658 +  until UCA data is updated
   1.659 +
   1.660 +* UCA
   1.661 +
   1.662 +- get output from Mark's tools; look in
   1.663 +    http://www.unicode.org/Public/UCA/6.1.0/CollationAuxiliary-<dev. version>.txt
   1.664 +- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
   1.665 +- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
   1.666 +  (note removing the underscore before "Rules")
   1.667 +- update (ICU)/source/test/testdata/CollationTest_*.txt
   1.668 +  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
   1.669 +  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
   1.670 +- check test file diffs for previously commented-out, known-failing data lines;
   1.671 +  probably need to keep those commented out
   1.672 +- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
   1.673 +- run makeuca.sh:
   1.674 +  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
   1.675 +- rebuild ICU4C
   1.676 +- refresh ICU4J collation data:
   1.677 +  (subset of instructions above for properties data refresh, except copies all coll/*)
   1.678 +    ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   1.679 +    ~/svn.icu/trunk/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
   1.680 +    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
   1.681 +    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
   1.682 +- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
   1.683 +- note on intltest: if collate/UCAConformanceTest fails, then
   1.684 +  utility/MultithreadTest/TestCollators will fail as well;
   1.685 +  fix the conformance test before looking into the multi-thread test
   1.686 +
   1.687 +* When refreshing all of ICU4J data from ICU4C
   1.688 +- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   1.689 +- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
   1.690 +or
   1.691 +- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
   1.692 +
   1.693 +*** LayoutEngine script information
   1.694 +
   1.695 +(For details see the Unicode 5.2 change log below.)
   1.696 +
   1.697 +* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
   1.698 +  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
   1.699 +  in the working directory.
   1.700 +  (It also generates ScriptRunData.cpp, which is no longer needed.)
   1.701 +
   1.702 +  The generated files have a current copyright date and "@draft" statement.
   1.703 +
   1.704 +- diff current <icu>/source/layout files vs. generated ones
   1.705 +    ~/svn.icu4j/trunk/src$ kdiff3 ~/svn.icu/trunk/src/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
   1.706 +  review and manually merge desired changes;
   1.707 +  fix gratuitous changes, incorrect @draft and missing aliases;
   1.708 +  Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
   1.709 +- if you just copy the above files, then
   1.710 +  fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
   1.711 +  manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
   1.712 +
   1.713 +*** merge the Unicode update branches back onto the trunk
   1.714 +- do not merge the icudata.jar and testdata.jar,
   1.715 +  instead rebuild them from merged & tested ICU4C
   1.716 +
   1.717 +---------------------------------------------------------------------------- ***
   1.718 +
   1.719 +ICU 4.8 (no Unicode update, just new script codes)
   1.720 +
   1.721 +* 9 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
   1.722 +  (added 2010-12-21)
   1.723 +    Afak    439     Afaka
   1.724 +    Jurc    510     Jurchen
   1.725 +    Mroo    199     Mro, Mru
   1.726 +    Nshu    499     Nüshu
   1.727 +    Shrd    319     Sharada, Śāradā
   1.728 +    Sora    398     Sora Sompeng
   1.729 +    Takr    321     Takri, Ṭākrī, Ṭāṅkrī
   1.730 +    Tang    520     Tangut
   1.731 +    Wole    480     Woleai
   1.732 +  -> uscript.h
   1.733 +  -> com.ibm.icu.lang.UScript
   1.734 +    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
   1.735 +    replace  public static final int \1 = \2;\3
   1.736 +  -> genpname/SyntheticPropertyValueAliases.txt
   1.737 +  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
   1.738 +      and in com.ibm.icu.dev.test.lang.TestUScript.java
   1.739 +
   1.740 +* run genpname/preparse.pl (on Linux)
   1.741 +  + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
   1.742 +  + make sure that data.h is writable
   1.743 +  + perl preparse.pl ~/svn.icu/trunk/src > out.txt
   1.744 +  + preparse.pl shows no errors, out.txt Info and Warning lines look ok
   1.745 +
   1.746 +* rebuild Unicode tools (at least genpname) using make
   1.747 +- You might first need to "make install" ICU so that the tools build can pick
   1.748 +  up the new definitions from the installed header files.
   1.749 +
   1.750 +* run genpname
   1.751 +  (builds both pnames.icu and propname_data.h)
   1.752 +- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
   1.753 +- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
   1.754 +- rebuild ICU & tools
   1.755 +
   1.756 +* run genprops
   1.757 +- ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/data/in -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
   1.758 +- ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/common --csource -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
   1.759 +- rebuild ICU & tools
   1.760 +
   1.761 +* update Java data files
   1.762 +- refresh just the UCD-related files, just to be safe
   1.763 +- see (ICU4C)/source/data/icu4j-readme.txt
   1.764 +- mkdir /tmp/icu4j
   1.765 +- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   1.766 +- copy the big-endian Unicode data files to another location,
   1.767 +  separate from the other data files
   1.768 +    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
   1.769 +    ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
   1.770 +    ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/uprops.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
   1.771 +- refresh ICU4J
   1.772 +    ~/svn.icu/trunk/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt48b
   1.773 +
   1.774 +* should have updated the layout engine script codes but forgot
   1.775 +
   1.776 +---------------------------------------------------------------------------- ***
   1.777 +
   1.778 +Unicode 6.0 update
   1.779 +
   1.780 +*** related ICU Trac tickets
   1.781 +
   1.782 +7264 Unicode 6.0 Update
   1.783 +
   1.784 +*** Unicode version numbers
   1.785 +- makedata.mak
   1.786 +- uchar.h
   1.787 +  (configure.in & configure: have been modified to extract the version from uchar.h)
   1.788 +- com.ibm.icu.util.VersionInfo
   1.789 +
   1.790 +*** data files & enums & parser code
   1.791 +
   1.792 +* file preparation
   1.793 +
   1.794 +~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/processed
   1.795 +- This now prepares both unidata and testdata files in respective output subfolders.
   1.796 +
   1.797 +* PropertyAliases.txt changes
   1.798 +- new Script_Extensions property defined in the new ScriptExtensions.txt file
   1.799 +  but not listed in PropertyAliases.txt; reported to unicode.org;
   1.800 +  -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt
   1.801 +    scx; Script_Extensions
   1.802 +  -> uchar.h with new UProperty section
   1.803 +  -> com.ibm.icu.lang.UProperty, parallel with uchar.h
   1.804 +
   1.805 +* PropertyValueAliases.txt changes
   1.806 +- 12 new block names:
   1.807 +  Alchemical_Symbols
   1.808 +  Bamum_Supplement
   1.809 +  Batak
   1.810 +  Brahmi
   1.811 +  CJK_Unified_Ideographs_Extension_D
   1.812 +  Emoticons
   1.813 +  Ethiopic_Extended_A
   1.814 +  Kana_Supplement
   1.815 +  Mandaic
   1.816 +  Miscellaneous_Symbols_And_Pictographs
   1.817 +  Playing_Cards
   1.818 +  Transport_And_Map_Symbols
   1.819 +  -> add to uchar.h
   1.820 +  -> add to UCharacter.UnicodeBlock
   1.821 +    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
   1.822 +            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
   1.823 +- Joining_Group (jg) values:
   1.824 +  Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal which becomes an alias
   1.825 +  -> uchar.h & UCharacter.JoiningGroup
   1.826 +- 3 new scripts:
   1.827 +  sc ; Batk      ; Batak
   1.828 +  sc ; Brah      ; Brahmi
   1.829 +  sc ; Mand      ; Mandaic
   1.830 +  -> remove these from SyntheticPropertyValueAliases.txt
   1.831 +  -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN
   1.832 +  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
   1.833 +      and in com.ibm.icu.dev.test.lang.TestUScript.java
   1.834 +- 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
   1.835 +  (added 2009-11-11..2010-07-18)
   1.836 +  Bass        259     Bassa Vah
   1.837 +  Dupl        755     Duployan shortand
   1.838 +  Elba        226     Elbasan
   1.839 +  Gran        343     Grantha
   1.840 +  Kpel        436     Kpelle
   1.841 +  Loma        437     Loma
   1.842 +  Mend        438     Mende
   1.843 +  Merc        101     Meroitic Cursive
   1.844 +  Narb        106     Old North Arabian
   1.845 +  Nbat        159     Nabataean
   1.846 +  Palm        126     Palmyrene
   1.847 +  Sind        318     Sindhi
   1.848 +  Wara        262     Warang Citi
   1.849 +  -> uscript.h
   1.850 +  -> com.ibm.icu.lang.UScript
   1.851 +    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
   1.852 +    replace  public static final int \1 = \2;\3
   1.853 +  -> SyntheticPropertyValueAliases.txt
   1.854 +  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
   1.855 +      and in com.ibm.icu.dev.test.lang.TestUScript.java
   1.856 +- ISO 15924 name change
   1.857 +  Mero        100     Meroitic Hieroglyphs (was Meroitic)
   1.858 +  -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC
   1.859 +- property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt
   1.860 +
   1.861 +* UnicodeData.txt changes
   1.862 +- new CJK block:
   1.863 +  2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;;
   1.864 +  2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;;
   1.865 +  -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion
   1.866 +
   1.867 +* build Unicode tools using CMake+make
   1.868 +
   1.869 +* run genpname/preparse.pl (on Linux)
   1.870 +  + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
   1.871 +  + make sure that data.h is writable
   1.872 +  + perl preparse.pl ~/svn.icu/trunk/src > out.txt
   1.873 +  + preparse.pl shows no errors, out.txt Info and Warning lines look ok
   1.874 +
   1.875 +* rebuild Unicode tools (at least genpname) using make
   1.876 +- You might first need to "make install" ICU so that the tools build can pick
   1.877 +  up the new definitions from the installed header files.
   1.878 +
   1.879 +* run genpname
   1.880 +- ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
   1.881 +- rebuild ICU & tools
   1.882 +
   1.883 +* update source/data/unidata/norm2/nfkc_cf.txt
   1.884 +- follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
   1.885 +
   1.886 +* update source/data/unidata/norm2/uts46.txt
   1.887 +- download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt
   1.888 +  to ~/svn.icu/tools/trunk/src/unicode/py
   1.889 +- adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values
   1.890 +- ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
   1.891 +- ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
   1.892 +
   1.893 +* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
   1.894 +  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
   1.895 +- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
   1.896 +- Unicode 6.0: U+2260, U+226E, U+226F
   1.897 +
   1.898 +* generate core properties data files
   1.899 +- ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
   1.900 +- rebuild ICU & tools
   1.901 +- run makeuca.sh so that genuca picks up the new nfc.nrm:
   1.902 +  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
   1.903 +- rebuild ICU & tools
   1.904 +
   1.905 +* implement new Script_Extensions property (provisional)
   1.906 +- parser & generator: genprops & uprops.icu
   1.907 +- uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/usettest.cpp
   1.908 +- UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java
   1.909 +
   1.910 +* switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2
   1.911 +- (one-time change)
   1.912 +- genbidi/gencase/genprops tools changes
   1.913 +- re-run makeprops.sh (see above)
   1.914 +- UCharacterProperty.java, UCharacterTypeIterator.java,
   1.915 +  UBiDiProps.java, UCaseProps.java, and several others with minor changes;
   1.916 +  UCharacterPropertyReader.java deleted and its code folded into UCharacterProperty.java
   1.917 +
   1.918 +* update Java data files
   1.919 +- refresh just the UCD-related files, just to be safe
   1.920 +- see (ICU4C)/source/data/icu4j-readme.txt
   1.921 +- mkdir /tmp/icu4j
   1.922 +- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   1.923 +  output:
   1.924 +    ...
   1.925 +    Unicode .icu files built to ./out/build/icudt45l
   1.926 +    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b
   1.927 +    echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
   1.928 +    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b
   1.929 +    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b
   1.930 +    mkdir -p /tmp/icu4j/main/shared/data
   1.931 +    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
   1.932 +- copy the big-endian Unicode data files to another location,
   1.933 +  separate from the other data files
   1.934 +    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
   1.935 +    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
   1.936 +    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
   1.937 +    ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/cnvalias.icu
   1.938 +    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
   1.939 +    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
   1.940 +    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
   1.941 +- refresh ICU4J
   1.942 +    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
   1.943 +
   1.944 +* refresh Java test .txt files
   1.945 +- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
   1.946 +
   1.947 +* un-hardcode normalization skippable (NF*_Inert) test data
   1.948 +- removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools
   1.949 +
   1.950 +* copy updated break iterator test files
   1.951 +- now handled by early ucdcopy.py and
   1.952 +  copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/testdata
   1.953 +  (old instructions:
   1.954 +   copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt
   1.955 +   to ~/svn.icu/trunk/src/source/test/testdata)
   1.956 +- they are not used in ICU4J
   1.957 +
   1.958 +* UCA
   1.959 +
   1.960 +- get output from Mark's tools; look in
   1.961 +    http://www.unicode.org/~book/incoming/mark/uca6.0.0/
   1.962 +    http://www.macchiato.com/unicode/utc/additional-uca-files
   1.963 +    http://www.unicode.org/Public/UCA/6.0.0/
   1.964 +    http://www.unicode.org/~mdavis/uca/
   1.965 +- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
   1.966 +- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
   1.967 +- update Han-implicit ranges for new CJK extensions:
   1.968 +  swapCJK() in ucol.cpp & ImplicitCEGenerator.java
   1.969 +- genuca: allow bytes 02 for U+FFFE, new merge-sort character;
   1.970 +  do not add it into invuca so that tailoring primary-after an ignorable works
   1.971 +- genuca: permit space between [variable top] bytes
   1.972 +- ucol.cpp: treat noncharacters like unassigned rather than ignorable
   1.973 +- run makeuca.sh:
   1.974 +  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
   1.975 +- rebuild ICU4C
   1.976 +- refresh ICU4J collation data:
   1.977 +  (subset of instructions above for properties data refresh, except copies all coll/*)
   1.978 +    ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   1.979 +    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
   1.980 +    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
   1.981 +    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
   1.982 +- update (ICU)/source/test/testdata/CollationTest_*.txt
   1.983 +  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
   1.984 +  with output from Mark's Unicode tools
   1.985 +- run all tests with the *_SHORT.txt or the full files (the full ones have comments)
   1.986 +- note on intltest: if collate/UCAConformanceTest fails, then
   1.987 +  utility/MultithreadTest/TestCollators will fail as well;
   1.988 +  fix the conformance test before looking into the multi-thread test
   1.989 +
   1.990 +* When refreshing all of ICU4J data from ICU4C
   1.991 +- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   1.992 +- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
   1.993 +or
   1.994 +- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
   1.995 +
   1.996 +*** LayoutEngine script information
   1.997 +
   1.998 +(For details see the Unicode 5.2 change log below.)
   1.999 +
  1.1000 +* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
  1.1001 +ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
  1.1002 +ScriptRunData.cpp, which is no longer needed.)
  1.1003 +
  1.1004 +The generated files have a current copyright date and "@draft" statement.
  1.1005 +
  1.1006 +* copy the above files into <icu>/source/layout, replacing the old files.
  1.1007 +* fix mixed line endings
  1.1008 +* review the diffs and fix incorrect @draft and missing aliases;
  1.1009 +  Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
  1.1010 +* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
  1.1011 +
  1.1012 +---------------------------------------------------------------------------- ***
  1.1013 +
  1.1014 +Unicode 5.2 update
  1.1015 +
  1.1016 +*** related ICU Trac tickets
  1.1017 +
  1.1018 +7084 Unicode 5.2
  1.1019 +
  1.1020 +7167 verify collation bytes
  1.1021 +7235 Java test NAME_ALIAS
  1.1022 +7236 Java DerivedCoreProperties.txt test
  1.1023 +7237 Java BidiTest.txt
  1.1024 +7238 UTrie2 in core unidata
  1.1025 +7239 test for tailoring gaps
  1.1026 +7240 Java fix CollationMiscTest
  1.1027 +7243 update layout engine for Unicode 5.2
  1.1028 +
  1.1029 +*** Unicode version numbers
  1.1030 +- makedata.mak
  1.1031 +- uchar.h
  1.1032 +- configure.in & configure
  1.1033 +- update ucdVersion in gennames.c if an algorithmic range changes
  1.1034 +
  1.1035 +*** data files & enums & parser code
  1.1036 +
  1.1037 +* file preparation
  1.1038 +
  1.1039 +python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata
  1.1040 +- includes finding files regardless of version numbers,
  1.1041 +  copying them, and performing the equivalent processing of the
  1.1042 +  ucdstrip and ucdmerge tools on the desired set of files
  1.1043 +
  1.1044 +* notes on changes
  1.1045 +- PropertyAliases.txt
  1.1046 +  moved from numeric to enumerated:
  1.1047 +    ccc       ; Canonical_Combining_Class
  1.1048 +  new string properties:
  1.1049 +    NFKC_CF   ; NFKC_Casefold
  1.1050 +    Name_Alias; Name_Alias
  1.1051 +  new binary properties:
  1.1052 +    Cased     ; Cased
  1.1053 +    CI        ; Case_Ignorable
  1.1054 +    CWCF      ; Changes_When_Casefolded
  1.1055 +    CWCM      ; Changes_When_Casemapped
  1.1056 +    CWKCF     ; Changes_When_NFKC_Casefolded
  1.1057 +    CWL       ; Changes_When_Lowercased
  1.1058 +    CWT       ; Changes_When_Titlecased
  1.1059 +    CWU       ; Changes_When_Uppercased
  1.1060 +  new CJK Unihan properties (not supported by ICU)
  1.1061 +- PropertyValueAliases.txt
  1.1062 +  new block names
  1.1063 +  new scripts
  1.1064 +  one script code change:
  1.1065 +    sc ; Qaai      ; Inherited
  1.1066 +    ->
  1.1067 +    sc ; Zinh      ; Inherited                        ; Qaai
  1.1068 +  new Line_Break (lb) value:
  1.1069 +    lb ; CP        ; Close_Parenthesis
  1.1070 +  new Joining_Group (jg) values: Farsi_Yeh, Nya
  1.1071 +  other new values:
  1.1072 +    ccc; 214; ATA  ; Attached_Above
  1.1073 +- DerivedBidiClass.txt
  1.1074 +  new default-R range: U+1E800 - U+1EFFF
  1.1075 +- UnicodeData.txt
  1.1076 +  all of the ISO comments are gone
  1.1077 +  new CJK block end:
  1.1078 +    9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last>
  1.1079 +  new CJK block:
  1.1080 +    2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;;
  1.1081 +    2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;;
  1.1082 +
  1.1083 +* genpname
  1.1084 +- run preparse.pl
  1.1085 +  + cd \svn\icuproj\icu\trunk\source\tools\genpname
  1.1086 +  + make sure that data.h is writable
  1.1087 +  + perl preparse.pl \svn\icuproj\icu\trunk > out.txt
  1.1088 +  + preparse.pl complains with errors like the following:
  1.1089 +      Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34.
  1.1090 +    This is because ICU 4.0 had scripts from ISO 15924 which are now
  1.1091 +    added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt
  1.1092 +    and PropertyValueAliases.txt.
  1.1093 +    -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
  1.1094 +       Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt
  1.1095 +  + preparse.pl complains with errors about block names missing from uchar.h; add them
  1.1096 +
  1.1097 +* uchar.h & uscript.h & uprops.h & uprops.c & genprops
  1.1098 +- new block & script values
  1.1099 +  + 26 new blocks
  1.1100 +    copy new blocks from Blocks.txt
  1.1101 +    MS VC++ 2008 regular expression:
  1.1102 +      find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$"
  1.1103 +      replace with "    UBLOCK_\3 = 172, /*[\1]*/"
  1.1104 +  + several new script values already added in ICU 4.0 for ISO 15924 coverage
  1.1105 +    (removed from SyntheticPropertyValueAliases.txt, see genpname notes above)
  1.1106 +  + 3 new script values added for ISO 15924 and Unicode 5.2 coverage
  1.1107 +  + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2)
  1.1108 +    (added to SyntheticPropertyValueAliases.txt)
  1.1109 +- new Joining Group (JG) values: Farsi_Yeh, Nya
  1.1110 +- new Line_Break (lb) value:
  1.1111 +    lb ; CP        ; Close_Parenthesis
  1.1112 +
  1.1113 +* hardcoded Unihan range end/limit
  1.1114 +- Unihan range end moves from 9FC3 to 9FCB
  1.1115 +  search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive)
  1.1116 +  + do change gennames.c
  1.1117 +
  1.1118 +* Compare definitions of new binary properties with what we used to use
  1.1119 +  in algorithms, to see if the definitions changed.
  1.1120 +- Verified that definitions for Cased and Case_Ignorable are unchanged.
  1.1121 +  The gencase tool now parses the newly public Case_Ignorable values
  1.1122 +  in case the definition changes in the future.
  1.1123 +
  1.1124 +* uchar.c & uprops.h & uprops.c & genprops
  1.1125 +- new numeric values that didn't exist in Unicode data before:
  1.1126 +    1/7, 1/9, 1/10, 3/10, 1/16, 3/16
  1.1127 +  the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5,
  1.1128 +  therefore redesign the encoding of numeric types and values for formatVersion 6;
  1.1129 +  design for simple numbers up to at least 144 ("one gross"),
  1.1130 +  large values up to at least 10^20,
  1.1131 +  and fractions with numerators -1..17 and denominators 1..16
  1.1132 +  to cover current and expected future values
  1.1133 +  (e.g., more Han numeric values, Meroitic twelfths)
  1.1134 +
  1.1135 +* reimplement Hangul_Syllable_Type for new Jamo characters
  1.1136 +- the old code assumed that all Jamo characters are in the 11xx block
  1.1137 +- Unicode 5.2 fills holes there and adds new Jamo characters in
  1.1138 +    A960..A97F; Hangul Jamo Extended-A
  1.1139 +  and in
  1.1140 +    D7B0..D7FF; Hangul Jamo Extended-B
  1.1141 +- Hangul_Syllable_Type can be trivially derived from a subset of
  1.1142 +  Grapheme_Cluster_Break values
  1.1143 +
  1.1144 +* build Unicode data source code for hardcoding core data
  1.1145 +C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data
  1.1146 +
  1.1147 +ICU data make path is \svn\icuproj\icu\trunk\source\data\
  1.1148 +ICU root path is \svn\icuproj\icu\trunk
  1.1149 +Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
  1.1150 +Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
  1.1151 +Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
  1.1152 +Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
  1.1153 +Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
  1.1154 +Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
  1.1155 +Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
  1.1156 +Information: cannot find "spreplocal.mk". Not building user-additional stringprep files.
  1.1157 +Creating data file for Unicode Property Names
  1.1158 +Creating data file for Unicode Character Properties
  1.1159 +Creating data file for Unicode Case Mapping Properties
  1.1160 +Creating data file for Unicode BiDi/Shaping Properties
  1.1161 +Creating data file for Unicode Normalization
  1.1162 +Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l"
  1.1163 +Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp"
  1.1164 +
  1.1165 +- copy the .c source files to C:\svn\icuproj\icu\trunk\source\common
  1.1166 +  and rebuild the common library
  1.1167 +
  1.1168 +*** UCA
  1.1169 +
  1.1170 +- update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools)
  1.1171 +- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools
  1.1172 +- update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools
  1.1173 +[ Begin obsolete instructions:
  1.1174 +  Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files.
  1.1175 +    - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py
  1.1176 +      on Windows:
  1.1177 +        python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt
  1.1178 +        python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt
  1.1179 +  End obsolete instructions]
  1.1180 +- run all tests with the *_SHORT.txt or the full files (the full ones have comments)
  1.1181 +  not just the *_STUB.txt files
  1.1182 +- note on intltest: if collate/UCAConformanceTest fails, then
  1.1183 +  utility/MultithreadTest/TestCollators will fail as well;
  1.1184 +  fix the conformance test before looking into the multi-thread test
  1.1185 +
  1.1186 +*** Implement Cased & Case_Ignorable properties
  1.1187 +- via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable()
  1.1188 +- Problem: These properties should be disjoint, but aren't
  1.1189 +- UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not
  1.1190 +- change ucase.icu to be able to store any combination of Cased and Case_Ignorable
  1.1191 +
  1.1192 +*** Implement Changes_When_Xyz properties
  1.1193 +- without stored data
  1.1194 +
  1.1195 +*** Implement Name_Alias property
  1.1196 +- add it as another name field in unames.icu
  1.1197 +- make it available via u_charName() and UCharNameChoice and
  1.1198 +- consider it in u_charFromName()
  1.1199 +
  1.1200 +*** Break iterators
  1.1201 +
  1.1202 +* Update break iterator rules to new UAX versions and new property values
  1.1203 +* Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary
  1.1204 +
  1.1205 +*** new BidiTest file
  1.1206 +- review format and data
  1.1207 +- copy BidiTest.txt to source/test/testdata
  1.1208 +- write test code using this data
  1.1209 +- fix ICU code where it fails the conformance test
  1.1210 +
  1.1211 +*** Java
  1.1212 +- generally, find and update code corresponding to C/C++
  1.1213 +- UCharacter.UnicodeBlock constants:
  1.1214 +  a) add an _ID integer per new block, update COUNT
  1.1215 +  b) add a class instance per new block
  1.1216 +     Visual Studio regex:
  1.1217 +        find            UBLOCK_{[^ ]+} = [0-9]+, {/.+}
  1.1218 +        replace with    public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
  1.1219 +- CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias()
  1.1220 +
  1.1221 +- port test changes to Java
  1.1222 +
  1.1223 +*** LayoutEngine script information
  1.1224 +
  1.1225 +(For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833)
  1.1226 +
  1.1227 +* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
  1.1228 +ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
  1.1229 +ScriptRunData.cpp, which is no longer needed.)
  1.1230 +
  1.1231 +The generated files have a current copyright date and "@draft" statement.
  1.1232 +
  1.1233 +-> Eric Mader wrote in email on 20090930:
  1.1234 +    "I think the tool has been modified to update @draft to @stable for
  1.1235 +     older scripts and to add @draft for new scripts.
  1.1236 +     (I worked with an intern on this last year.)
  1.1237 +     You should check the output after you run it."
  1.1238 +
  1.1239 +* copy the above files into <icu>/source/layout, replacing the old files.
  1.1240 +* fix mixed line endings
  1.1241 +* review the diffs and fix incorrect @draft and missing aliases
  1.1242 +* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
  1.1243 +
  1.1244 +Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
  1.1245 +and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
  1.1246 +
  1.1247 +-> Eric Mader wrote in email on 20090930:
  1.1248 +    "This is just a matter of making sure that all the per-script tables have
  1.1249 +     entries for any new scripts that were added.
  1.1250 +     If any new Indic characters were added, then the class tables in
  1.1251 +     IndicClassTables.cpp should be updated to reflect this.
  1.1252 +     John Emmons should know how to do this if it's required."
  1.1253 +
  1.1254 +* rebuild the layout and layoutex libraries.
  1.1255 +
  1.1256 +*** Documentation
  1.1257 +- Update User Guide
  1.1258 +  + Jamo_Short_Name, sfc->scf, binary property value aliases
  1.1259 +
  1.1260 +---------------------------------------------------------------------------- ***
  1.1261 +
  1.1262 +Unicode 5.1 update
  1.1263 +
  1.1264 +*** related ICU Trac tickets
  1.1265 +
  1.1266 +5696 Update to Unicode 5.1
  1.1267 +
  1.1268 +*** Unicode version numbers
  1.1269 +- makedata.mak
  1.1270 +- uchar.h
  1.1271 +- configure.in & configure
  1.1272 +- update ucdVersion in gennames.c if an algorithmic range changes
  1.1273 +
  1.1274 +*** data files & enums & parser code
  1.1275 +
  1.1276 +* file preparation
  1.1277 +- ucdstrip:
  1.1278 +    DerivedCoreProperties.txt
  1.1279 +    DerivedNormalizationProps.txt
  1.1280 +    NormalizationTest.txt
  1.1281 +    PropList.txt
  1.1282 +    Scripts.txt
  1.1283 +    GraphemeBreakProperty.txt
  1.1284 +    SentenceBreakProperty.txt
  1.1285 +    WordBreakProperty.txt
  1.1286 +- ucdstrip and ucdmerge:
  1.1287 +    EastAsianWidth.txt
  1.1288 +    LineBreak.txt
  1.1289 +
  1.1290 +* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
  1.1291 +copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\
  1.1292 +copy 5.1.0\ucd\Blocks.txt ..\unidata\
  1.1293 +copy 5.1.0\ucd\CaseFolding.txt ..\unidata\
  1.1294 +copy 5.1.0\ucd\DerivedAge.txt ..\unidata\
  1.1295 +copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
  1.1296 +copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
  1.1297 +copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
  1.1298 +copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
  1.1299 +copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\
  1.1300 +copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\
  1.1301 +copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\
  1.1302 +copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\
  1.1303 +copy 5.1.0\ucd\UnicodeData.txt ..\unidata\
  1.1304 +
  1.1305 +ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
  1.1306 +ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
  1.1307 +ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
  1.1308 +ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt
  1.1309 +ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
  1.1310 +ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
  1.1311 +ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
  1.1312 +ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
  1.1313 +ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
  1.1314 +ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
  1.1315 +
  1.1316 +* genpname
  1.1317 +- run preparse.pl
  1.1318 +  + cd \svn\icuproj\icu\uni51\source\tools\genpname
  1.1319 +  + make sure that data.h is writable
  1.1320 +  + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt
  1.1321 +  + preparse.pl complains with errors like the following:
  1.1322 +      Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30.
  1.1323 +    This is because ICU 3.8 had scripts from ISO 15924 which are now
  1.1324 +    added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt
  1.1325 +    and PropertyValueAliases.txt.
  1.1326 +    -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
  1.1327 +       Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii
  1.1328 +  + PropertyValueAliases.txt now explicitly contains values for boolean properties:
  1.1329 +      N/Y, No/Yes, F/T, False/True
  1.1330 +    -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases.
  1.1331 +       It will use further values from the file if present.
  1.1332 +
  1.1333 +* uchar.h & uscript.h & uprops.h & uprops.c & genprops
  1.1334 +- new block & script values
  1.1335 +  + 17 new blocks
  1.1336 +  + 11 new script values already added in ICU 3.8 for ISO 15924 coverage
  1.1337 +    (removed from SyntheticPropertyValueAliases.txt)
  1.1338 +  + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1)
  1.1339 +    (added to SyntheticPropertyValueAliases.txt)
  1.1340 +- uprops.icu (uprops.h) only provides 7 bits for script codes.
  1.1341 +  In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now.
  1.1342 +  There is none above 127 yet which is the script code for an
  1.1343 +  assigned Unicode character, so ICU 4.0 uprops.icu does not store any
  1.1344 +  script code values greater than 127.
  1.1345 +  However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129
  1.1346 +  in a parallel bit field, and that overflows now.
  1.1347 +  Also, future values >=128 would be incompatible anyway.
  1.1348 +  uprops.h is modified to move around several of the bit fields
  1.1349 +  in the properties vector words, and now uses 8 bits for the script code.
  1.1350 +  Two other bit fields also grow to accommodate future growth:
  1.1351 +  Block (current count: 172) grows from 8 to 9 bits,
  1.1352 +  and Word_Break grows from 4 to 5 bits.
  1.1353 +- renamed property Simple_Case_Folding (sfc->scf)
  1.1354 +  + nothing to be done: handled as normal alias
  1.1355 +- new property JSN Jamo_Short_Name
  1.1356 +  + no new API: only contributes to the Name property
  1.1357 +- new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark
  1.1358 +- new Joining Group (JG) value: Burushashki_Yeh_Barree
  1.1359 +- new Sentence_Break (SB) values:
  1.1360 +    SB ; CR        ; CR
  1.1361 +    SB ; EX        ; Extend
  1.1362 +    SB ; LF        ; LF
  1.1363 +    SB ; SC        ; SContinue
  1.1364 +- new Word_Break (WB) values:
  1.1365 +    WB ; CR        ; CR
  1.1366 +    WB ; Extend    ; Extend
  1.1367 +    WB ; LF        ; LF
  1.1368 +    WB ; MB        ; MidNumLet
  1.1369 +
  1.1370 +* Further changes in the 2008-02-29 update:
  1.1371 +- Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP
  1.1372 +  because they should not normally be invisible.
  1.1373 +- new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed)
  1.1374 +- new Grapheme_Cluster_Break (GCB) value: PP=Prepend
  1.1375 +- new Word_Break (WB) value: NL=Newline
  1.1376 +
  1.1377 +* hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison)
  1.1378 +- Unihan range end moves from 9FBB to 9FC3
  1.1379 +  search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive)
  1.1380 +  + do change gennames.c
  1.1381 +
  1.1382 +* build Unicode data source code for hardcoding core data
  1.1383 +C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data
  1.1384 +
  1.1385 +ICU data make path is \svn\icuproj\icu\uni51\source\data\
  1.1386 +ICU root path is \svn\icuproj\icu\uni51
  1.1387 +Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
  1.1388 +Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
  1.1389 +Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
  1.1390 +Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
  1.1391 +Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
  1.1392 +Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
  1.1393 +Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
  1.1394 +Creating data file for Unicode Character Properties
  1.1395 +Creating data file for Unicode Case Mapping Properties
  1.1396 +Creating data file for Unicode BiDi/Shaping Properties
  1.1397 +Creating data file for Unicode Normalization
  1.1398 +Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l"
  1.1399 +Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp"
  1.1400 +
  1.1401 +- copy the .c source files to C:\svn\icuproj\icu\uni51\source\common
  1.1402 +  and rebuild the common library
  1.1403 +
  1.1404 +*** Break iterators
  1.1405 +
  1.1406 +* Update break iterator rules to new UAX versions and new property values
  1.1407 +
  1.1408 +*** UCA
  1.1409 +
  1.1410 +* update FractionalUCA.txt and UCARules.txt with new canonical closure
  1.1411 +
  1.1412 +*** Test suites
  1.1413 +- Test that APIs using Unicode property value aliases (like UnicodeSet)
  1.1414 +  support all of the boolean values N/Y, No/Yes, F/T, False/True
  1.1415 +  -> TestBinaryValues() tests in both cintltst and intltest
  1.1416 +
  1.1417 +*** LayoutEngine script information
  1.1418 +* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
  1.1419 +ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
  1.1420 +ScriptRunData.cpp, which is no longer needed.)
  1.1421 +
  1.1422 +The generated files have a current copyright date and "@draft" statement.
  1.1423 +
  1.1424 +* copy the above files into <icu>/source/layout, replacing the old files.
  1.1425 +
  1.1426 +Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
  1.1427 +and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
  1.1428 +
  1.1429 +* rebuild the layout and layoutex libraries.
  1.1430 +
  1.1431 +*** Documentation
  1.1432 +- Update User Guide
  1.1433 +  + Jamo_Short_Name, sfc->scf, binary property value aliases
  1.1434 +
  1.1435 +---------------------------------------------------------------------------- ***
  1.1436 +
  1.1437 +Unicode 5.0 update
  1.1438 +
  1.1439 +*** related Jitterbugs
  1.1440 +
  1.1441 +5084 RFE: Update to Unicode 5.0
  1.1442 +
  1.1443 +*** data files & enums & parser code
  1.1444 +
  1.1445 +* file preparation
  1.1446 +- ucdstrip:
  1.1447 +    DerivedCoreProperties.txt
  1.1448 +    DerivedNormalizationProps.txt
  1.1449 +    NormalizationTest.txt
  1.1450 +    PropList.txt
  1.1451 +    Scripts.txt
  1.1452 +    GraphemeBreakProperty.txt
  1.1453 +    SentenceBreakProperty.txt
  1.1454 +    WordBreakProperty.txt
  1.1455 +- ucdstrip and ucdmerge:
  1.1456 +    EastAsianWidth.txt
  1.1457 +    LineBreak.txt
  1.1458 +
  1.1459 +* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
  1.1460 +copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
  1.1461 +copy 5.0.0\ucd\Blocks.txt ..\unidata\
  1.1462 +copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
  1.1463 +copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
  1.1464 +copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
  1.1465 +copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
  1.1466 +copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
  1.1467 +copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
  1.1468 +copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
  1.1469 +copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
  1.1470 +copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
  1.1471 +copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
  1.1472 +copy 5.0.0\ucd\UnicodeData.txt ..\unidata\
  1.1473 +
  1.1474 +ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
  1.1475 +ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
  1.1476 +ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
  1.1477 +ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
  1.1478 +ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
  1.1479 +ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
  1.1480 +ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
  1.1481 +ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
  1.1482 +ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
  1.1483 +ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
  1.1484 +
  1.1485 +* update FractionalUCA.txt and UCARules.txt with new canonical closure
  1.1486 +
  1.1487 +* genpname
  1.1488 +- run preparse.pl
  1.1489 +  + make sure that data.h is writable
  1.1490 +  + perl preparse.pl \cvs\oss\icu > out.txt
  1.1491 +
  1.1492 +* uchar.h & uscript.h & uprops.h & uprops.c & genprops
  1.1493 +- new block & script values
  1.1494 +  + script values already added in ICU 3.6 because all of ISO 15924 is now covered
  1.1495 +
  1.1496 +* build Unicode data source code for hardcoding core data
  1.1497 +C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data
  1.1498 +
  1.1499 +ICU data make path is \cvs\oss\icu\source\data\
  1.1500 +ICU root path is \cvs\oss\icu
  1.1501 +Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
  1.1502 +[etc.]
  1.1503 +Creating data file for Unicode Character Properties
  1.1504 +Creating data file for Unicode Case Mapping Properties
  1.1505 +Creating data file for Unicode BiDi/Shaping Properties
  1.1506 +Creating data file for Unicode Normalization
  1.1507 +Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
  1.1508 +Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"
  1.1509 +
  1.1510 +- copy the .c source files to C:\cvs\oss\icu\source\common
  1.1511 +  and rebuild the common library
  1.1512 +
  1.1513 +*** Unicode version numbers
  1.1514 +- makedata.mak
  1.1515 +- uchar.h
  1.1516 +- configure.in
  1.1517 +
  1.1518 +*** LayoutEngine script information
  1.1519 +* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
  1.1520 +ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
  1.1521 +ScriptRunData.cpp, which is no longer needed.)
  1.1522 +
  1.1523 +The generated files have a current copyright date and "@draft" statement.
  1.1524 +
  1.1525 +* copy the above files into <icu>/source/layout, replacing the old files.
  1.1526 +
  1.1527 +Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
  1.1528 +and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
  1.1529 +
  1.1530 +* rebuild the layout and layoutex libraries.
  1.1531 +
  1.1532 +---------------------------------------------------------------------------- ***
  1.1533 +
  1.1534 +Unicode 4.1 update
  1.1535 +
  1.1536 +*** related Jitterbugs
  1.1537 +
  1.1538 +4332 RFE: Update to Unicode 4.1
  1.1539 +4157 RBBI, TR29 4.1 updates
  1.1540 +
  1.1541 +*** data files & enums & parser code
  1.1542 +
  1.1543 +* file preparation
  1.1544 +- ucdstrip:
  1.1545 +    DerivedCoreProperties.txt
  1.1546 +    DerivedNormalizationProps.txt
  1.1547 +    NormalizationTest.txt
  1.1548 +    GraphemeBreakProperty.txt
  1.1549 +    SentenceBreakProperty.txt
  1.1550 +    WordBreakProperty.txt
  1.1551 +- ucdstrip and ucdmerge:
  1.1552 +    EastAsianWidth.txt
  1.1553 +    LineBreak.txt
  1.1554 +
  1.1555 +* add new files to the repository
  1.1556 +    GraphemeBreakProperty.txt
  1.1557 +    SentenceBreakProperty.txt
  1.1558 +    WordBreakProperty.txt
  1.1559 +
  1.1560 +* update FractionalUCA.txt and UCARules.txt with new canonical closure
  1.1561 +
  1.1562 +* genpname
  1.1563 +- handle new enumerated properties in sub read_uchar
  1.1564 +- run preparse.pl
  1.1565 +
  1.1566 +* uchar.h & uscript.h & uprops.h & uprops.c & genprops
  1.1567 +- new binary properties
  1.1568 +  + Pattern_Syntax
  1.1569 +  + Pattern_White_Space
  1.1570 +- new enumerated properties
  1.1571 +  + Grapheme_Cluster_Break
  1.1572 +  + Sentence_Break
  1.1573 +  + Word_Break
  1.1574 +- new block & script & line break values
  1.1575 +
  1.1576 +* gencase
  1.1577 +- case-ignorable changes
  1.1578 +  see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
  1.1579 +  now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk
  1.1580 +
  1.1581 +*** Unicode version numbers
  1.1582 +- makedata.mak
  1.1583 +- uchar.h
  1.1584 +- configure.in
  1.1585 +
  1.1586 +*** tests
  1.1587 +- verify that u_charMirror() round-trips
  1.1588 +- test all new properties and some new values of old properties
  1.1589 +
  1.1590 +*** other code
  1.1591 +
  1.1592 +* hardcoded Unihan range end/limit
  1.1593 +- Unihan range end moves from 9FA5 to 9FBB
  1.1594 +  search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
  1.1595 +  + do not modify BOCU/BOCSU code because that would change the encoding
  1.1596 +    and break binary compatibility!
  1.1597 +  + similarly, do not change the GB 18030 range data (ucnvmbcs.c),
  1.1598 +    NamePrepProfile.txt
  1.1599 +  + ignore trietest.c: test data is arbitrary
  1.1600 +  + ignore tstnorm.cpp: test optimization, not important
  1.1601 +  + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
  1.1602 +  + do change line_th.txt and word_th.txt
  1.1603 +    by replacing hardcoded ranges with the new property values
  1.1604 +  + do change gennames.c
  1.1605 +
  1.1606 +source\data\brkitr\line_th.txt(229):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
  1.1607 +source\data\brkitr\word_th.txt(23):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
  1.1608 +source\tools\gennames\gennames.c(971):        0x4e00, 0x9fa5,
  1.1609 +
  1.1610 +* case mappings
  1.1611 +- compare new special casing context conditions with previous ones
  1.1612 +  see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
  1.1613 +
  1.1614 +* genpname
  1.1615 +- consider storing only the short name if it is the same as the long name
  1.1616 +
  1.1617 +*** other reviews
  1.1618 +- UAX #29 changes (grapheme/word/sentence breaks)
  1.1619 +- UAX #14 changes (line breaks)
  1.1620 +- Pattern_Syntax & Pattern_White_Space
  1.1621 +
  1.1622 +---------------------------------------------------------------------------- ***
  1.1623 +
  1.1624 +Unicode 4.0.1 update
  1.1625 +
  1.1626 +*** related Jitterbugs
  1.1627 +
  1.1628 +3170 RFE: Update to Unicode 4.0.1
  1.1629 +3171 Add new Unicode 4.0.1 properties
  1.1630 +3520 use Unicode 4.0.1 updates for break iteration
  1.1631 +
  1.1632 +*** data files & enums & parser code
  1.1633 +
  1.1634 +* file preparation
  1.1635 +- ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
  1.1636 +- ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt
  1.1637 +
  1.1638 +* file fixes
  1.1639 +- fix UnicodeData.txt general categories of Ethiopic digits Nd->No
  1.1640 +  according to PRI #26
  1.1641 +  http://www.unicode.org/review/resolved-pri.html#pri26
  1.1642 +- undone again because no corrigendum in sight;
  1.1643 +  instead modified tests to not check consistency on this for Unicode 4.0.1
  1.1644 +
  1.1645 +* ucdterms.txt
  1.1646 +- update from http://www.unicode.org/copyright.html
  1.1647 +  formatted for plain text
  1.1648 +
  1.1649 +* uchar.h & uprops.h & uprops.c & genprops
  1.1650 +- add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
  1.1651 +- add U_LB_INSEPARABLE due to a spelling fix
  1.1652 +  + put short name comment only on line with new constant
  1.1653 +    for genpname perl script parser
  1.1654 +- new binary properties
  1.1655 +  + STerm
  1.1656 +  + Variation_Selector
  1.1657 +
  1.1658 +* genpname
  1.1659 +- fix genpname perl script so that it doesn't choke on more than 2 names per property value
  1.1660 +- perl script: correctly calculate the maximum number of fields per row
  1.1661 +
  1.1662 +* uscript.h
  1.1663 +- new script code Hrkt=Katakana_Or_Hiragana
  1.1664 +
  1.1665 +* gennorm.c track changes in DerivedNormalizationProps.txt
  1.1666 +- "FNC" -> "FC_NFKC"
  1.1667 +- single field "NFD_NO" -> two fields "NFD_QC; N" etc.
  1.1668 +
  1.1669 +* genprops/props2.c track changes in DerivedNumericValues.txt
  1.1670 +- changed from 3 columns to 2, dropping the numeric type
  1.1671 +  + assume that the type is always numeric for Han characters,
  1.1672 +    and that only those are added in addition to what UnicodeData.txt lists
  1.1673 +
  1.1674 +*** Unicode version numbers
  1.1675 +- makedata.mak
  1.1676 +- uchar.h
  1.1677 +- configure.in
  1.1678 +
  1.1679 +*** tests
  1.1680 +- update test of default bidi classes according to PRI #28
  1.1681 +  /tsutil/cucdtst/TestUnicodeData
  1.1682 +  http://www.unicode.org/review/resolved-pri.html#pri28
  1.1683 +- bidi tests: change exemplar character for ES depending on Unicode version
  1.1684 +- change hardcoded expected property values where they change
  1.1685 +
  1.1686 +*** other code
  1.1687 +
  1.1688 +* name matching
  1.1689 +- read UCD.html
  1.1690 +
  1.1691 +* scripts
  1.1692 +- use new Hrkt=Katakana_Or_Hiragana
  1.1693 +
  1.1694 +* ZWJ & ZWNJ
  1.1695 +- are now part of combining character sequences
  1.1696 +- break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ

mercurial