|
1 * Copyright (C) 2004-2013, International Business Machines |
|
2 * Corporation and others. All Rights Reserved. |
|
3 * |
|
4 * file name: changes.txt |
|
5 * encoding: US-ASCII |
|
6 * tab size: 8 (not used) |
|
7 * indentation:4 |
|
8 * |
|
9 * created on: 2004may06 |
|
10 * created by: Markus W. Scherer |
|
11 * |
|
12 * change log for Unicode updates |
|
13 |
|
14 ---------------------------------------------------------------------------- *** |
|
15 |
|
16 Unicode 6.3 update |
|
17 |
|
18 http://www.unicode.org/review/pri249/ -- beta review |
|
19 http://www.unicode.org/reports/uax-proposed-updates.html |
|
20 http://www.unicode.org/versions/beta-6.3.0.html#notable_issues |
|
21 http://www.unicode.org/reports/tr44/tr44-11.html |
|
22 |
|
23 *** ICU Trac |
|
24 |
|
25 - ticket 10128: update ICU to Unicode 6.3 beta |
|
26 - ticket 10168: update ICU to Unicode 6.3 final |
|
27 - C++ branches/markus/uni63 at r33552 from trunk at r33551 |
|
28 - Java branches/markus/uni63 at r33550 from trunk at r33553 |
|
29 |
|
30 - ticket 10142: implement Unicode 6.3 bidi algorithm additions |
|
31 |
|
32 *** Unicode version numbers |
|
33 - makedata.mak |
|
34 - uchar.h |
|
35 (configure.in & configure: have been modified to extract the version from uchar.h) |
|
36 - com.ibm.icu.util.VersionInfo |
|
37 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_ |
|
38 |
|
39 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h |
|
40 so that the makefiles see the new version number. |
|
41 |
|
42 *** data files & enums & parser code |
|
43 |
|
44 * file preparation |
|
45 |
|
46 - download UCD, UCA & IDNA files |
|
47 - make sure that the Unicode data folder passed into preparseucd.py |
|
48 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder) |
|
49 - modify preparseucd.py: |
|
50 parse new file BidiBrackets.txt |
|
51 with new properties bpb=Bidi_Paired_Bracket and bpt=Bidi_Paired_Bracket_Type |
|
52 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni63/20130425 ~/svn.icu/uni63/src ~/svn.icutools/trunk/src |
|
53 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders. |
|
54 - Check test file diffs for previously commented-out, known-failing data lines; |
|
55 probably need to keep those commented out. |
|
56 |
|
57 * PropertyAliases.txt changes |
|
58 - 1 new Enumerated Property |
|
59 bpt ; Bidi_Paired_Bracket_Type |
|
60 -> uchar.h & UProperty.java & UCharacter.BidiPairedBracketType |
|
61 -> ubidi_props.h & .c & UBiDiProps.java |
|
62 -> remember to write the max value at UBIDI_MAX_VALUES_INDEX |
|
63 -> uprops.cpp |
|
64 -> change ubidi.icu format version from 2.0 to 2.1 |
|
65 - 1 new Miscellaneous Property |
|
66 bpb ; Bidi_Paired_Bracket |
|
67 -> uchar.h & UProperty.java |
|
68 -> ppucd.h & .cpp |
|
69 |
|
70 * PropertyValueAliases.txt changes |
|
71 - 3 Bidi_Paired_Bracket_Type (bpt) values: |
|
72 bpt; c ; Close |
|
73 bpt; n ; None |
|
74 bpt; o ; Open |
|
75 -> uchar.h & UCharacter.BidiPairedBracketType |
|
76 -> ubidi_props.h & .c & UBiDiProps.java |
|
77 -> change ubidi.icu format version from 2.0 to 2.1 |
|
78 - 4 new Bidi_Class (bc) values: |
|
79 bc ; FSI ; First_Strong_Isolate |
|
80 bc ; LRI ; Left_To_Right_Isolate |
|
81 bc ; RLI ; Right_To_Left_Isolate |
|
82 bc ; PDI ; Pop_Directional_Isolate |
|
83 -> uchar.h & UCharacterEnums.ECharacterDirection |
|
84 -> until the bidi code gets updated, |
|
85 Roozbeh suggests mapping the new bc values to ON (Other_Neutral) |
|
86 - 3 new Word_Break (WB) values: |
|
87 WB ; HL ; Hebrew_Letter |
|
88 WB ; SQ ; Single_Quote |
|
89 WB ; DQ ; Double_Quote |
|
90 -> uchar.h & UCharacter.WordBreak |
|
91 -> first time Word_Break numeric constants exceed 4 bits (now 17 values) |
|
92 - 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html |
|
93 (added 2012-10-16) |
|
94 Aghb 239 Caucasian Albanian |
|
95 Mahj 314 Mahajani |
|
96 -> uscript.h |
|
97 -> com.ibm.icu.lang.UScript |
|
98 find USCRIPT_([^ ]+) *= ([0-9]+),(.+) |
|
99 replace public static final int \1 = \2;\3 |
|
100 -> preparseucd.py _scripts_only_in_iso15924 |
|
101 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI() |
|
102 and in com.ibm.icu.dev.test.lang.TestUScript.java |
|
103 -> update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata |
|
104 (not strictly necessary for NOT_ENCODED scripts) |
|
105 |
|
106 * generate normalization data files |
|
107 - ~/svn.icu/uni63/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni63/dbg/lib |
|
108 - ~/svn.icu/uni63/dbg$ SRC_DATA_IN=~/svn.icu/uni63/src/source/data/in |
|
109 - ~/svn.icu/uni63/dbg$ UNIDATA=~/svn.icu/uni63/src/source/data/unidata |
|
110 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt |
|
111 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt |
|
112 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt |
|
113 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt |
|
114 |
|
115 * build ICU (make install) |
|
116 so that the tools build can pick up the new definitions from the installed header files. |
|
117 |
|
118 ~/svn.icu/uni63/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt |
|
119 |
|
120 * build Unicode tools using CMake+make |
|
121 |
|
122 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt: |
|
123 |
|
124 # Location (--prefix) of where ICU was installed. |
|
125 set(ICU_INST_DIR /home/mscherer/svn.icu/uni63/inst) |
|
126 # Location of the ICU source tree. |
|
127 set(ICU_SRC_DIR /home/mscherer/svn.icu/uni63/src) |
|
128 |
|
129 ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c |
|
130 ~/svn.icutools/trunk/dbg/unicode/c$ make |
|
131 |
|
132 * generate core properties data files |
|
133 - ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops ~/svn.icu/uni63/src |
|
134 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca -i ~/svn.icu/uni63/dbg/data/out/build/icudt52l ~/svn.icu/uni63/src |
|
135 - rebuild ICU (make install) & tools |
|
136 - run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm |
|
137 - rebuild ICU (make install) & tools |
|
138 |
|
139 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to |
|
140 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar) |
|
141 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters |
|
142 - Unicode 6.0..6.3: U+2260, U+226E, U+226F |
|
143 - nothing new in 6.3, no test file to update |
|
144 |
|
145 * update Java data files |
|
146 - refresh just the UCD-related files, just to be safe |
|
147 - see (ICU4C)/source/data/icu4j-readme.txt |
|
148 - mkdir /tmp/icu4j |
|
149 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install |
|
150 output: |
|
151 ... |
|
152 Unicode .icu files built to ./out/build/icudt52l |
|
153 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt52b |
|
154 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b |
|
155 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt |
|
156 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt52l.dat ./out/icu4j/icudt52b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt52l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt52b |
|
157 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b" |
|
158 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt52b/ |
|
159 mkdir -p /tmp/icu4j/main/shared/data |
|
160 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data |
|
161 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt52b/ |
|
162 mkdir -p /tmp/icu4j/main/shared/data |
|
163 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data |
|
164 make[1]: Leaving directory `/home/mscherer/svn.icu/uni63/dbg/data' |
|
165 - copy the big-endian Unicode data files to another location, |
|
166 separate from the other data files |
|
167 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll |
|
168 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr |
|
169 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b |
|
170 ~/svn.icu/uni63/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/cnvalias.icu |
|
171 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b |
|
172 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll |
|
173 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr |
|
174 - refresh ICU4J |
|
175 ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b |
|
176 |
|
177 * refresh Java test .txt files |
|
178 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode |
|
179 |
|
180 * UCA -- mostly skipped for ICU 52 / Unicode 6.3, except update coll/* files |
|
181 |
|
182 - get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/ |
|
183 - CLDR root files for ICU are in CollationAuxiliary.zip; unpack that |
|
184 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt |
|
185 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt |
|
186 (note removing the underscore before "Rules") |
|
187 - update (ICU4C)/source/test/testdata/CollationTest_*.txt |
|
188 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt |
|
189 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt) |
|
190 - check test file diffs for previously commented-out, known-failing data lines; |
|
191 probably need to keep those commented out |
|
192 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani |
|
193 - run genuca, see command line above |
|
194 - rebuild ICU4C |
|
195 - refresh ICU4J collation data: |
|
196 (subset of instructions above for properties data refresh, except copies all coll/*) |
|
197 ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install |
|
198 ~/svn.icu/uni63/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll |
|
199 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll |
|
200 ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b |
|
201 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging) |
|
202 - note on intltest: if collate/UCAConformanceTest fails, then |
|
203 utility/MultithreadTest/TestCollators will fail as well; |
|
204 fix the conformance test before looking into the multi-thread test |
|
205 |
|
206 * test ICU, fix test code where necessary |
|
207 |
|
208 * When refreshing all of ICU4J data from ICU4C |
|
209 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install |
|
210 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data |
|
211 or |
|
212 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install |
|
213 |
|
214 *** LayoutEngine script information |
|
215 - skipped for Unicode 6.3: no new scripts |
|
216 |
|
217 *** merge the Unicode update branches back onto the trunk |
|
218 - do not merge the icudata.jar and testdata.jar, |
|
219 instead rebuild them from merged & tested ICU4C |
|
220 |
|
221 ---------------------------------------------------------------------------- *** |
|
222 |
|
223 Unicode 6.2 update |
|
224 |
|
225 http://www.unicode.org/review/pri230/ |
|
226 http://www.unicode.org/versions/beta-6.2.0.html |
|
227 http://www.unicode.org/reports/tr44/tr44-9.html#Unicode_6.2.0 |
|
228 http://www.unicode.org/review/pri227/ Changes to Script Extensions Property Values |
|
229 http://www.unicode.org/review/pri228/ Changing some common characters from Punctuation to Symbol |
|
230 http://www.unicode.org/review/pri229/ Linebreaking Changes for Pictographic Symbols |
|
231 http://www.unicode.org/reports/tr46/tr46-8.html IDNA |
|
232 http://unicode.org/Public/idna/6.2.0/ |
|
233 |
|
234 *** ICU Trac |
|
235 |
|
236 - ticket 9515: Unicode 6.2: final ICU update |
|
237 |
|
238 - ticket 9514: UCA 6.2: fix UCARules.txt |
|
239 |
|
240 - ticket 9437: update ICU to Unicode 6.2 |
|
241 - C++ branches/markus/uni62 at r32050 from trunk at r32041 |
|
242 - Java branches/markus/uni62 at r32068 from trunk at r32066 |
|
243 |
|
244 *** Unicode version numbers |
|
245 - makedata.mak |
|
246 - uchar.h |
|
247 (configure.in & configure: have been modified to extract the version from uchar.h) |
|
248 - com.ibm.icu.util.VersionInfo |
|
249 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_ |
|
250 |
|
251 *** data files & enums & parser code |
|
252 |
|
253 * file preparation |
|
254 |
|
255 - download UCD, UCA & IDNA files |
|
256 - make sure that the Unicode data folder passed into preparseucd.py |
|
257 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder) |
|
258 - modify preparseucd.py: NamesList.txt is now in UTF-8 |
|
259 - ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni62/20120816 ~/svn.icu/uni62/src ~/svn.icu/tools/trunk/src |
|
260 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders. |
|
261 - Check test file diffs for previously commented-out, known-failing data lines; |
|
262 probably need to keep those commented out. |
|
263 |
|
264 * PropertyValueAliases.txt changes |
|
265 - 1 new Line_Break (lb) value: |
|
266 lb ; RI ; Regional_Indicator |
|
267 -> uchar.h & UCharacter.LineBreak |
|
268 - 1 new Word_Break (WB) value: |
|
269 WB ; RI ; Regional_Indicator |
|
270 -> uchar.h & UCharacter.WordBreak |
|
271 - 1 new Grapheme_Cluster_Break (GCB) value: |
|
272 GCB; RI ; Regional_Indicator |
|
273 -> uchar.h & UCharacter.GraphemeClusterBreak |
|
274 |
|
275 * 3 new numeric values |
|
276 The new value -1, which was really supposed to be NaN but that would have required |
|
277 new UnicodeData.txt syntax, can already be represented as a "fraction" of -1/1, |
|
278 but encodeNumericValue() in corepropsbuilder.cpp had to be fixed. |
|
279 cp;12456;na=CUNEIFORM NUMERIC SIGN NIGIDAMIN;nv=-1 |
|
280 cp;12457;na=CUNEIFORM NUMERIC SIGN NIGIDAESH;nv=-1 |
|
281 The two new values 216000 and 432000 require an addition to the encoding of numeric values. |
|
282 cp;12432;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS DISH;nv=216000 |
|
283 cp;12433;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS MIN;nv=432000 |
|
284 -> uprops.h, uchar.c & UCharacterProperty.java |
|
285 -> cucdtst.c & UCharacterTest.java |
|
286 |
|
287 * generate normalization data files |
|
288 - ~/svn.icu/uni62/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni62/dbg/lib |
|
289 - ~/svn.icu/uni62/dbg$ SRC_DATA_IN=~/svn.icu/uni62/src/source/data/in |
|
290 - ~/svn.icu/uni62/dbg$ UNIDATA=~/svn.icu/uni62/src/source/data/unidata |
|
291 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt |
|
292 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt |
|
293 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt |
|
294 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt |
|
295 |
|
296 * build ICU (make install) |
|
297 so that the tools build can pick up the new definitions from the installed header files. |
|
298 * build Unicode tools using CMake+make |
|
299 |
|
300 * generate core properties data files |
|
301 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/uni62/src |
|
302 - in initial bootstrapping, change the UCA version |
|
303 in source/data/unidata/FractionalUCA.txt to match the new Unicode version |
|
304 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/uni62/dbg/data/out/build/icudt50l ~/svn.icu/uni62/src |
|
305 - rebuild ICU (make install) & tools |
|
306 + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR, |
|
307 check if the UCA version in FractionalUCA.txt matches the new Unicode version |
|
308 (see step above) |
|
309 - run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm |
|
310 - rebuild ICU (make install) & tools |
|
311 |
|
312 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to |
|
313 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar) |
|
314 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters |
|
315 - Unicode 6.0..6.2: U+2260, U+226E, U+226F |
|
316 - nothing new in 6.2, no test file to update |
|
317 |
|
318 * update Java data files |
|
319 - refresh just the UCD-related files, just to be safe |
|
320 - see (ICU4C)/source/data/icu4j-readme.txt |
|
321 - mkdir /tmp/icu4j |
|
322 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install |
|
323 output: |
|
324 ... |
|
325 Unicode .icu files built to ./out/build/icudt50l |
|
326 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt50b |
|
327 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b |
|
328 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt |
|
329 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt50l.dat ./out/icu4j/icudt50b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt50l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt50b |
|
330 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b" |
|
331 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt50b/ |
|
332 mkdir -p /tmp/icu4j/main/shared/data |
|
333 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data |
|
334 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt50b/ |
|
335 mkdir -p /tmp/icu4j/main/shared/data |
|
336 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data |
|
337 make[1]: Leaving directory `/home/mscherer/svn.icu/uni62/dbg/data' |
|
338 - copy the big-endian Unicode data files to another location, |
|
339 separate from the other data files |
|
340 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll |
|
341 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr |
|
342 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b |
|
343 ~/svn.icu/uni62/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/cnvalias.icu |
|
344 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b |
|
345 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll |
|
346 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr |
|
347 - refresh ICU4J |
|
348 ~/svn.icu/uni62/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b |
|
349 |
|
350 * refresh Java test .txt files |
|
351 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode |
|
352 |
|
353 * UCA |
|
354 |
|
355 - get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/ |
|
356 - CLDR root files for ICU are in CollationAuxiliary.zip; unpack that |
|
357 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt |
|
358 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt |
|
359 (note removing the underscore before "Rules") |
|
360 - update (ICU4C)/source/test/testdata/CollationTest_*.txt |
|
361 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt |
|
362 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt) |
|
363 - check test file diffs for previously commented-out, known-failing data lines; |
|
364 probably need to keep those commented out |
|
365 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani |
|
366 - run genuca, see command line above |
|
367 - rebuild ICU4C |
|
368 - refresh ICU4J collation data: |
|
369 (subset of instructions above for properties data refresh, except copies all coll/*) |
|
370 ~/svn.icu/uni62/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install |
|
371 ~/svn.icu/uni62/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll |
|
372 ~/svn.icu/uni62/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll |
|
373 ~/svn.icu/uni62/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b |
|
374 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging) |
|
375 - note on intltest: if collate/UCAConformanceTest fails, then |
|
376 utility/MultithreadTest/TestCollators will fail as well; |
|
377 fix the conformance test before looking into the multi-thread test |
|
378 |
|
379 * test ICU, fix test code where necessary |
|
380 |
|
381 * When refreshing all of ICU4J data from ICU4C |
|
382 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install |
|
383 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data |
|
384 or |
|
385 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install |
|
386 |
|
387 *** LayoutEngine script information |
|
388 - skipped for Unicode 6.2: no new scripts |
|
389 |
|
390 *** merge the Unicode update branches back onto the trunk |
|
391 - do not merge the icudata.jar and testdata.jar, |
|
392 instead rebuild them from merged & tested ICU4C |
|
393 |
|
394 ---------------------------------------------------------------------------- *** |
|
395 |
|
396 Future Unicode update |
|
397 |
|
398 Tools simplified since the Unicode 6.1 update. See |
|
399 - http://site.icu-project.org/design/props/ppucd |
|
400 - http://bugs.icu-project.org/trac/wiki/Markus/ReviewTicket8972 |
|
401 |
|
402 * Unicode version numbers |
|
403 - icutools/unicode/makedefs.sh was deleted, so one fewer place for version & path updates |
|
404 |
|
405 * file preparation |
|
406 - ucdcopy.py, idna2nrm.py and genpname/preparse.pl replaced by preparseucd.py: |
|
407 - ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni61/20120118 ~/svn.icu/trunk/src ~/svn.icu/tools/trunk/src |
|
408 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders. |
|
409 - Check test file diffs for previously commented-out, known-failing data lines; |
|
410 probably need to keep those commented out. |
|
411 |
|
412 * PropertyValueAliases.txt changes |
|
413 - Script codes that are in ISO 15924 but not in Unicode are now listed in |
|
414 preparseucd.py, in the _scripts_only_in_iso15924 variable. |
|
415 If there are new ISO codes, then add them. |
|
416 If Unicode adds some of them, then remove them from the .py variable. |
|
417 |
|
418 * UnicodeData.txt changes |
|
419 - No more manual changes for CJK ranges for algorithmic names; |
|
420 those are now written to ppucd.txt and genprops reads them from there. |
|
421 |
|
422 * generate core properties data files (makeprops.sh was deleted) |
|
423 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/trunk/src |
|
424 |
|
425 * no more manual updates of source/data/unidata/norm2/nfkc_cf.txt |
|
426 - it is now generated by preparseucd.py |
|
427 |
|
428 * no more separate idna2nrm.py run and manual copying to generate source/data/unidata/norm2/uts46.txt |
|
429 - it is now generated by preparseucd.py |
|
430 - make sure that the Unicode data folder passed into preparseucd.py |
|
431 includes a copy of http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt |
|
432 (can be in some subfolder) |
|
433 |
|
434 * generate normalization data files |
|
435 - ~/svn.icu/trunk/dbg$ export LD_LIBRARY_PATH=~/svn.icu/trunk/dbg/lib |
|
436 - ~/svn.icu/trunk/dbg$ SRC_DATA_IN=~/svn.icu/trunk/src/source/data/in |
|
437 - ~/svn.icu/trunk/dbg$ UNIDATA=~/svn.icu/trunk/src/source/data/unidata |
|
438 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt |
|
439 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt |
|
440 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt |
|
441 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt |
|
442 |
|
443 * build ICU (make install) |
|
444 * build Unicode tools using CMake+make |
|
445 |
|
446 * new way to call genuca (makeuca.sh was deleted) |
|
447 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/trunk/dbg/data/out/build/icudt49l ~/svn.icu/trunk/src |
|
448 |
|
449 ---------------------------------------------------------------------------- *** |
|
450 |
|
451 Unicode 6.1 update |
|
452 |
|
453 *** ICU Trac |
|
454 |
|
455 - ticket 8995 final update to Unicode 6.1 |
|
456 - ticket 8994 regenerate source/layout/CanonData.cpp |
|
457 |
|
458 - ticket 8961 support Unicode "Age" value *names* |
|
459 - ticket 8963 support multiple character name aliases & types |
|
460 |
|
461 - ticket 8827 "update ICU to Unicode 6.1" |
|
462 - C++ branches/markus/uni61 at r30864 from trunk at r30843 |
|
463 - Java branches/markus/uni61 at r30865 from trunk at r30863 |
|
464 |
|
465 *** Unicode version numbers |
|
466 - makedata.mak |
|
467 - uchar.h |
|
468 (configure.in & configure: have been modified to extract the version from uchar.h) |
|
469 - com.ibm.icu.util.VersionInfo |
|
470 - icutools/unicode/makedefs.sh |
|
471 + also review & update other definitions in that file, |
|
472 e.g. the ICU version in this path: BLD_DATA_FILES=$ICU_BLD/data/out/build/icudt49l |
|
473 |
|
474 *** data files & enums & parser code |
|
475 |
|
476 * file preparation |
|
477 |
|
478 ~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni61/20111205/ucd ~/uni61/processed |
|
479 - This prepares both unidata and testdata files in respective output subfolders. |
|
480 - Check test file diffs for previously commented-out, known-failing data lines; |
|
481 probably need to keep those commented out. |
|
482 |
|
483 * PropertyValueAliases.txt changes |
|
484 - 11 new block names: |
|
485 Arabic_Extended_A |
|
486 Arabic_Mathematical_Alphabetic_Symbols |
|
487 Chakma |
|
488 Meetei_Mayek_Extensions |
|
489 Meroitic_Cursive |
|
490 Meroitic_Hieroglyphs |
|
491 Miao |
|
492 Sharada |
|
493 Sora_Sompeng |
|
494 Sundanese_Supplement |
|
495 Takri |
|
496 -> add to uchar.h |
|
497 -> add to UCharacter.UnicodeBlock IDs |
|
498 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+) |
|
499 replace public static final int \1_ID = \2; \3 |
|
500 -> add to UCharacter.UnicodeBlock objects |
|
501 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+) |
|
502 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2 |
|
503 - 1 new Joining_Group (jg) value: |
|
504 Rohingya_Yeh |
|
505 -> uchar.h & UCharacter.JoiningGroup |
|
506 - 2 new Line_Break (lb) values: |
|
507 CJ=Conditional_Japanese_Starter |
|
508 HL=Hebrew_Letter |
|
509 -> uchar.h & UCharacter.LineBreak |
|
510 - 7 new scripts: |
|
511 sc ; Cakm ; Chakma |
|
512 sc ; Merc ; Meroitic_Cursive |
|
513 sc ; Mero ; Meroitic_Hieroglyphs |
|
514 sc ; Plrd ; Miao |
|
515 sc ; Shrd ; Sharada |
|
516 sc ; Sora ; Sora_Sompeng |
|
517 sc ; Takr ; Takri |
|
518 -> remove these from SyntheticPropertyValueAliases.txt |
|
519 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI() |
|
520 and in com.ibm.icu.dev.test.lang.TestUScript.java |
|
521 - 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html |
|
522 (added 2011-06-21) |
|
523 Khoj 322 Khojki |
|
524 Tirh 326 Tirhuta |
|
525 and another one added 2011-12-09 |
|
526 Hluw 080 Anatolian Hieroglyphs (Luwian Hieroglyphs, Hittite Hieroglyphs) |
|
527 -> uscript.h |
|
528 -> com.ibm.icu.lang.UScript |
|
529 find USCRIPT_([^ ]+) *= ([0-9]+),(.+) |
|
530 replace public static final int \1 = \2;\3 |
|
531 -> SyntheticPropertyValueAliases.txt |
|
532 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI() |
|
533 and in com.ibm.icu.dev.test.lang.TestUScript.java |
|
534 |
|
535 * UnicodeData.txt changes |
|
536 - the last Unihan code point changes from U+9FCB to U+9FCC |
|
537 search for both 9FCB (end) and 9FCC (limit) (regex 9FC[BC], case-insensitive) |
|
538 + do change gennames.c |
|
539 + do change swapCJK() in ucol.cpp & ImplicitCEGenerator.java |
|
540 |
|
541 * DerivedBidiClass.txt changes |
|
542 - 2 new default-AL blocks: |
|
543 # Arabic Extended-A: U+08A0 - U+08FF (was default-R) |
|
544 # Arabic Mathematical Alphabetic Symbols: |
|
545 # U+1EE00 - U+1EEFF (was default-R) |
|
546 - 2 new default-R blocks: |
|
547 # Meroitic Hieroglyphs: |
|
548 # U+10980 - U+1099F |
|
549 # Meroitic Cursive: U+109A0 - U+109FF |
|
550 -> should be picked up by the explicit data in the file |
|
551 |
|
552 * NameAliases.txt changes |
|
553 - from |
|
554 # Each line has two fields |
|
555 # First field: Code point |
|
556 # Second field: Alias |
|
557 - to |
|
558 # Each line has three fields, as described here: |
|
559 # |
|
560 # First field: Code point |
|
561 # Second field: Alias |
|
562 # Third field: Type |
|
563 - Also, the file previously allowed multiple aliases but only now does it |
|
564 actually provide multiple, even multiple of the same type. For example, |
|
565 FEFF;BYTE ORDER MARK;alternate |
|
566 FEFF;BOM;abbreviation |
|
567 FEFF;ZWNBSP;abbreviation |
|
568 - This breaks our gennames parser, unames.icu data structure, and API. |
|
569 Fix gennames to only pick up "correction" aliases. |
|
570 New ticket #8963 for further changes. |
|
571 |
|
572 * run genpname/preparse.pl (on Linux) |
|
573 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname |
|
574 + make sure that data.h is writable |
|
575 + perl preparse.pl ~/svn.icu/trunk/src > out.txt |
|
576 + preparse.pl shows no errors, out.txt Info and Warning lines look ok |
|
577 |
|
578 * build ICU (make install) |
|
579 so that the tools build can pick up the new definitions from the installed header files. |
|
580 * build Unicode tools (at least genpname) using CMake+make |
|
581 |
|
582 * run genpname |
|
583 (builds both pnames.icu and propname_data.h) |
|
584 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in |
|
585 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource |
|
586 |
|
587 * build ICU (make install) |
|
588 * build Unicode tools using CMake+make |
|
589 |
|
590 * update source/data/unidata/norm2/nfkc_cf.txt |
|
591 - follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt |
|
592 |
|
593 * update source/data/unidata/norm2/uts46.txt |
|
594 - download http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt |
|
595 to ~/svn.icu/tools/trunk/src/unicode/py |
|
596 - adjust idna2nrm.py to remove "; NV8": For UTS #46, we do not care about "not valid in IDNA2008". |
|
597 - ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py |
|
598 - ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2 |
|
599 |
|
600 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to |
|
601 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar) |
|
602 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters |
|
603 - Unicode 6.0..6.1: U+2260, U+226E, U+226F |
|
604 - nothing new in 6.1, no test file to update |
|
605 |
|
606 * generate core properties data files |
|
607 - in initial bootstrapping, change the UCA version |
|
608 in source/data/unidata/FractionalUCA.txt to match the new Unicode version |
|
609 - ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld |
|
610 - rebuild ICU & tools |
|
611 + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR, |
|
612 check if the UCA version in FractionalUCA.txt matches the new Unicode version |
|
613 (see step above) |
|
614 - run makeuca.sh so that genuca picks up the new case mappings and nfc.nrm: |
|
615 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld |
|
616 - rebuild ICU & tools |
|
617 |
|
618 * update Java data files |
|
619 - refresh just the UCD-related files, just to be safe |
|
620 - see (ICU4C)/source/data/icu4j-readme.txt |
|
621 - mkdir /tmp/icu4j |
|
622 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install |
|
623 output: |
|
624 ... |
|
625 Unicode .icu files built to ./out/build/icudt49l |
|
626 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt49b |
|
627 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b |
|
628 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt |
|
629 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt49l.dat ./out/icu4j/icudt49b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt49l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt49b |
|
630 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b" |
|
631 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt49b/ |
|
632 mkdir -p /tmp/icu4j/main/shared/data |
|
633 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data |
|
634 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt49b/ |
|
635 mkdir -p /tmp/icu4j/main/shared/data |
|
636 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data |
|
637 make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/bld/data' |
|
638 - copy the big-endian Unicode data files to another location, |
|
639 separate from the other data files |
|
640 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll |
|
641 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr |
|
642 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b |
|
643 ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/cnvalias.icu |
|
644 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b |
|
645 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll |
|
646 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr |
|
647 - refresh ICU4J |
|
648 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b |
|
649 |
|
650 * refresh Java test .txt files |
|
651 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode |
|
652 |
|
653 * test ICU so far, fix test code where necessary |
|
654 - temporarily ignore collation issues that look like UCA/UCD mismatches, |
|
655 until UCA data is updated |
|
656 |
|
657 * UCA |
|
658 |
|
659 - get output from Mark's tools; look in |
|
660 http://www.unicode.org/Public/UCA/6.1.0/CollationAuxiliary-<dev. version>.txt |
|
661 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt |
|
662 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt |
|
663 (note removing the underscore before "Rules") |
|
664 - update (ICU)/source/test/testdata/CollationTest_*.txt |
|
665 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt |
|
666 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt) |
|
667 - check test file diffs for previously commented-out, known-failing data lines; |
|
668 probably need to keep those commented out |
|
669 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani |
|
670 - run makeuca.sh: |
|
671 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld |
|
672 - rebuild ICU4C |
|
673 - refresh ICU4J collation data: |
|
674 (subset of instructions above for properties data refresh, except copies all coll/*) |
|
675 ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install |
|
676 ~/svn.icu/trunk/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll |
|
677 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll |
|
678 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b |
|
679 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging) |
|
680 - note on intltest: if collate/UCAConformanceTest fails, then |
|
681 utility/MultithreadTest/TestCollators will fail as well; |
|
682 fix the conformance test before looking into the multi-thread test |
|
683 |
|
684 * When refreshing all of ICU4J data from ICU4C |
|
685 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install |
|
686 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data |
|
687 or |
|
688 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install |
|
689 |
|
690 *** LayoutEngine script information |
|
691 |
|
692 (For details see the Unicode 5.2 change log below.) |
|
693 |
|
694 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder. |
|
695 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp |
|
696 in the working directory. |
|
697 (It also generates ScriptRunData.cpp, which is no longer needed.) |
|
698 |
|
699 The generated files have a current copyright date and "@draft" statement. |
|
700 |
|
701 - diff current <icu>/source/layout files vs. generated ones |
|
702 ~/svn.icu4j/trunk/src$ kdiff3 ~/svn.icu/trunk/src/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout |
|
703 review and manually merge desired changes; |
|
704 fix gratuitous changes, incorrect @draft and missing aliases; |
|
705 Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc. |
|
706 - if you just copy the above files, then |
|
707 fix mixed line endings, review the diffs as above and restore changes to API tags etc.; |
|
708 manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h |
|
709 |
|
710 *** merge the Unicode update branches back onto the trunk |
|
711 - do not merge the icudata.jar and testdata.jar, |
|
712 instead rebuild them from merged & tested ICU4C |
|
713 |
|
714 ---------------------------------------------------------------------------- *** |
|
715 |
|
716 ICU 4.8 (no Unicode update, just new script codes) |
|
717 |
|
718 * 9 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html |
|
719 (added 2010-12-21) |
|
720 Afak 439 Afaka |
|
721 Jurc 510 Jurchen |
|
722 Mroo 199 Mro, Mru |
|
723 Nshu 499 Nüshu |
|
724 Shrd 319 Sharada, Śāradā |
|
725 Sora 398 Sora Sompeng |
|
726 Takr 321 Takri, Ṭākrī, Ṭāṅkrī |
|
727 Tang 520 Tangut |
|
728 Wole 480 Woleai |
|
729 -> uscript.h |
|
730 -> com.ibm.icu.lang.UScript |
|
731 find USCRIPT_([^ ]+) *= ([0-9]+),(.+) |
|
732 replace public static final int \1 = \2;\3 |
|
733 -> genpname/SyntheticPropertyValueAliases.txt |
|
734 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI() |
|
735 and in com.ibm.icu.dev.test.lang.TestUScript.java |
|
736 |
|
737 * run genpname/preparse.pl (on Linux) |
|
738 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname |
|
739 + make sure that data.h is writable |
|
740 + perl preparse.pl ~/svn.icu/trunk/src > out.txt |
|
741 + preparse.pl shows no errors, out.txt Info and Warning lines look ok |
|
742 |
|
743 * rebuild Unicode tools (at least genpname) using make |
|
744 - You might first need to "make install" ICU so that the tools build can pick |
|
745 up the new definitions from the installed header files. |
|
746 |
|
747 * run genpname |
|
748 (builds both pnames.icu and propname_data.h) |
|
749 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in |
|
750 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource |
|
751 - rebuild ICU & tools |
|
752 |
|
753 * run genprops |
|
754 - ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/data/in -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0 |
|
755 - ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/common --csource -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0 |
|
756 - rebuild ICU & tools |
|
757 |
|
758 * update Java data files |
|
759 - refresh just the UCD-related files, just to be safe |
|
760 - see (ICU4C)/source/data/icu4j-readme.txt |
|
761 - mkdir /tmp/icu4j |
|
762 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install |
|
763 - copy the big-endian Unicode data files to another location, |
|
764 separate from the other data files |
|
765 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt48b |
|
766 ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b |
|
767 ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/uprops.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b |
|
768 - refresh ICU4J |
|
769 ~/svn.icu/trunk/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt48b |
|
770 |
|
771 * should have updated the layout engine script codes but forgot |
|
772 |
|
773 ---------------------------------------------------------------------------- *** |
|
774 |
|
775 Unicode 6.0 update |
|
776 |
|
777 *** related ICU Trac tickets |
|
778 |
|
779 7264 Unicode 6.0 Update |
|
780 |
|
781 *** Unicode version numbers |
|
782 - makedata.mak |
|
783 - uchar.h |
|
784 (configure.in & configure: have been modified to extract the version from uchar.h) |
|
785 - com.ibm.icu.util.VersionInfo |
|
786 |
|
787 *** data files & enums & parser code |
|
788 |
|
789 * file preparation |
|
790 |
|
791 ~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/processed |
|
792 - This now prepares both unidata and testdata files in respective output subfolders. |
|
793 |
|
794 * PropertyAliases.txt changes |
|
795 - new Script_Extensions property defined in the new ScriptExtensions.txt file |
|
796 but not listed in PropertyAliases.txt; reported to unicode.org; |
|
797 -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt |
|
798 scx; Script_Extensions |
|
799 -> uchar.h with new UProperty section |
|
800 -> com.ibm.icu.lang.UProperty, parallel with uchar.h |
|
801 |
|
802 * PropertyValueAliases.txt changes |
|
803 - 12 new block names: |
|
804 Alchemical_Symbols |
|
805 Bamum_Supplement |
|
806 Batak |
|
807 Brahmi |
|
808 CJK_Unified_Ideographs_Extension_D |
|
809 Emoticons |
|
810 Ethiopic_Extended_A |
|
811 Kana_Supplement |
|
812 Mandaic |
|
813 Miscellaneous_Symbols_And_Pictographs |
|
814 Playing_Cards |
|
815 Transport_And_Map_Symbols |
|
816 -> add to uchar.h |
|
817 -> add to UCharacter.UnicodeBlock |
|
818 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+) |
|
819 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2 |
|
820 - Joining_Group (jg) values: |
|
821 Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal which becomes an alias |
|
822 -> uchar.h & UCharacter.JoiningGroup |
|
823 - 3 new scripts: |
|
824 sc ; Batk ; Batak |
|
825 sc ; Brah ; Brahmi |
|
826 sc ; Mand ; Mandaic |
|
827 -> remove these from SyntheticPropertyValueAliases.txt |
|
828 -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN |
|
829 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI() |
|
830 and in com.ibm.icu.dev.test.lang.TestUScript.java |
|
831 - 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html |
|
832 (added 2009-11-11..2010-07-18) |
|
833 Bass 259 Bassa Vah |
|
834 Dupl 755 Duployan shortand |
|
835 Elba 226 Elbasan |
|
836 Gran 343 Grantha |
|
837 Kpel 436 Kpelle |
|
838 Loma 437 Loma |
|
839 Mend 438 Mende |
|
840 Merc 101 Meroitic Cursive |
|
841 Narb 106 Old North Arabian |
|
842 Nbat 159 Nabataean |
|
843 Palm 126 Palmyrene |
|
844 Sind 318 Sindhi |
|
845 Wara 262 Warang Citi |
|
846 -> uscript.h |
|
847 -> com.ibm.icu.lang.UScript |
|
848 find USCRIPT_([^ ]+) *= ([0-9]+),(.+) |
|
849 replace public static final int \1 = \2;\3 |
|
850 -> SyntheticPropertyValueAliases.txt |
|
851 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI() |
|
852 and in com.ibm.icu.dev.test.lang.TestUScript.java |
|
853 - ISO 15924 name change |
|
854 Mero 100 Meroitic Hieroglyphs (was Meroitic) |
|
855 -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC |
|
856 - property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt |
|
857 |
|
858 * UnicodeData.txt changes |
|
859 - new CJK block: |
|
860 2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;; |
|
861 2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;; |
|
862 -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion |
|
863 |
|
864 * build Unicode tools using CMake+make |
|
865 |
|
866 * run genpname/preparse.pl (on Linux) |
|
867 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname |
|
868 + make sure that data.h is writable |
|
869 + perl preparse.pl ~/svn.icu/trunk/src > out.txt |
|
870 + preparse.pl shows no errors, out.txt Info and Warning lines look ok |
|
871 |
|
872 * rebuild Unicode tools (at least genpname) using make |
|
873 - You might first need to "make install" ICU so that the tools build can pick |
|
874 up the new definitions from the installed header files. |
|
875 |
|
876 * run genpname |
|
877 - ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in |
|
878 - rebuild ICU & tools |
|
879 |
|
880 * update source/data/unidata/norm2/nfkc_cf.txt |
|
881 - follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt |
|
882 |
|
883 * update source/data/unidata/norm2/uts46.txt |
|
884 - download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt |
|
885 to ~/svn.icu/tools/trunk/src/unicode/py |
|
886 - adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values |
|
887 - ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py |
|
888 - ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2 |
|
889 |
|
890 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to |
|
891 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar) |
|
892 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters |
|
893 - Unicode 6.0: U+2260, U+226E, U+226F |
|
894 |
|
895 * generate core properties data files |
|
896 - ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld |
|
897 - rebuild ICU & tools |
|
898 - run makeuca.sh so that genuca picks up the new nfc.nrm: |
|
899 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld |
|
900 - rebuild ICU & tools |
|
901 |
|
902 * implement new Script_Extensions property (provisional) |
|
903 - parser & generator: genprops & uprops.icu |
|
904 - uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/usettest.cpp |
|
905 - UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java |
|
906 |
|
907 * switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2 |
|
908 - (one-time change) |
|
909 - genbidi/gencase/genprops tools changes |
|
910 - re-run makeprops.sh (see above) |
|
911 - UCharacterProperty.java, UCharacterTypeIterator.java, |
|
912 UBiDiProps.java, UCaseProps.java, and several others with minor changes; |
|
913 UCharacterPropertyReader.java deleted and its code folded into UCharacterProperty.java |
|
914 |
|
915 * update Java data files |
|
916 - refresh just the UCD-related files, just to be safe |
|
917 - see (ICU4C)/source/data/icu4j-readme.txt |
|
918 - mkdir /tmp/icu4j |
|
919 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install |
|
920 output: |
|
921 ... |
|
922 Unicode .icu files built to ./out/build/icudt45l |
|
923 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b |
|
924 echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt |
|
925 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b |
|
926 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b |
|
927 mkdir -p /tmp/icu4j/main/shared/data |
|
928 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data |
|
929 - copy the big-endian Unicode data files to another location, |
|
930 separate from the other data files |
|
931 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll |
|
932 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr |
|
933 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b |
|
934 ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/cnvalias.icu |
|
935 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b |
|
936 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll |
|
937 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr |
|
938 - refresh ICU4J |
|
939 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b |
|
940 |
|
941 * refresh Java test .txt files |
|
942 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode |
|
943 |
|
944 * un-hardcode normalization skippable (NF*_Inert) test data |
|
945 - removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools |
|
946 |
|
947 * copy updated break iterator test files |
|
948 - now handled by early ucdcopy.py and |
|
949 copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/testdata |
|
950 (old instructions: |
|
951 copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt |
|
952 to ~/svn.icu/trunk/src/source/test/testdata) |
|
953 - they are not used in ICU4J |
|
954 |
|
955 * UCA |
|
956 |
|
957 - get output from Mark's tools; look in |
|
958 http://www.unicode.org/~book/incoming/mark/uca6.0.0/ |
|
959 http://www.macchiato.com/unicode/utc/additional-uca-files |
|
960 http://www.unicode.org/Public/UCA/6.0.0/ |
|
961 http://www.unicode.org/~mdavis/uca/ |
|
962 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt |
|
963 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt |
|
964 - update Han-implicit ranges for new CJK extensions: |
|
965 swapCJK() in ucol.cpp & ImplicitCEGenerator.java |
|
966 - genuca: allow bytes 02 for U+FFFE, new merge-sort character; |
|
967 do not add it into invuca so that tailoring primary-after an ignorable works |
|
968 - genuca: permit space between [variable top] bytes |
|
969 - ucol.cpp: treat noncharacters like unassigned rather than ignorable |
|
970 - run makeuca.sh: |
|
971 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld |
|
972 - rebuild ICU4C |
|
973 - refresh ICU4J collation data: |
|
974 (subset of instructions above for properties data refresh, except copies all coll/*) |
|
975 ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install |
|
976 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll |
|
977 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll |
|
978 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b |
|
979 - update (ICU)/source/test/testdata/CollationTest_*.txt |
|
980 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt |
|
981 with output from Mark's Unicode tools |
|
982 - run all tests with the *_SHORT.txt or the full files (the full ones have comments) |
|
983 - note on intltest: if collate/UCAConformanceTest fails, then |
|
984 utility/MultithreadTest/TestCollators will fail as well; |
|
985 fix the conformance test before looking into the multi-thread test |
|
986 |
|
987 * When refreshing all of ICU4J data from ICU4C |
|
988 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install |
|
989 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data |
|
990 or |
|
991 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install |
|
992 |
|
993 *** LayoutEngine script information |
|
994 |
|
995 (For details see the Unicode 5.2 change log below.) |
|
996 |
|
997 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h, |
|
998 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates |
|
999 ScriptRunData.cpp, which is no longer needed.) |
|
1000 |
|
1001 The generated files have a current copyright date and "@draft" statement. |
|
1002 |
|
1003 * copy the above files into <icu>/source/layout, replacing the old files. |
|
1004 * fix mixed line endings |
|
1005 * review the diffs and fix incorrect @draft and missing aliases; |
|
1006 Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc. |
|
1007 * manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h |
|
1008 |
|
1009 ---------------------------------------------------------------------------- *** |
|
1010 |
|
1011 Unicode 5.2 update |
|
1012 |
|
1013 *** related ICU Trac tickets |
|
1014 |
|
1015 7084 Unicode 5.2 |
|
1016 |
|
1017 7167 verify collation bytes |
|
1018 7235 Java test NAME_ALIAS |
|
1019 7236 Java DerivedCoreProperties.txt test |
|
1020 7237 Java BidiTest.txt |
|
1021 7238 UTrie2 in core unidata |
|
1022 7239 test for tailoring gaps |
|
1023 7240 Java fix CollationMiscTest |
|
1024 7243 update layout engine for Unicode 5.2 |
|
1025 |
|
1026 *** Unicode version numbers |
|
1027 - makedata.mak |
|
1028 - uchar.h |
|
1029 - configure.in & configure |
|
1030 - update ucdVersion in gennames.c if an algorithmic range changes |
|
1031 |
|
1032 *** data files & enums & parser code |
|
1033 |
|
1034 * file preparation |
|
1035 |
|
1036 python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata |
|
1037 - includes finding files regardless of version numbers, |
|
1038 copying them, and performing the equivalent processing of the |
|
1039 ucdstrip and ucdmerge tools on the desired set of files |
|
1040 |
|
1041 * notes on changes |
|
1042 - PropertyAliases.txt |
|
1043 moved from numeric to enumerated: |
|
1044 ccc ; Canonical_Combining_Class |
|
1045 new string properties: |
|
1046 NFKC_CF ; NFKC_Casefold |
|
1047 Name_Alias; Name_Alias |
|
1048 new binary properties: |
|
1049 Cased ; Cased |
|
1050 CI ; Case_Ignorable |
|
1051 CWCF ; Changes_When_Casefolded |
|
1052 CWCM ; Changes_When_Casemapped |
|
1053 CWKCF ; Changes_When_NFKC_Casefolded |
|
1054 CWL ; Changes_When_Lowercased |
|
1055 CWT ; Changes_When_Titlecased |
|
1056 CWU ; Changes_When_Uppercased |
|
1057 new CJK Unihan properties (not supported by ICU) |
|
1058 - PropertyValueAliases.txt |
|
1059 new block names |
|
1060 new scripts |
|
1061 one script code change: |
|
1062 sc ; Qaai ; Inherited |
|
1063 -> |
|
1064 sc ; Zinh ; Inherited ; Qaai |
|
1065 new Line_Break (lb) value: |
|
1066 lb ; CP ; Close_Parenthesis |
|
1067 new Joining_Group (jg) values: Farsi_Yeh, Nya |
|
1068 other new values: |
|
1069 ccc; 214; ATA ; Attached_Above |
|
1070 - DerivedBidiClass.txt |
|
1071 new default-R range: U+1E800 - U+1EFFF |
|
1072 - UnicodeData.txt |
|
1073 all of the ISO comments are gone |
|
1074 new CJK block end: |
|
1075 9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last> |
|
1076 new CJK block: |
|
1077 2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;; |
|
1078 2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;; |
|
1079 |
|
1080 * genpname |
|
1081 - run preparse.pl |
|
1082 + cd \svn\icuproj\icu\trunk\source\tools\genpname |
|
1083 + make sure that data.h is writable |
|
1084 + perl preparse.pl \svn\icuproj\icu\trunk > out.txt |
|
1085 + preparse.pl complains with errors like the following: |
|
1086 Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34. |
|
1087 This is because ICU 4.0 had scripts from ISO 15924 which are now |
|
1088 added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt |
|
1089 and PropertyValueAliases.txt. |
|
1090 -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt: |
|
1091 Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt |
|
1092 + preparse.pl complains with errors about block names missing from uchar.h; add them |
|
1093 |
|
1094 * uchar.h & uscript.h & uprops.h & uprops.c & genprops |
|
1095 - new block & script values |
|
1096 + 26 new blocks |
|
1097 copy new blocks from Blocks.txt |
|
1098 MS VC++ 2008 regular expression: |
|
1099 find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$" |
|
1100 replace with " UBLOCK_\3 = 172, /*[\1]*/" |
|
1101 + several new script values already added in ICU 4.0 for ISO 15924 coverage |
|
1102 (removed from SyntheticPropertyValueAliases.txt, see genpname notes above) |
|
1103 + 3 new script values added for ISO 15924 and Unicode 5.2 coverage |
|
1104 + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2) |
|
1105 (added to SyntheticPropertyValueAliases.txt) |
|
1106 - new Joining Group (JG) values: Farsi_Yeh, Nya |
|
1107 - new Line_Break (lb) value: |
|
1108 lb ; CP ; Close_Parenthesis |
|
1109 |
|
1110 * hardcoded Unihan range end/limit |
|
1111 - Unihan range end moves from 9FC3 to 9FCB |
|
1112 search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive) |
|
1113 + do change gennames.c |
|
1114 |
|
1115 * Compare definitions of new binary properties with what we used to use |
|
1116 in algorithms, to see if the definitions changed. |
|
1117 - Verified that definitions for Cased and Case_Ignorable are unchanged. |
|
1118 The gencase tool now parses the newly public Case_Ignorable values |
|
1119 in case the definition changes in the future. |
|
1120 |
|
1121 * uchar.c & uprops.h & uprops.c & genprops |
|
1122 - new numeric values that didn't exist in Unicode data before: |
|
1123 1/7, 1/9, 1/10, 3/10, 1/16, 3/16 |
|
1124 the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5, |
|
1125 therefore redesign the encoding of numeric types and values for formatVersion 6; |
|
1126 design for simple numbers up to at least 144 ("one gross"), |
|
1127 large values up to at least 10^20, |
|
1128 and fractions with numerators -1..17 and denominators 1..16 |
|
1129 to cover current and expected future values |
|
1130 (e.g., more Han numeric values, Meroitic twelfths) |
|
1131 |
|
1132 * reimplement Hangul_Syllable_Type for new Jamo characters |
|
1133 - the old code assumed that all Jamo characters are in the 11xx block |
|
1134 - Unicode 5.2 fills holes there and adds new Jamo characters in |
|
1135 A960..A97F; Hangul Jamo Extended-A |
|
1136 and in |
|
1137 D7B0..D7FF; Hangul Jamo Extended-B |
|
1138 - Hangul_Syllable_Type can be trivially derived from a subset of |
|
1139 Grapheme_Cluster_Break values |
|
1140 |
|
1141 * build Unicode data source code for hardcoding core data |
|
1142 C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data |
|
1143 |
|
1144 ICU data make path is \svn\icuproj\icu\trunk\source\data\ |
|
1145 ICU root path is \svn\icuproj\icu\trunk |
|
1146 Information: cannot find "ucmlocal.mk". Not building user-additional converter files. |
|
1147 Information: cannot find "brklocal.mk". Not building user-additional break iterator files. |
|
1148 Information: cannot find "reslocal.mk". Not building user-additional resource bundle files. |
|
1149 Information: cannot find "collocal.mk". Not building user-additional resource bundle files. |
|
1150 Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files. |
|
1151 Information: cannot find "trnslocal.mk". Not building user-additional transliterator files. |
|
1152 Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files. |
|
1153 Information: cannot find "spreplocal.mk". Not building user-additional stringprep files. |
|
1154 Creating data file for Unicode Property Names |
|
1155 Creating data file for Unicode Character Properties |
|
1156 Creating data file for Unicode Case Mapping Properties |
|
1157 Creating data file for Unicode BiDi/Shaping Properties |
|
1158 Creating data file for Unicode Normalization |
|
1159 Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l" |
|
1160 Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp" |
|
1161 |
|
1162 - copy the .c source files to C:\svn\icuproj\icu\trunk\source\common |
|
1163 and rebuild the common library |
|
1164 |
|
1165 *** UCA |
|
1166 |
|
1167 - update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools) |
|
1168 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools |
|
1169 - update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools |
|
1170 [ Begin obsolete instructions: |
|
1171 Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files. |
|
1172 - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py |
|
1173 on Windows: |
|
1174 python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt |
|
1175 python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt |
|
1176 End obsolete instructions] |
|
1177 - run all tests with the *_SHORT.txt or the full files (the full ones have comments) |
|
1178 not just the *_STUB.txt files |
|
1179 - note on intltest: if collate/UCAConformanceTest fails, then |
|
1180 utility/MultithreadTest/TestCollators will fail as well; |
|
1181 fix the conformance test before looking into the multi-thread test |
|
1182 |
|
1183 *** Implement Cased & Case_Ignorable properties |
|
1184 - via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable() |
|
1185 - Problem: These properties should be disjoint, but aren't |
|
1186 - UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not |
|
1187 - change ucase.icu to be able to store any combination of Cased and Case_Ignorable |
|
1188 |
|
1189 *** Implement Changes_When_Xyz properties |
|
1190 - without stored data |
|
1191 |
|
1192 *** Implement Name_Alias property |
|
1193 - add it as another name field in unames.icu |
|
1194 - make it available via u_charName() and UCharNameChoice and |
|
1195 - consider it in u_charFromName() |
|
1196 |
|
1197 *** Break iterators |
|
1198 |
|
1199 * Update break iterator rules to new UAX versions and new property values |
|
1200 * Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary |
|
1201 |
|
1202 *** new BidiTest file |
|
1203 - review format and data |
|
1204 - copy BidiTest.txt to source/test/testdata |
|
1205 - write test code using this data |
|
1206 - fix ICU code where it fails the conformance test |
|
1207 |
|
1208 *** Java |
|
1209 - generally, find and update code corresponding to C/C++ |
|
1210 - UCharacter.UnicodeBlock constants: |
|
1211 a) add an _ID integer per new block, update COUNT |
|
1212 b) add a class instance per new block |
|
1213 Visual Studio regex: |
|
1214 find UBLOCK_{[^ ]+} = [0-9]+, {/.+} |
|
1215 replace with public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2 |
|
1216 - CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias() |
|
1217 |
|
1218 - port test changes to Java |
|
1219 |
|
1220 *** LayoutEngine script information |
|
1221 |
|
1222 (For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833) |
|
1223 |
|
1224 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h, |
|
1225 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates |
|
1226 ScriptRunData.cpp, which is no longer needed.) |
|
1227 |
|
1228 The generated files have a current copyright date and "@draft" statement. |
|
1229 |
|
1230 -> Eric Mader wrote in email on 20090930: |
|
1231 "I think the tool has been modified to update @draft to @stable for |
|
1232 older scripts and to add @draft for new scripts. |
|
1233 (I worked with an intern on this last year.) |
|
1234 You should check the output after you run it." |
|
1235 |
|
1236 * copy the above files into <icu>/source/layout, replacing the old files. |
|
1237 * fix mixed line endings |
|
1238 * review the diffs and fix incorrect @draft and missing aliases |
|
1239 * manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h |
|
1240 |
|
1241 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp |
|
1242 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...) |
|
1243 |
|
1244 -> Eric Mader wrote in email on 20090930: |
|
1245 "This is just a matter of making sure that all the per-script tables have |
|
1246 entries for any new scripts that were added. |
|
1247 If any new Indic characters were added, then the class tables in |
|
1248 IndicClassTables.cpp should be updated to reflect this. |
|
1249 John Emmons should know how to do this if it's required." |
|
1250 |
|
1251 * rebuild the layout and layoutex libraries. |
|
1252 |
|
1253 *** Documentation |
|
1254 - Update User Guide |
|
1255 + Jamo_Short_Name, sfc->scf, binary property value aliases |
|
1256 |
|
1257 ---------------------------------------------------------------------------- *** |
|
1258 |
|
1259 Unicode 5.1 update |
|
1260 |
|
1261 *** related ICU Trac tickets |
|
1262 |
|
1263 5696 Update to Unicode 5.1 |
|
1264 |
|
1265 *** Unicode version numbers |
|
1266 - makedata.mak |
|
1267 - uchar.h |
|
1268 - configure.in & configure |
|
1269 - update ucdVersion in gennames.c if an algorithmic range changes |
|
1270 |
|
1271 *** data files & enums & parser code |
|
1272 |
|
1273 * file preparation |
|
1274 - ucdstrip: |
|
1275 DerivedCoreProperties.txt |
|
1276 DerivedNormalizationProps.txt |
|
1277 NormalizationTest.txt |
|
1278 PropList.txt |
|
1279 Scripts.txt |
|
1280 GraphemeBreakProperty.txt |
|
1281 SentenceBreakProperty.txt |
|
1282 WordBreakProperty.txt |
|
1283 - ucdstrip and ucdmerge: |
|
1284 EastAsianWidth.txt |
|
1285 LineBreak.txt |
|
1286 |
|
1287 * my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers) |
|
1288 copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\ |
|
1289 copy 5.1.0\ucd\Blocks.txt ..\unidata\ |
|
1290 copy 5.1.0\ucd\CaseFolding.txt ..\unidata\ |
|
1291 copy 5.1.0\ucd\DerivedAge.txt ..\unidata\ |
|
1292 copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\ |
|
1293 copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\ |
|
1294 copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\ |
|
1295 copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\ |
|
1296 copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\ |
|
1297 copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\ |
|
1298 copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\ |
|
1299 copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\ |
|
1300 copy 5.1.0\ucd\UnicodeData.txt ..\unidata\ |
|
1301 |
|
1302 ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt |
|
1303 ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt |
|
1304 ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt |
|
1305 ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt |
|
1306 ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt |
|
1307 ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt |
|
1308 ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt |
|
1309 ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt |
|
1310 ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt |
|
1311 ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt |
|
1312 |
|
1313 * genpname |
|
1314 - run preparse.pl |
|
1315 + cd \svn\icuproj\icu\uni51\source\tools\genpname |
|
1316 + make sure that data.h is writable |
|
1317 + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt |
|
1318 + preparse.pl complains with errors like the following: |
|
1319 Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30. |
|
1320 This is because ICU 3.8 had scripts from ISO 15924 which are now |
|
1321 added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt |
|
1322 and PropertyValueAliases.txt. |
|
1323 -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt: |
|
1324 Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii |
|
1325 + PropertyValueAliases.txt now explicitly contains values for boolean properties: |
|
1326 N/Y, No/Yes, F/T, False/True |
|
1327 -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases. |
|
1328 It will use further values from the file if present. |
|
1329 |
|
1330 * uchar.h & uscript.h & uprops.h & uprops.c & genprops |
|
1331 - new block & script values |
|
1332 + 17 new blocks |
|
1333 + 11 new script values already added in ICU 3.8 for ISO 15924 coverage |
|
1334 (removed from SyntheticPropertyValueAliases.txt) |
|
1335 + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1) |
|
1336 (added to SyntheticPropertyValueAliases.txt) |
|
1337 - uprops.icu (uprops.h) only provides 7 bits for script codes. |
|
1338 In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now. |
|
1339 There is none above 127 yet which is the script code for an |
|
1340 assigned Unicode character, so ICU 4.0 uprops.icu does not store any |
|
1341 script code values greater than 127. |
|
1342 However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129 |
|
1343 in a parallel bit field, and that overflows now. |
|
1344 Also, future values >=128 would be incompatible anyway. |
|
1345 uprops.h is modified to move around several of the bit fields |
|
1346 in the properties vector words, and now uses 8 bits for the script code. |
|
1347 Two other bit fields also grow to accommodate future growth: |
|
1348 Block (current count: 172) grows from 8 to 9 bits, |
|
1349 and Word_Break grows from 4 to 5 bits. |
|
1350 - renamed property Simple_Case_Folding (sfc->scf) |
|
1351 + nothing to be done: handled as normal alias |
|
1352 - new property JSN Jamo_Short_Name |
|
1353 + no new API: only contributes to the Name property |
|
1354 - new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark |
|
1355 - new Joining Group (JG) value: Burushashki_Yeh_Barree |
|
1356 - new Sentence_Break (SB) values: |
|
1357 SB ; CR ; CR |
|
1358 SB ; EX ; Extend |
|
1359 SB ; LF ; LF |
|
1360 SB ; SC ; SContinue |
|
1361 - new Word_Break (WB) values: |
|
1362 WB ; CR ; CR |
|
1363 WB ; Extend ; Extend |
|
1364 WB ; LF ; LF |
|
1365 WB ; MB ; MidNumLet |
|
1366 |
|
1367 * Further changes in the 2008-02-29 update: |
|
1368 - Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP |
|
1369 because they should not normally be invisible. |
|
1370 - new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed) |
|
1371 - new Grapheme_Cluster_Break (GCB) value: PP=Prepend |
|
1372 - new Word_Break (WB) value: NL=Newline |
|
1373 |
|
1374 * hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison) |
|
1375 - Unihan range end moves from 9FBB to 9FC3 |
|
1376 search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive) |
|
1377 + do change gennames.c |
|
1378 |
|
1379 * build Unicode data source code for hardcoding core data |
|
1380 C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data |
|
1381 |
|
1382 ICU data make path is \svn\icuproj\icu\uni51\source\data\ |
|
1383 ICU root path is \svn\icuproj\icu\uni51 |
|
1384 Information: cannot find "ucmlocal.mk". Not building user-additional converter files. |
|
1385 Information: cannot find "brklocal.mk". Not building user-additional break iterator files. |
|
1386 Information: cannot find "reslocal.mk". Not building user-additional resource bundle files. |
|
1387 Information: cannot find "collocal.mk". Not building user-additional resource bundle files. |
|
1388 Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files. |
|
1389 Information: cannot find "trnslocal.mk". Not building user-additional transliterator files. |
|
1390 Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files. |
|
1391 Creating data file for Unicode Character Properties |
|
1392 Creating data file for Unicode Case Mapping Properties |
|
1393 Creating data file for Unicode BiDi/Shaping Properties |
|
1394 Creating data file for Unicode Normalization |
|
1395 Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l" |
|
1396 Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp" |
|
1397 |
|
1398 - copy the .c source files to C:\svn\icuproj\icu\uni51\source\common |
|
1399 and rebuild the common library |
|
1400 |
|
1401 *** Break iterators |
|
1402 |
|
1403 * Update break iterator rules to new UAX versions and new property values |
|
1404 |
|
1405 *** UCA |
|
1406 |
|
1407 * update FractionalUCA.txt and UCARules.txt with new canonical closure |
|
1408 |
|
1409 *** Test suites |
|
1410 - Test that APIs using Unicode property value aliases (like UnicodeSet) |
|
1411 support all of the boolean values N/Y, No/Yes, F/T, False/True |
|
1412 -> TestBinaryValues() tests in both cintltst and intltest |
|
1413 |
|
1414 *** LayoutEngine script information |
|
1415 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h, |
|
1416 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates |
|
1417 ScriptRunData.cpp, which is no longer needed.) |
|
1418 |
|
1419 The generated files have a current copyright date and "@draft" statement. |
|
1420 |
|
1421 * copy the above files into <icu>/source/layout, replacing the old files. |
|
1422 |
|
1423 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp |
|
1424 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...) |
|
1425 |
|
1426 * rebuild the layout and layoutex libraries. |
|
1427 |
|
1428 *** Documentation |
|
1429 - Update User Guide |
|
1430 + Jamo_Short_Name, sfc->scf, binary property value aliases |
|
1431 |
|
1432 ---------------------------------------------------------------------------- *** |
|
1433 |
|
1434 Unicode 5.0 update |
|
1435 |
|
1436 *** related Jitterbugs |
|
1437 |
|
1438 5084 RFE: Update to Unicode 5.0 |
|
1439 |
|
1440 *** data files & enums & parser code |
|
1441 |
|
1442 * file preparation |
|
1443 - ucdstrip: |
|
1444 DerivedCoreProperties.txt |
|
1445 DerivedNormalizationProps.txt |
|
1446 NormalizationTest.txt |
|
1447 PropList.txt |
|
1448 Scripts.txt |
|
1449 GraphemeBreakProperty.txt |
|
1450 SentenceBreakProperty.txt |
|
1451 WordBreakProperty.txt |
|
1452 - ucdstrip and ucdmerge: |
|
1453 EastAsianWidth.txt |
|
1454 LineBreak.txt |
|
1455 |
|
1456 * my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers) |
|
1457 copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\ |
|
1458 copy 5.0.0\ucd\Blocks.txt ..\unidata\ |
|
1459 copy 5.0.0\ucd\CaseFolding.txt ..\unidata\ |
|
1460 copy 5.0.0\ucd\DerivedAge.txt ..\unidata\ |
|
1461 copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\ |
|
1462 copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\ |
|
1463 copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\ |
|
1464 copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\ |
|
1465 copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\ |
|
1466 copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\ |
|
1467 copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\ |
|
1468 copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\ |
|
1469 copy 5.0.0\ucd\UnicodeData.txt ..\unidata\ |
|
1470 |
|
1471 ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt |
|
1472 ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt |
|
1473 ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt |
|
1474 ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt |
|
1475 ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt |
|
1476 ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt |
|
1477 ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt |
|
1478 ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt |
|
1479 ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt |
|
1480 ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt |
|
1481 |
|
1482 * update FractionalUCA.txt and UCARules.txt with new canonical closure |
|
1483 |
|
1484 * genpname |
|
1485 - run preparse.pl |
|
1486 + make sure that data.h is writable |
|
1487 + perl preparse.pl \cvs\oss\icu > out.txt |
|
1488 |
|
1489 * uchar.h & uscript.h & uprops.h & uprops.c & genprops |
|
1490 - new block & script values |
|
1491 + script values already added in ICU 3.6 because all of ISO 15924 is now covered |
|
1492 |
|
1493 * build Unicode data source code for hardcoding core data |
|
1494 C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data |
|
1495 |
|
1496 ICU data make path is \cvs\oss\icu\source\data\ |
|
1497 ICU root path is \cvs\oss\icu |
|
1498 Information: cannot find "ucmlocal.mk". Not building user-additional converter files. |
|
1499 [etc.] |
|
1500 Creating data file for Unicode Character Properties |
|
1501 Creating data file for Unicode Case Mapping Properties |
|
1502 Creating data file for Unicode BiDi/Shaping Properties |
|
1503 Creating data file for Unicode Normalization |
|
1504 Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l" |
|
1505 Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp" |
|
1506 |
|
1507 - copy the .c source files to C:\cvs\oss\icu\source\common |
|
1508 and rebuild the common library |
|
1509 |
|
1510 *** Unicode version numbers |
|
1511 - makedata.mak |
|
1512 - uchar.h |
|
1513 - configure.in |
|
1514 |
|
1515 *** LayoutEngine script information |
|
1516 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h, |
|
1517 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates |
|
1518 ScriptRunData.cpp, which is no longer needed.) |
|
1519 |
|
1520 The generated files have a current copyright date and "@draft" statement. |
|
1521 |
|
1522 * copy the above files into <icu>/source/layout, replacing the old files. |
|
1523 |
|
1524 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp |
|
1525 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...) |
|
1526 |
|
1527 * rebuild the layout and layoutex libraries. |
|
1528 |
|
1529 ---------------------------------------------------------------------------- *** |
|
1530 |
|
1531 Unicode 4.1 update |
|
1532 |
|
1533 *** related Jitterbugs |
|
1534 |
|
1535 4332 RFE: Update to Unicode 4.1 |
|
1536 4157 RBBI, TR29 4.1 updates |
|
1537 |
|
1538 *** data files & enums & parser code |
|
1539 |
|
1540 * file preparation |
|
1541 - ucdstrip: |
|
1542 DerivedCoreProperties.txt |
|
1543 DerivedNormalizationProps.txt |
|
1544 NormalizationTest.txt |
|
1545 GraphemeBreakProperty.txt |
|
1546 SentenceBreakProperty.txt |
|
1547 WordBreakProperty.txt |
|
1548 - ucdstrip and ucdmerge: |
|
1549 EastAsianWidth.txt |
|
1550 LineBreak.txt |
|
1551 |
|
1552 * add new files to the repository |
|
1553 GraphemeBreakProperty.txt |
|
1554 SentenceBreakProperty.txt |
|
1555 WordBreakProperty.txt |
|
1556 |
|
1557 * update FractionalUCA.txt and UCARules.txt with new canonical closure |
|
1558 |
|
1559 * genpname |
|
1560 - handle new enumerated properties in sub read_uchar |
|
1561 - run preparse.pl |
|
1562 |
|
1563 * uchar.h & uscript.h & uprops.h & uprops.c & genprops |
|
1564 - new binary properties |
|
1565 + Pattern_Syntax |
|
1566 + Pattern_White_Space |
|
1567 - new enumerated properties |
|
1568 + Grapheme_Cluster_Break |
|
1569 + Sentence_Break |
|
1570 + Word_Break |
|
1571 - new block & script & line break values |
|
1572 |
|
1573 * gencase |
|
1574 - case-ignorable changes |
|
1575 see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods |
|
1576 now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk |
|
1577 |
|
1578 *** Unicode version numbers |
|
1579 - makedata.mak |
|
1580 - uchar.h |
|
1581 - configure.in |
|
1582 |
|
1583 *** tests |
|
1584 - verify that u_charMirror() round-trips |
|
1585 - test all new properties and some new values of old properties |
|
1586 |
|
1587 *** other code |
|
1588 |
|
1589 * hardcoded Unihan range end/limit |
|
1590 - Unihan range end moves from 9FA5 to 9FBB |
|
1591 search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive) |
|
1592 + do not modify BOCU/BOCSU code because that would change the encoding |
|
1593 and break binary compatibility! |
|
1594 + similarly, do not change the GB 18030 range data (ucnvmbcs.c), |
|
1595 NamePrepProfile.txt |
|
1596 + ignore trietest.c: test data is arbitrary |
|
1597 + ignore tstnorm.cpp: test optimization, not important |
|
1598 + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF |
|
1599 + do change line_th.txt and word_th.txt |
|
1600 by replacing hardcoded ranges with the new property values |
|
1601 + do change gennames.c |
|
1602 |
|
1603 source\data\brkitr\line_th.txt(229): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6 |
|
1604 source\data\brkitr\word_th.txt(23): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6 |
|
1605 source\tools\gennames\gennames.c(971): 0x4e00, 0x9fa5, |
|
1606 |
|
1607 * case mappings |
|
1608 - compare new special casing context conditions with previous ones |
|
1609 see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods |
|
1610 |
|
1611 * genpname |
|
1612 - consider storing only the short name if it is the same as the long name |
|
1613 |
|
1614 *** other reviews |
|
1615 - UAX #29 changes (grapheme/word/sentence breaks) |
|
1616 - UAX #14 changes (line breaks) |
|
1617 - Pattern_Syntax & Pattern_White_Space |
|
1618 |
|
1619 ---------------------------------------------------------------------------- *** |
|
1620 |
|
1621 Unicode 4.0.1 update |
|
1622 |
|
1623 *** related Jitterbugs |
|
1624 |
|
1625 3170 RFE: Update to Unicode 4.0.1 |
|
1626 3171 Add new Unicode 4.0.1 properties |
|
1627 3520 use Unicode 4.0.1 updates for break iteration |
|
1628 |
|
1629 *** data files & enums & parser code |
|
1630 |
|
1631 * file preparation |
|
1632 - ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt |
|
1633 - ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt |
|
1634 |
|
1635 * file fixes |
|
1636 - fix UnicodeData.txt general categories of Ethiopic digits Nd->No |
|
1637 according to PRI #26 |
|
1638 http://www.unicode.org/review/resolved-pri.html#pri26 |
|
1639 - undone again because no corrigendum in sight; |
|
1640 instead modified tests to not check consistency on this for Unicode 4.0.1 |
|
1641 |
|
1642 * ucdterms.txt |
|
1643 - update from http://www.unicode.org/copyright.html |
|
1644 formatted for plain text |
|
1645 |
|
1646 * uchar.h & uprops.h & uprops.c & genprops |
|
1647 - add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed |
|
1648 - add U_LB_INSEPARABLE due to a spelling fix |
|
1649 + put short name comment only on line with new constant |
|
1650 for genpname perl script parser |
|
1651 - new binary properties |
|
1652 + STerm |
|
1653 + Variation_Selector |
|
1654 |
|
1655 * genpname |
|
1656 - fix genpname perl script so that it doesn't choke on more than 2 names per property value |
|
1657 - perl script: correctly calculate the maximum number of fields per row |
|
1658 |
|
1659 * uscript.h |
|
1660 - new script code Hrkt=Katakana_Or_Hiragana |
|
1661 |
|
1662 * gennorm.c track changes in DerivedNormalizationProps.txt |
|
1663 - "FNC" -> "FC_NFKC" |
|
1664 - single field "NFD_NO" -> two fields "NFD_QC; N" etc. |
|
1665 |
|
1666 * genprops/props2.c track changes in DerivedNumericValues.txt |
|
1667 - changed from 3 columns to 2, dropping the numeric type |
|
1668 + assume that the type is always numeric for Han characters, |
|
1669 and that only those are added in addition to what UnicodeData.txt lists |
|
1670 |
|
1671 *** Unicode version numbers |
|
1672 - makedata.mak |
|
1673 - uchar.h |
|
1674 - configure.in |
|
1675 |
|
1676 *** tests |
|
1677 - update test of default bidi classes according to PRI #28 |
|
1678 /tsutil/cucdtst/TestUnicodeData |
|
1679 http://www.unicode.org/review/resolved-pri.html#pri28 |
|
1680 - bidi tests: change exemplar character for ES depending on Unicode version |
|
1681 - change hardcoded expected property values where they change |
|
1682 |
|
1683 *** other code |
|
1684 |
|
1685 * name matching |
|
1686 - read UCD.html |
|
1687 |
|
1688 * scripts |
|
1689 - use new Hrkt=Katakana_Or_Hiragana |
|
1690 |
|
1691 * ZWJ & ZWNJ |
|
1692 - are now part of combining character sequences |
|
1693 - break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ |