1.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 1.2 +++ b/intl/icu/source/common/unicode/utf_old.h Wed Dec 31 06:09:35 2014 +0100 1.3 @@ -0,0 +1,1169 @@ 1.4 +/* 1.5 +******************************************************************************* 1.6 +* 1.7 +* Copyright (C) 2002-2012, International Business Machines 1.8 +* Corporation and others. All Rights Reserved. 1.9 +* 1.10 +******************************************************************************* 1.11 +* file name: utf_old.h 1.12 +* encoding: US-ASCII 1.13 +* tab size: 8 (not used) 1.14 +* indentation:4 1.15 +* 1.16 +* created on: 2002sep21 1.17 +* created by: Markus W. Scherer 1.18 +*/ 1.19 + 1.20 +/** 1.21 + * \file 1.22 + * \brief C API: Deprecated macros for Unicode string handling 1.23 + */ 1.24 + 1.25 +/** 1.26 + * 1.27 + * The macros in utf_old.h are all deprecated and their use discouraged. 1.28 + * Some of the design principles behind the set of UTF macros 1.29 + * have changed or proved impractical. 1.30 + * Almost all of the old "UTF macros" are at least renamed. 1.31 + * If you are looking for a new equivalent to an old macro, please see the 1.32 + * comment at the old one. 1.33 + * 1.34 + * Brief summary of reasons for deprecation: 1.35 + * - Switch on UTF_SIZE (selection of UTF-8/16/32 default string processing) 1.36 + * was impractical. 1.37 + * - Switch on UTF_SAFE etc. (selection of unsafe/safe/strict default string processing) 1.38 + * was of little use and impractical. 1.39 + * - Whole classes of macros became obsolete outside of the UTF_SIZE/UTF_SAFE 1.40 + * selection framework: UTF32_ macros (all trivial) 1.41 + * and UTF_ default and intermediate macros (all aliases). 1.42 + * - The selection framework also caused many macro aliases. 1.43 + * - Change in Unicode standard: "irregular" sequences (3.0) became illegal (3.2). 1.44 + * - Change of language in Unicode standard: 1.45 + * Growing distinction between internal x-bit Unicode strings and external UTF-x 1.46 + * forms, with the former more lenient. 1.47 + * Suggests renaming of UTF16_ macros to U16_. 1.48 + * - The prefix "UTF_" without a width number confused some users. 1.49 + * - "Safe" append macros needed the addition of an error indicator output. 1.50 + * - "Safe" UTF-8 macros used legitimate (if rarely used) code point values 1.51 + * to indicate error conditions. 1.52 + * - The use of the "_CHAR" infix for code point operations confused some users. 1.53 + * 1.54 + * More details: 1.55 + * 1.56 + * Until ICU 2.2, utf.h theoretically allowed to choose among UTF-8/16/32 1.57 + * for string processing, and among unsafe/safe/strict default macros for that. 1.58 + * 1.59 + * It proved nearly impossible to write non-trivial, high-performance code 1.60 + * that is UTF-generic. 1.61 + * Unsafe default macros would be dangerous for default string processing, 1.62 + * and the main reason for the "strict" versions disappeared: 1.63 + * Between Unicode 3.0 and 3.2 all "irregular" UTF-8 sequences became illegal. 1.64 + * The only other conditions that "strict" checked for were non-characters, 1.65 + * which are valid during processing. Only during text input/output should they 1.66 + * be checked, and at that time other well-formedness checks may be 1.67 + * necessary or useful as well. 1.68 + * This can still be done by using U16_NEXT and U_IS_UNICODE_NONCHAR 1.69 + * or U_IS_UNICODE_CHAR. 1.70 + * 1.71 + * The old UTF8_..._SAFE macros also used some normal Unicode code points 1.72 + * to indicate malformed sequences. 1.73 + * The new UTF8_ macros without suffix use negative values instead. 1.74 + * 1.75 + * The entire contents of utf32.h was moved here without replacement 1.76 + * because all those macros were trivial and 1.77 + * were meaningful only in the framework of choosing the UTF size. 1.78 + * 1.79 + * See Jitterbug 2150 and its discussion on the ICU mailing list 1.80 + * in September 2002. 1.81 + * 1.82 + * <hr> 1.83 + * 1.84 + * <em>Obsolete part</em> of pre-ICU 2.4 utf.h file documentation: 1.85 + * 1.86 + * <p>The original concept for these files was for ICU to allow 1.87 + * in principle to set which UTF (UTF-8/16/32) is used internally 1.88 + * by defining UTF_SIZE to either 8, 16, or 32. utf.h would then define the UChar type 1.89 + * accordingly. UTF-16 was the default.</p> 1.90 + * 1.91 + * <p>This concept has been abandoned. 1.92 + * A lot of the ICU source code assumes UChar strings are in UTF-16. 1.93 + * This is especially true for low-level code like 1.94 + * conversion, normalization, and collation. 1.95 + * The utf.h header enforces the default of UTF-16. 1.96 + * The UTF-8 and UTF-32 macros remain for now for completeness and backward compatibility.</p> 1.97 + * 1.98 + * <p>Accordingly, utf.h defines UChar to be an unsigned 16-bit integer. If this matches wchar_t, then 1.99 + * UChar is defined to be exactly wchar_t, otherwise uint16_t.</p> 1.100 + * 1.101 + * <p>UChar32 is defined to be a signed 32-bit integer (int32_t), large enough for a 21-bit 1.102 + * Unicode code point (Unicode scalar value, 0..0x10ffff). 1.103 + * Before ICU 2.4, the definition of UChar32 was similarly platform-dependent as 1.104 + * the definition of UChar. For details see the documentation for UChar32 itself.</p> 1.105 + * 1.106 + * <p>utf.h also defines a number of C macros for handling single Unicode code points and 1.107 + * for using UTF Unicode strings. It includes utf8.h, utf16.h, and utf32.h for the actual 1.108 + * implementations of those macros and then aliases one set of them (for UTF-16) for general use. 1.109 + * The UTF-specific macros have the UTF size in the macro name prefixes (UTF16_...), while 1.110 + * the general alias macros always begin with UTF_...</p> 1.111 + * 1.112 + * <p>Many string operations can be done with or without error checking. 1.113 + * Where such a distinction is useful, there are two versions of the macros, "unsafe" and "safe" 1.114 + * ones with ..._UNSAFE and ..._SAFE suffixes. The unsafe macros are fast but may cause 1.115 + * program failures if the strings are not well-formed. The safe macros have an additional, boolean 1.116 + * parameter "strict". If strict is FALSE, then only illegal sequences are detected. 1.117 + * Otherwise, irregular sequences and non-characters are detected as well (like single surrogates). 1.118 + * Safe macros return special error code points for illegal/irregular sequences: 1.119 + * Typically, U+ffff, or values that would result in a code unit sequence of the same length 1.120 + * as the erroneous input sequence.<br> 1.121 + * Note that _UNSAFE macros have fewer parameters: They do not have the strictness parameter, and 1.122 + * they do not have start/length parameters for boundary checking.</p> 1.123 + * 1.124 + * <p>Here, the macros are aliased in two steps: 1.125 + * In the first step, the UTF-specific macros with UTF16_ prefix and _UNSAFE and _SAFE suffixes are 1.126 + * aliased according to the UTF_SIZE to macros with UTF_ prefix and the same suffixes and signatures. 1.127 + * Then, in a second step, the default, general alias macros are set to use either the unsafe or 1.128 + * the safe/not strict (default) or the safe/strict macro; 1.129 + * these general macros do not have a strictness parameter.</p> 1.130 + * 1.131 + * <p>It is possible to change the default choice for the general alias macros to be unsafe, safe/not strict or safe/strict. 1.132 + * The default is safe/not strict. It is not recommended to select the unsafe macros as the basis for 1.133 + * Unicode string handling in ICU! To select this, define UTF_SAFE, UTF_STRICT, or UTF_UNSAFE.</p> 1.134 + * 1.135 + * <p>For general use, one should use the default, general macros with UTF_ prefix and no _SAFE/_UNSAFE suffix. 1.136 + * Only in some cases it may be necessary to control the choice of macro directly and use a less generic alias. 1.137 + * For example, if it can be assumed that a string is well-formed and the index will stay within the bounds, 1.138 + * then the _UNSAFE version may be used. 1.139 + * If a UTF-8 string is to be processed, then the macros with UTF8_ prefixes need to be used.</p> 1.140 + * 1.141 + * <hr> 1.142 + * 1.143 + * @deprecated ICU 2.4. Use the macros in utf.h, utf16.h, utf8.h instead. 1.144 + */ 1.145 + 1.146 +#ifndef __UTF_OLD_H__ 1.147 +#define __UTF_OLD_H__ 1.148 + 1.149 +#ifndef U_HIDE_DEPRECATED_API 1.150 + 1.151 +#include "unicode/utf.h" 1.152 +#include "unicode/utf8.h" 1.153 +#include "unicode/utf16.h" 1.154 + 1.155 +/* Formerly utf.h, part 1 --------------------------------------------------- */ 1.156 + 1.157 +#ifdef U_USE_UTF_DEPRECATES 1.158 +/** 1.159 + * Unicode string and array offset and index type. 1.160 + * ICU always counts Unicode code units (UChars) for 1.161 + * string offsets, indexes, and lengths, not Unicode code points. 1.162 + * 1.163 + * @obsolete ICU 2.6. Use int32_t directly instead since this API will be removed in that release. 1.164 + */ 1.165 +typedef int32_t UTextOffset; 1.166 +#endif 1.167 + 1.168 +/** Number of bits in a Unicode string code unit - ICU uses 16-bit Unicode. @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.169 +#define UTF_SIZE 16 1.170 + 1.171 +/** 1.172 + * The default choice for general Unicode string macros is to use the ..._SAFE macro implementations 1.173 + * with strict=FALSE. 1.174 + * 1.175 + * @deprecated ICU 2.4. Obsolete, see utf_old.h. 1.176 + */ 1.177 +#define UTF_SAFE 1.178 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.179 +#undef UTF_UNSAFE 1.180 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.181 +#undef UTF_STRICT 1.182 + 1.183 +/** 1.184 + * UTF8_ERROR_VALUE_1 and UTF8_ERROR_VALUE_2 are special error values for UTF-8, 1.185 + * which need 1 or 2 bytes in UTF-8: 1.186 + * \code 1.187 + * U+0015 = NAK = Negative Acknowledge, C0 control character 1.188 + * U+009f = highest C1 control character 1.189 + * \endcode 1.190 + * 1.191 + * These are used by UTF8_..._SAFE macros so that they can return an error value 1.192 + * that needs the same number of code units (bytes) as were seen by 1.193 + * a macro. They should be tested with UTF_IS_ERROR() or UTF_IS_VALID(). 1.194 + * 1.195 + * @deprecated ICU 2.4. Obsolete, see utf_old.h. 1.196 + */ 1.197 +#define UTF8_ERROR_VALUE_1 0x15 1.198 + 1.199 +/** 1.200 + * See documentation on UTF8_ERROR_VALUE_1 for details. 1.201 + * 1.202 + * @deprecated ICU 2.4. Obsolete, see utf_old.h. 1.203 + */ 1.204 +#define UTF8_ERROR_VALUE_2 0x9f 1.205 + 1.206 +/** 1.207 + * Error value for all UTFs. This code point value will be set by macros with error 1.208 + * checking if an error is detected. 1.209 + * 1.210 + * @deprecated ICU 2.4. Obsolete, see utf_old.h. 1.211 + */ 1.212 +#define UTF_ERROR_VALUE 0xffff 1.213 + 1.214 +/** 1.215 + * Is a given 32-bit code an error value 1.216 + * as returned by one of the macros for any UTF? 1.217 + * 1.218 + * @deprecated ICU 2.4. Obsolete, see utf_old.h. 1.219 + */ 1.220 +#define UTF_IS_ERROR(c) \ 1.221 + (((c)&0xfffe)==0xfffe || (c)==UTF8_ERROR_VALUE_1 || (c)==UTF8_ERROR_VALUE_2) 1.222 + 1.223 +/** 1.224 + * This is a combined macro: Is c a valid Unicode value _and_ not an error code? 1.225 + * 1.226 + * @deprecated ICU 2.4. Obsolete, see utf_old.h. 1.227 + */ 1.228 +#define UTF_IS_VALID(c) \ 1.229 + (UTF_IS_UNICODE_CHAR(c) && \ 1.230 + (c)!=UTF8_ERROR_VALUE_1 && (c)!=UTF8_ERROR_VALUE_2) 1.231 + 1.232 +/** 1.233 + * Is this code unit or code point a surrogate (U+d800..U+dfff)? 1.234 + * @deprecated ICU 2.4. Renamed to U_IS_SURROGATE and U16_IS_SURROGATE, see utf_old.h. 1.235 + */ 1.236 +#define UTF_IS_SURROGATE(uchar) (((uchar)&0xfffff800)==0xd800) 1.237 + 1.238 +/** 1.239 + * Is a given 32-bit code point a Unicode noncharacter? 1.240 + * 1.241 + * @deprecated ICU 2.4. Renamed to U_IS_UNICODE_NONCHAR, see utf_old.h. 1.242 + */ 1.243 +#define UTF_IS_UNICODE_NONCHAR(c) \ 1.244 + ((c)>=0xfdd0 && \ 1.245 + ((uint32_t)(c)<=0xfdef || ((c)&0xfffe)==0xfffe) && \ 1.246 + (uint32_t)(c)<=0x10ffff) 1.247 + 1.248 +/** 1.249 + * Is a given 32-bit value a Unicode code point value (0..U+10ffff) 1.250 + * that can be assigned a character? 1.251 + * 1.252 + * Code points that are not characters include: 1.253 + * - single surrogate code points (U+d800..U+dfff, 2048 code points) 1.254 + * - the last two code points on each plane (U+__fffe and U+__ffff, 34 code points) 1.255 + * - U+fdd0..U+fdef (new with Unicode 3.1, 32 code points) 1.256 + * - the highest Unicode code point value is U+10ffff 1.257 + * 1.258 + * This means that all code points below U+d800 are character code points, 1.259 + * and that boundary is tested first for performance. 1.260 + * 1.261 + * @deprecated ICU 2.4. Renamed to U_IS_UNICODE_CHAR, see utf_old.h. 1.262 + */ 1.263 +#define UTF_IS_UNICODE_CHAR(c) \ 1.264 + ((uint32_t)(c)<0xd800 || \ 1.265 + ((uint32_t)(c)>0xdfff && \ 1.266 + (uint32_t)(c)<=0x10ffff && \ 1.267 + !UTF_IS_UNICODE_NONCHAR(c))) 1.268 + 1.269 +/* Formerly utf8.h ---------------------------------------------------------- */ 1.270 + 1.271 +/** 1.272 + * Count the trail bytes for a UTF-8 lead byte. 1.273 + * @deprecated ICU 2.4. Renamed to U8_COUNT_TRAIL_BYTES, see utf_old.h. 1.274 + */ 1.275 +#define UTF8_COUNT_TRAIL_BYTES(leadByte) (utf8_countTrailBytes[(uint8_t)leadByte]) 1.276 + 1.277 +/** 1.278 + * Mask a UTF-8 lead byte, leave only the lower bits that form part of the code point value. 1.279 + * @deprecated ICU 2.4. Renamed to U8_MASK_LEAD_BYTE, see utf_old.h. 1.280 + */ 1.281 +#define UTF8_MASK_LEAD_BYTE(leadByte, countTrailBytes) ((leadByte)&=(1<<(6-(countTrailBytes)))-1) 1.282 + 1.283 +/** Is this this code point a single code unit (byte)? @deprecated ICU 2.4. Renamed to U8_IS_SINGLE, see utf_old.h. */ 1.284 +#define UTF8_IS_SINGLE(uchar) (((uchar)&0x80)==0) 1.285 +/** Is this this code unit the lead code unit (byte) of a code point? @deprecated ICU 2.4. Renamed to U8_IS_LEAD, see utf_old.h. */ 1.286 +#define UTF8_IS_LEAD(uchar) ((uint8_t)((uchar)-0xc0)<0x3e) 1.287 +/** Is this this code unit a trailing code unit (byte) of a code point? @deprecated ICU 2.4. Renamed to U8_IS_TRAIL, see utf_old.h. */ 1.288 +#define UTF8_IS_TRAIL(uchar) (((uchar)&0xc0)==0x80) 1.289 + 1.290 +/** Does this scalar Unicode value need multiple code units for storage? @deprecated ICU 2.4. Use U8_LENGTH or test ((uint32_t)(c)>0x7f) instead, see utf_old.h. */ 1.291 +#define UTF8_NEED_MULTIPLE_UCHAR(c) ((uint32_t)(c)>0x7f) 1.292 + 1.293 +/** 1.294 + * Given the lead character, how many bytes are taken by this code point. 1.295 + * ICU does not deal with code points >0x10ffff 1.296 + * unless necessary for advancing in the byte stream. 1.297 + * 1.298 + * These length macros take into account that for values >0x10ffff 1.299 + * the UTF8_APPEND_CHAR_SAFE macros would write the error code point 0xffff 1.300 + * with 3 bytes. 1.301 + * Code point comparisons need to be in uint32_t because UChar32 1.302 + * may be a signed type, and negative values must be recognized. 1.303 + * 1.304 + * @deprecated ICU 2.4. Use U8_LENGTH instead, see utf.h. 1.305 + */ 1.306 +#if 1 1.307 +# define UTF8_CHAR_LENGTH(c) \ 1.308 + ((uint32_t)(c)<=0x7f ? 1 : \ 1.309 + ((uint32_t)(c)<=0x7ff ? 2 : \ 1.310 + ((uint32_t)((c)-0x10000)>0xfffff ? 3 : 4) \ 1.311 + ) \ 1.312 + ) 1.313 +#else 1.314 +# define UTF8_CHAR_LENGTH(c) \ 1.315 + ((uint32_t)(c)<=0x7f ? 1 : \ 1.316 + ((uint32_t)(c)<=0x7ff ? 2 : \ 1.317 + ((uint32_t)(c)<=0xffff ? 3 : \ 1.318 + ((uint32_t)(c)<=0x10ffff ? 4 : \ 1.319 + ((uint32_t)(c)<=0x3ffffff ? 5 : \ 1.320 + ((uint32_t)(c)<=0x7fffffff ? 6 : 3) \ 1.321 + ) \ 1.322 + ) \ 1.323 + ) \ 1.324 + ) \ 1.325 + ) 1.326 +#endif 1.327 + 1.328 +/** The maximum number of bytes per code point. @deprecated ICU 2.4. Renamed to U8_MAX_LENGTH, see utf_old.h. */ 1.329 +#define UTF8_MAX_CHAR_LENGTH 4 1.330 + 1.331 +/** Average number of code units compared to UTF-16. @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.332 +#define UTF8_ARRAY_SIZE(size) ((5*(size))/2) 1.333 + 1.334 +/** @deprecated ICU 2.4. Renamed to U8_GET_UNSAFE, see utf_old.h. */ 1.335 +#define UTF8_GET_CHAR_UNSAFE(s, i, c) { \ 1.336 + int32_t _utf8_get_char_unsafe_index=(int32_t)(i); \ 1.337 + UTF8_SET_CHAR_START_UNSAFE(s, _utf8_get_char_unsafe_index); \ 1.338 + UTF8_NEXT_CHAR_UNSAFE(s, _utf8_get_char_unsafe_index, c); \ 1.339 +} 1.340 + 1.341 +/** @deprecated ICU 2.4. Use U8_GET instead, see utf_old.h. */ 1.342 +#define UTF8_GET_CHAR_SAFE(s, start, i, length, c, strict) { \ 1.343 + int32_t _utf8_get_char_safe_index=(int32_t)(i); \ 1.344 + UTF8_SET_CHAR_START_SAFE(s, start, _utf8_get_char_safe_index); \ 1.345 + UTF8_NEXT_CHAR_SAFE(s, _utf8_get_char_safe_index, length, c, strict); \ 1.346 +} 1.347 + 1.348 +/** @deprecated ICU 2.4. Renamed to U8_NEXT_UNSAFE, see utf_old.h. */ 1.349 +#define UTF8_NEXT_CHAR_UNSAFE(s, i, c) { \ 1.350 + (c)=(s)[(i)++]; \ 1.351 + if((uint8_t)((c)-0xc0)<0x35) { \ 1.352 + uint8_t __count=UTF8_COUNT_TRAIL_BYTES(c); \ 1.353 + UTF8_MASK_LEAD_BYTE(c, __count); \ 1.354 + switch(__count) { \ 1.355 + /* each following branch falls through to the next one */ \ 1.356 + case 3: \ 1.357 + (c)=((c)<<6)|((s)[(i)++]&0x3f); \ 1.358 + case 2: \ 1.359 + (c)=((c)<<6)|((s)[(i)++]&0x3f); \ 1.360 + case 1: \ 1.361 + (c)=((c)<<6)|((s)[(i)++]&0x3f); \ 1.362 + /* no other branches to optimize switch() */ \ 1.363 + break; \ 1.364 + } \ 1.365 + } \ 1.366 +} 1.367 + 1.368 +/** @deprecated ICU 2.4. Renamed to U8_APPEND_UNSAFE, see utf_old.h. */ 1.369 +#define UTF8_APPEND_CHAR_UNSAFE(s, i, c) { \ 1.370 + if((uint32_t)(c)<=0x7f) { \ 1.371 + (s)[(i)++]=(uint8_t)(c); \ 1.372 + } else { \ 1.373 + if((uint32_t)(c)<=0x7ff) { \ 1.374 + (s)[(i)++]=(uint8_t)(((c)>>6)|0xc0); \ 1.375 + } else { \ 1.376 + if((uint32_t)(c)<=0xffff) { \ 1.377 + (s)[(i)++]=(uint8_t)(((c)>>12)|0xe0); \ 1.378 + } else { \ 1.379 + (s)[(i)++]=(uint8_t)(((c)>>18)|0xf0); \ 1.380 + (s)[(i)++]=(uint8_t)((((c)>>12)&0x3f)|0x80); \ 1.381 + } \ 1.382 + (s)[(i)++]=(uint8_t)((((c)>>6)&0x3f)|0x80); \ 1.383 + } \ 1.384 + (s)[(i)++]=(uint8_t)(((c)&0x3f)|0x80); \ 1.385 + } \ 1.386 +} 1.387 + 1.388 +/** @deprecated ICU 2.4. Renamed to U8_FWD_1_UNSAFE, see utf_old.h. */ 1.389 +#define UTF8_FWD_1_UNSAFE(s, i) { \ 1.390 + (i)+=1+UTF8_COUNT_TRAIL_BYTES((s)[i]); \ 1.391 +} 1.392 + 1.393 +/** @deprecated ICU 2.4. Renamed to U8_FWD_N_UNSAFE, see utf_old.h. */ 1.394 +#define UTF8_FWD_N_UNSAFE(s, i, n) { \ 1.395 + int32_t __N=(n); \ 1.396 + while(__N>0) { \ 1.397 + UTF8_FWD_1_UNSAFE(s, i); \ 1.398 + --__N; \ 1.399 + } \ 1.400 +} 1.401 + 1.402 +/** @deprecated ICU 2.4. Renamed to U8_SET_CP_START_UNSAFE, see utf_old.h. */ 1.403 +#define UTF8_SET_CHAR_START_UNSAFE(s, i) { \ 1.404 + while(UTF8_IS_TRAIL((s)[i])) { --(i); } \ 1.405 +} 1.406 + 1.407 +/** @deprecated ICU 2.4. Use U8_NEXT instead, see utf_old.h. */ 1.408 +#define UTF8_NEXT_CHAR_SAFE(s, i, length, c, strict) { \ 1.409 + (c)=(s)[(i)++]; \ 1.410 + if((c)>=0x80) { \ 1.411 + if(UTF8_IS_LEAD(c)) { \ 1.412 + (c)=utf8_nextCharSafeBody(s, &(i), (int32_t)(length), c, strict); \ 1.413 + } else { \ 1.414 + (c)=UTF8_ERROR_VALUE_1; \ 1.415 + } \ 1.416 + } \ 1.417 +} 1.418 + 1.419 +/** @deprecated ICU 2.4. Use U8_APPEND instead, see utf_old.h. */ 1.420 +#define UTF8_APPEND_CHAR_SAFE(s, i, length, c) { \ 1.421 + if((uint32_t)(c)<=0x7f) { \ 1.422 + (s)[(i)++]=(uint8_t)(c); \ 1.423 + } else { \ 1.424 + (i)=utf8_appendCharSafeBody(s, (int32_t)(i), (int32_t)(length), c, NULL); \ 1.425 + } \ 1.426 +} 1.427 + 1.428 +/** @deprecated ICU 2.4. Renamed to U8_FWD_1, see utf_old.h. */ 1.429 +#define UTF8_FWD_1_SAFE(s, i, length) U8_FWD_1(s, i, length) 1.430 + 1.431 +/** @deprecated ICU 2.4. Renamed to U8_FWD_N, see utf_old.h. */ 1.432 +#define UTF8_FWD_N_SAFE(s, i, length, n) U8_FWD_N(s, i, length, n) 1.433 + 1.434 +/** @deprecated ICU 2.4. Renamed to U8_SET_CP_START, see utf_old.h. */ 1.435 +#define UTF8_SET_CHAR_START_SAFE(s, start, i) U8_SET_CP_START(s, start, i) 1.436 + 1.437 +/** @deprecated ICU 2.4. Renamed to U8_PREV_UNSAFE, see utf_old.h. */ 1.438 +#define UTF8_PREV_CHAR_UNSAFE(s, i, c) { \ 1.439 + (c)=(s)[--(i)]; \ 1.440 + if(UTF8_IS_TRAIL(c)) { \ 1.441 + uint8_t __b, __count=1, __shift=6; \ 1.442 +\ 1.443 + /* c is a trail byte */ \ 1.444 + (c)&=0x3f; \ 1.445 + for(;;) { \ 1.446 + __b=(s)[--(i)]; \ 1.447 + if(__b>=0xc0) { \ 1.448 + UTF8_MASK_LEAD_BYTE(__b, __count); \ 1.449 + (c)|=(UChar32)__b<<__shift; \ 1.450 + break; \ 1.451 + } else { \ 1.452 + (c)|=(UChar32)(__b&0x3f)<<__shift; \ 1.453 + ++__count; \ 1.454 + __shift+=6; \ 1.455 + } \ 1.456 + } \ 1.457 + } \ 1.458 +} 1.459 + 1.460 +/** @deprecated ICU 2.4. Renamed to U8_BACK_1_UNSAFE, see utf_old.h. */ 1.461 +#define UTF8_BACK_1_UNSAFE(s, i) { \ 1.462 + while(UTF8_IS_TRAIL((s)[--(i)])) {} \ 1.463 +} 1.464 + 1.465 +/** @deprecated ICU 2.4. Renamed to U8_BACK_N_UNSAFE, see utf_old.h. */ 1.466 +#define UTF8_BACK_N_UNSAFE(s, i, n) { \ 1.467 + int32_t __N=(n); \ 1.468 + while(__N>0) { \ 1.469 + UTF8_BACK_1_UNSAFE(s, i); \ 1.470 + --__N; \ 1.471 + } \ 1.472 +} 1.473 + 1.474 +/** @deprecated ICU 2.4. Renamed to U8_SET_CP_LIMIT_UNSAFE, see utf_old.h. */ 1.475 +#define UTF8_SET_CHAR_LIMIT_UNSAFE(s, i) { \ 1.476 + UTF8_BACK_1_UNSAFE(s, i); \ 1.477 + UTF8_FWD_1_UNSAFE(s, i); \ 1.478 +} 1.479 + 1.480 +/** @deprecated ICU 2.4. Use U8_PREV instead, see utf_old.h. */ 1.481 +#define UTF8_PREV_CHAR_SAFE(s, start, i, c, strict) { \ 1.482 + (c)=(s)[--(i)]; \ 1.483 + if((c)>=0x80) { \ 1.484 + if((c)<=0xbf) { \ 1.485 + (c)=utf8_prevCharSafeBody(s, start, &(i), c, strict); \ 1.486 + } else { \ 1.487 + (c)=UTF8_ERROR_VALUE_1; \ 1.488 + } \ 1.489 + } \ 1.490 +} 1.491 + 1.492 +/** @deprecated ICU 2.4. Renamed to U8_BACK_1, see utf_old.h. */ 1.493 +#define UTF8_BACK_1_SAFE(s, start, i) U8_BACK_1(s, start, i) 1.494 + 1.495 +/** @deprecated ICU 2.4. Renamed to U8_BACK_N, see utf_old.h. */ 1.496 +#define UTF8_BACK_N_SAFE(s, start, i, n) U8_BACK_N(s, start, i, n) 1.497 + 1.498 +/** @deprecated ICU 2.4. Renamed to U8_SET_CP_LIMIT, see utf_old.h. */ 1.499 +#define UTF8_SET_CHAR_LIMIT_SAFE(s, start, i, length) U8_SET_CP_LIMIT(s, start, i, length) 1.500 + 1.501 +/* Formerly utf16.h --------------------------------------------------------- */ 1.502 + 1.503 +/** Is uchar a first/lead surrogate? @deprecated ICU 2.4. Renamed to U_IS_LEAD and U16_IS_LEAD, see utf_old.h. */ 1.504 +#define UTF_IS_FIRST_SURROGATE(uchar) (((uchar)&0xfffffc00)==0xd800) 1.505 + 1.506 +/** Is uchar a second/trail surrogate? @deprecated ICU 2.4. Renamed to U_IS_TRAIL and U16_IS_TRAIL, see utf_old.h. */ 1.507 +#define UTF_IS_SECOND_SURROGATE(uchar) (((uchar)&0xfffffc00)==0xdc00) 1.508 + 1.509 +/** Assuming c is a surrogate, is it a first/lead surrogate? @deprecated ICU 2.4. Renamed to U_IS_SURROGATE_LEAD and U16_IS_SURROGATE_LEAD, see utf_old.h. */ 1.510 +#define UTF_IS_SURROGATE_FIRST(c) (((c)&0x400)==0) 1.511 + 1.512 +/** Helper constant for UTF16_GET_PAIR_VALUE. @deprecated ICU 2.4. Renamed to U16_SURROGATE_OFFSET, see utf_old.h. */ 1.513 +#define UTF_SURROGATE_OFFSET ((0xd800<<10UL)+0xdc00-0x10000) 1.514 + 1.515 +/** Get the UTF-32 value from the surrogate code units. @deprecated ICU 2.4. Renamed to U16_GET_SUPPLEMENTARY, see utf_old.h. */ 1.516 +#define UTF16_GET_PAIR_VALUE(first, second) \ 1.517 + (((first)<<10UL)+(second)-UTF_SURROGATE_OFFSET) 1.518 + 1.519 +/** @deprecated ICU 2.4. Renamed to U16_LEAD, see utf_old.h. */ 1.520 +#define UTF_FIRST_SURROGATE(supplementary) (UChar)(((supplementary)>>10)+0xd7c0) 1.521 + 1.522 +/** @deprecated ICU 2.4. Renamed to U16_TRAIL, see utf_old.h. */ 1.523 +#define UTF_SECOND_SURROGATE(supplementary) (UChar)(((supplementary)&0x3ff)|0xdc00) 1.524 + 1.525 +/** @deprecated ICU 2.4. Renamed to U16_LEAD, see utf_old.h. */ 1.526 +#define UTF16_LEAD(supplementary) UTF_FIRST_SURROGATE(supplementary) 1.527 + 1.528 +/** @deprecated ICU 2.4. Renamed to U16_TRAIL, see utf_old.h. */ 1.529 +#define UTF16_TRAIL(supplementary) UTF_SECOND_SURROGATE(supplementary) 1.530 + 1.531 +/** @deprecated ICU 2.4. Renamed to U16_IS_SINGLE, see utf_old.h. */ 1.532 +#define UTF16_IS_SINGLE(uchar) !UTF_IS_SURROGATE(uchar) 1.533 + 1.534 +/** @deprecated ICU 2.4. Renamed to U16_IS_LEAD, see utf_old.h. */ 1.535 +#define UTF16_IS_LEAD(uchar) UTF_IS_FIRST_SURROGATE(uchar) 1.536 + 1.537 +/** @deprecated ICU 2.4. Renamed to U16_IS_TRAIL, see utf_old.h. */ 1.538 +#define UTF16_IS_TRAIL(uchar) UTF_IS_SECOND_SURROGATE(uchar) 1.539 + 1.540 +/** Does this scalar Unicode value need multiple code units for storage? @deprecated ICU 2.4. Use U16_LENGTH or test ((uint32_t)(c)>0xffff) instead, see utf_old.h. */ 1.541 +#define UTF16_NEED_MULTIPLE_UCHAR(c) ((uint32_t)(c)>0xffff) 1.542 + 1.543 +/** @deprecated ICU 2.4. Renamed to U16_LENGTH, see utf_old.h. */ 1.544 +#define UTF16_CHAR_LENGTH(c) ((uint32_t)(c)<=0xffff ? 1 : 2) 1.545 + 1.546 +/** @deprecated ICU 2.4. Renamed to U16_MAX_LENGTH, see utf_old.h. */ 1.547 +#define UTF16_MAX_CHAR_LENGTH 2 1.548 + 1.549 +/** Average number of code units compared to UTF-16. @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.550 +#define UTF16_ARRAY_SIZE(size) (size) 1.551 + 1.552 +/** 1.553 + * Get a single code point from an offset that points to any 1.554 + * of the code units that belong to that code point. 1.555 + * Assume 0<=i<length. 1.556 + * 1.557 + * This could be used for iteration together with 1.558 + * UTF16_CHAR_LENGTH() and UTF_IS_ERROR(), 1.559 + * but the use of UTF16_NEXT_CHAR[_UNSAFE]() and 1.560 + * UTF16_PREV_CHAR[_UNSAFE]() is more efficient for that. 1.561 + * @deprecated ICU 2.4. Renamed to U16_GET_UNSAFE, see utf_old.h. 1.562 + */ 1.563 +#define UTF16_GET_CHAR_UNSAFE(s, i, c) { \ 1.564 + (c)=(s)[i]; \ 1.565 + if(UTF_IS_SURROGATE(c)) { \ 1.566 + if(UTF_IS_SURROGATE_FIRST(c)) { \ 1.567 + (c)=UTF16_GET_PAIR_VALUE((c), (s)[(i)+1]); \ 1.568 + } else { \ 1.569 + (c)=UTF16_GET_PAIR_VALUE((s)[(i)-1], (c)); \ 1.570 + } \ 1.571 + } \ 1.572 +} 1.573 + 1.574 +/** @deprecated ICU 2.4. Use U16_GET instead, see utf_old.h. */ 1.575 +#define UTF16_GET_CHAR_SAFE(s, start, i, length, c, strict) { \ 1.576 + (c)=(s)[i]; \ 1.577 + if(UTF_IS_SURROGATE(c)) { \ 1.578 + uint16_t __c2; \ 1.579 + if(UTF_IS_SURROGATE_FIRST(c)) { \ 1.580 + if((i)+1<(length) && UTF_IS_SECOND_SURROGATE(__c2=(s)[(i)+1])) { \ 1.581 + (c)=UTF16_GET_PAIR_VALUE((c), __c2); \ 1.582 + /* strict: ((c)&0xfffe)==0xfffe is caught by UTF_IS_ERROR() and UTF_IS_UNICODE_CHAR() */ \ 1.583 + } else if(strict) {\ 1.584 + /* unmatched first surrogate */ \ 1.585 + (c)=UTF_ERROR_VALUE; \ 1.586 + } \ 1.587 + } else { \ 1.588 + if((i)-1>=(start) && UTF_IS_FIRST_SURROGATE(__c2=(s)[(i)-1])) { \ 1.589 + (c)=UTF16_GET_PAIR_VALUE(__c2, (c)); \ 1.590 + /* strict: ((c)&0xfffe)==0xfffe is caught by UTF_IS_ERROR() and UTF_IS_UNICODE_CHAR() */ \ 1.591 + } else if(strict) {\ 1.592 + /* unmatched second surrogate */ \ 1.593 + (c)=UTF_ERROR_VALUE; \ 1.594 + } \ 1.595 + } \ 1.596 + } else if((strict) && !UTF_IS_UNICODE_CHAR(c)) { \ 1.597 + (c)=UTF_ERROR_VALUE; \ 1.598 + } \ 1.599 +} 1.600 + 1.601 +/** @deprecated ICU 2.4. Renamed to U16_NEXT_UNSAFE, see utf_old.h. */ 1.602 +#define UTF16_NEXT_CHAR_UNSAFE(s, i, c) { \ 1.603 + (c)=(s)[(i)++]; \ 1.604 + if(UTF_IS_FIRST_SURROGATE(c)) { \ 1.605 + (c)=UTF16_GET_PAIR_VALUE((c), (s)[(i)++]); \ 1.606 + } \ 1.607 +} 1.608 + 1.609 +/** @deprecated ICU 2.4. Renamed to U16_APPEND_UNSAFE, see utf_old.h. */ 1.610 +#define UTF16_APPEND_CHAR_UNSAFE(s, i, c) { \ 1.611 + if((uint32_t)(c)<=0xffff) { \ 1.612 + (s)[(i)++]=(uint16_t)(c); \ 1.613 + } else { \ 1.614 + (s)[(i)++]=(uint16_t)(((c)>>10)+0xd7c0); \ 1.615 + (s)[(i)++]=(uint16_t)(((c)&0x3ff)|0xdc00); \ 1.616 + } \ 1.617 +} 1.618 + 1.619 +/** @deprecated ICU 2.4. Renamed to U16_FWD_1_UNSAFE, see utf_old.h. */ 1.620 +#define UTF16_FWD_1_UNSAFE(s, i) { \ 1.621 + if(UTF_IS_FIRST_SURROGATE((s)[(i)++])) { \ 1.622 + ++(i); \ 1.623 + } \ 1.624 +} 1.625 + 1.626 +/** @deprecated ICU 2.4. Renamed to U16_FWD_N_UNSAFE, see utf_old.h. */ 1.627 +#define UTF16_FWD_N_UNSAFE(s, i, n) { \ 1.628 + int32_t __N=(n); \ 1.629 + while(__N>0) { \ 1.630 + UTF16_FWD_1_UNSAFE(s, i); \ 1.631 + --__N; \ 1.632 + } \ 1.633 +} 1.634 + 1.635 +/** @deprecated ICU 2.4. Renamed to U16_SET_CP_START_UNSAFE, see utf_old.h. */ 1.636 +#define UTF16_SET_CHAR_START_UNSAFE(s, i) { \ 1.637 + if(UTF_IS_SECOND_SURROGATE((s)[i])) { \ 1.638 + --(i); \ 1.639 + } \ 1.640 +} 1.641 + 1.642 +/** @deprecated ICU 2.4. Use U16_NEXT instead, see utf_old.h. */ 1.643 +#define UTF16_NEXT_CHAR_SAFE(s, i, length, c, strict) { \ 1.644 + (c)=(s)[(i)++]; \ 1.645 + if(UTF_IS_FIRST_SURROGATE(c)) { \ 1.646 + uint16_t __c2; \ 1.647 + if((i)<(length) && UTF_IS_SECOND_SURROGATE(__c2=(s)[(i)])) { \ 1.648 + ++(i); \ 1.649 + (c)=UTF16_GET_PAIR_VALUE((c), __c2); \ 1.650 + /* strict: ((c)&0xfffe)==0xfffe is caught by UTF_IS_ERROR() and UTF_IS_UNICODE_CHAR() */ \ 1.651 + } else if(strict) {\ 1.652 + /* unmatched first surrogate */ \ 1.653 + (c)=UTF_ERROR_VALUE; \ 1.654 + } \ 1.655 + } else if((strict) && !UTF_IS_UNICODE_CHAR(c)) { \ 1.656 + /* unmatched second surrogate or other non-character */ \ 1.657 + (c)=UTF_ERROR_VALUE; \ 1.658 + } \ 1.659 +} 1.660 + 1.661 +/** @deprecated ICU 2.4. Use U16_APPEND instead, see utf_old.h. */ 1.662 +#define UTF16_APPEND_CHAR_SAFE(s, i, length, c) { \ 1.663 + if((uint32_t)(c)<=0xffff) { \ 1.664 + (s)[(i)++]=(uint16_t)(c); \ 1.665 + } else if((uint32_t)(c)<=0x10ffff) { \ 1.666 + if((i)+1<(length)) { \ 1.667 + (s)[(i)++]=(uint16_t)(((c)>>10)+0xd7c0); \ 1.668 + (s)[(i)++]=(uint16_t)(((c)&0x3ff)|0xdc00); \ 1.669 + } else /* not enough space */ { \ 1.670 + (s)[(i)++]=UTF_ERROR_VALUE; \ 1.671 + } \ 1.672 + } else /* c>0x10ffff, write error value */ { \ 1.673 + (s)[(i)++]=UTF_ERROR_VALUE; \ 1.674 + } \ 1.675 +} 1.676 + 1.677 +/** @deprecated ICU 2.4. Renamed to U16_FWD_1, see utf_old.h. */ 1.678 +#define UTF16_FWD_1_SAFE(s, i, length) U16_FWD_1(s, i, length) 1.679 + 1.680 +/** @deprecated ICU 2.4. Renamed to U16_FWD_N, see utf_old.h. */ 1.681 +#define UTF16_FWD_N_SAFE(s, i, length, n) U16_FWD_N(s, i, length, n) 1.682 + 1.683 +/** @deprecated ICU 2.4. Renamed to U16_SET_CP_START, see utf_old.h. */ 1.684 +#define UTF16_SET_CHAR_START_SAFE(s, start, i) U16_SET_CP_START(s, start, i) 1.685 + 1.686 +/** @deprecated ICU 2.4. Renamed to U16_PREV_UNSAFE, see utf_old.h. */ 1.687 +#define UTF16_PREV_CHAR_UNSAFE(s, i, c) { \ 1.688 + (c)=(s)[--(i)]; \ 1.689 + if(UTF_IS_SECOND_SURROGATE(c)) { \ 1.690 + (c)=UTF16_GET_PAIR_VALUE((s)[--(i)], (c)); \ 1.691 + } \ 1.692 +} 1.693 + 1.694 +/** @deprecated ICU 2.4. Renamed to U16_BACK_1_UNSAFE, see utf_old.h. */ 1.695 +#define UTF16_BACK_1_UNSAFE(s, i) { \ 1.696 + if(UTF_IS_SECOND_SURROGATE((s)[--(i)])) { \ 1.697 + --(i); \ 1.698 + } \ 1.699 +} 1.700 + 1.701 +/** @deprecated ICU 2.4. Renamed to U16_BACK_N_UNSAFE, see utf_old.h. */ 1.702 +#define UTF16_BACK_N_UNSAFE(s, i, n) { \ 1.703 + int32_t __N=(n); \ 1.704 + while(__N>0) { \ 1.705 + UTF16_BACK_1_UNSAFE(s, i); \ 1.706 + --__N; \ 1.707 + } \ 1.708 +} 1.709 + 1.710 +/** @deprecated ICU 2.4. Renamed to U16_SET_CP_LIMIT_UNSAFE, see utf_old.h. */ 1.711 +#define UTF16_SET_CHAR_LIMIT_UNSAFE(s, i) { \ 1.712 + if(UTF_IS_FIRST_SURROGATE((s)[(i)-1])) { \ 1.713 + ++(i); \ 1.714 + } \ 1.715 +} 1.716 + 1.717 +/** @deprecated ICU 2.4. Use U16_PREV instead, see utf_old.h. */ 1.718 +#define UTF16_PREV_CHAR_SAFE(s, start, i, c, strict) { \ 1.719 + (c)=(s)[--(i)]; \ 1.720 + if(UTF_IS_SECOND_SURROGATE(c)) { \ 1.721 + uint16_t __c2; \ 1.722 + if((i)>(start) && UTF_IS_FIRST_SURROGATE(__c2=(s)[(i)-1])) { \ 1.723 + --(i); \ 1.724 + (c)=UTF16_GET_PAIR_VALUE(__c2, (c)); \ 1.725 + /* strict: ((c)&0xfffe)==0xfffe is caught by UTF_IS_ERROR() and UTF_IS_UNICODE_CHAR() */ \ 1.726 + } else if(strict) {\ 1.727 + /* unmatched second surrogate */ \ 1.728 + (c)=UTF_ERROR_VALUE; \ 1.729 + } \ 1.730 + } else if((strict) && !UTF_IS_UNICODE_CHAR(c)) { \ 1.731 + /* unmatched first surrogate or other non-character */ \ 1.732 + (c)=UTF_ERROR_VALUE; \ 1.733 + } \ 1.734 +} 1.735 + 1.736 +/** @deprecated ICU 2.4. Renamed to U16_BACK_1, see utf_old.h. */ 1.737 +#define UTF16_BACK_1_SAFE(s, start, i) U16_BACK_1(s, start, i) 1.738 + 1.739 +/** @deprecated ICU 2.4. Renamed to U16_BACK_N, see utf_old.h. */ 1.740 +#define UTF16_BACK_N_SAFE(s, start, i, n) U16_BACK_N(s, start, i, n) 1.741 + 1.742 +/** @deprecated ICU 2.4. Renamed to U16_SET_CP_LIMIT, see utf_old.h. */ 1.743 +#define UTF16_SET_CHAR_LIMIT_SAFE(s, start, i, length) U16_SET_CP_LIMIT(s, start, i, length) 1.744 + 1.745 +/* Formerly utf32.h --------------------------------------------------------- */ 1.746 + 1.747 +/* 1.748 +* Old documentation: 1.749 +* 1.750 +* This file defines macros to deal with UTF-32 code units and code points. 1.751 +* Signatures and semantics are the same as for the similarly named macros 1.752 +* in utf16.h. 1.753 +* utf32.h is included by utf.h after unicode/umachine.h</p> 1.754 +* and some common definitions. 1.755 +* <p><b>Usage:</b> ICU coding guidelines for if() statements should be followed when using these macros. 1.756 +* Compound statements (curly braces {}) must be used for if-else-while... 1.757 +* bodies and all macro statements should be terminated with semicolon.</p> 1.758 +*/ 1.759 + 1.760 +/* internal definitions ----------------------------------------------------- */ 1.761 + 1.762 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.763 +#define UTF32_IS_SAFE(c, strict) \ 1.764 + (!(strict) ? \ 1.765 + (uint32_t)(c)<=0x10ffff : \ 1.766 + UTF_IS_UNICODE_CHAR(c)) 1.767 + 1.768 +/* 1.769 + * For the semantics of all of these macros, see utf16.h. 1.770 + * The UTF-32 versions are trivial because any code point is 1.771 + * encoded using exactly one code unit. 1.772 + */ 1.773 + 1.774 +/* single-code point definitions -------------------------------------------- */ 1.775 + 1.776 +/* classes of code unit values */ 1.777 + 1.778 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.779 +#define UTF32_IS_SINGLE(uchar) 1 1.780 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.781 +#define UTF32_IS_LEAD(uchar) 0 1.782 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.783 +#define UTF32_IS_TRAIL(uchar) 0 1.784 + 1.785 +/* number of code units per code point */ 1.786 + 1.787 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.788 +#define UTF32_NEED_MULTIPLE_UCHAR(c) 0 1.789 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.790 +#define UTF32_CHAR_LENGTH(c) 1 1.791 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.792 +#define UTF32_MAX_CHAR_LENGTH 1 1.793 + 1.794 +/* average number of code units compared to UTF-16 */ 1.795 + 1.796 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.797 +#define UTF32_ARRAY_SIZE(size) (size) 1.798 + 1.799 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.800 +#define UTF32_GET_CHAR_UNSAFE(s, i, c) { \ 1.801 + (c)=(s)[i]; \ 1.802 +} 1.803 + 1.804 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.805 +#define UTF32_GET_CHAR_SAFE(s, start, i, length, c, strict) { \ 1.806 + (c)=(s)[i]; \ 1.807 + if(!UTF32_IS_SAFE(c, strict)) { \ 1.808 + (c)=UTF_ERROR_VALUE; \ 1.809 + } \ 1.810 +} 1.811 + 1.812 +/* definitions with forward iteration --------------------------------------- */ 1.813 + 1.814 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.815 +#define UTF32_NEXT_CHAR_UNSAFE(s, i, c) { \ 1.816 + (c)=(s)[(i)++]; \ 1.817 +} 1.818 + 1.819 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.820 +#define UTF32_APPEND_CHAR_UNSAFE(s, i, c) { \ 1.821 + (s)[(i)++]=(c); \ 1.822 +} 1.823 + 1.824 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.825 +#define UTF32_FWD_1_UNSAFE(s, i) { \ 1.826 + ++(i); \ 1.827 +} 1.828 + 1.829 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.830 +#define UTF32_FWD_N_UNSAFE(s, i, n) { \ 1.831 + (i)+=(n); \ 1.832 +} 1.833 + 1.834 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.835 +#define UTF32_SET_CHAR_START_UNSAFE(s, i) { \ 1.836 +} 1.837 + 1.838 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.839 +#define UTF32_NEXT_CHAR_SAFE(s, i, length, c, strict) { \ 1.840 + (c)=(s)[(i)++]; \ 1.841 + if(!UTF32_IS_SAFE(c, strict)) { \ 1.842 + (c)=UTF_ERROR_VALUE; \ 1.843 + } \ 1.844 +} 1.845 + 1.846 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.847 +#define UTF32_APPEND_CHAR_SAFE(s, i, length, c) { \ 1.848 + if((uint32_t)(c)<=0x10ffff) { \ 1.849 + (s)[(i)++]=(c); \ 1.850 + } else /* c>0x10ffff, write 0xfffd */ { \ 1.851 + (s)[(i)++]=0xfffd; \ 1.852 + } \ 1.853 +} 1.854 + 1.855 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.856 +#define UTF32_FWD_1_SAFE(s, i, length) { \ 1.857 + ++(i); \ 1.858 +} 1.859 + 1.860 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.861 +#define UTF32_FWD_N_SAFE(s, i, length, n) { \ 1.862 + if(((i)+=(n))>(length)) { \ 1.863 + (i)=(length); \ 1.864 + } \ 1.865 +} 1.866 + 1.867 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.868 +#define UTF32_SET_CHAR_START_SAFE(s, start, i) { \ 1.869 +} 1.870 + 1.871 +/* definitions with backward iteration -------------------------------------- */ 1.872 + 1.873 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.874 +#define UTF32_PREV_CHAR_UNSAFE(s, i, c) { \ 1.875 + (c)=(s)[--(i)]; \ 1.876 +} 1.877 + 1.878 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.879 +#define UTF32_BACK_1_UNSAFE(s, i) { \ 1.880 + --(i); \ 1.881 +} 1.882 + 1.883 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.884 +#define UTF32_BACK_N_UNSAFE(s, i, n) { \ 1.885 + (i)-=(n); \ 1.886 +} 1.887 + 1.888 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.889 +#define UTF32_SET_CHAR_LIMIT_UNSAFE(s, i) { \ 1.890 +} 1.891 + 1.892 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.893 +#define UTF32_PREV_CHAR_SAFE(s, start, i, c, strict) { \ 1.894 + (c)=(s)[--(i)]; \ 1.895 + if(!UTF32_IS_SAFE(c, strict)) { \ 1.896 + (c)=UTF_ERROR_VALUE; \ 1.897 + } \ 1.898 +} 1.899 + 1.900 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.901 +#define UTF32_BACK_1_SAFE(s, start, i) { \ 1.902 + --(i); \ 1.903 +} 1.904 + 1.905 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.906 +#define UTF32_BACK_N_SAFE(s, start, i, n) { \ 1.907 + (i)-=(n); \ 1.908 + if((i)<(start)) { \ 1.909 + (i)=(start); \ 1.910 + } \ 1.911 +} 1.912 + 1.913 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */ 1.914 +#define UTF32_SET_CHAR_LIMIT_SAFE(s, i, length) { \ 1.915 +} 1.916 + 1.917 +/* Formerly utf.h, part 2 --------------------------------------------------- */ 1.918 + 1.919 +/** 1.920 + * Estimate the number of code units for a string based on the number of UTF-16 code units. 1.921 + * 1.922 + * @deprecated ICU 2.4. Obsolete, see utf_old.h. 1.923 + */ 1.924 +#define UTF_ARRAY_SIZE(size) UTF16_ARRAY_SIZE(size) 1.925 + 1.926 +/** @deprecated ICU 2.4. Renamed to U16_GET_UNSAFE, see utf_old.h. */ 1.927 +#define UTF_GET_CHAR_UNSAFE(s, i, c) UTF16_GET_CHAR_UNSAFE(s, i, c) 1.928 + 1.929 +/** @deprecated ICU 2.4. Use U16_GET instead, see utf_old.h. */ 1.930 +#define UTF_GET_CHAR_SAFE(s, start, i, length, c, strict) UTF16_GET_CHAR_SAFE(s, start, i, length, c, strict) 1.931 + 1.932 + 1.933 +/** @deprecated ICU 2.4. Renamed to U16_NEXT_UNSAFE, see utf_old.h. */ 1.934 +#define UTF_NEXT_CHAR_UNSAFE(s, i, c) UTF16_NEXT_CHAR_UNSAFE(s, i, c) 1.935 + 1.936 +/** @deprecated ICU 2.4. Use U16_NEXT instead, see utf_old.h. */ 1.937 +#define UTF_NEXT_CHAR_SAFE(s, i, length, c, strict) UTF16_NEXT_CHAR_SAFE(s, i, length, c, strict) 1.938 + 1.939 + 1.940 +/** @deprecated ICU 2.4. Renamed to U16_APPEND_UNSAFE, see utf_old.h. */ 1.941 +#define UTF_APPEND_CHAR_UNSAFE(s, i, c) UTF16_APPEND_CHAR_UNSAFE(s, i, c) 1.942 + 1.943 +/** @deprecated ICU 2.4. Use U16_APPEND instead, see utf_old.h. */ 1.944 +#define UTF_APPEND_CHAR_SAFE(s, i, length, c) UTF16_APPEND_CHAR_SAFE(s, i, length, c) 1.945 + 1.946 + 1.947 +/** @deprecated ICU 2.4. Renamed to U16_FWD_1_UNSAFE, see utf_old.h. */ 1.948 +#define UTF_FWD_1_UNSAFE(s, i) UTF16_FWD_1_UNSAFE(s, i) 1.949 + 1.950 +/** @deprecated ICU 2.4. Renamed to U16_FWD_1, see utf_old.h. */ 1.951 +#define UTF_FWD_1_SAFE(s, i, length) UTF16_FWD_1_SAFE(s, i, length) 1.952 + 1.953 + 1.954 +/** @deprecated ICU 2.4. Renamed to U16_FWD_N_UNSAFE, see utf_old.h. */ 1.955 +#define UTF_FWD_N_UNSAFE(s, i, n) UTF16_FWD_N_UNSAFE(s, i, n) 1.956 + 1.957 +/** @deprecated ICU 2.4. Renamed to U16_FWD_N, see utf_old.h. */ 1.958 +#define UTF_FWD_N_SAFE(s, i, length, n) UTF16_FWD_N_SAFE(s, i, length, n) 1.959 + 1.960 + 1.961 +/** @deprecated ICU 2.4. Renamed to U16_SET_CP_START_UNSAFE, see utf_old.h. */ 1.962 +#define UTF_SET_CHAR_START_UNSAFE(s, i) UTF16_SET_CHAR_START_UNSAFE(s, i) 1.963 + 1.964 +/** @deprecated ICU 2.4. Renamed to U16_SET_CP_START, see utf_old.h. */ 1.965 +#define UTF_SET_CHAR_START_SAFE(s, start, i) UTF16_SET_CHAR_START_SAFE(s, start, i) 1.966 + 1.967 + 1.968 +/** @deprecated ICU 2.4. Renamed to U16_PREV_UNSAFE, see utf_old.h. */ 1.969 +#define UTF_PREV_CHAR_UNSAFE(s, i, c) UTF16_PREV_CHAR_UNSAFE(s, i, c) 1.970 + 1.971 +/** @deprecated ICU 2.4. Use U16_PREV instead, see utf_old.h. */ 1.972 +#define UTF_PREV_CHAR_SAFE(s, start, i, c, strict) UTF16_PREV_CHAR_SAFE(s, start, i, c, strict) 1.973 + 1.974 + 1.975 +/** @deprecated ICU 2.4. Renamed to U16_BACK_1_UNSAFE, see utf_old.h. */ 1.976 +#define UTF_BACK_1_UNSAFE(s, i) UTF16_BACK_1_UNSAFE(s, i) 1.977 + 1.978 +/** @deprecated ICU 2.4. Renamed to U16_BACK_1, see utf_old.h. */ 1.979 +#define UTF_BACK_1_SAFE(s, start, i) UTF16_BACK_1_SAFE(s, start, i) 1.980 + 1.981 + 1.982 +/** @deprecated ICU 2.4. Renamed to U16_BACK_N_UNSAFE, see utf_old.h. */ 1.983 +#define UTF_BACK_N_UNSAFE(s, i, n) UTF16_BACK_N_UNSAFE(s, i, n) 1.984 + 1.985 +/** @deprecated ICU 2.4. Renamed to U16_BACK_N, see utf_old.h. */ 1.986 +#define UTF_BACK_N_SAFE(s, start, i, n) UTF16_BACK_N_SAFE(s, start, i, n) 1.987 + 1.988 + 1.989 +/** @deprecated ICU 2.4. Renamed to U16_SET_CP_LIMIT_UNSAFE, see utf_old.h. */ 1.990 +#define UTF_SET_CHAR_LIMIT_UNSAFE(s, i) UTF16_SET_CHAR_LIMIT_UNSAFE(s, i) 1.991 + 1.992 +/** @deprecated ICU 2.4. Renamed to U16_SET_CP_LIMIT, see utf_old.h. */ 1.993 +#define UTF_SET_CHAR_LIMIT_SAFE(s, start, i, length) UTF16_SET_CHAR_LIMIT_SAFE(s, start, i, length) 1.994 + 1.995 +/* Define default macros (UTF-16 "safe") ------------------------------------ */ 1.996 + 1.997 +/** 1.998 + * Does this code unit alone encode a code point (BMP, not a surrogate)? 1.999 + * Same as UTF16_IS_SINGLE. 1.1000 + * @deprecated ICU 2.4. Renamed to U_IS_SINGLE and U16_IS_SINGLE, see utf_old.h. 1.1001 + */ 1.1002 +#define UTF_IS_SINGLE(uchar) U16_IS_SINGLE(uchar) 1.1003 + 1.1004 +/** 1.1005 + * Is this code unit the first one of several (a lead surrogate)? 1.1006 + * Same as UTF16_IS_LEAD. 1.1007 + * @deprecated ICU 2.4. Renamed to U_IS_LEAD and U16_IS_LEAD, see utf_old.h. 1.1008 + */ 1.1009 +#define UTF_IS_LEAD(uchar) U16_IS_LEAD(uchar) 1.1010 + 1.1011 +/** 1.1012 + * Is this code unit one of several but not the first one (a trail surrogate)? 1.1013 + * Same as UTF16_IS_TRAIL. 1.1014 + * @deprecated ICU 2.4. Renamed to U_IS_TRAIL and U16_IS_TRAIL, see utf_old.h. 1.1015 + */ 1.1016 +#define UTF_IS_TRAIL(uchar) U16_IS_TRAIL(uchar) 1.1017 + 1.1018 +/** 1.1019 + * Does this code point require multiple code units (is it a supplementary code point)? 1.1020 + * Same as UTF16_NEED_MULTIPLE_UCHAR. 1.1021 + * @deprecated ICU 2.4. Use U16_LENGTH or test ((uint32_t)(c)>0xffff) instead. 1.1022 + */ 1.1023 +#define UTF_NEED_MULTIPLE_UCHAR(c) UTF16_NEED_MULTIPLE_UCHAR(c) 1.1024 + 1.1025 +/** 1.1026 + * How many code units are used to encode this code point (1 or 2)? 1.1027 + * Same as UTF16_CHAR_LENGTH. 1.1028 + * @deprecated ICU 2.4. Renamed to U16_LENGTH, see utf_old.h. 1.1029 + */ 1.1030 +#define UTF_CHAR_LENGTH(c) U16_LENGTH(c) 1.1031 + 1.1032 +/** 1.1033 + * How many code units are used at most for any Unicode code point (2)? 1.1034 + * Same as UTF16_MAX_CHAR_LENGTH. 1.1035 + * @deprecated ICU 2.4. Renamed to U16_MAX_LENGTH, see utf_old.h. 1.1036 + */ 1.1037 +#define UTF_MAX_CHAR_LENGTH U16_MAX_LENGTH 1.1038 + 1.1039 +/** 1.1040 + * Set c to the code point that contains the code unit i. 1.1041 + * i could point to the lead or the trail surrogate for the code point. 1.1042 + * i is not modified. 1.1043 + * Same as UTF16_GET_CHAR. 1.1044 + * \pre 0<=i<length 1.1045 + * 1.1046 + * @deprecated ICU 2.4. Renamed to U16_GET, see utf_old.h. 1.1047 + */ 1.1048 +#define UTF_GET_CHAR(s, start, i, length, c) U16_GET(s, start, i, length, c) 1.1049 + 1.1050 +/** 1.1051 + * Set c to the code point that starts at code unit i 1.1052 + * and advance i to beyond the code units of this code point (post-increment). 1.1053 + * i must point to the first code unit of a code point. 1.1054 + * Otherwise c is set to the trail unit (surrogate) itself. 1.1055 + * Same as UTF16_NEXT_CHAR. 1.1056 + * \pre 0<=i<length 1.1057 + * \post 0<i<=length 1.1058 + * 1.1059 + * @deprecated ICU 2.4. Renamed to U16_NEXT, see utf_old.h. 1.1060 + */ 1.1061 +#define UTF_NEXT_CHAR(s, i, length, c) U16_NEXT(s, i, length, c) 1.1062 + 1.1063 +/** 1.1064 + * Append the code units of code point c to the string at index i 1.1065 + * and advance i to beyond the new code units (post-increment). 1.1066 + * The code units beginning at index i will be overwritten. 1.1067 + * Same as UTF16_APPEND_CHAR. 1.1068 + * \pre 0<=c<=0x10ffff 1.1069 + * \pre 0<=i<length 1.1070 + * \post 0<i<=length 1.1071 + * 1.1072 + * @deprecated ICU 2.4. Use U16_APPEND instead, see utf_old.h. 1.1073 + */ 1.1074 +#define UTF_APPEND_CHAR(s, i, length, c) UTF16_APPEND_CHAR_SAFE(s, i, length, c) 1.1075 + 1.1076 +/** 1.1077 + * Advance i to beyond the code units of the code point that begins at i. 1.1078 + * I.e., advance i by one code point. 1.1079 + * Same as UTF16_FWD_1. 1.1080 + * \pre 0<=i<length 1.1081 + * \post 0<i<=length 1.1082 + * 1.1083 + * @deprecated ICU 2.4. Renamed to U16_FWD_1, see utf_old.h. 1.1084 + */ 1.1085 +#define UTF_FWD_1(s, i, length) U16_FWD_1(s, i, length) 1.1086 + 1.1087 +/** 1.1088 + * Advance i to beyond the code units of the n code points where the first one begins at i. 1.1089 + * I.e., advance i by n code points. 1.1090 + * Same as UT16_FWD_N. 1.1091 + * \pre 0<=i<length 1.1092 + * \post 0<i<=length 1.1093 + * 1.1094 + * @deprecated ICU 2.4. Renamed to U16_FWD_N, see utf_old.h. 1.1095 + */ 1.1096 +#define UTF_FWD_N(s, i, length, n) U16_FWD_N(s, i, length, n) 1.1097 + 1.1098 +/** 1.1099 + * Take the random-access index i and adjust it so that it points to the beginning 1.1100 + * of a code point. 1.1101 + * The input index points to any code unit of a code point and is moved to point to 1.1102 + * the first code unit of the same code point. i is never incremented. 1.1103 + * In other words, if i points to a trail surrogate that is preceded by a matching 1.1104 + * lead surrogate, then i is decremented. Otherwise it is not modified. 1.1105 + * This can be used to start an iteration with UTF_NEXT_CHAR() from a random index. 1.1106 + * Same as UTF16_SET_CHAR_START. 1.1107 + * \pre start<=i<length 1.1108 + * \post start<=i<length 1.1109 + * 1.1110 + * @deprecated ICU 2.4. Renamed to U16_SET_CP_START, see utf_old.h. 1.1111 + */ 1.1112 +#define UTF_SET_CHAR_START(s, start, i) U16_SET_CP_START(s, start, i) 1.1113 + 1.1114 +/** 1.1115 + * Set c to the code point that has code units before i 1.1116 + * and move i backward (towards the beginning of the string) 1.1117 + * to the first code unit of this code point (pre-increment). 1.1118 + * i must point to the first code unit after the last unit of a code point (i==length is allowed). 1.1119 + * Same as UTF16_PREV_CHAR. 1.1120 + * \pre start<i<=length 1.1121 + * \post start<=i<length 1.1122 + * 1.1123 + * @deprecated ICU 2.4. Renamed to U16_PREV, see utf_old.h. 1.1124 + */ 1.1125 +#define UTF_PREV_CHAR(s, start, i, c) U16_PREV(s, start, i, c) 1.1126 + 1.1127 +/** 1.1128 + * Move i backward (towards the beginning of the string) 1.1129 + * to the first code unit of the code point that has code units before i. 1.1130 + * I.e., move i backward by one code point. 1.1131 + * i must point to the first code unit after the last unit of a code point (i==length is allowed). 1.1132 + * Same as UTF16_BACK_1. 1.1133 + * \pre start<i<=length 1.1134 + * \post start<=i<length 1.1135 + * 1.1136 + * @deprecated ICU 2.4. Renamed to U16_BACK_1, see utf_old.h. 1.1137 + */ 1.1138 +#define UTF_BACK_1(s, start, i) U16_BACK_1(s, start, i) 1.1139 + 1.1140 +/** 1.1141 + * Move i backward (towards the beginning of the string) 1.1142 + * to the first code unit of the n code points that have code units before i. 1.1143 + * I.e., move i backward by n code points. 1.1144 + * i must point to the first code unit after the last unit of a code point (i==length is allowed). 1.1145 + * Same as UTF16_BACK_N. 1.1146 + * \pre start<i<=length 1.1147 + * \post start<=i<length 1.1148 + * 1.1149 + * @deprecated ICU 2.4. Renamed to U16_BACK_N, see utf_old.h. 1.1150 + */ 1.1151 +#define UTF_BACK_N(s, start, i, n) U16_BACK_N(s, start, i, n) 1.1152 + 1.1153 +/** 1.1154 + * Take the random-access index i and adjust it so that it points beyond 1.1155 + * a code point. The input index points beyond any code unit 1.1156 + * of a code point and is moved to point beyond the last code unit of the same 1.1157 + * code point. i is never decremented. 1.1158 + * In other words, if i points to a trail surrogate that is preceded by a matching 1.1159 + * lead surrogate, then i is incremented. Otherwise it is not modified. 1.1160 + * This can be used to start an iteration with UTF_PREV_CHAR() from a random index. 1.1161 + * Same as UTF16_SET_CHAR_LIMIT. 1.1162 + * \pre start<i<=length 1.1163 + * \post start<i<=length 1.1164 + * 1.1165 + * @deprecated ICU 2.4. Renamed to U16_SET_CP_LIMIT, see utf_old.h. 1.1166 + */ 1.1167 +#define UTF_SET_CHAR_LIMIT(s, start, i, length) U16_SET_CP_LIMIT(s, start, i, length) 1.1168 + 1.1169 +#endif /* U_HIDE_DEPRECATED_API */ 1.1170 + 1.1171 +#endif 1.1172 +