The Tor Browser: diff intl/icu/source/common/unicode/utf

     1.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     1.2 +++ b/intl/icu/source/common/unicode/utf_old.h	Wed Dec 31 06:09:35 2014 +0100
     1.3 @@ -0,0 +1,1169 @@
     1.4 +/*
     1.5 +*******************************************************************************
     1.6 +*
     1.7 +*   Copyright (C) 2002-2012, International Business Machines
     1.8 +*   Corporation and others.  All Rights Reserved.
     1.9 +*
    1.10 +*******************************************************************************
    1.11 +*   file name:  utf_old.h
    1.12 +*   encoding:   US-ASCII
    1.13 +*   tab size:   8 (not used)
    1.14 +*   indentation:4
    1.15 +*
    1.16 +*   created on: 2002sep21
    1.17 +*   created by: Markus W. Scherer
    1.18 +*/
    1.19 +
    1.20 +/**
    1.21 + * \file 
    1.22 + * \brief C API: Deprecated macros for Unicode string handling
    1.23 + */
    1.24 +
    1.25 +/**
    1.26 + * 
    1.27 + * The macros in utf_old.h are all deprecated and their use discouraged.
    1.28 + * Some of the design principles behind the set of UTF macros
    1.29 + * have changed or proved impractical.
    1.30 + * Almost all of the old "UTF macros" are at least renamed.
    1.31 + * If you are looking for a new equivalent to an old macro, please see the
    1.32 + * comment at the old one.
    1.33 + *
    1.34 + * Brief summary of reasons for deprecation:
    1.35 + * - Switch on UTF_SIZE (selection of UTF-8/16/32 default string processing)
    1.36 + *   was impractical.
    1.37 + * - Switch on UTF_SAFE etc. (selection of unsafe/safe/strict default string processing)
    1.38 + *   was of little use and impractical.
    1.39 + * - Whole classes of macros became obsolete outside of the UTF_SIZE/UTF_SAFE
    1.40 + *   selection framework: UTF32_ macros (all trivial)
    1.41 + *   and UTF_ default and intermediate macros (all aliases).
    1.42 + * - The selection framework also caused many macro aliases.
    1.43 + * - Change in Unicode standard: "irregular" sequences (3.0) became illegal (3.2).
    1.44 + * - Change of language in Unicode standard:
    1.45 + *   Growing distinction between internal x-bit Unicode strings and external UTF-x
    1.46 + *   forms, with the former more lenient.
    1.47 + *   Suggests renaming of UTF16_ macros to U16_.
    1.48 + * - The prefix "UTF_" without a width number confused some users.
    1.49 + * - "Safe" append macros needed the addition of an error indicator output.
    1.50 + * - "Safe" UTF-8 macros used legitimate (if rarely used) code point values
    1.51 + *   to indicate error conditions.
    1.52 + * - The use of the "_CHAR" infix for code point operations confused some users.
    1.53 + *
    1.54 + * More details:
    1.55 + *
    1.56 + * Until ICU 2.2, utf.h theoretically allowed to choose among UTF-8/16/32
    1.57 + * for string processing, and among unsafe/safe/strict default macros for that.
    1.58 + *
    1.59 + * It proved nearly impossible to write non-trivial, high-performance code
    1.60 + * that is UTF-generic.
    1.61 + * Unsafe default macros would be dangerous for default string processing,
    1.62 + * and the main reason for the "strict" versions disappeared:
    1.63 + * Between Unicode 3.0 and 3.2 all "irregular" UTF-8 sequences became illegal.
    1.64 + * The only other conditions that "strict" checked for were non-characters,
    1.65 + * which are valid during processing. Only during text input/output should they
    1.66 + * be checked, and at that time other well-formedness checks may be
    1.67 + * necessary or useful as well.
    1.68 + * This can still be done by using U16_NEXT and U_IS_UNICODE_NONCHAR
    1.69 + * or U_IS_UNICODE_CHAR.
    1.70 + *
    1.71 + * The old UTF8_..._SAFE macros also used some normal Unicode code points
    1.72 + * to indicate malformed sequences.
    1.73 + * The new UTF8_ macros without suffix use negative values instead.
    1.74 + *
    1.75 + * The entire contents of utf32.h was moved here without replacement
    1.76 + * because all those macros were trivial and
    1.77 + * were meaningful only in the framework of choosing the UTF size.
    1.78 + *
    1.79 + * See Jitterbug 2150 and its discussion on the ICU mailing list
    1.80 + * in September 2002.
    1.81 + *
    1.82 + * <hr>
    1.83 + *
    1.84 + * <em>Obsolete part</em> of pre-ICU 2.4 utf.h file documentation:
    1.85 + *
    1.86 + * <p>The original concept for these files was for ICU to allow
    1.87 + * in principle to set which UTF (UTF-8/16/32) is used internally
    1.88 + * by defining UTF_SIZE to either 8, 16, or 32. utf.h would then define the UChar type
    1.89 + * accordingly. UTF-16 was the default.</p>
    1.90 + *
    1.91 + * <p>This concept has been abandoned.
    1.92 + * A lot of the ICU source code assumes UChar strings are in UTF-16.
    1.93 + * This is especially true for low-level code like
    1.94 + * conversion, normalization, and collation.
    1.95 + * The utf.h header enforces the default of UTF-16.
    1.96 + * The UTF-8 and UTF-32 macros remain for now for completeness and backward compatibility.</p>
    1.97 + *
    1.98 + * <p>Accordingly, utf.h defines UChar to be an unsigned 16-bit integer. If this matches wchar_t, then
    1.99 + * UChar is defined to be exactly wchar_t, otherwise uint16_t.</p>
   1.100 + *
   1.101 + * <p>UChar32 is defined to be a signed 32-bit integer (int32_t), large enough for a 21-bit
   1.102 + * Unicode code point (Unicode scalar value, 0..0x10ffff).
   1.103 + * Before ICU 2.4, the definition of UChar32 was similarly platform-dependent as
   1.104 + * the definition of UChar. For details see the documentation for UChar32 itself.</p>
   1.105 + *
   1.106 + * <p>utf.h also defines a number of C macros for handling single Unicode code points and
   1.107 + * for using UTF Unicode strings. It includes utf8.h, utf16.h, and utf32.h for the actual
   1.108 + * implementations of those macros and then aliases one set of them (for UTF-16) for general use.
   1.109 + * The UTF-specific macros have the UTF size in the macro name prefixes (UTF16_...), while
   1.110 + * the general alias macros always begin with UTF_...</p>
   1.111 + *
   1.112 + * <p>Many string operations can be done with or without error checking.
   1.113 + * Where such a distinction is useful, there are two versions of the macros, "unsafe" and "safe"
   1.114 + * ones with ..._UNSAFE and ..._SAFE suffixes. The unsafe macros are fast but may cause
   1.115 + * program failures if the strings are not well-formed. The safe macros have an additional, boolean
   1.116 + * parameter "strict". If strict is FALSE, then only illegal sequences are detected.
   1.117 + * Otherwise, irregular sequences and non-characters are detected as well (like single surrogates).
   1.118 + * Safe macros return special error code points for illegal/irregular sequences:
   1.119 + * Typically, U+ffff, or values that would result in a code unit sequence of the same length
   1.120 + * as the erroneous input sequence.<br>
   1.121 + * Note that _UNSAFE macros have fewer parameters: They do not have the strictness parameter, and
   1.122 + * they do not have start/length parameters for boundary checking.</p>
   1.123 + *
   1.124 + * <p>Here, the macros are aliased in two steps:
   1.125 + * In the first step, the UTF-specific macros with UTF16_ prefix and _UNSAFE and _SAFE suffixes are
   1.126 + * aliased according to the UTF_SIZE to macros with UTF_ prefix and the same suffixes and signatures.
   1.127 + * Then, in a second step, the default, general alias macros are set to use either the unsafe or
   1.128 + * the safe/not strict (default) or the safe/strict macro;
   1.129 + * these general macros do not have a strictness parameter.</p>
   1.130 + *
   1.131 + * <p>It is possible to change the default choice for the general alias macros to be unsafe, safe/not strict or safe/strict.
   1.132 + * The default is safe/not strict. It is not recommended to select the unsafe macros as the basis for
   1.133 + * Unicode string handling in ICU! To select this, define UTF_SAFE, UTF_STRICT, or UTF_UNSAFE.</p>
   1.134 + *
   1.135 + * <p>For general use, one should use the default, general macros with UTF_ prefix and no _SAFE/_UNSAFE suffix.
   1.136 + * Only in some cases it may be necessary to control the choice of macro directly and use a less generic alias.
   1.137 + * For example, if it can be assumed that a string is well-formed and the index will stay within the bounds,
   1.138 + * then the _UNSAFE version may be used.
   1.139 + * If a UTF-8 string is to be processed, then the macros with UTF8_ prefixes need to be used.</p>
   1.140 + *
   1.141 + * <hr>
   1.142 + *
   1.143 + * @deprecated ICU 2.4. Use the macros in utf.h, utf16.h, utf8.h instead.
   1.144 + */
   1.145 +
   1.146 +#ifndef __UTF_OLD_H__
   1.147 +#define __UTF_OLD_H__
   1.148 +
   1.149 +#ifndef U_HIDE_DEPRECATED_API
   1.150 +
   1.151 +#include "unicode/utf.h"
   1.152 +#include "unicode/utf8.h"
   1.153 +#include "unicode/utf16.h"
   1.154 +
   1.155 +/* Formerly utf.h, part 1 --------------------------------------------------- */
   1.156 +
   1.157 +#ifdef U_USE_UTF_DEPRECATES
   1.158 +/**
   1.159 + * Unicode string and array offset and index type.
   1.160 + * ICU always counts Unicode code units (UChars) for
   1.161 + * string offsets, indexes, and lengths, not Unicode code points.
   1.162 + *
   1.163 + * @obsolete ICU 2.6. Use int32_t directly instead since this API will be removed in that release.
   1.164 + */
   1.165 +typedef int32_t UTextOffset;
   1.166 +#endif
   1.167 +
   1.168 +/** Number of bits in a Unicode string code unit - ICU uses 16-bit Unicode. @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.169 +#define UTF_SIZE 16
   1.170 +
   1.171 +/**
   1.172 + * The default choice for general Unicode string macros is to use the ..._SAFE macro implementations
   1.173 + * with strict=FALSE.
   1.174 + *
   1.175 + * @deprecated ICU 2.4. Obsolete, see utf_old.h.
   1.176 + */
   1.177 +#define UTF_SAFE
   1.178 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.179 +#undef UTF_UNSAFE
   1.180 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.181 +#undef UTF_STRICT
   1.182 +
   1.183 +/**
   1.184 + * UTF8_ERROR_VALUE_1 and UTF8_ERROR_VALUE_2 are special error values for UTF-8,
   1.185 + * which need 1 or 2 bytes in UTF-8:
   1.186 + * \code
   1.187 + * U+0015 = NAK = Negative Acknowledge, C0 control character
   1.188 + * U+009f = highest C1 control character
   1.189 + * \endcode
   1.190 + *
   1.191 + * These are used by UTF8_..._SAFE macros so that they can return an error value
   1.192 + * that needs the same number of code units (bytes) as were seen by
   1.193 + * a macro. They should be tested with UTF_IS_ERROR() or UTF_IS_VALID().
   1.194 + *
   1.195 + * @deprecated ICU 2.4. Obsolete, see utf_old.h.
   1.196 + */
   1.197 +#define UTF8_ERROR_VALUE_1 0x15
   1.198 +
   1.199 +/**
   1.200 + * See documentation on UTF8_ERROR_VALUE_1 for details.
   1.201 + *
   1.202 + * @deprecated ICU 2.4. Obsolete, see utf_old.h.
   1.203 + */
   1.204 +#define UTF8_ERROR_VALUE_2 0x9f
   1.205 +
   1.206 +/**
   1.207 + * Error value for all UTFs. This code point value will be set by macros with error
   1.208 + * checking if an error is detected.
   1.209 + *
   1.210 + * @deprecated ICU 2.4. Obsolete, see utf_old.h.
   1.211 + */
   1.212 +#define UTF_ERROR_VALUE 0xffff
   1.213 +
   1.214 +/**
   1.215 + * Is a given 32-bit code an error value
   1.216 + * as returned by one of the macros for any UTF?
   1.217 + *
   1.218 + * @deprecated ICU 2.4. Obsolete, see utf_old.h.
   1.219 + */
   1.220 +#define UTF_IS_ERROR(c) \
   1.221 +    (((c)&0xfffe)==0xfffe || (c)==UTF8_ERROR_VALUE_1 || (c)==UTF8_ERROR_VALUE_2)
   1.222 +
   1.223 +/**
   1.224 + * This is a combined macro: Is c a valid Unicode value _and_ not an error code?
   1.225 + *
   1.226 + * @deprecated ICU 2.4. Obsolete, see utf_old.h.
   1.227 + */
   1.228 +#define UTF_IS_VALID(c) \
   1.229 +    (UTF_IS_UNICODE_CHAR(c) && \
   1.230 +     (c)!=UTF8_ERROR_VALUE_1 && (c)!=UTF8_ERROR_VALUE_2)
   1.231 +
   1.232 +/**
   1.233 + * Is this code unit or code point a surrogate (U+d800..U+dfff)?
   1.234 + * @deprecated ICU 2.4. Renamed to U_IS_SURROGATE and U16_IS_SURROGATE, see utf_old.h.
   1.235 + */
   1.236 +#define UTF_IS_SURROGATE(uchar) (((uchar)&0xfffff800)==0xd800)
   1.237 +
   1.238 +/**
   1.239 + * Is a given 32-bit code point a Unicode noncharacter?
   1.240 + *
   1.241 + * @deprecated ICU 2.4. Renamed to U_IS_UNICODE_NONCHAR, see utf_old.h.
   1.242 + */
   1.243 +#define UTF_IS_UNICODE_NONCHAR(c) \
   1.244 +    ((c)>=0xfdd0 && \
   1.245 +     ((uint32_t)(c)<=0xfdef || ((c)&0xfffe)==0xfffe) && \
   1.246 +     (uint32_t)(c)<=0x10ffff)
   1.247 +
   1.248 +/**
   1.249 + * Is a given 32-bit value a Unicode code point value (0..U+10ffff)
   1.250 + * that can be assigned a character?
   1.251 + *
   1.252 + * Code points that are not characters include:
   1.253 + * - single surrogate code points (U+d800..U+dfff, 2048 code points)
   1.254 + * - the last two code points on each plane (U+__fffe and U+__ffff, 34 code points)
   1.255 + * - U+fdd0..U+fdef (new with Unicode 3.1, 32 code points)
   1.256 + * - the highest Unicode code point value is U+10ffff
   1.257 + *
   1.258 + * This means that all code points below U+d800 are character code points,
   1.259 + * and that boundary is tested first for performance.
   1.260 + *
   1.261 + * @deprecated ICU 2.4. Renamed to U_IS_UNICODE_CHAR, see utf_old.h.
   1.262 + */
   1.263 +#define UTF_IS_UNICODE_CHAR(c) \
   1.264 +    ((uint32_t)(c)<0xd800 || \
   1.265 +        ((uint32_t)(c)>0xdfff && \
   1.266 +         (uint32_t)(c)<=0x10ffff && \
   1.267 +         !UTF_IS_UNICODE_NONCHAR(c)))
   1.268 +
   1.269 +/* Formerly utf8.h ---------------------------------------------------------- */
   1.270 +
   1.271 +/**
   1.272 + * Count the trail bytes for a UTF-8 lead byte.
   1.273 + * @deprecated ICU 2.4. Renamed to U8_COUNT_TRAIL_BYTES, see utf_old.h.
   1.274 + */
   1.275 +#define UTF8_COUNT_TRAIL_BYTES(leadByte) (utf8_countTrailBytes[(uint8_t)leadByte])
   1.276 +
   1.277 +/**
   1.278 + * Mask a UTF-8 lead byte, leave only the lower bits that form part of the code point value.
   1.279 + * @deprecated ICU 2.4. Renamed to U8_MASK_LEAD_BYTE, see utf_old.h.
   1.280 + */
   1.281 +#define UTF8_MASK_LEAD_BYTE(leadByte, countTrailBytes) ((leadByte)&=(1<<(6-(countTrailBytes)))-1)
   1.282 +
   1.283 +/** Is this this code point a single code unit (byte)? @deprecated ICU 2.4. Renamed to U8_IS_SINGLE, see utf_old.h. */
   1.284 +#define UTF8_IS_SINGLE(uchar) (((uchar)&0x80)==0)
   1.285 +/** Is this this code unit the lead code unit (byte) of a code point? @deprecated ICU 2.4. Renamed to U8_IS_LEAD, see utf_old.h. */
   1.286 +#define UTF8_IS_LEAD(uchar) ((uint8_t)((uchar)-0xc0)<0x3e)
   1.287 +/** Is this this code unit a trailing code unit (byte) of a code point? @deprecated ICU 2.4. Renamed to U8_IS_TRAIL, see utf_old.h. */
   1.288 +#define UTF8_IS_TRAIL(uchar) (((uchar)&0xc0)==0x80)
   1.289 +
   1.290 +/** Does this scalar Unicode value need multiple code units for storage? @deprecated ICU 2.4. Use U8_LENGTH or test ((uint32_t)(c)>0x7f) instead, see utf_old.h. */
   1.291 +#define UTF8_NEED_MULTIPLE_UCHAR(c) ((uint32_t)(c)>0x7f)
   1.292 +
   1.293 +/**
   1.294 + * Given the lead character, how many bytes are taken by this code point.
   1.295 + * ICU does not deal with code points >0x10ffff
   1.296 + * unless necessary for advancing in the byte stream.
   1.297 + *
   1.298 + * These length macros take into account that for values >0x10ffff
   1.299 + * the UTF8_APPEND_CHAR_SAFE macros would write the error code point 0xffff
   1.300 + * with 3 bytes.
   1.301 + * Code point comparisons need to be in uint32_t because UChar32
   1.302 + * may be a signed type, and negative values must be recognized.
   1.303 + *
   1.304 + * @deprecated ICU 2.4. Use U8_LENGTH instead, see utf.h.
   1.305 + */
   1.306 +#if 1
   1.307 +#   define UTF8_CHAR_LENGTH(c) \
   1.308 +        ((uint32_t)(c)<=0x7f ? 1 : \
   1.309 +            ((uint32_t)(c)<=0x7ff ? 2 : \
   1.310 +                ((uint32_t)((c)-0x10000)>0xfffff ? 3 : 4) \
   1.311 +            ) \
   1.312 +        )
   1.313 +#else
   1.314 +#   define UTF8_CHAR_LENGTH(c) \
   1.315 +        ((uint32_t)(c)<=0x7f ? 1 : \
   1.316 +            ((uint32_t)(c)<=0x7ff ? 2 : \
   1.317 +                ((uint32_t)(c)<=0xffff ? 3 : \
   1.318 +                    ((uint32_t)(c)<=0x10ffff ? 4 : \
   1.319 +                        ((uint32_t)(c)<=0x3ffffff ? 5 : \
   1.320 +                            ((uint32_t)(c)<=0x7fffffff ? 6 : 3) \
   1.321 +                        ) \
   1.322 +                    ) \
   1.323 +                ) \
   1.324 +            ) \
   1.325 +        )
   1.326 +#endif
   1.327 +
   1.328 +/** The maximum number of bytes per code point. @deprecated ICU 2.4. Renamed to U8_MAX_LENGTH, see utf_old.h. */
   1.329 +#define UTF8_MAX_CHAR_LENGTH 4
   1.330 +
   1.331 +/** Average number of code units compared to UTF-16. @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.332 +#define UTF8_ARRAY_SIZE(size) ((5*(size))/2)
   1.333 +
   1.334 +/** @deprecated ICU 2.4. Renamed to U8_GET_UNSAFE, see utf_old.h. */
   1.335 +#define UTF8_GET_CHAR_UNSAFE(s, i, c) { \
   1.336 +    int32_t _utf8_get_char_unsafe_index=(int32_t)(i); \
   1.337 +    UTF8_SET_CHAR_START_UNSAFE(s, _utf8_get_char_unsafe_index); \
   1.338 +    UTF8_NEXT_CHAR_UNSAFE(s, _utf8_get_char_unsafe_index, c); \
   1.339 +}
   1.340 +
   1.341 +/** @deprecated ICU 2.4. Use U8_GET instead, see utf_old.h. */
   1.342 +#define UTF8_GET_CHAR_SAFE(s, start, i, length, c, strict) { \
   1.343 +    int32_t _utf8_get_char_safe_index=(int32_t)(i); \
   1.344 +    UTF8_SET_CHAR_START_SAFE(s, start, _utf8_get_char_safe_index); \
   1.345 +    UTF8_NEXT_CHAR_SAFE(s, _utf8_get_char_safe_index, length, c, strict); \
   1.346 +}
   1.347 +
   1.348 +/** @deprecated ICU 2.4. Renamed to U8_NEXT_UNSAFE, see utf_old.h. */
   1.349 +#define UTF8_NEXT_CHAR_UNSAFE(s, i, c) { \
   1.350 +    (c)=(s)[(i)++]; \
   1.351 +    if((uint8_t)((c)-0xc0)<0x35) { \
   1.352 +        uint8_t __count=UTF8_COUNT_TRAIL_BYTES(c); \
   1.353 +        UTF8_MASK_LEAD_BYTE(c, __count); \
   1.354 +        switch(__count) { \
   1.355 +        /* each following branch falls through to the next one */ \
   1.356 +        case 3: \
   1.357 +            (c)=((c)<<6)|((s)[(i)++]&0x3f); \
   1.358 +        case 2: \
   1.359 +            (c)=((c)<<6)|((s)[(i)++]&0x3f); \
   1.360 +        case 1: \
   1.361 +            (c)=((c)<<6)|((s)[(i)++]&0x3f); \
   1.362 +        /* no other branches to optimize switch() */ \
   1.363 +            break; \
   1.364 +        } \
   1.365 +    } \
   1.366 +}
   1.367 +
   1.368 +/** @deprecated ICU 2.4. Renamed to U8_APPEND_UNSAFE, see utf_old.h. */
   1.369 +#define UTF8_APPEND_CHAR_UNSAFE(s, i, c) { \
   1.370 +    if((uint32_t)(c)<=0x7f) { \
   1.371 +        (s)[(i)++]=(uint8_t)(c); \
   1.372 +    } else { \
   1.373 +        if((uint32_t)(c)<=0x7ff) { \
   1.374 +            (s)[(i)++]=(uint8_t)(((c)>>6)|0xc0); \
   1.375 +        } else { \
   1.376 +            if((uint32_t)(c)<=0xffff) { \
   1.377 +                (s)[(i)++]=(uint8_t)(((c)>>12)|0xe0); \
   1.378 +            } else { \
   1.379 +                (s)[(i)++]=(uint8_t)(((c)>>18)|0xf0); \
   1.380 +                (s)[(i)++]=(uint8_t)((((c)>>12)&0x3f)|0x80); \
   1.381 +            } \
   1.382 +            (s)[(i)++]=(uint8_t)((((c)>>6)&0x3f)|0x80); \
   1.383 +        } \
   1.384 +        (s)[(i)++]=(uint8_t)(((c)&0x3f)|0x80); \
   1.385 +    } \
   1.386 +}
   1.387 +
   1.388 +/** @deprecated ICU 2.4. Renamed to U8_FWD_1_UNSAFE, see utf_old.h. */
   1.389 +#define UTF8_FWD_1_UNSAFE(s, i) { \
   1.390 +    (i)+=1+UTF8_COUNT_TRAIL_BYTES((s)[i]); \
   1.391 +}
   1.392 +
   1.393 +/** @deprecated ICU 2.4. Renamed to U8_FWD_N_UNSAFE, see utf_old.h. */
   1.394 +#define UTF8_FWD_N_UNSAFE(s, i, n) { \
   1.395 +    int32_t __N=(n); \
   1.396 +    while(__N>0) { \
   1.397 +        UTF8_FWD_1_UNSAFE(s, i); \
   1.398 +        --__N; \
   1.399 +    } \
   1.400 +}
   1.401 +
   1.402 +/** @deprecated ICU 2.4. Renamed to U8_SET_CP_START_UNSAFE, see utf_old.h. */
   1.403 +#define UTF8_SET_CHAR_START_UNSAFE(s, i) { \
   1.404 +    while(UTF8_IS_TRAIL((s)[i])) { --(i); } \
   1.405 +}
   1.406 +
   1.407 +/** @deprecated ICU 2.4. Use U8_NEXT instead, see utf_old.h. */
   1.408 +#define UTF8_NEXT_CHAR_SAFE(s, i, length, c, strict) { \
   1.409 +    (c)=(s)[(i)++]; \
   1.410 +    if((c)>=0x80) { \
   1.411 +        if(UTF8_IS_LEAD(c)) { \
   1.412 +            (c)=utf8_nextCharSafeBody(s, &(i), (int32_t)(length), c, strict); \
   1.413 +        } else { \
   1.414 +            (c)=UTF8_ERROR_VALUE_1; \
   1.415 +        } \
   1.416 +    } \
   1.417 +}
   1.418 +
   1.419 +/** @deprecated ICU 2.4. Use U8_APPEND instead, see utf_old.h. */
   1.420 +#define UTF8_APPEND_CHAR_SAFE(s, i, length, c)  { \
   1.421 +    if((uint32_t)(c)<=0x7f) { \
   1.422 +        (s)[(i)++]=(uint8_t)(c); \
   1.423 +    } else { \
   1.424 +        (i)=utf8_appendCharSafeBody(s, (int32_t)(i), (int32_t)(length), c, NULL); \
   1.425 +    } \
   1.426 +}
   1.427 +
   1.428 +/** @deprecated ICU 2.4. Renamed to U8_FWD_1, see utf_old.h. */
   1.429 +#define UTF8_FWD_1_SAFE(s, i, length) U8_FWD_1(s, i, length)
   1.430 +
   1.431 +/** @deprecated ICU 2.4. Renamed to U8_FWD_N, see utf_old.h. */
   1.432 +#define UTF8_FWD_N_SAFE(s, i, length, n) U8_FWD_N(s, i, length, n)
   1.433 +
   1.434 +/** @deprecated ICU 2.4. Renamed to U8_SET_CP_START, see utf_old.h. */
   1.435 +#define UTF8_SET_CHAR_START_SAFE(s, start, i) U8_SET_CP_START(s, start, i)
   1.436 +
   1.437 +/** @deprecated ICU 2.4. Renamed to U8_PREV_UNSAFE, see utf_old.h. */
   1.438 +#define UTF8_PREV_CHAR_UNSAFE(s, i, c) { \
   1.439 +    (c)=(s)[--(i)]; \
   1.440 +    if(UTF8_IS_TRAIL(c)) { \
   1.441 +        uint8_t __b, __count=1, __shift=6; \
   1.442 +\
   1.443 +        /* c is a trail byte */ \
   1.444 +        (c)&=0x3f; \
   1.445 +        for(;;) { \
   1.446 +            __b=(s)[--(i)]; \
   1.447 +            if(__b>=0xc0) { \
   1.448 +                UTF8_MASK_LEAD_BYTE(__b, __count); \
   1.449 +                (c)|=(UChar32)__b<<__shift; \
   1.450 +                break; \
   1.451 +            } else { \
   1.452 +                (c)|=(UChar32)(__b&0x3f)<<__shift; \
   1.453 +                ++__count; \
   1.454 +                __shift+=6; \
   1.455 +            } \
   1.456 +        } \
   1.457 +    } \
   1.458 +}
   1.459 +
   1.460 +/** @deprecated ICU 2.4. Renamed to U8_BACK_1_UNSAFE, see utf_old.h. */
   1.461 +#define UTF8_BACK_1_UNSAFE(s, i) { \
   1.462 +    while(UTF8_IS_TRAIL((s)[--(i)])) {} \
   1.463 +}
   1.464 +
   1.465 +/** @deprecated ICU 2.4. Renamed to U8_BACK_N_UNSAFE, see utf_old.h. */
   1.466 +#define UTF8_BACK_N_UNSAFE(s, i, n) { \
   1.467 +    int32_t __N=(n); \
   1.468 +    while(__N>0) { \
   1.469 +        UTF8_BACK_1_UNSAFE(s, i); \
   1.470 +        --__N; \
   1.471 +    } \
   1.472 +}
   1.473 +
   1.474 +/** @deprecated ICU 2.4. Renamed to U8_SET_CP_LIMIT_UNSAFE, see utf_old.h. */
   1.475 +#define UTF8_SET_CHAR_LIMIT_UNSAFE(s, i) { \
   1.476 +    UTF8_BACK_1_UNSAFE(s, i); \
   1.477 +    UTF8_FWD_1_UNSAFE(s, i); \
   1.478 +}
   1.479 +
   1.480 +/** @deprecated ICU 2.4. Use U8_PREV instead, see utf_old.h. */
   1.481 +#define UTF8_PREV_CHAR_SAFE(s, start, i, c, strict) { \
   1.482 +    (c)=(s)[--(i)]; \
   1.483 +    if((c)>=0x80) { \
   1.484 +        if((c)<=0xbf) { \
   1.485 +            (c)=utf8_prevCharSafeBody(s, start, &(i), c, strict); \
   1.486 +        } else { \
   1.487 +            (c)=UTF8_ERROR_VALUE_1; \
   1.488 +        } \
   1.489 +    } \
   1.490 +}
   1.491 +
   1.492 +/** @deprecated ICU 2.4. Renamed to U8_BACK_1, see utf_old.h. */
   1.493 +#define UTF8_BACK_1_SAFE(s, start, i) U8_BACK_1(s, start, i)
   1.494 +
   1.495 +/** @deprecated ICU 2.4. Renamed to U8_BACK_N, see utf_old.h. */
   1.496 +#define UTF8_BACK_N_SAFE(s, start, i, n) U8_BACK_N(s, start, i, n)
   1.497 +
   1.498 +/** @deprecated ICU 2.4. Renamed to U8_SET_CP_LIMIT, see utf_old.h. */
   1.499 +#define UTF8_SET_CHAR_LIMIT_SAFE(s, start, i, length) U8_SET_CP_LIMIT(s, start, i, length)
   1.500 +
   1.501 +/* Formerly utf16.h --------------------------------------------------------- */
   1.502 +
   1.503 +/** Is uchar a first/lead surrogate? @deprecated ICU 2.4. Renamed to U_IS_LEAD and U16_IS_LEAD, see utf_old.h. */
   1.504 +#define UTF_IS_FIRST_SURROGATE(uchar) (((uchar)&0xfffffc00)==0xd800)
   1.505 +
   1.506 +/** Is uchar a second/trail surrogate? @deprecated ICU 2.4. Renamed to U_IS_TRAIL and U16_IS_TRAIL, see utf_old.h. */
   1.507 +#define UTF_IS_SECOND_SURROGATE(uchar) (((uchar)&0xfffffc00)==0xdc00)
   1.508 +
   1.509 +/** Assuming c is a surrogate, is it a first/lead surrogate? @deprecated ICU 2.4. Renamed to U_IS_SURROGATE_LEAD and U16_IS_SURROGATE_LEAD, see utf_old.h. */
   1.510 +#define UTF_IS_SURROGATE_FIRST(c) (((c)&0x400)==0)
   1.511 +
   1.512 +/** Helper constant for UTF16_GET_PAIR_VALUE. @deprecated ICU 2.4. Renamed to U16_SURROGATE_OFFSET, see utf_old.h. */
   1.513 +#define UTF_SURROGATE_OFFSET ((0xd800<<10UL)+0xdc00-0x10000)
   1.514 +
   1.515 +/** Get the UTF-32 value from the surrogate code units. @deprecated ICU 2.4. Renamed to U16_GET_SUPPLEMENTARY, see utf_old.h. */
   1.516 +#define UTF16_GET_PAIR_VALUE(first, second) \
   1.517 +    (((first)<<10UL)+(second)-UTF_SURROGATE_OFFSET)
   1.518 +
   1.519 +/** @deprecated ICU 2.4. Renamed to U16_LEAD, see utf_old.h. */
   1.520 +#define UTF_FIRST_SURROGATE(supplementary) (UChar)(((supplementary)>>10)+0xd7c0)
   1.521 +
   1.522 +/** @deprecated ICU 2.4. Renamed to U16_TRAIL, see utf_old.h. */
   1.523 +#define UTF_SECOND_SURROGATE(supplementary) (UChar)(((supplementary)&0x3ff)|0xdc00)
   1.524 +
   1.525 +/** @deprecated ICU 2.4. Renamed to U16_LEAD, see utf_old.h. */
   1.526 +#define UTF16_LEAD(supplementary) UTF_FIRST_SURROGATE(supplementary)
   1.527 +
   1.528 +/** @deprecated ICU 2.4. Renamed to U16_TRAIL, see utf_old.h. */
   1.529 +#define UTF16_TRAIL(supplementary) UTF_SECOND_SURROGATE(supplementary)
   1.530 +
   1.531 +/** @deprecated ICU 2.4. Renamed to U16_IS_SINGLE, see utf_old.h. */
   1.532 +#define UTF16_IS_SINGLE(uchar) !UTF_IS_SURROGATE(uchar)
   1.533 +
   1.534 +/** @deprecated ICU 2.4. Renamed to U16_IS_LEAD, see utf_old.h. */
   1.535 +#define UTF16_IS_LEAD(uchar) UTF_IS_FIRST_SURROGATE(uchar)
   1.536 +
   1.537 +/** @deprecated ICU 2.4. Renamed to U16_IS_TRAIL, see utf_old.h. */
   1.538 +#define UTF16_IS_TRAIL(uchar) UTF_IS_SECOND_SURROGATE(uchar)
   1.539 +
   1.540 +/** Does this scalar Unicode value need multiple code units for storage? @deprecated ICU 2.4. Use U16_LENGTH or test ((uint32_t)(c)>0xffff) instead, see utf_old.h. */
   1.541 +#define UTF16_NEED_MULTIPLE_UCHAR(c) ((uint32_t)(c)>0xffff)
   1.542 +
   1.543 +/** @deprecated ICU 2.4. Renamed to U16_LENGTH, see utf_old.h. */
   1.544 +#define UTF16_CHAR_LENGTH(c) ((uint32_t)(c)<=0xffff ? 1 : 2)
   1.545 +
   1.546 +/** @deprecated ICU 2.4. Renamed to U16_MAX_LENGTH, see utf_old.h. */
   1.547 +#define UTF16_MAX_CHAR_LENGTH 2
   1.548 +
   1.549 +/** Average number of code units compared to UTF-16. @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.550 +#define UTF16_ARRAY_SIZE(size) (size)
   1.551 +
   1.552 +/**
   1.553 + * Get a single code point from an offset that points to any
   1.554 + * of the code units that belong to that code point.
   1.555 + * Assume 0<=i<length.
   1.556 + *
   1.557 + * This could be used for iteration together with
   1.558 + * UTF16_CHAR_LENGTH() and UTF_IS_ERROR(),
   1.559 + * but the use of UTF16_NEXT_CHAR[_UNSAFE]() and
   1.560 + * UTF16_PREV_CHAR[_UNSAFE]() is more efficient for that.
   1.561 + * @deprecated ICU 2.4. Renamed to U16_GET_UNSAFE, see utf_old.h.
   1.562 + */
   1.563 +#define UTF16_GET_CHAR_UNSAFE(s, i, c) { \
   1.564 +    (c)=(s)[i]; \
   1.565 +    if(UTF_IS_SURROGATE(c)) { \
   1.566 +        if(UTF_IS_SURROGATE_FIRST(c)) { \
   1.567 +            (c)=UTF16_GET_PAIR_VALUE((c), (s)[(i)+1]); \
   1.568 +        } else { \
   1.569 +            (c)=UTF16_GET_PAIR_VALUE((s)[(i)-1], (c)); \
   1.570 +        } \
   1.571 +    } \
   1.572 +}
   1.573 +
   1.574 +/** @deprecated ICU 2.4. Use U16_GET instead, see utf_old.h. */
   1.575 +#define UTF16_GET_CHAR_SAFE(s, start, i, length, c, strict) { \
   1.576 +    (c)=(s)[i]; \
   1.577 +    if(UTF_IS_SURROGATE(c)) { \
   1.578 +        uint16_t __c2; \
   1.579 +        if(UTF_IS_SURROGATE_FIRST(c)) { \
   1.580 +            if((i)+1<(length) && UTF_IS_SECOND_SURROGATE(__c2=(s)[(i)+1])) { \
   1.581 +                (c)=UTF16_GET_PAIR_VALUE((c), __c2); \
   1.582 +                /* strict: ((c)&0xfffe)==0xfffe is caught by UTF_IS_ERROR() and UTF_IS_UNICODE_CHAR() */ \
   1.583 +            } else if(strict) {\
   1.584 +                /* unmatched first surrogate */ \
   1.585 +                (c)=UTF_ERROR_VALUE; \
   1.586 +            } \
   1.587 +        } else { \
   1.588 +            if((i)-1>=(start) && UTF_IS_FIRST_SURROGATE(__c2=(s)[(i)-1])) { \
   1.589 +                (c)=UTF16_GET_PAIR_VALUE(__c2, (c)); \
   1.590 +                /* strict: ((c)&0xfffe)==0xfffe is caught by UTF_IS_ERROR() and UTF_IS_UNICODE_CHAR() */ \
   1.591 +            } else if(strict) {\
   1.592 +                /* unmatched second surrogate */ \
   1.593 +                (c)=UTF_ERROR_VALUE; \
   1.594 +            } \
   1.595 +        } \
   1.596 +    } else if((strict) && !UTF_IS_UNICODE_CHAR(c)) { \
   1.597 +        (c)=UTF_ERROR_VALUE; \
   1.598 +    } \
   1.599 +}
   1.600 +
   1.601 +/** @deprecated ICU 2.4. Renamed to U16_NEXT_UNSAFE, see utf_old.h. */
   1.602 +#define UTF16_NEXT_CHAR_UNSAFE(s, i, c) { \
   1.603 +    (c)=(s)[(i)++]; \
   1.604 +    if(UTF_IS_FIRST_SURROGATE(c)) { \
   1.605 +        (c)=UTF16_GET_PAIR_VALUE((c), (s)[(i)++]); \
   1.606 +    } \
   1.607 +}
   1.608 +
   1.609 +/** @deprecated ICU 2.4. Renamed to U16_APPEND_UNSAFE, see utf_old.h. */
   1.610 +#define UTF16_APPEND_CHAR_UNSAFE(s, i, c) { \
   1.611 +    if((uint32_t)(c)<=0xffff) { \
   1.612 +        (s)[(i)++]=(uint16_t)(c); \
   1.613 +    } else { \
   1.614 +        (s)[(i)++]=(uint16_t)(((c)>>10)+0xd7c0); \
   1.615 +        (s)[(i)++]=(uint16_t)(((c)&0x3ff)|0xdc00); \
   1.616 +    } \
   1.617 +}
   1.618 +
   1.619 +/** @deprecated ICU 2.4. Renamed to U16_FWD_1_UNSAFE, see utf_old.h. */
   1.620 +#define UTF16_FWD_1_UNSAFE(s, i) { \
   1.621 +    if(UTF_IS_FIRST_SURROGATE((s)[(i)++])) { \
   1.622 +        ++(i); \
   1.623 +    } \
   1.624 +}
   1.625 +
   1.626 +/** @deprecated ICU 2.4. Renamed to U16_FWD_N_UNSAFE, see utf_old.h. */
   1.627 +#define UTF16_FWD_N_UNSAFE(s, i, n) { \
   1.628 +    int32_t __N=(n); \
   1.629 +    while(__N>0) { \
   1.630 +        UTF16_FWD_1_UNSAFE(s, i); \
   1.631 +        --__N; \
   1.632 +    } \
   1.633 +}
   1.634 +
   1.635 +/** @deprecated ICU 2.4. Renamed to U16_SET_CP_START_UNSAFE, see utf_old.h. */
   1.636 +#define UTF16_SET_CHAR_START_UNSAFE(s, i) { \
   1.637 +    if(UTF_IS_SECOND_SURROGATE((s)[i])) { \
   1.638 +        --(i); \
   1.639 +    } \
   1.640 +}
   1.641 +
   1.642 +/** @deprecated ICU 2.4. Use U16_NEXT instead, see utf_old.h. */
   1.643 +#define UTF16_NEXT_CHAR_SAFE(s, i, length, c, strict) { \
   1.644 +    (c)=(s)[(i)++]; \
   1.645 +    if(UTF_IS_FIRST_SURROGATE(c)) { \
   1.646 +        uint16_t __c2; \
   1.647 +        if((i)<(length) && UTF_IS_SECOND_SURROGATE(__c2=(s)[(i)])) { \
   1.648 +            ++(i); \
   1.649 +            (c)=UTF16_GET_PAIR_VALUE((c), __c2); \
   1.650 +            /* strict: ((c)&0xfffe)==0xfffe is caught by UTF_IS_ERROR() and UTF_IS_UNICODE_CHAR() */ \
   1.651 +        } else if(strict) {\
   1.652 +            /* unmatched first surrogate */ \
   1.653 +            (c)=UTF_ERROR_VALUE; \
   1.654 +        } \
   1.655 +    } else if((strict) && !UTF_IS_UNICODE_CHAR(c)) { \
   1.656 +        /* unmatched second surrogate or other non-character */ \
   1.657 +        (c)=UTF_ERROR_VALUE; \
   1.658 +    } \
   1.659 +}
   1.660 +
   1.661 +/** @deprecated ICU 2.4. Use U16_APPEND instead, see utf_old.h. */
   1.662 +#define UTF16_APPEND_CHAR_SAFE(s, i, length, c) { \
   1.663 +    if((uint32_t)(c)<=0xffff) { \
   1.664 +        (s)[(i)++]=(uint16_t)(c); \
   1.665 +    } else if((uint32_t)(c)<=0x10ffff) { \
   1.666 +        if((i)+1<(length)) { \
   1.667 +            (s)[(i)++]=(uint16_t)(((c)>>10)+0xd7c0); \
   1.668 +            (s)[(i)++]=(uint16_t)(((c)&0x3ff)|0xdc00); \
   1.669 +        } else /* not enough space */ { \
   1.670 +            (s)[(i)++]=UTF_ERROR_VALUE; \
   1.671 +        } \
   1.672 +    } else /* c>0x10ffff, write error value */ { \
   1.673 +        (s)[(i)++]=UTF_ERROR_VALUE; \
   1.674 +    } \
   1.675 +}
   1.676 +
   1.677 +/** @deprecated ICU 2.4. Renamed to U16_FWD_1, see utf_old.h. */
   1.678 +#define UTF16_FWD_1_SAFE(s, i, length) U16_FWD_1(s, i, length)
   1.679 +
   1.680 +/** @deprecated ICU 2.4. Renamed to U16_FWD_N, see utf_old.h. */
   1.681 +#define UTF16_FWD_N_SAFE(s, i, length, n) U16_FWD_N(s, i, length, n)
   1.682 +
   1.683 +/** @deprecated ICU 2.4. Renamed to U16_SET_CP_START, see utf_old.h. */
   1.684 +#define UTF16_SET_CHAR_START_SAFE(s, start, i) U16_SET_CP_START(s, start, i)
   1.685 +
   1.686 +/** @deprecated ICU 2.4. Renamed to U16_PREV_UNSAFE, see utf_old.h. */
   1.687 +#define UTF16_PREV_CHAR_UNSAFE(s, i, c) { \
   1.688 +    (c)=(s)[--(i)]; \
   1.689 +    if(UTF_IS_SECOND_SURROGATE(c)) { \
   1.690 +        (c)=UTF16_GET_PAIR_VALUE((s)[--(i)], (c)); \
   1.691 +    } \
   1.692 +}
   1.693 +
   1.694 +/** @deprecated ICU 2.4. Renamed to U16_BACK_1_UNSAFE, see utf_old.h. */
   1.695 +#define UTF16_BACK_1_UNSAFE(s, i) { \
   1.696 +    if(UTF_IS_SECOND_SURROGATE((s)[--(i)])) { \
   1.697 +        --(i); \
   1.698 +    } \
   1.699 +}
   1.700 +
   1.701 +/** @deprecated ICU 2.4. Renamed to U16_BACK_N_UNSAFE, see utf_old.h. */
   1.702 +#define UTF16_BACK_N_UNSAFE(s, i, n) { \
   1.703 +    int32_t __N=(n); \
   1.704 +    while(__N>0) { \
   1.705 +        UTF16_BACK_1_UNSAFE(s, i); \
   1.706 +        --__N; \
   1.707 +    } \
   1.708 +}
   1.709 +
   1.710 +/** @deprecated ICU 2.4. Renamed to U16_SET_CP_LIMIT_UNSAFE, see utf_old.h. */
   1.711 +#define UTF16_SET_CHAR_LIMIT_UNSAFE(s, i) { \
   1.712 +    if(UTF_IS_FIRST_SURROGATE((s)[(i)-1])) { \
   1.713 +        ++(i); \
   1.714 +    } \
   1.715 +}
   1.716 +
   1.717 +/** @deprecated ICU 2.4. Use U16_PREV instead, see utf_old.h. */
   1.718 +#define UTF16_PREV_CHAR_SAFE(s, start, i, c, strict) { \
   1.719 +    (c)=(s)[--(i)]; \
   1.720 +    if(UTF_IS_SECOND_SURROGATE(c)) { \
   1.721 +        uint16_t __c2; \
   1.722 +        if((i)>(start) && UTF_IS_FIRST_SURROGATE(__c2=(s)[(i)-1])) { \
   1.723 +            --(i); \
   1.724 +            (c)=UTF16_GET_PAIR_VALUE(__c2, (c)); \
   1.725 +            /* strict: ((c)&0xfffe)==0xfffe is caught by UTF_IS_ERROR() and UTF_IS_UNICODE_CHAR() */ \
   1.726 +        } else if(strict) {\
   1.727 +            /* unmatched second surrogate */ \
   1.728 +            (c)=UTF_ERROR_VALUE; \
   1.729 +        } \
   1.730 +    } else if((strict) && !UTF_IS_UNICODE_CHAR(c)) { \
   1.731 +        /* unmatched first surrogate or other non-character */ \
   1.732 +        (c)=UTF_ERROR_VALUE; \
   1.733 +    } \
   1.734 +}
   1.735 +
   1.736 +/** @deprecated ICU 2.4. Renamed to U16_BACK_1, see utf_old.h. */
   1.737 +#define UTF16_BACK_1_SAFE(s, start, i) U16_BACK_1(s, start, i)
   1.738 +
   1.739 +/** @deprecated ICU 2.4. Renamed to U16_BACK_N, see utf_old.h. */
   1.740 +#define UTF16_BACK_N_SAFE(s, start, i, n) U16_BACK_N(s, start, i, n)
   1.741 +
   1.742 +/** @deprecated ICU 2.4. Renamed to U16_SET_CP_LIMIT, see utf_old.h. */
   1.743 +#define UTF16_SET_CHAR_LIMIT_SAFE(s, start, i, length) U16_SET_CP_LIMIT(s, start, i, length)
   1.744 +
   1.745 +/* Formerly utf32.h --------------------------------------------------------- */
   1.746 +
   1.747 +/*
   1.748 +* Old documentation:
   1.749 +*
   1.750 +*   This file defines macros to deal with UTF-32 code units and code points.
   1.751 +*   Signatures and semantics are the same as for the similarly named macros
   1.752 +*   in utf16.h.
   1.753 +*   utf32.h is included by utf.h after unicode/umachine.h</p>
   1.754 +*   and some common definitions.
   1.755 +*   <p><b>Usage:</b>  ICU coding guidelines for if() statements should be followed when using these macros.
   1.756 +*                  Compound statements (curly braces {}) must be used  for if-else-while...
   1.757 +*                  bodies and all macro statements should be terminated with semicolon.</p>
   1.758 +*/
   1.759 +
   1.760 +/* internal definitions ----------------------------------------------------- */
   1.761 +
   1.762 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.763 +#define UTF32_IS_SAFE(c, strict) \
   1.764 +    (!(strict) ? \
   1.765 +        (uint32_t)(c)<=0x10ffff : \
   1.766 +        UTF_IS_UNICODE_CHAR(c))
   1.767 +
   1.768 +/*
   1.769 + * For the semantics of all of these macros, see utf16.h.
   1.770 + * The UTF-32 versions are trivial because any code point is
   1.771 + * encoded using exactly one code unit.
   1.772 + */
   1.773 +
   1.774 +/* single-code point definitions -------------------------------------------- */
   1.775 +
   1.776 +/* classes of code unit values */
   1.777 +
   1.778 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.779 +#define UTF32_IS_SINGLE(uchar) 1
   1.780 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.781 +#define UTF32_IS_LEAD(uchar) 0
   1.782 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.783 +#define UTF32_IS_TRAIL(uchar) 0
   1.784 +
   1.785 +/* number of code units per code point */
   1.786 +
   1.787 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.788 +#define UTF32_NEED_MULTIPLE_UCHAR(c) 0
   1.789 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.790 +#define UTF32_CHAR_LENGTH(c) 1
   1.791 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.792 +#define UTF32_MAX_CHAR_LENGTH 1
   1.793 +
   1.794 +/* average number of code units compared to UTF-16 */
   1.795 +
   1.796 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.797 +#define UTF32_ARRAY_SIZE(size) (size)
   1.798 +
   1.799 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.800 +#define UTF32_GET_CHAR_UNSAFE(s, i, c) { \
   1.801 +    (c)=(s)[i]; \
   1.802 +}
   1.803 +
   1.804 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.805 +#define UTF32_GET_CHAR_SAFE(s, start, i, length, c, strict) { \
   1.806 +    (c)=(s)[i]; \
   1.807 +    if(!UTF32_IS_SAFE(c, strict)) { \
   1.808 +        (c)=UTF_ERROR_VALUE; \
   1.809 +    } \
   1.810 +}
   1.811 +
   1.812 +/* definitions with forward iteration --------------------------------------- */
   1.813 +
   1.814 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.815 +#define UTF32_NEXT_CHAR_UNSAFE(s, i, c) { \
   1.816 +    (c)=(s)[(i)++]; \
   1.817 +}
   1.818 +
   1.819 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.820 +#define UTF32_APPEND_CHAR_UNSAFE(s, i, c) { \
   1.821 +    (s)[(i)++]=(c); \
   1.822 +}
   1.823 +
   1.824 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.825 +#define UTF32_FWD_1_UNSAFE(s, i) { \
   1.826 +    ++(i); \
   1.827 +}
   1.828 +
   1.829 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.830 +#define UTF32_FWD_N_UNSAFE(s, i, n) { \
   1.831 +    (i)+=(n); \
   1.832 +}
   1.833 +
   1.834 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.835 +#define UTF32_SET_CHAR_START_UNSAFE(s, i) { \
   1.836 +}
   1.837 +
   1.838 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.839 +#define UTF32_NEXT_CHAR_SAFE(s, i, length, c, strict) { \
   1.840 +    (c)=(s)[(i)++]; \
   1.841 +    if(!UTF32_IS_SAFE(c, strict)) { \
   1.842 +        (c)=UTF_ERROR_VALUE; \
   1.843 +    } \
   1.844 +}
   1.845 +
   1.846 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.847 +#define UTF32_APPEND_CHAR_SAFE(s, i, length, c) { \
   1.848 +    if((uint32_t)(c)<=0x10ffff) { \
   1.849 +        (s)[(i)++]=(c); \
   1.850 +    } else /* c>0x10ffff, write 0xfffd */ { \
   1.851 +        (s)[(i)++]=0xfffd; \
   1.852 +    } \
   1.853 +}
   1.854 +
   1.855 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.856 +#define UTF32_FWD_1_SAFE(s, i, length) { \
   1.857 +    ++(i); \
   1.858 +}
   1.859 +
   1.860 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.861 +#define UTF32_FWD_N_SAFE(s, i, length, n) { \
   1.862 +    if(((i)+=(n))>(length)) { \
   1.863 +        (i)=(length); \
   1.864 +    } \
   1.865 +}
   1.866 +
   1.867 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.868 +#define UTF32_SET_CHAR_START_SAFE(s, start, i) { \
   1.869 +}
   1.870 +
   1.871 +/* definitions with backward iteration -------------------------------------- */
   1.872 +
   1.873 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.874 +#define UTF32_PREV_CHAR_UNSAFE(s, i, c) { \
   1.875 +    (c)=(s)[--(i)]; \
   1.876 +}
   1.877 +
   1.878 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.879 +#define UTF32_BACK_1_UNSAFE(s, i) { \
   1.880 +    --(i); \
   1.881 +}
   1.882 +
   1.883 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.884 +#define UTF32_BACK_N_UNSAFE(s, i, n) { \
   1.885 +    (i)-=(n); \
   1.886 +}
   1.887 +
   1.888 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.889 +#define UTF32_SET_CHAR_LIMIT_UNSAFE(s, i) { \
   1.890 +}
   1.891 +
   1.892 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.893 +#define UTF32_PREV_CHAR_SAFE(s, start, i, c, strict) { \
   1.894 +    (c)=(s)[--(i)]; \
   1.895 +    if(!UTF32_IS_SAFE(c, strict)) { \
   1.896 +        (c)=UTF_ERROR_VALUE; \
   1.897 +    } \
   1.898 +}
   1.899 +
   1.900 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.901 +#define UTF32_BACK_1_SAFE(s, start, i) { \
   1.902 +    --(i); \
   1.903 +}
   1.904 +
   1.905 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.906 +#define UTF32_BACK_N_SAFE(s, start, i, n) { \
   1.907 +    (i)-=(n); \
   1.908 +    if((i)<(start)) { \
   1.909 +        (i)=(start); \
   1.910 +    } \
   1.911 +}
   1.912 +
   1.913 +/** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
   1.914 +#define UTF32_SET_CHAR_LIMIT_SAFE(s, i, length) { \
   1.915 +}
   1.916 +
   1.917 +/* Formerly utf.h, part 2 --------------------------------------------------- */
   1.918 +
   1.919 +/**
   1.920 + * Estimate the number of code units for a string based on the number of UTF-16 code units.
   1.921 + *
   1.922 + * @deprecated ICU 2.4. Obsolete, see utf_old.h.
   1.923 + */
   1.924 +#define UTF_ARRAY_SIZE(size) UTF16_ARRAY_SIZE(size)
   1.925 +
   1.926 +/** @deprecated ICU 2.4. Renamed to U16_GET_UNSAFE, see utf_old.h. */
   1.927 +#define UTF_GET_CHAR_UNSAFE(s, i, c)                 UTF16_GET_CHAR_UNSAFE(s, i, c)
   1.928 +
   1.929 +/** @deprecated ICU 2.4. Use U16_GET instead, see utf_old.h. */
   1.930 +#define UTF_GET_CHAR_SAFE(s, start, i, length, c, strict) UTF16_GET_CHAR_SAFE(s, start, i, length, c, strict)
   1.931 +
   1.932 +
   1.933 +/** @deprecated ICU 2.4. Renamed to U16_NEXT_UNSAFE, see utf_old.h. */
   1.934 +#define UTF_NEXT_CHAR_UNSAFE(s, i, c)                UTF16_NEXT_CHAR_UNSAFE(s, i, c)
   1.935 +
   1.936 +/** @deprecated ICU 2.4. Use U16_NEXT instead, see utf_old.h. */
   1.937 +#define UTF_NEXT_CHAR_SAFE(s, i, length, c, strict)  UTF16_NEXT_CHAR_SAFE(s, i, length, c, strict)
   1.938 +
   1.939 +
   1.940 +/** @deprecated ICU 2.4. Renamed to U16_APPEND_UNSAFE, see utf_old.h. */
   1.941 +#define UTF_APPEND_CHAR_UNSAFE(s, i, c)              UTF16_APPEND_CHAR_UNSAFE(s, i, c)
   1.942 +
   1.943 +/** @deprecated ICU 2.4. Use U16_APPEND instead, see utf_old.h. */
   1.944 +#define UTF_APPEND_CHAR_SAFE(s, i, length, c)        UTF16_APPEND_CHAR_SAFE(s, i, length, c)
   1.945 +
   1.946 +
   1.947 +/** @deprecated ICU 2.4. Renamed to U16_FWD_1_UNSAFE, see utf_old.h. */
   1.948 +#define UTF_FWD_1_UNSAFE(s, i)                       UTF16_FWD_1_UNSAFE(s, i)
   1.949 +
   1.950 +/** @deprecated ICU 2.4. Renamed to U16_FWD_1, see utf_old.h. */
   1.951 +#define UTF_FWD_1_SAFE(s, i, length)                 UTF16_FWD_1_SAFE(s, i, length)
   1.952 +
   1.953 +
   1.954 +/** @deprecated ICU 2.4. Renamed to U16_FWD_N_UNSAFE, see utf_old.h. */
   1.955 +#define UTF_FWD_N_UNSAFE(s, i, n)                    UTF16_FWD_N_UNSAFE(s, i, n)
   1.956 +
   1.957 +/** @deprecated ICU 2.4. Renamed to U16_FWD_N, see utf_old.h. */
   1.958 +#define UTF_FWD_N_SAFE(s, i, length, n)              UTF16_FWD_N_SAFE(s, i, length, n)
   1.959 +
   1.960 +
   1.961 +/** @deprecated ICU 2.4. Renamed to U16_SET_CP_START_UNSAFE, see utf_old.h. */
   1.962 +#define UTF_SET_CHAR_START_UNSAFE(s, i)              UTF16_SET_CHAR_START_UNSAFE(s, i)
   1.963 +
   1.964 +/** @deprecated ICU 2.4. Renamed to U16_SET_CP_START, see utf_old.h. */
   1.965 +#define UTF_SET_CHAR_START_SAFE(s, start, i)         UTF16_SET_CHAR_START_SAFE(s, start, i)
   1.966 +
   1.967 +
   1.968 +/** @deprecated ICU 2.4. Renamed to U16_PREV_UNSAFE, see utf_old.h. */
   1.969 +#define UTF_PREV_CHAR_UNSAFE(s, i, c)                UTF16_PREV_CHAR_UNSAFE(s, i, c)
   1.970 +
   1.971 +/** @deprecated ICU 2.4. Use U16_PREV instead, see utf_old.h. */
   1.972 +#define UTF_PREV_CHAR_SAFE(s, start, i, c, strict)   UTF16_PREV_CHAR_SAFE(s, start, i, c, strict)
   1.973 +
   1.974 +
   1.975 +/** @deprecated ICU 2.4. Renamed to U16_BACK_1_UNSAFE, see utf_old.h. */
   1.976 +#define UTF_BACK_1_UNSAFE(s, i)                      UTF16_BACK_1_UNSAFE(s, i)
   1.977 +
   1.978 +/** @deprecated ICU 2.4. Renamed to U16_BACK_1, see utf_old.h. */
   1.979 +#define UTF_BACK_1_SAFE(s, start, i)                 UTF16_BACK_1_SAFE(s, start, i)
   1.980 +
   1.981 +
   1.982 +/** @deprecated ICU 2.4. Renamed to U16_BACK_N_UNSAFE, see utf_old.h. */
   1.983 +#define UTF_BACK_N_UNSAFE(s, i, n)                   UTF16_BACK_N_UNSAFE(s, i, n)
   1.984 +
   1.985 +/** @deprecated ICU 2.4. Renamed to U16_BACK_N, see utf_old.h. */
   1.986 +#define UTF_BACK_N_SAFE(s, start, i, n)              UTF16_BACK_N_SAFE(s, start, i, n)
   1.987 +
   1.988 +
   1.989 +/** @deprecated ICU 2.4. Renamed to U16_SET_CP_LIMIT_UNSAFE, see utf_old.h. */
   1.990 +#define UTF_SET_CHAR_LIMIT_UNSAFE(s, i)              UTF16_SET_CHAR_LIMIT_UNSAFE(s, i)
   1.991 +
   1.992 +/** @deprecated ICU 2.4. Renamed to U16_SET_CP_LIMIT, see utf_old.h. */
   1.993 +#define UTF_SET_CHAR_LIMIT_SAFE(s, start, i, length) UTF16_SET_CHAR_LIMIT_SAFE(s, start, i, length)
   1.994 +
   1.995 +/* Define default macros (UTF-16 "safe") ------------------------------------ */
   1.996 +
   1.997 +/**
   1.998 + * Does this code unit alone encode a code point (BMP, not a surrogate)?
   1.999 + * Same as UTF16_IS_SINGLE.
  1.1000 + * @deprecated ICU 2.4. Renamed to U_IS_SINGLE and U16_IS_SINGLE, see utf_old.h.
  1.1001 + */
  1.1002 +#define UTF_IS_SINGLE(uchar) U16_IS_SINGLE(uchar)
  1.1003 +
  1.1004 +/**
  1.1005 + * Is this code unit the first one of several (a lead surrogate)?
  1.1006 + * Same as UTF16_IS_LEAD.
  1.1007 + * @deprecated ICU 2.4. Renamed to U_IS_LEAD and U16_IS_LEAD, see utf_old.h.
  1.1008 + */
  1.1009 +#define UTF_IS_LEAD(uchar) U16_IS_LEAD(uchar)
  1.1010 +
  1.1011 +/**
  1.1012 + * Is this code unit one of several but not the first one (a trail surrogate)?
  1.1013 + * Same as UTF16_IS_TRAIL.
  1.1014 + * @deprecated ICU 2.4. Renamed to U_IS_TRAIL and U16_IS_TRAIL, see utf_old.h.
  1.1015 + */
  1.1016 +#define UTF_IS_TRAIL(uchar) U16_IS_TRAIL(uchar)
  1.1017 +
  1.1018 +/**
  1.1019 + * Does this code point require multiple code units (is it a supplementary code point)?
  1.1020 + * Same as UTF16_NEED_MULTIPLE_UCHAR.
  1.1021 + * @deprecated ICU 2.4. Use U16_LENGTH or test ((uint32_t)(c)>0xffff) instead.
  1.1022 + */
  1.1023 +#define UTF_NEED_MULTIPLE_UCHAR(c) UTF16_NEED_MULTIPLE_UCHAR(c)
  1.1024 +
  1.1025 +/**
  1.1026 + * How many code units are used to encode this code point (1 or 2)?
  1.1027 + * Same as UTF16_CHAR_LENGTH.
  1.1028 + * @deprecated ICU 2.4. Renamed to U16_LENGTH, see utf_old.h.
  1.1029 + */
  1.1030 +#define UTF_CHAR_LENGTH(c) U16_LENGTH(c)
  1.1031 +
  1.1032 +/**
  1.1033 + * How many code units are used at most for any Unicode code point (2)?
  1.1034 + * Same as UTF16_MAX_CHAR_LENGTH.
  1.1035 + * @deprecated ICU 2.4. Renamed to U16_MAX_LENGTH, see utf_old.h.
  1.1036 + */
  1.1037 +#define UTF_MAX_CHAR_LENGTH U16_MAX_LENGTH
  1.1038 +
  1.1039 +/**
  1.1040 + * Set c to the code point that contains the code unit i.
  1.1041 + * i could point to the lead or the trail surrogate for the code point.
  1.1042 + * i is not modified.
  1.1043 + * Same as UTF16_GET_CHAR.
  1.1044 + * \pre 0<=i<length
  1.1045 + *
  1.1046 + * @deprecated ICU 2.4. Renamed to U16_GET, see utf_old.h.
  1.1047 + */
  1.1048 +#define UTF_GET_CHAR(s, start, i, length, c) U16_GET(s, start, i, length, c)
  1.1049 +
  1.1050 +/**
  1.1051 + * Set c to the code point that starts at code unit i
  1.1052 + * and advance i to beyond the code units of this code point (post-increment).
  1.1053 + * i must point to the first code unit of a code point.
  1.1054 + * Otherwise c is set to the trail unit (surrogate) itself.
  1.1055 + * Same as UTF16_NEXT_CHAR.
  1.1056 + * \pre 0<=i<length
  1.1057 + * \post 0<i<=length
  1.1058 + *
  1.1059 + * @deprecated ICU 2.4. Renamed to U16_NEXT, see utf_old.h.
  1.1060 + */
  1.1061 +#define UTF_NEXT_CHAR(s, i, length, c) U16_NEXT(s, i, length, c)
  1.1062 +
  1.1063 +/**
  1.1064 + * Append the code units of code point c to the string at index i
  1.1065 + * and advance i to beyond the new code units (post-increment).
  1.1066 + * The code units beginning at index i will be overwritten.
  1.1067 + * Same as UTF16_APPEND_CHAR.
  1.1068 + * \pre 0<=c<=0x10ffff
  1.1069 + * \pre 0<=i<length
  1.1070 + * \post 0<i<=length
  1.1071 + *
  1.1072 + * @deprecated ICU 2.4. Use U16_APPEND instead, see utf_old.h.
  1.1073 + */
  1.1074 +#define UTF_APPEND_CHAR(s, i, length, c) UTF16_APPEND_CHAR_SAFE(s, i, length, c)
  1.1075 +
  1.1076 +/**
  1.1077 + * Advance i to beyond the code units of the code point that begins at i.
  1.1078 + * I.e., advance i by one code point.
  1.1079 + * Same as UTF16_FWD_1.
  1.1080 + * \pre 0<=i<length
  1.1081 + * \post 0<i<=length
  1.1082 + *
  1.1083 + * @deprecated ICU 2.4. Renamed to U16_FWD_1, see utf_old.h.
  1.1084 + */
  1.1085 +#define UTF_FWD_1(s, i, length) U16_FWD_1(s, i, length)
  1.1086 +
  1.1087 +/**
  1.1088 + * Advance i to beyond the code units of the n code points where the first one begins at i.
  1.1089 + * I.e., advance i by n code points.
  1.1090 + * Same as UT16_FWD_N.
  1.1091 + * \pre 0<=i<length
  1.1092 + * \post 0<i<=length
  1.1093 + *
  1.1094 + * @deprecated ICU 2.4. Renamed to U16_FWD_N, see utf_old.h.
  1.1095 + */
  1.1096 +#define UTF_FWD_N(s, i, length, n) U16_FWD_N(s, i, length, n)
  1.1097 +
  1.1098 +/**
  1.1099 + * Take the random-access index i and adjust it so that it points to the beginning
  1.1100 + * of a code point.
  1.1101 + * The input index points to any code unit of a code point and is moved to point to
  1.1102 + * the first code unit of the same code point. i is never incremented.
  1.1103 + * In other words, if i points to a trail surrogate that is preceded by a matching
  1.1104 + * lead surrogate, then i is decremented. Otherwise it is not modified.
  1.1105 + * This can be used to start an iteration with UTF_NEXT_CHAR() from a random index.
  1.1106 + * Same as UTF16_SET_CHAR_START.
  1.1107 + * \pre start<=i<length
  1.1108 + * \post start<=i<length
  1.1109 + *
  1.1110 + * @deprecated ICU 2.4. Renamed to U16_SET_CP_START, see utf_old.h.
  1.1111 + */
  1.1112 +#define UTF_SET_CHAR_START(s, start, i) U16_SET_CP_START(s, start, i)
  1.1113 +
  1.1114 +/**
  1.1115 + * Set c to the code point that has code units before i
  1.1116 + * and move i backward (towards the beginning of the string)
  1.1117 + * to the first code unit of this code point (pre-increment).
  1.1118 + * i must point to the first code unit after the last unit of a code point (i==length is allowed).
  1.1119 + * Same as UTF16_PREV_CHAR.
  1.1120 + * \pre start<i<=length
  1.1121 + * \post start<=i<length
  1.1122 + *
  1.1123 + * @deprecated ICU 2.4. Renamed to U16_PREV, see utf_old.h.
  1.1124 + */
  1.1125 +#define UTF_PREV_CHAR(s, start, i, c) U16_PREV(s, start, i, c)
  1.1126 +
  1.1127 +/**
  1.1128 + * Move i backward (towards the beginning of the string)
  1.1129 + * to the first code unit of the code point that has code units before i.
  1.1130 + * I.e., move i backward by one code point.
  1.1131 + * i must point to the first code unit after the last unit of a code point (i==length is allowed).
  1.1132 + * Same as UTF16_BACK_1.
  1.1133 + * \pre start<i<=length
  1.1134 + * \post start<=i<length
  1.1135 + *
  1.1136 + * @deprecated ICU 2.4. Renamed to U16_BACK_1, see utf_old.h.
  1.1137 + */
  1.1138 +#define UTF_BACK_1(s, start, i) U16_BACK_1(s, start, i)
  1.1139 +
  1.1140 +/**
  1.1141 + * Move i backward (towards the beginning of the string)
  1.1142 + * to the first code unit of the n code points that have code units before i.
  1.1143 + * I.e., move i backward by n code points.
  1.1144 + * i must point to the first code unit after the last unit of a code point (i==length is allowed).
  1.1145 + * Same as UTF16_BACK_N.
  1.1146 + * \pre start<i<=length
  1.1147 + * \post start<=i<length
  1.1148 + *
  1.1149 + * @deprecated ICU 2.4. Renamed to U16_BACK_N, see utf_old.h.
  1.1150 + */
  1.1151 +#define UTF_BACK_N(s, start, i, n) U16_BACK_N(s, start, i, n)
  1.1152 +
  1.1153 +/**
  1.1154 + * Take the random-access index i and adjust it so that it points beyond
  1.1155 + * a code point. The input index points beyond any code unit
  1.1156 + * of a code point and is moved to point beyond the last code unit of the same
  1.1157 + * code point. i is never decremented.
  1.1158 + * In other words, if i points to a trail surrogate that is preceded by a matching
  1.1159 + * lead surrogate, then i is incremented. Otherwise it is not modified.
  1.1160 + * This can be used to start an iteration with UTF_PREV_CHAR() from a random index.
  1.1161 + * Same as UTF16_SET_CHAR_LIMIT.
  1.1162 + * \pre start<i<=length
  1.1163 + * \post start<i<=length
  1.1164 + *
  1.1165 + * @deprecated ICU 2.4. Renamed to U16_SET_CP_LIMIT, see utf_old.h.
  1.1166 + */
  1.1167 +#define UTF_SET_CHAR_LIMIT(s, start, i, length) U16_SET_CP_LIMIT(s, start, i, length)
  1.1168 +
  1.1169 +#endif /* U_HIDE_DEPRECATED_API */
  1.1170 +
  1.1171 +#endif
  1.1172 +
The Tor Browser / file diff

diff: intl/icu/source/common/unicode/utf_old.h

intl/icu/source/common/unicode/utf_old.h