intl/icu/source/common/rbbirpt.txt

Sat, 03 Jan 2015 20:18:00 +0100

author
Michael Schloh von Bennewitz <michael@schloh.com>
date
Sat, 03 Jan 2015 20:18:00 +0100
branch
TOR_BUG_3246
changeset 7
129ffea94266
permissions
-rw-r--r--

Conditionally enable double key logic according to:
private browsing mode or privacy.thirdparty.isolate preference and
implement in GetCookieStringCommon and FindCookie where it counts...
With some reservations of how to convince FindCookie users to test
condition and pass a nullptr when disabling double key logic.

michael@0 1
michael@0 2 #*****************************************************************************
michael@0 3 #
michael@0 4 # Copyright (C) 2002-2003, International Business Machines Corporation and others.
michael@0 5 # All Rights Reserved.
michael@0 6 #
michael@0 7 #*****************************************************************************
michael@0 8 #
michael@0 9 # file: rbbirpt.txt
michael@0 10 # ICU Break Iterator Rule Parser State Table
michael@0 11 #
michael@0 12 # This state table is used when reading and parsing a set of RBBI rules
michael@0 13 # The rule parser uses a state machine; the data in this file define the
michael@0 14 # state transitions that occur for each input character.
michael@0 15 #
michael@0 16 # *** This file defines the RBBI rule grammar. This is it.
michael@0 17 # *** The determination of what is accepted is here.
michael@0 18 #
michael@0 19 # This file is processed by a perl script "rbbicst.pl" to produce initialized C arrays
michael@0 20 # that are then built with the rule parser.
michael@0 21 #
michael@0 22
michael@0 23 #
michael@0 24 # Here is the syntax of the state definitions in this file:
michael@0 25 #
michael@0 26 #
michael@0 27 #StateName:
michael@0 28 # input-char n next-state ^push-state action
michael@0 29 # input-char n next-state ^push-state action
michael@0 30 # | | | | |
michael@0 31 # | | | | |--- action to be performed by state machine
michael@0 32 # | | | | See function RBBIRuleScanner::doParseActions()
michael@0 33 # | | | |
michael@0 34 # | | | |--- Push this named state onto the state stack.
michael@0 35 # | | | Later, when next state is specified as "pop",
michael@0 36 # | | | the pushed state will become the current state.
michael@0 37 # | | |
michael@0 38 # | | |--- Transition to this state if the current input character matches the input
michael@0 39 # | | character or char class in the left hand column. "pop" causes the next
michael@0 40 # | | state to be popped from the state stack.
michael@0 41 # | |
michael@0 42 # | |--- When making the state transition specified on this line, advance to the next
michael@0 43 # | character from the input only if 'n' appears here.
michael@0 44 # |
michael@0 45 # |--- Character or named character classes to test for. If the current character being scanned
michael@0 46 # matches, peform the actions and go to the state specified on this line.
michael@0 47 # The input character is tested sequentally, in the order written. The characters and
michael@0 48 # character classes tested for do not need to be mutually exclusive. The first match wins.
michael@0 49 #
michael@0 50
michael@0 51
michael@0 52
michael@0 53
michael@0 54 #
michael@0 55 # start state, scan position is at the beginning of the rules file, or in between two rules.
michael@0 56 #
michael@0 57 start:
michael@0 58 escaped term ^break-rule-end doExprStart
michael@0 59 white_space n start
michael@0 60 '$' scan-var-name ^assign-or-rule doExprStart
michael@0 61 '!' n rev-option
michael@0 62 ';' n start # ignore empty rules.
michael@0 63 eof exit
michael@0 64 default term ^break-rule-end doExprStart
michael@0 65
michael@0 66 #
michael@0 67 # break-rule-end: Returned from doing a break-rule expression.
michael@0 68 #
michael@0 69 break-rule-end:
michael@0 70 ';' n start doEndOfRule
michael@0 71 white_space n break-rule-end
michael@0 72 default errorDeath doRuleError
michael@0 73
michael@0 74
michael@0 75 #
michael@0 76 # ! We've just scanned a '!', indicating either a !!key word flag or a
michael@0 77 # !Reverse rule.
michael@0 78 #
michael@0 79 rev-option:
michael@0 80 '!' n option-scan1
michael@0 81 default reverse-rule ^break-rule-end doReverseDir
michael@0 82
michael@0 83 option-scan1:
michael@0 84 name_start_char n option-scan2 doOptionStart
michael@0 85 default errorDeath doRuleError
michael@0 86
michael@0 87 option-scan2:
michael@0 88 name_char n option-scan2
michael@0 89 default option-scan3 doOptionEnd
michael@0 90
michael@0 91 option-scan3:
michael@0 92 ';' n start
michael@0 93 white_space n option-scan3
michael@0 94 default errorDeath doRuleError
michael@0 95
michael@0 96
michael@0 97 reverse-rule:
michael@0 98 default term ^break-rule-end doExprStart
michael@0 99
michael@0 100
michael@0 101 #
michael@0 102 # term. Eat through a single rule character, or a composite thing, which
michael@0 103 # could be a parenthesized expression, a variable name, or a Unicode Set.
michael@0 104 #
michael@0 105 term:
michael@0 106 escaped n expr-mod doRuleChar
michael@0 107 white_space n term
michael@0 108 rule_char n expr-mod doRuleChar
michael@0 109 '[' scan-unicode-set ^expr-mod
michael@0 110 '(' n term ^expr-mod doLParen
michael@0 111 '$' scan-var-name ^term-var-ref
michael@0 112 '.' n expr-mod doDotAny
michael@0 113 default errorDeath doRuleError
michael@0 114
michael@0 115
michael@0 116
michael@0 117 #
michael@0 118 # term-var-ref We've just finished scanning a reference to a $variable.
michael@0 119 # Check that the variable was defined.
michael@0 120 # The variable name scanning is in common with assignment statements,
michael@0 121 # so the check can't be done there.
michael@0 122 term-var-ref:
michael@0 123 default expr-mod doCheckVarDef
michael@0 124
michael@0 125
michael@0 126 #
michael@0 127 # expr-mod We've just finished scanning a term, now look for the optional
michael@0 128 # trailing '*', '?', '+'
michael@0 129 #
michael@0 130 expr-mod:
michael@0 131 white_space n expr-mod
michael@0 132 '*' n expr-cont doUnaryOpStar
michael@0 133 '+' n expr-cont doUnaryOpPlus
michael@0 134 '?' n expr-cont doUnaryOpQuestion
michael@0 135 default expr-cont
michael@0 136
michael@0 137
michael@0 138 #
michael@0 139 # expr-cont Expression, continuation. At a point where additional terms are
michael@0 140 # allowed, but not required.
michael@0 141 #
michael@0 142 expr-cont:
michael@0 143 escaped term doExprCatOperator
michael@0 144 white_space n expr-cont
michael@0 145 rule_char term doExprCatOperator
michael@0 146 '[' term doExprCatOperator
michael@0 147 '(' term doExprCatOperator
michael@0 148 '$' term doExprCatOperator
michael@0 149 '.' term doExprCatOperator
michael@0 150 '/' look-ahead doExprCatOperator
michael@0 151 '{' n tag-open doExprCatOperator
michael@0 152 '|' n term doExprOrOperator
michael@0 153 ')' n pop doExprRParen
michael@0 154 default pop doExprFinished
michael@0 155
michael@0 156
michael@0 157 #
michael@0 158 # look-ahead Scanning a '/', which identifies a break point, assuming that the
michael@0 159 # remainder of the expression matches.
michael@0 160 #
michael@0 161 # Generate a parse tree as if this was a special kind of input symbol
michael@0 162 # appearing in an otherwise normal concatenation expression.
michael@0 163 #
michael@0 164 look-ahead:
michael@0 165 '/' n expr-cont-no-slash doSlash
michael@0 166 default errorDeath
michael@0 167
michael@0 168
michael@0 169 #
michael@0 170 # expr-cont-no-slash Expression, continuation. At a point where additional terms are
michael@0 171 # allowed, but not required. Just like
michael@0 172 # expr-cont, above, except that no '/'
michael@0 173 # look-ahead symbol is permitted.
michael@0 174 #
michael@0 175 expr-cont-no-slash:
michael@0 176 escaped term doExprCatOperator
michael@0 177 white_space n expr-cont
michael@0 178 rule_char term doExprCatOperator
michael@0 179 '[' term doExprCatOperator
michael@0 180 '(' term doExprCatOperator
michael@0 181 '$' term doExprCatOperator
michael@0 182 '.' term doExprCatOperator
michael@0 183 '|' n term doExprOrOperator
michael@0 184 ')' n pop doExprRParen
michael@0 185 default pop doExprFinished
michael@0 186
michael@0 187
michael@0 188 #
michael@0 189 # tags scanning a '{', the opening delimiter for a tag that identifies
michael@0 190 # the kind of match. Scan the whole {dddd} tag, where d=digit
michael@0 191 #
michael@0 192 tag-open:
michael@0 193 white_space n tag-open
michael@0 194 digit_char tag-value doStartTagValue
michael@0 195 default errorDeath doTagExpectedError
michael@0 196
michael@0 197 tag-value:
michael@0 198 white_space n tag-close
michael@0 199 '}' tag-close
michael@0 200 digit_char n tag-value doTagDigit
michael@0 201 default errorDeath doTagExpectedError
michael@0 202
michael@0 203 tag-close:
michael@0 204 white_space n tag-close
michael@0 205 '}' n expr-cont-no-tag doTagValue
michael@0 206 default errorDeath doTagExpectedError
michael@0 207
michael@0 208
michael@0 209
michael@0 210 #
michael@0 211 # expr-cont-no-tag Expression, continuation. At a point where additional terms are
michael@0 212 # allowed, but not required. Just like
michael@0 213 # expr-cont, above, except that no "{ddd}"
michael@0 214 # tagging is permitted.
michael@0 215 #
michael@0 216 expr-cont-no-tag:
michael@0 217 escaped term doExprCatOperator
michael@0 218 white_space n expr-cont-no-tag
michael@0 219 rule_char term doExprCatOperator
michael@0 220 '[' term doExprCatOperator
michael@0 221 '(' term doExprCatOperator
michael@0 222 '$' term doExprCatOperator
michael@0 223 '.' term doExprCatOperator
michael@0 224 '/' look-ahead doExprCatOperator
michael@0 225 '|' n term doExprOrOperator
michael@0 226 ')' n pop doExprRParen
michael@0 227 default pop doExprFinished
michael@0 228
michael@0 229
michael@0 230
michael@0 231
michael@0 232 #
michael@0 233 # Variable Name Scanning.
michael@0 234 #
michael@0 235 # The state that branched to here must have pushed a return state
michael@0 236 # to go to after completion of the variable name scanning.
michael@0 237 #
michael@0 238 # The current input character must be the $ that introduces the name.
michael@0 239 # The $ is consummed here rather than in the state that first detected it
michael@0 240 # so that the doStartVariableName action only needs to happen in one
michael@0 241 # place (here), and the other states don't need to worry about it.
michael@0 242 #
michael@0 243 scan-var-name:
michael@0 244 '$' n scan-var-start doStartVariableName
michael@0 245 default errorDeath
michael@0 246
michael@0 247
michael@0 248 scan-var-start:
michael@0 249 name_start_char n scan-var-body
michael@0 250 default errorDeath doVariableNameExpectedErr
michael@0 251
michael@0 252 scan-var-body:
michael@0 253 name_char n scan-var-body
michael@0 254 default pop doEndVariableName
michael@0 255
michael@0 256
michael@0 257
michael@0 258 #
michael@0 259 # scan-unicode-set Unicode Sets are parsed by the the UnicodeSet class.
michael@0 260 # Within the RBBI parser, after finding the first character
michael@0 261 # of a Unicode Set, we just hand the rule input at that
michael@0 262 # point of to the Unicode Set constructor, then pick
michael@0 263 # up parsing after the close of the set.
michael@0 264 #
michael@0 265 # The action for this state invokes the UnicodeSet parser.
michael@0 266 #
michael@0 267 scan-unicode-set:
michael@0 268 '[' n pop doScanUnicodeSet
michael@0 269 'p' n pop doScanUnicodeSet
michael@0 270 'P' n pop doScanUnicodeSet
michael@0 271 default errorDeath
michael@0 272
michael@0 273
michael@0 274
michael@0 275
michael@0 276
michael@0 277
michael@0 278
michael@0 279 #
michael@0 280 # assign-or-rule. A $variable was encountered at the start of something, could be
michael@0 281 # either an assignment statement or a rule, depending on whether an '='
michael@0 282 # follows the variable name. We get to this state when the variable name
michael@0 283 # scanning does a return.
michael@0 284 #
michael@0 285 assign-or-rule:
michael@0 286 white_space n assign-or-rule
michael@0 287 '=' n term ^assign-end doStartAssign # variable was target of assignment
michael@0 288 default term-var-ref ^break-rule-end # variable was a term in a rule
michael@0 289
michael@0 290
michael@0 291
michael@0 292 #
michael@0 293 # assign-end This state is entered when the end of the expression on the
michael@0 294 # right hand side of an assignment is found. We get here via
michael@0 295 # a pop; this state is pushed when the '=' in an assignment is found.
michael@0 296 #
michael@0 297 # The only thing allowed at this point is a ';'. The RHS of an
michael@0 298 # assignment must look like a rule expression, and we come here
michael@0 299 # when what is being scanned no longer looks like an expression.
michael@0 300 #
michael@0 301 assign-end:
michael@0 302 ';' n start doEndAssign
michael@0 303 default errorDeath doRuleErrorAssignExpr
michael@0 304
michael@0 305
michael@0 306
michael@0 307 #
michael@0 308 # errorDeath. This state is specified as the next state whenever a syntax error
michael@0 309 # in the source rules is detected. Barring bugs, the state machine will never
michael@0 310 # actually get here, but will stop because of the action associated with the error.
michael@0 311 # But, just in case, this state asks the state machine to exit.
michael@0 312 errorDeath:
michael@0 313 default n errorDeath doExit
michael@0 314
michael@0 315

mercurial