parser/htmlparser/tests/mochitest/html5lib_tree_construction/README.md

Wed, 31 Dec 2014 06:09:35 +0100

author
Michael Schloh von Bennewitz <michael@schloh.com>
date
Wed, 31 Dec 2014 06:09:35 +0100
changeset 0
6474c204b198
permissions
-rw-r--r--

Cloned upstream origin tor-browser at tor-browser-31.3.0esr-4.5-1-build1
revision ID fc1c9ff7c1b2defdbc039f12214767608f46423f for hacking purpose.

michael@0 1 Tree Construction Tests
michael@0 2 =======================
michael@0 3
michael@0 4 Each file containing tree construction tests consists of any number of
michael@0 5 tests separated by two newlines (LF) and a single newline before the end
michael@0 6 of the file. For instance:
michael@0 7
michael@0 8 [TEST]LF
michael@0 9 LF
michael@0 10 [TEST]LF
michael@0 11 LF
michael@0 12 [TEST]LF
michael@0 13
michael@0 14 Where [TEST] is the following format:
michael@0 15
michael@0 16 Each test must begin with a string "\#data" followed by a newline (LF).
michael@0 17 All subsequent lines until a line that says "\#errors" are the test data
michael@0 18 and must be passed to the system being tested unchanged, except with the
michael@0 19 final newline (on the last line) removed.
michael@0 20
michael@0 21 Then there must be a line that says "\#errors". It must be followed by
michael@0 22 one line per parse error that a conformant checker would return. It
michael@0 23 doesn't matter what those lines are, although they can't be
michael@0 24 "\#document-fragment", "\#document", or empty, the only thing that
michael@0 25 matters is that there be the right number of parse errors.
michael@0 26
michael@0 27 Then there \*may\* be a line that says "\#document-fragment", which must
michael@0 28 be followed by a newline (LF), followed by a string of characters that
michael@0 29 indicates the context element, followed by a newline (LF). If this line
michael@0 30 is present the "\#data" must be parsed using the HTML fragment parsing
michael@0 31 algorithm with the context element as context.
michael@0 32
michael@0 33 Then there must be a line that says "\#document", which must be followed
michael@0 34 by a dump of the tree of the parsed DOM. Each node must be represented
michael@0 35 by a single line. Each line must start with "| ", followed by two spaces
michael@0 36 per parent node that the node has before the root document node.
michael@0 37
michael@0 38 - Element nodes must be represented by a "`<`" then the *tag name
michael@0 39 string* "`>`", and all the attributes must be given, sorted
michael@0 40 lexicographically by UTF-16 code unit according to their *attribute
michael@0 41 name string*, on subsequent lines, as if they were children of the
michael@0 42 element node.
michael@0 43 - Attribute nodes must have the *attribute name string*, then an "="
michael@0 44 sign, then the attribute value in double quotes (").
michael@0 45 - Text nodes must be the string, in double quotes. Newlines aren't
michael@0 46 escaped.
michael@0 47 - Comments must be "`<`" then "`!-- `" then the data then "` -->`".
michael@0 48 - DOCTYPEs must be "`<!DOCTYPE `" then the name then if either of the
michael@0 49 system id or public id is non-empty a space, public id in
michael@0 50 double-quotes, another space an the system id in double-quotes, and
michael@0 51 then in any case "`>`".
michael@0 52 - Processing instructions must be "`<?`", then the target, then a
michael@0 53 space, then the data and then "`>`". (The HTML parser cannot emit
michael@0 54 processing instructions, but scripts can, and the WebVTT to DOM
michael@0 55 rules can emit them.)
michael@0 56
michael@0 57 The *tag name string* is the local name prefixed by a namespace
michael@0 58 designator. For the HTML namespace, the namespace designator is the
michael@0 59 empty string, i.e. there's no prefix. For the SVG namespace, the
michael@0 60 namespace designator is "svg ". For the MathML namespace, the namespace
michael@0 61 designator is "math ".
michael@0 62
michael@0 63 The *attribute name string* is the local name prefixed by a namespace
michael@0 64 designator. For no namespace, the namespace designator is the empty
michael@0 65 string, i.e. there's no prefix. For the XLink namespace, the namespace
michael@0 66 designator is "xlink ". For the XML namespace, the namespace designator
michael@0 67 is "xml ". For the XMLNS namespace, the namespace designator is "xmlns
michael@0 68 ". Note the difference between "xlink:href" which is an attribute in no
michael@0 69 namespace with the local name "xlink:href" and "xlink href" which is an
michael@0 70 attribute in the xlink namespace with the local name "href".
michael@0 71
michael@0 72 If there is also a "\#document-fragment" the bit following "\#document"
michael@0 73 must be a representation of the HTML fragment serialization for the
michael@0 74 context element given by "\#document-fragment".
michael@0 75
michael@0 76 For example:
michael@0 77
michael@0 78 #data
michael@0 79 <p>One<p>Two
michael@0 80 #errors
michael@0 81 3: Missing document type declaration
michael@0 82 #document
michael@0 83 | <html>
michael@0 84 | <head>
michael@0 85 | <body>
michael@0 86 | <p>
michael@0 87 | "One"
michael@0 88 | <p>
michael@0 89 | "Two"

mercurial