michael@0: michael@0: michael@0: michael@0: michael@0: michael@0: michael@0: michael@0: HTML michael@0: michael@0: michael@0: michael@0: michael@0:

michael@0: HTML

michael@0: This documents describes the complete handling of HTML in magellan. The michael@0: document covers the parsing process - how HTML is lexically analysized michael@0: and then interprted. After the parsing process is discussed we give a detailed michael@0: analysis of each HTML tag and the attributes that are supported, the values michael@0: for the attributes and how the tag is treated by magellan. michael@0:

michael@0: Parsing

michael@0: HTML is tokenized by an HTML scanner. The scanner is fed unicode data to michael@0: parse. Stream converters are used to translate from various encodings to michael@0: unicode. The scanner separates the input stream into tokens which consist michael@0: of: michael@0: michael@0: The HTML parsing engine uses the HTML scanner for lexical anlaysis. The michael@0: parsing engine operates by attacking the input stream in a set of well michael@0: defined steps: michael@0: michael@0: michael@0:

michael@0: Tag Processing

michael@0: Tags are processed by the parser by locating a "tag handler" for michael@0: the tag. The HTML parser serves as the tag handler for all of the builtin michael@0: tags documented below. Tag attribute handling is done during translation michael@0: of tags into content. This mapping translates the tag attributes into content michael@0: data and into style data. The translation to style data is documented below michael@0: by indicating the mapping from tag attributes to their CSS1 (plus extensions) michael@0: equivalents. michael@0:

michael@0: Special Hacks

michael@0: The following list describes hacks added to the magellan parsing engine michael@0: to deal with navigator compatibility. These are just the parser hacks, michael@0: not the layout or presentation hacks. Most hacks are intriduced for HTML michael@0: syntax error recovering. HTML doesn't specify much how to handle those michael@0: error conditions. Netscape has made big effort to render pages with non-prefect michael@0: HTML. For many reasons, new browsers need to keep compatible in thsi area. michael@0: michael@0: TODO: michael@0: michael@0: List of 6.0 features incompatible with 4.0 michael@0: michael@0: michael@0:
michael@0:

michael@0: Tags (Categorically sorted)

michael@0: All line breaks are conditional. If the x coordinate is at the current michael@0: left margin then a soft line break does nothing. Hard line breaks are ignored michael@0: if the last tag did a hard line break. michael@0: michael@0:

divalign = left | right | center | justify michael@0:
alignparam = abscenter | left | right | texttop | absbottom michael@0: | baseline | center | bottom | top | middle | absmiddle michael@0:
colorspec = named-color | #xyz | #xxyyzz | #xxxyyyzzz | #xxxxyyyyzzzz michael@0:
clip = [auto | value-or-pct-xy](1..4) (pct of width for even michael@0: coordinates; pct of height for odd coordinates) michael@0:
value-or-pct = an integer with an optional %; ifthe percent michael@0: is present any following characters are ignored! michael@0:
coord-list = XXX michael@0:
whitespace-strip = remove leading and michael@0: trailing and any embedded whitespace that is not an actual space (e.g. michael@0: newlines) michael@0:

michael@0: Head objects:

michael@0: TITLE michael@0: michael@0: BASE michael@0: michael@0: META michael@0: michael@0: LINK michael@0: michael@0: HEAD michael@0: michael@0: HTML michael@0: michael@0: STYLE michael@0: michael@0: FRAMESET michael@0: michael@0: FRAME michael@0: michael@0: NOFRAMES michael@0: michael@0: michael@0:

michael@0: michael@0:
Body objects:

michael@0:  BODY michael@0: michael@0: LAYER, ILAYER michael@0: michael@0: NOLAYER michael@0: michael@0: P michael@0: michael@0: ADDRESS michael@0: michael@0: PLAINTEXT, XMP michael@0: michael@0: LISTING michael@0: michael@0: PRE michael@0: michael@0: NOBR michael@0: michael@0: CENTER michael@0: michael@0: DIV michael@0: michael@0: H1-H6 michael@0: michael@0: A note regarding closing paragraphs: Any time a close paragraph is done michael@0: (for any tag) if the top of the alignment stack has a tag named "P" then michael@0: a conditional soft line break is done and the alignment is popped. michael@0:

michael@0: michael@0:

michael@0: TABLE michael@0: michael@0: TR michael@0: michael@0: TH, TD michael@0: michael@0: CAPTION michael@0: michael@0: MULTICOL michael@0: michael@0: michael@0:

michael@0: michael@0:

michael@0: BLOCKQUOTE michael@0: michael@0: UL, OL, MENU, DIR michael@0: michael@0: DL michael@0: michael@0: LI michael@0: michael@0: DD michael@0: michael@0: DT michael@0: michael@0: michael@0:

michael@0: michael@0:

michael@0: A michael@0: michael@0: STRIKE, S, TT, CODE, SAMPLE, KBD, B, STRONG, I, EM, VAR, CITE, BLINK, michael@0: BIG, SMALL, U, INLINEINPUT, SPELL michael@0: michael@0: SUP, SUB michael@0: michael@0: SPAN michael@0: michael@0: FONT michael@0: michael@0: A note regarding the style stack: The pop of the stack checks to see if michael@0: the top of the stack is an ANCHOR tag. If it is not an anchor then the michael@0: top item is unconditionally popped. If the top of the style stack is an michael@0: anchor tag then the code searches for either the bottom of the stack or michael@0: the first style stack entry not created by an anchor tag. If the entry michael@0: is followed by another entry then the entry is removed from the stack (an michael@0: out-of-order pop in other words). In this case the anchor style stack entry michael@0: is left untouched. michael@0:

michael@0: michael@0:

michael@0: text, entities michael@0: michael@0: IMG, IMAGE michael@0: michael@0: HR michael@0: michael@0: BR michael@0: michael@0: WBR michael@0: michael@0: EMBED michael@0: michael@0: NOEBMED michael@0: michael@0: APPLET michael@0: michael@0: PARAM michael@0: michael@0: OBJECT michael@0: michael@0: MAP michael@0: michael@0: AREA michael@0: michael@0: SERVER michael@0: michael@0: SPACER michael@0: michael@0: michael@0:

michael@0: michael@0:

michael@0: SCRIPT michael@0: michael@0: NOSCRIPT michael@0: michael@0: michael@0:

michael@0: michael@0:

michael@0: FORM  michael@0: michael@0: ISINDEX  michael@0: michael@0: INPUT  michael@0: michael@0: SELECT  michael@0: michael@0: OPTION  michael@0: michael@0: TEXTAREA  michael@0: michael@0: KEYGEN  michael@0: michael@0: michael@0:

michael@0: michael@0:

michael@0: BASEFONT  michael@0: michael@0: michael@0:

michael@0: michael@0:
Unsupported

michael@0: NSCP_CLOSE, NSCP_OPEN, NSCP_REBLOCK, MQUOTE, CELL, SUBDOC, CERTIFICATE, michael@0: INLINEINPUTTHICK, INLINEINPUTDOTTED, COLORMAP, HYPE, SPELL, NSDT michael@0: michael@0: michael@0: michael@0: