class DOMTreeBuilder implements EventHandler

Create an HTML5 DOM tree from events.

This attempts to create a DOM from events emitted by a parser. This attempts (but does not guarantee) to up-convert older HTML documents to HTML5. It does this by applying HTML5's rules, but it will not change the architecture of the document itself.

Many of the error correction and quirks features suggested in the specification are implemented herein; however, not all of them are. Since we do not assume a graphical user agent, no presentation-specific logic is conducted during tree building.

FIXME: The present tree builder does not exactly follow the state machine rules for insert modes as outlined in the HTML5 spec. The processor needs to be re-written to accomodate this. See, for example, the Go language HTML5 parser.

Constants

NAMESPACE_HTML

Defined in http://www.w3.org/TR/html51/infrastructure.html#html-namespace-0

NAMESPACE_MATHML

NAMESPACE_SVG

NAMESPACE_XLINK

NAMESPACE_XML

NAMESPACE_XMLNS

OPT_DISABLE_HTML_NS

OPT_TARGET_DOC

OPT_IMPLICIT_NS

IM_INITIAL

Defined in 8.2.5.

IM_BEFORE_HTML

IM_BEFORE_HEAD

IM_IN_HEAD

IM_IN_HEAD_NOSCRIPT

IM_AFTER_HEAD

IM_IN_BODY

IM_TEXT

IM_IN_TABLE

IM_IN_TABLE_TEXT

IM_IN_CAPTION

IM_IN_COLUMN_GROUP

IM_IN_TABLE_BODY

IM_IN_ROW

IM_IN_CELL

IM_IN_SELECT

IM_IN_SELECT_IN_TABLE

IM_AFTER_BODY

IM_IN_FRAMESET

IM_AFTER_FRAMESET

IM_AFTER_AFTER_BODY

IM_AFTER_AFTER_FRAMESET

IM_IN_SVG

IM_IN_MATHML

Methods

__construct($isFragment = false, array $options = array())

No description

document()

Get the document.

DOMFragmentDocumentFragment
fragment()

Get the DOM fragment for the body.

setInstructionProcessor( InstructionProcessor $proc)

Provide an instruction processor.

doctype( string $name, int $idType, string $id = null, boolean $quirks = false)

A doctype declaration.

int
startTag( string $name, array $attributes = array(), boolean $selfClosing = false)

Process the start tag.

endTag($name)

An end-tag.

comment($cdata)

A comment section (unparsed character data).

text($data)

A unit of parsed character data.

eof()

Indicates that the document has been entirely processed.

parseError($msg, $line, $col)

Emitted when the parser encounters an error condition.

getErrors()

No description

cdata( string $data)

A CDATA section.

processingInstruction( string $name, string $data = null)

This is a holdover from the XML spec.

Details

at line line 162
__construct($isFragment = false, array $options = array())

Parameters

$isFragment
array $options

at line line 206
document()

Get the document.

at line line 221
DOMFragmentDocumentFragment fragment()

Get the DOM fragment for the body.

This returns a DOMNodeList because a fragment may have zero or more DOMNodes at its root.

Return Value

DOMFragmentDocumentFragment

See also

http://www.w3.org/TR/2012/CR-html5-20121217/syntax.html#concept-frag-parse-context

at line line 232
setInstructionProcessor( InstructionProcessor $proc)

Provide an instruction processor.

This is used for handling Processor Instructions as they are inserted. If omitted, PI's are inserted directly into the DOM tree.

Parameters

InstructionProcessor $proc

at line line 237
doctype( string $name, int $idType, string $id = null, boolean $quirks = false)

A doctype declaration.

Parameters

string $name The name of the root element.
int $idType One of DOCTYPENONE, DOCTYPEPUBLIC, or DOCTYPE_SYSTEM.
string $id The identifier. For DOCTYPEPUBLIC, this is the public ID. If DOCTYPESYSTEM, then this is a system ID.
boolean $quirks Indicates whether the builder should enter quirks mode.

at line line 259
int startTag( string $name, array $attributes = array(), boolean $selfClosing = false)

Process the start tag.

Parameters

string $name The tag name.
array $attributes An array with all of the tag's attributes.
boolean $selfClosing An indicator of whether or not this tag is self-closing ()

Return Value

int One of the Tokenizer::TEXTMODE_* constants.

at line line 460
endTag($name)

An end-tag.

Parameters

$name

at line line 540
comment($cdata)

A comment section (unparsed character data).

Parameters

$cdata

at line line 547
text($data)

A unit of parsed character data.

Entities in this text are already decoded.

Parameters

$data

at line line 568
eof()

Indicates that the document has been entirely processed.

at line line 573
parseError($msg, $line, $col)

Emitted when the parser encounters an error condition.

Parameters

$msg
$line
$col

at line line 578
getErrors()

at line line 583
cdata( string $data)

A CDATA section.

Parameters

string $data The unparsed character data.

at line line 589
processingInstruction( string $name, string $data = null)

This is a holdover from the XML spec.

While user agents don't get PIs, server-side does.

Parameters

string $name The name of the processor (e.g. 'php').
string $data The unparsed data.