HTML5 Parser

Introduction

With the 2.0.0 release, Dompdf incorporated the Masterminds/HTML5-PHP HTML5 parser library. The HTML5 parser is always enabled when ingesting an HTML document.

Previous releases of Dompdf bundled an older HTML5 parser: html5lib. In those releases the HTML5 parser can be activated by setting \Dompdf\Options::$isHtml5ParserEnabled to true.

What is an HTML5 parser?

An HTML parser is a library or software able to read an HTML source code and translate it into a DOM tree.

The difference between a regular HTML parser and an HTML5 parser is that the latter knows how to deal with badly structured HTML code as all the cases are strictly defined in W3C specifications.

What does it mean for dompdf?

Having an HTML5 parser, dompdf will be able to handle more poorly written HTML documents.

For example, it happens that a table element has rows without closing tr elements. A regular HTML parser (the one embedded with the PHP DOM extension: libxml) won't be able to handle it well and may, for example, ignore this line or append the next cells to the current line. An HTML5 parser will handle it like if the </tr> tag is present.

Skipping the HTML5 parser

Though not recommended, it is possible to skip HTML5 parsing by feeding Dompdf a DOMDocument instance instead of an HTML document. To do so, you would call the loadDom method with your previously instantiated DOMDocument instance.

$doc = new DOMDocument("1.0", "UTF-8");
$doc->preserveWhiteSpace = true;
$doc->loadHTMLFile(...);
$doc->encoding = "UTF-8";

$dompdf = new Dompdf();
$dompdf->loadDom($doc);
$dompmdf->render();
$dompdf->stream();

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTML5 Parser

Introduction

What is an HTML5 parser?

What does it mean for dompdf?

Skipping the HTML5 parser

Clone this wiki locally