Skip to content

XML parser student project

Rohan Chakravarthy edited this page Jan 10, 2015 · 4 revisions

Integrate an XML parser

Background information: An important part of loading web pages is the process of turning HTML source into a DOM. We have parsers that do that in Servo already, but the ability to turn XHTML (based on XML) into a DOM is missing. We need a parser that can read XML and build a tree of DOM nodes out of it in the same manner that the current HTML parser does; using it as a model for the new parser is encouraged.

Initial step: Build Servo, create a new Parser trait in hubbub_html_parser.rs, create a new HTMLParser struct that contains a hubbub::Parser<'a> member, and implement the Parser trait for HTMLParser. Make sure it builds and pages still load correctly.

  • Create a Parser trait has a parse_chunk method
  • Create an HTMLParser struct that contains the hubbub-related data in parse_html (such as the TreeHandler), and have it impl the Parser trait
  • Make RustyXML a dependency of the script crate. Learn more about Cargo, the dependency manager and build system that we use.
  • Create an XMLParser struct that contains the data necessary for using RustyXML and make it impl the Parser trait
  • Rewrite parse_html to create the right Parser based on the HTTP Content-Type header (use application/xhtml+xml for the XML parser; see an example
  • Expand on the XML parser to use the events (https://github.com/Florob/RustyXML/blob/master/src/xml/lib.rs#L135) to perform the same actions as the HTML parser (see the callbacks in TreeHandler). Start with creating elements and setting attributes.
  • Support executing scripts by checking the elements being added and using the same "discovery" mechanism the HTML parser does (see js_script_listener)
Clone this wiki locally