Skip to content

Off main thread HTML parsing project

Josh Matthews edited this page Mar 13, 2017 · 3 revisions

Contact: Anthony Ramine

The HTML parser currently in use in Servo is synchronous. This means that parsing HTML content and evaluating JavaScript are often entangled, which prevents optimizations such as speculatively continuing to parse HTML content at the same time. The goal of this project is to make it possible to run the HTML parser in a separate thread and process the results in the original thread at a later time.

Project breakdown

  • Make Servo's parser create a stream of events instead of executing actions immediately (see experiment)
  • Provide synchronous or asynchronous interface to parser
  • Implement checkpoints in the parser that allow re-parsing from a previous location and state
  • Support speculatively continuing to parse input while the parser is blocked waiting
  • Preload images encountered during speculative parsing

Reference:

Current implementation:

Suggestions for preparation

(Feel free to ask questions in #servo on irc.mozilla.org, or our mailing list!)

  • Add println!() calls to the HTML parser actions to understand how they interact with parsing an HTML document
  • File issues about inadequate documentation related to the HTML parser if anything is unclear
  • Gain experience using Rust by solving an easy issue. Please leave a comment saying that you're working on it.
Clone this wiki locally