Html parser

A simple and general purpose html/xhtml parser lib/bin, using Pest.

Features

Parse html & xhtml (not xml processing instructions)
Parse html-documents
Parse html-fragments
Parse empty documents
Parse with the same api for both documents and fragments
Parse custom, non-standard, elements; <cat/>, <Cat/> and <C4-t/>
Removes comments
Removes dangling elements
Iterate over all nodes in the dom three

What is it not

It's not a high-performance browser-grade parser
It's not suitable for html validation
It's not a parser that includes element selection or dom manipulation

If your requirements matches any of the above, then you're most likely looking for one of the crates below:

html5ever
kuchiki
scraper
or other crates using the html5ever parser

Examples bin

Parse html file

html_parser index.html

Parse stdin with pretty output

curl <website> | html_parser -p

Examples lib

Parse html document

    use html_parser::Dom;

    fn main() {
        let html = r#"
            <!doctype html>
            <html lang="en">
                <head>
                    <meta charset="utf-8">
                    <title>Html parser</title>
                </head>
                <body>
                    <h1 id="a" class="b c">Hello world</h1>
                    </h1> <!-- comments & dangling elements are ignored -->
                </body>
            </html>"#;

        assert!(Dom::parse(html).is_ok());
    }

Parse html fragment

    use html_parser::Dom;

    fn main() {
        let html = "<div id=cat />";
        assert!(Dom::parse(html).is_ok());
    }

Print to json

    use html_parser::{Dom, Result};

    fn main() -> Result<()> {
        let html = "<div id=cat />";
        let json = Dom::parse(html)?.to_json_pretty()?;
        println!("{}", json);
        Ok(())
    }

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
benches		benches
examples		examples
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benches

benches

examples

examples

src

src

tests

tests

.gitignore

.gitignore

CHANGELOG.md

CHANGELOG.md

Cargo.lock

Cargo.lock

Cargo.toml

Cargo.toml

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Html parser

Features

What is it not

Examples bin

Examples lib

About

Releases

Packages

Contributors 6

Languages

License

mathiversen/html-parser

Folders and files

Latest commit

History

Repository files navigation

Html parser

Features

What is it not

Examples bin

Examples lib

About

Topics

Resources

License

Stars

Watchers

Forks

Languages