Skip to content

suhlig/httpspell

Repository files navigation

httpspell

Build Status

This is a spellchecker that recursively fetches HTML pages, converts them to plain text (using pandoc), and spellchecks them with hunspell. Unknown words will be printed to stdout, which makes the tool a good candidate for CI pipelines where you might want to take action when a spelling error is found on a web page.

Words that are not in the dictionary for the given language (inferred from the lang attribute of the HTML document's root element) can be added to a personal dictionary, which will mark the word as correctly spelled.

Usage

  • The following command will retrieve the HTML document at https://example.com, spellcheck it, and not print anything because there are no errors:

    $ httpspell https://example.com

    The exit code is 0.

  • The following command will spellcheck the README of this project as rendered by GitHub, and print a list of unknown words. Note that we set the language to en_US because GitHub declares 'en' as document language, but the installed dictionaries usually refer the a specific language variant like en_US:

    $ httpspell https://github.com/suhlig/httpspell/blob/master/README.markdown --language en_US
    suhlig
    Permalink
    httpspell
    sloc
    pandoc
    hunspell
    ...

    The exit code is 1.

What is not checked

  • When spidering a site, httpspell will skip all responses with a content-type header other than text/html (unless pointing it to file, in which case it accepts anything).
  • Before converting, httpspell removes the following nodes from the HTML DOM as they are not a good target for spellchecking:
    • code
    • pre
    • Elements with spellcheck='false' (this is how HTML5 allows tagging elements as a being target for spellchecking or not)

Misc

If you produce content with kramdown (e.g. using Jekyll), setting spellcheck='false' for an element is a simple as adding this line after the element (e.g. heading):

{: spellcheck="false"}

About

Spellchecker that recursively fetches HTML pages, converts them to plain text, and spellchecks them.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages