Skip to content

jlieth/hocr-parser

Repository files navigation

hocr-parser

Python parser for hOCR files using lxml

Build Status codecov Coverage Status

hOCR is an open standard for representing the results of optical character recognition (OCR). The results of OCR (the recognized text, layout, styles, etc.) are represented in hOCR using XHTML. This Python module parses an existing hOCR file and gives easy access to the hOCR elements and their attributes.

Install

Python 3.6+ is required, and you'll probably want to use some kind of virtual environment to install this package. Until I push the package to PyPi, you can install directly from Github with pip:

pip install git+https://github.com/jlieth/hocr-parser

Similar projects

External links

Releases

No releases published

Languages