Skip to content

hohonu/padhana

Repository files navigation

Padhana

The Padhana framework is designed to enable you to work with PDF and other types of documents in a formal way. By combining a simple document format based on a node hierarchy with a set of parsers and document analysis tools, we parse and then structure/annotate document content to enable rich interactions.

Documentation & Examples

Documentation can be found here: https://hohonu.github.io/padhana-docs/

Set-up

Ensure you have Anaconda 3 or greater installed, then run:

conda env create -f conda.yml --force

Activate the padhana Conda environment with the command:

conda activate padhana

Additional Steps

If you want to use the Tesseract Parser then you will need to install Tesseract

See https://github.com/tesseract-ocr/tesseract/wiki

About

Tool for parsing and working with documents

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages