Parsr Dependencies

This page lists all of the dependencies of Parsr and what they are used for.

Parsr Dependencies

1. Base Dependencies

The following required dependencies need to be installed for Parsr to work properly:

node.js : The underlying framework upon which the platform is built.
qpdf : For reading password-protected PDFs.
imagemagick : For converting between file formats.

2. Extraction Dependencies

Depending upon the type of documents to be treated by the platform, one or multiple of the following dependencies should be installed.

If simple PDFs containing digital (or selectable) textual elements are to be fed into the system, the pdfminer library needs to be installed.

If images (jpg, png, tiff, etc.) are to be used with the tool, then the tool also supports the use of the following two OCR based solutions as an underlying extraction module:

tesseract : Open source, support for over ~100 languages, Google's Tesseract is a free, on premise OCR solution. However, text formatting, or tabular data is not detected.
ABBYY FineReader Server : Proprietary OCR solution with extremely high recognition accuracy, formatting recognition and tabular data extraction. It is an optional dependency.

3. Optional Dependencies

The following optional dependencies may to be installed:

mupdf-tools: For error-correcting corrupt PDFs at input.
pandoc: Generate PDF files from an intermediate Markdown output after the cleaning operation in the pipeline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dependencies.md

dependencies.md

Parsr Dependencies

1. Base Dependencies

2. Extraction Dependencies

3. Optional Dependencies

Files

dependencies.md

Latest commit

History

dependencies.md

File metadata and controls

Parsr Dependencies

1. Base Dependencies

2. Extraction Dependencies

3. Optional Dependencies