Skip to content
This repository has been archived by the owner on Dec 18, 2019. It is now read-only.

OCR language

jflesch edited this page Sep 24, 2014 · 10 revisions

By default, Paperwork uses Tesseract for the OCR. If unavailable, it falls back on Cuneiform.

To get better results, OCR tools need to know the language used in the document(s).

The language available in the settings dialog of Paperwork are those understood by the automatically-selected OCR tool (Tesseract or Cuneiform). If your language is not in the list, it means the OCR tool doesn't have the data required to read your language.

Debian

# OCR (Tesseract)
$ sudo apt-get install tesseract-ocr tesseract-ocr-<lang>

Fedora

# OCR (Tesseract)
$ sudo yum install tesseract tesseract-langpack-<lang>

Ubuntu

# OCR (Tesseract)
$ sudo apt-get install tesseract-ocr tesseract-ocr-<lang>