Skip to content
Stefan Weil edited this page May 10, 2020 · 8 revisions

Welcome to the tesstrain wiki!

tesstrain (formerly ocrd-train) is a collection of scripts and documentation for training of Tesseract with LSTM (supported by Tesseract 4 and newer releases).

Currently it includes a Makefile which allows training from real line images with ground truth (text transcriptions). Such data is available from a number of sources, see https://github.com/cneud/ocr-gt for a list.

Training from synthetic images is supported by training scripts (Shell, Python) which are still part of the Tesseract code base.

Examples