Skip to content

pharos-alexandria/ocr-greek_cursive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ocr-greek_cursive

Training files for Greek cursive script (in early print)

This repository contains ground truth and pre-trained models for Kraken and for Calamari.

Ground truth

The folder gt contains ground truth made from the exemplar of the edition of John Chrysostom's works by Henry Savile, ed. Eton (John Norton), Tom. V, 1612, which is today at Bayerische Staatsbibliothek München, Res/2 P.gr. 55-5, and was made available in digitized form at http://mdz-nbn-resolving.de/urn:nbn:de:bvb:12-bsb10870413-4 (subfolder Savile) and from the exemplar of ΣΕΙΡΑ ΕΝΟΣ ΚΑΙ ΠΕΝΤΗΚΟΝΤΑ ΥΠΟΜΝΗΜΑΤΙΣΤΩΝ ΕΙΣ ΤΗΝ ΟΚΤΑΤΕΥΧΟΝ ΚΑΙ ΤΑ ΤΩΝ ΒΑΣΙΛΕΙΩΝ ΗΔΗ ΠΡΩΤΟΝ ΤΥΠΟΙΣ ΕΚΔΟΘΕΙΣΑ ... ΓΡΗΓΟΡΙΟΥ ΑΛΕΞΑΝΔΡΟΥ ΓΚΙΚΑ, ΤΟΜΟΣ ΠΡΩΤΟΣ, Leipzig 1772, which is today at Staatsbibliothek zu Berlin Preußischer Kulturbesitz, 2" B 1774-1, and was made available in digitized form at http://resolver.staatsbibliothek-berlin.de/SBB00028A5400000000 (subfolder CatenaLipsiensis, NFC; there's a subfolder with ground truth in NFD).

The photos were pre-processed with ScanTailor, esp. in the Catena Lispiensis the two columns were manually split.

For the transcripts of Savile the text in the Patristic Text Archive was used; the transcripts of Catena Lipsiensis were made by Janina Skóra and Karin Metzler and converted to plain text.

The models

In the folder calamari-models is a ensemble of models that was trained via calamari-cross-fold-train on the Catena Lipsiensis ground truth.

In the folder kraken-models are models that were trained on the Savile or the Catena Lispiensis ground truth on several machines. -nfc and -nfd-models are trained on Unicode precomposed and decomposed text respectively.

Evaluation data is in folder eval.

About

Training files for Greek cursive script (in early print)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published