Skip to content

OCR-D/gt-metadata

 
 

Repository files navigation

gt-metadata

gt-metadata is a tool for collecting metadata from ground truth data records. Data on the dataset (including title, short description, project reference, license) and references to OCR models can be recorded. The metadata data is saved in YAML format and can be automatically saved in the repository of the Ground Truth dataset and in the HTR-United catalog. The HTR-United catalog lists various GT datasets and OCR/HTR models.

With the availability of the gt-metadata tool the HTR-Data Reuse Charter is followed. This offering follows the principles of Reciprocity, Interoperability, Citability, Openness, Stewardship and Trustworthiness. This tool builds on the HTR-United tool. It has been reduced and extended.

Metadata Schema (JSON format)

gt-metadata supports the current metadata schema.

Language support

  • German
  • English
  • French

Extensions that differ from those of HTR-United:

  • A German translation

The metadata schema has been extended by:

  • the ground truth type
  • the declaration of an OCR model