Skip to content

NatLibFi/FintoAI-data-YSO

Repository files navigation

FintoAI-data-YSO

Configurations for maintaining the Annif projects with YSO vocabulary used at Finto AI service and the analysis notebook of Annif suggestions in Theseus repository.

The projects are trained and evaluated using a DVC (Data Version Control) pipeline defined in dvc.yaml. The training corpora that are public can be found from Annif-corpora repository.

The pipeline takes care of

  1. installing Annif in a venv,
  2. loading YSO vocabulary,
  3. training the projects,
  4. evaluating the projects.

When the necessary vocabulary and training corpora are in place the pipeline can be run using the command

dvc repro

For more information about using DVC with Annif projects see the DVC exercise of Annif tutorial.