Skip to content

centre-for-humanities-computing/odyCy

Repository files navigation

A general-purpose NLP pipeline for Ancient-Greek.


Features 🗻

  • Part of speech tagging
  • Lemmatization
  • Dependency parsing
  • Morphological analysis
  • Named entity recognition (work in progress 🚧)

Installation 🌅

OdyCy models can be directly installed from huggingface:

# To install the transformer-based pipeline
pip install https://huggingface.co/chcaa/grc_odycy_joint_trf/resolve/main/grc_odycy_joint_trf-any-py3-none-any.whl
# To install the tok2vec-based small pipeline
pip install https://huggingface.co/chcaa/grc_odycy_joint_sm/resolve/main/grc_odycy_joint_sm-any-py3-none-any.whl

Usage 🐳

Open in Colab

OdyCy pipelines can be imported with spaCy.

import spacy

# For the transformer-based pipeline
nlp = spacy.load("grc_odycy_joint_trf")

# For a faster and smaller (but less accurate) tok2vec-based pipeline
nlp = spacy.load("grc_odycy_joint_sm")

Pipelines can then be used as any other spaCy pipeline. (spaCy Documentation)

Check out our Documentation on Basic Usage.

Performance ⛵

odyCy achieves state of the art performance on multiple tasks on unseen test data from the Universal Dependencies Perseus treebank, and performs second best on the PROIEL treebank’s test set on even more tasks. In addition performance also seems relatively stable across the two evaluation datasets in comparison with other NLP pipelines.

For plots and tables on OdyCy's performance, check out the Documentation page on Performance