A number of command-line tools for working with FoLiA (Format for Linguistic Annotation). Includes validators, converters, visualisers, and more.
-
Updated
May 14, 2024 - Python
A number of command-line tools for working with FoLiA (Format for Linguistic Annotation). Includes validators, converters, visualisers, and more.
A pipeline for machine translation (using OPUS-MT models) of parliamentary text collections in 30+ languages (ParlaMint corpora). The pipeline includes parsing TEI XLM and CONLL-u files, linguistic processing with the Stanza pipeline, machine translation and word alignment with the Eflomal tool.
Count Bigram frequency in a conllu format corpus
Exploring and visualizing CONULLU files in Python
A tool for validating English CoNLL-U data files.
A package for manipulating Universal Dependencies trees
Repository for the paper "Exploring Non-Verbal Predicates in Semantic Role Labeling: Challenges and Opportunities"
A minimal, pure Python library to interface with CoNLL-U format files.
GitHub repository for Arc-Eager Transition-Based Parser
Tool for translating a corpus file from one language to another.
Analysing different text representations for genre identification. I parse CONLL-u files and extract various representations of a text (running text, lemmas, part-of-speech), then train a Fasttext model on each to see which representation is the most beneficial for the genre identification task.
Toolkit that simplifies corpus processing
ACoLi CoNLL libraries: Several tools for processing, manipulating and transforming TSV formats (CoNLL-RDF, CoNLL-Merge, CQP4RDF)
End-to-end integration of HuggingFace's models for sequence labeling.
Simple script to parse text with spaCy and print the output in CoNLL-U format.
Add a description, image, and links to the conllu topic page so that developers can more easily learn about it.
To associate your repository with the conllu topic, visit your repo's landing page and select "manage topics."