A corpus builder for evaluation of plagiarism detection tools
-
Updated
Dec 12, 2016 - PHP
A corpus builder for evaluation of plagiarism detection tools
golden arabic corpus build for test Assem's arabicstemmer and other arabic stemmers
Augmentation scripts for the bAbI Dialog Tasks dataset
A corpus of Ukrainian Twitter texts + instructions for downloading and filtering texts.
A clean Fusha Arabic tagged corpus.
A prototype for generating language in a grounded simulation of a simple hunter-gatherer world
A set of corpus-based sampling & analysis M4L devices
Generate pseudo-English sentences for research in semantic composition
Scrimshaw parses IRC logs stored in the driftwood format for quotes attributable to a given user. Written in Rust.
Natively log WeeChat channel and private messages, CTCP, and notices, in the driftwood standard. Written in Python.
Bitextor generates translation memories from multilingual websites
Information Retrieval Lab
A full-text article retrieval pipeline for biomedical literature.
Katya or The Liberated Corpus a text corpus that allows you to request and scrape any web resource!
Create a corpus for fine-tuning an OpenAI model
AutoCorpus is a tool backed by a large language model (LLM) for automatically generating corpus files for fuzzing.
The canonical resources to build the backend for a corpus/repository management framework for Crow, the Corpus and Repository of Writing
A parser for annotated MuseScore 3 files.
Add a description, image, and links to the corpus-generator topic page so that developers can more easily learn about it.
To associate your repository with the corpus-generator topic, visit your repo's landing page and select "manage topics."