A corpus builder for evaluation of plagiarism detection tools
-
Updated
Dec 12, 2016 - PHP
A corpus builder for evaluation of plagiarism detection tools
Information Retrieval Lab
The canonical resources to build the backend for a corpus/repository management framework for Crow, the Corpus and Repository of Writing
Scrimshaw parses IRC logs stored in the driftwood format for quotes attributable to a given user. Written in Rust.
A clean Fusha Arabic tagged corpus.
Katya or The Liberated Corpus a text corpus that allows you to request and scrape any web resource!
Generate pseudo-English sentences for research in semantic composition
Natively log WeeChat channel and private messages, CTCP, and notices, in the driftwood standard. Written in Python.
Create a corpus for fine-tuning an OpenAI model
A set of corpus-based sampling & analysis M4L devices
A prototype for generating language in a grounded simulation of a simple hunter-gatherer world
A full-text article retrieval pipeline for biomedical literature.
AutoCorpus is a tool backed by a large language model (LLM) for automatically generating corpus files for fuzzing.
A parser for annotated MuseScore 3 files.
A corpus of Ukrainian Twitter texts + instructions for downloading and filtering texts.
Augmentation scripts for the bAbI Dialog Tasks dataset
golden arabic corpus build for test Assem's arabicstemmer and other arabic stemmers
Bitextor generates translation memories from multilingual websites
Add a description, image, and links to the corpus-generator topic page so that developers can more easily learn about it.
To associate your repository with the corpus-generator topic, visit your repo's landing page and select "manage topics."