corpus-tools

Here are 91 public repositories matching this topic...

Affenmilchmann / lingwiki

(Ongoing module in development) Getting Wikipedia articles parsed content. Created for getting text corpuses data fast and easy. But can be freely used for other purpuses too

parser wikipedia multithreading linguistics corpus-linguistics corpus-data corpus-tools article-extractor wikipedia-corpus

Updated Jan 3, 2023
Python

george-zip / ap_exam_to_corpus

Star

The AP Exam Corpus Project is a Python application that generates corpora for AP exams.

corpus-linguistics corpus-tools

Updated May 14, 2021
Python

aitor-alvarez / emorabic

Star

Tools for creating speech corpora by extracting audio from YouTube videos

audio speech speech-processing corpus-tools speech-corpora

Updated Aug 15, 2022
Python

CasparChou / srt2corpus

Star

It can help you to convert srt file into CN-? parallel corpus

parallel-corpus corpus-tools

Updated Mar 31, 2018
JavaScript

antcont / LEXB

Star

Python scripts for the construction of the LEXB parallel corpus of South Tyrolean legislation (IT-DE).

machine-translation web-scraping corpus-tools corpus-processing tmx-parser tmx-cleaning

Updated Jan 23, 2022
Python

cognitive-metascience / word_sketch

Star

Open source Python package to produce word sketches inspired by Sketch Engine (to make reproducible analyses)

corpus-linguistics corpus-tools collocation-extraction word-sketches

Updated Jan 19, 2023
GLSL

oroszgy / PyMPQA

Star

Python API for extracting data from the MPQA corpus

python nlp natural-language-processing sentiment-analysis sentiment data-extraction sentiment-classification corpus-tools mpqa

Updated Jan 6, 2017
Python

This package provides utility classes and static methods for Python that make use of different third party software commonly used in text processing such as: Unitex-GramLab, TreeTagger, Apache-Tika and Google-Tesseract.

nlp ocr text-processing corpus-linguistics nlp-parsing unitexgramlab corpus-tools treetagger corpus-processing

Updated Mar 4, 2022
Python

unhammer / gt-CorpusTools

Star

branches of https://victorio.uit.no/langtech/trunk/tools/CorpusTools used by Giellatekno.UiT.no for corpus gathering.

corpus-tools giellatekno

Updated Jun 29, 2015
Python

severinsimmler / forpus

Star

Forpus is a Python library for processing plain text corpora to various corpus formats.

python-library python3 corpus-tools corpus-processing corpus-formats

Updated Mar 16, 2018
Python

aminraz / word_stats

Star

Corpus analysis of plain text and providing Type-Token Ratio as well as some other statistics.

corpus-tools corpus-processing python-dictionaries

Updated Oct 30, 2023
Python

auromitamitra / Bengali_Word_Finder

Star

Tool to generate lists of Bengali words and transcriptions matching given phonological descriptions

linguistics bengali bangla phonetics corpus-tools bengali-nlp

Updated Dec 3, 2021
Python

nlp-tlp / lexiclean

Star

An open-source web-based application for multi-task lexical normalisation

nlp annotation mern multi-task corpus-tools lexical-normalisation

Updated Feb 24, 2022
JavaScript

techiaith / Paldaruo

Star

Cod yr ap Paldaruo i iOS ar gyfer torfoli casglu corpws lleferydd | Code for the Paldaruo speech corpus crowdsourcing ap for iOS

speech crowdsourcing welsh corpus-tools

Updated Aug 3, 2017
Objective-C

Linguista / FreeLing-es_CL

Star

Linguistic resources for adapting FreeLing to Chilean Spanish

spanish tagger corpus-linguistics lemmatizer chile corpus-tools chilean-spansh

Updated Jan 6, 2020
Makefile

CherokeeLanguage / CherokeeLemmatizer

Star

Utility to guess some affix splits on Cherokee texts. Developed to use with the Moses Machine Translation software.

java neural-machine-translation statistical-machine-translation corpus-tools cherokee-language moses-machine-translation

Updated Aug 31, 2020
Java

TienZhao / suoyan.pro

Star

Online parallel text alignment tool.

corpus parallel-corpus corpus-tools text-alignment

Updated Feb 17, 2021
TypeScript

Xenios91 / BinCorp-Generator

Star

Analyzes binary executables and can generate a test corpus for defined instruction paths, each discovered function, or it can generate a test corpus to reach every basic block detected in non library/shared object parts of the bin's text section.

binary fuzzing binary-analysis corpus-tools

Updated May 17, 2024
Python

gederajeg / corplingr

Star

Tidy concordances, collocates, and wordlist

corpus-linguistics corpus-data indonesian-language indonesian corpus-tools corpus-processing leipzig-corpora-collection leipzig-corpus-files indonesian-linguistics usage-based-linguistics

Updated Nov 12, 2021
R

LeviMatheus / tcc-readability-score-level

Star

Repositório para disponibilização de bases de dados do Wikipedia e Simple Wikipedia pré-processadas, além de scripts de pré-processamento e geração de bases em Python.

python database wikipedia python3 weka readability corpus-data corpus-tools ingles wikipedia-corpus corpus-processing portuguese-brazilian base-de-dados legibilidade

Updated Jan 19, 2023

Improve this page

Add a description, image, and links to the corpus-tools topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the corpus-tools topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

corpus-tools

Here are 91 public repositories matching this topic...

Affenmilchmann / lingwiki

george-zip / ap_exam_to_corpus

aitor-alvarez / emorabic

CasparChou / srt2corpus

antcont / LEXB

cognitive-metascience / word_sketch

oroszgy / PyMPQA

petar-popovic-bg / Jerteh

unhammer / gt-CorpusTools

severinsimmler / forpus

aminraz / word_stats

auromitamitra / Bengali_Word_Finder

nlp-tlp / lexiclean

techiaith / Paldaruo

Linguista / FreeLing-es_CL

CherokeeLanguage / CherokeeLemmatizer

TienZhao / suoyan.pro

Xenios91 / BinCorp-Generator

gederajeg / corplingr

LeviMatheus / tcc-readability-score-level

Improve this page

Add this topic to your repo