#

low-resource-languages

Here are 105 public repositories matching this topic...

csikasote / BembaSpeech

This is an ASR corpus for Bemba language. It contains read speech from diverse publicly available Bemba sources; Literature Books, Radio/TV shows transcripts, Youtube Video transcripts, Online sources. The corpus has 14, 438 utterances culminating into over 24 hours of speech.

automatic-speech-recognition low-resource-languages bemba

Updated May 23, 2024

nicolay-r / RuSentNE-LLM-Benchmark

This repository highlights the LLMs reasoning capabilities of ✨ Mistral / LLaMA-3 / Phi-3 / Gemma / Flan-T5 / GPT-4o ✨ in Targeted Sentiment Analysis in Russian / Translated to English mass-media 📊

sentiment-analysis leaderboard prompt openai gemma zero-shot mistral reasoning fine-tuning low-resource-languages transformers-library low-resource-nlp gpt4 llm llms chain-of-thought llama3 gpt4o

Updated May 23, 2024
Python

generalpurposelab / instruct-global

Repo associated with the forthcoming paper 'Instruct-global: aligning language models to follow instructions in low-resource languages'. Instruct-global automates the process of generating instruction datasets in low-resource languages (LRLs).

fine-tuning low-resource-languages llms

Updated May 16, 2024
Python

luciusssss / ZhuangBench

Teaching Large Language Models an Unseen Language on the Fly

low-resource-languages zhuang low-resource-nlp large-language-models llm

Updated May 16, 2024
Python

cisnlp / GlotLID

GlotLID: Language Identification with Support for More Than 2000 Labels (EMNLP 2023).

language-detection multlingual language-detector language-recognition glot lid language-identification language-classification language-identification-toolkit low-resource-languages language-detection-library language-identifier language-detection-lib langid low-resource-nlp

Updated May 12, 2024
Python

cisnlp / GlotWeb

GlotWeb: Web Indexing for Low-Resource Languages -- under construction.

multilingual dataset glot low-resource-languages news-dataset awsome-list

Updated May 10, 2024
Python

RichardLitt / low-resource-languages

Resources for conservation, development, and documentation of low resource (human) languages.

nlp list natural-language-processing awesome natural-language language-learning awesome-list language-resources endangered-languages human-language language-documentation resourced-languages minority-language low-resource-languages lrls

Updated May 9, 2024
TeX

franciellevargas / HausaHate

HausaHate is a benchmark dataset for Hausa hate speech detection task. it was extracted from West African Facebook pages and comprises 2,000 comments annotated according to a binary class (offensive and non-offensive) and hate speech targets (race, gender and none).

benchmark machine-learning natural-language-processing corpus dataset nlp-machine-learning offensive-language hate-speech low-resource-languages hausa-nlp

Updated May 2, 2024

Andrews2017 / africanlp-public-datasets

A repository for publicly/freely available Natural Language Processing (NLP) datasets for African languages.

natural-language-processing african-languages datasets low-resource-languages

Updated Apr 26, 2024

kenza-ily / QuantHaLL_NLP

QuantHaLL: Quantifying Hallucination in machine translation for Low-resource Languages

machine-translation embeddings hallucination low-resource-languages llms

Updated Apr 22, 2024
Jupyter Notebook

cisnlp / GlotSparse

GlotSparse: Building Corpora in Under-Resourced Languages

multilingual dataset corpus-linguistics glot low-resource-languages news-dataset awsome-list

Updated Apr 18, 2024

ndamulelonemakh / zabantu-beta

ZaBantu is a fleet of light-weight Masked Language Models for Southern Bantu Languages

nlp zulu tshivenda low-resource-languages roberta sotho xlm-roberta tsonga

Updated Apr 15, 2024
Python

generalpurposelab / ede-data

The Ede Python library automates the generations of instruction fine-tuning datasets in low-resource languages.

fine-tuning low-resource-languages llms

Updated Apr 4, 2024
Python

kashubian-translator / pl-csb-data

The following repository contains data and data preparation tools for a Polish-Kashubian translator.

data polish low-resource-languages kashubian

Updated Mar 31, 2024
PLSQL

kashubian-translator / pl-csb-model

The following repository contains model training and BLEU calculation tools for a Polish-Kashubian translator.

translator machine-translation polish nlp-machine-learning low-resource-languages kashubian

Updated Mar 31, 2024
Python

csebuetnlp / xl-sum

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.

multilingual machine-learning deep-learning dataset text-summarization abstractive-text-summarization abstractive-summarization text-summarisation low-resource-languages multilinguality summarization-corpora summarization-dataset multilingual-text-summarization text-summarization-dataset text-summarization-model low-resource-summarization low-resource-text-summarizarion multilingual-summarization

Updated Mar 26, 2024
Python

LLM-low-resource-lang / LLM-low-resource-lang.github.io

LLMs for Low Resource Languages in Multilingual, Multimodal and Dialectal Settings

benchmarking zero-shot-learning dialects few-shot-learning low-resource-languages multilingual-models few-shot-classifcation zero-shot-classification llms

Updated Mar 23, 2024
HTML

ofdn / OpenSpeaks-Before-AI

A set of frameworks for creating the AI/ML building blocks for low-resource languages.

ai ml languages low-resource-languages

Updated Mar 21, 2024

cisnlp / GlotStoryBook

Children StoryBooks for 180 langauges.

multilingual storybook dataset glot low-resource-languages low-resource-nlp

Updated Mar 12, 2024
Jupyter Notebook

BatsResearch / LexC-Gen

Generate synthetic labeled data for extremely low-resource languages using bilingual lexicons.

multilingual sentiment-analysis topic-modeling synthetic-data synthetic-dataset-generation low-resource-languages lexicon-based multilingual-nlp llm

Updated May 1, 2024
Python

Improve this page

Add a description, image, and links to the low-resource-languages topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the low-resource-languages topic, visit your repo's landing page and select "manage topics."