Skip to content

russelbradley/Malay-NLP-Dataset

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 

Repository files navigation

Bahasa Melayu Natural Language Processing (MelayuNLP) Resource

Collection of Bahasa Malaysia (Malay) Natural Language Processing (NLP) software libraries, dictionaries, and corpus. Always welcome for pull requests.

Bahasa Melayu NLP Libraries/Services

Natural Language Toolkit

Library Description Programming Languages License Author & Link
Malaya Natural-Language-Toolkit for Bahasa Malaysia iPython MIT License (MIT) DevconX

Natural Language Pipleline

Library Description Programming Languages License Author & Link
polyglot Polyglot is a natural language pipeline that supports massive multilingual applications such as Transliteration, NER, Sentiment Analysis, Morphological Analysis Python GPLv3 aboSamoor

Part of Speech Tagging (POS Tagging)

API Description Programming Languages License Guide & Link
Malay NLP Frequency Based and Max-ent POS Taggers Malay NLP Blog

Morphology Analysis

Library Description Programming Languages License Author & Link
hltdi-morphology Mirror Repository for ParaMorfo, HornMorpho, AntiMorfo, and MorfoMelayu LowResourceLanguages

Dictionaries / Translation Pairs / Parallel Corpus

Library Description Size Features License Link
MALINDO_Morph Morphological dictionary for Malay / Indonesian English-Malay, English-Indonesian CC BY-NC-SA 4.0 TH english
TALPCo The TUFS Asian Language Parallel Corpus Japanese -> Malay Creative Commons Attribution 4.0 International (CC BY 4.0) license matbahasa
Open Parallel Corpus OPUS is a growing collection of translated texts from the web. Malay <-> Many languages Modified BSD License OPUS

Pre-trained Word Vectors

Pre-trained Model Description Size Dimensions License Link
fastText Skip-Gram model trained on Wikipedia using fastText 300 CC BY-SA 3.0 Facebook + Bin & Text + Text Only
wordvectors Pre-trained word vectors of 30+ languages 173MB 100 MIT License Kyubyong

Not found? Try this.

Malay is currently a low-resource language with few NLP resources out there. Due to its close resemblence to Bahasa Indonesia, it may be useful to try using resources built for Bahasa Indonesia. If you're looking for a place to start, here is a great resource: https://github.com/keyreply/Bahasa-Indo-NLP-Dataset

About

A collection of NLP resources for Malay

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published