Kurdish-G2P-dataset

Datasets for evaluation of Central Kurdish Grapheme-to-Phoneme Conversion systems.

Format

Central Kurdish words in Standard Arabic script and its corresponding phoneme string separated by tab character. Syllable start is indicated by full stop. For example: ئازادی .ʔa.za.dî

Datasets

AsoSoft Kurdish Corpus most frequent tokens

Manually converted First 5000 most frequent words of AsoSoft Kurdish Corpus presented by:

Veisi, H., MohammadAmini, M., & Hosseini, H. (2019). “Toward Kurdish language processing: Experiments in collecting and processing the AsoSoft text corpus”. Digital Scholarship in the Humanities.

@article{veisi2020toward,
  title={Toward Kurdish language processing: Experiments in collecting and processing the AsoSoft text corpus},
  author={Veisi, Hadi and MohammadAmini, Mohammad and Hosseini, Hawre},
  journal={Digital Scholarship in the Humanities},
  volume={35},
  number={1},
  pages={176--193},
  year={2020},
  publisher={Oxford University Press}
}

Wergor dataset

Manually converted 5041 unique words of document presented by: https://github.com/sinaahmadi/wergor

Ahmadi, S. (2019). “A Rule-Based Kurdish Text Transliteration System”. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 18(2), 18.

@article{ahmadi2019rule,
  title={A Rule-Based Kurdish Text Transliteration System},
  author={Ahmadi, Sina},
  journal={ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)},
  volume={18},
  number={2},
  pages={18},
  year={2019},
  publisher={ACM}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
AsoSoft-top5K		AsoSoft-top5K
README.md		README.md
Wergor-words		Wergor-words

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AsoSoft-top5K

AsoSoft-top5K

README.md

README.md

Wergor-words

Wergor-words

Repository files navigation

Kurdish-G2P-dataset

Format

Datasets

AsoSoft Kurdish Corpus most frequent tokens

Wergor dataset

About

Releases 2

Packages

AsoSoft/Kurdish-G2P-dataset

Folders and files

Latest commit

History

Repository files navigation

Kurdish-G2P-dataset

Format

Datasets

AsoSoft Kurdish Corpus most frequent tokens

Wergor dataset

About

Topics

Resources

Stars

Watchers

Forks