Skip to content

Code for the ACL 2020 Paper on Schwa Deletion in Hindi and Punjabi

Notifications You must be signed in to change notification settings

aryamanarora/schwa-deletion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

schwa-deletion

Machine learning models for schwa deletion in Hindi and Punjabi.

Pre-generated models, which achieve state-of-the-art performance, using scikit-learn's MLPClassifier and LogisticRegression, as well as XGBoost's XGBClassifier are included in the models subfolder in each language's directory.

The results of this research are presented in the paper below:

"Supervised Grapheme-to-Phoneme Conversion of Orthographic Schwas in Hindi and Punjabi", Aryaman Arora, Luke Gessler, and Nathan Schneider (2020). In Proceedings of ACL. Preprint: https://arxiv.org/abs/2004.10353

Usage

Ensure that you are using the most recent Python 3 version.

Clone repo and install requirements:

git clone https://github.com/aryamanarora/schwa-deletion.git
cd schwa-deletion
pip install -r requirements.txt

Testing the pretrained Hindi XGBoost model:

cd hindi
python test.py

You can see test.py for an idea of how to use the main.py script as a module.