This is a backup repository for a Python project focused on testing ways to improve morphological segmentation in low-resource languages that have complicated, non-concatenative morphological patterns.
The code is admittedly obtuse, and I'm not currently working on it. If interested in running it, there are main functions in preprocessing_test.py
and evaluation_test.py
that run over the pre-tokenized Tagalog data in the data/ directory.
If you make use of this project in an academic or commercial environment, please cite the associated publication:
@article{butler2016infixer,
title={Infixer: A Method for Segmenting Non-Concatenative Morphology in Tagalog},
author={Butler, Steven R},
year={2016}
}
Please contact me if you have any questions and I'll do my best to respond.