Multi-Task Deep Morph Analyzer

A multi-task learning CNN-RNN model combined together with the potential of task-optimized phonetic features to predict the Lemma, POS category, Gender, Number, Person, Case, and Tense-aspect-mood (TAM) of Hindi words.

Framework

Getting started

Clone the repository

git clone git@github.com:Saurav0074/morph_analyzer.git
cd morph_analyzer

Provide the arguments

The file main.py takes the following command-line arguments:

Argument	Values	Required	Specification
lang	hindi, urdu	Yes	Language
mode	train, test and predict (i.e., no gold labels required)	Yes	Training, testing and predictions.
phonetic	True/1/yes/y/t and False/0/no/n/f	No (default=`False`)	Use MOO-driven phonological features or not.
freezing	" " and " "	No (default=`False`)	Use progressive freezing for training or not (see FreezeOut).

train and test modes operate upon the standard train-test split specified by the HDTB and UDTB datasets (see datasets README while predict uses the text provided manually in src/[lang]_predict_data/.

Sample run commands:

Training:

>>> python main.py --lang urdu --mode train --phonetic true --freezing true #train

Testing:

>>> python main.py --lang urdu --mode test --phonetic true --freezing true #test

Predicting:

>>> python main.py --lang urdu --mode predict --phonetic true --freezing true #predict

For prediction, the plain text should be provided within src/[lang]_predict_data/test_data.txt.

Outputs

For the test mode:

the predicted roots and features as well as their gold-labelled counterparts are written to separate files within output/[lang]/roots.txt, feature_0.txt, ..., feature_6.txt.
Micro-averaged precision-recall graphs are stored in graph_outputs/[lang]/.

For the predict mode, all the predictions (i.e., roots + features) are written to: output/[lang]/predictions.txt.

Graph outputs

Micro-averaged precision-recall cuves for each class arranged by increasing F1 scores:

Citation

If this repo was helpful in your research, consider citing our work:

@article{jha2018multi,
  title={Multi Task Deep Morphological Analyzer: Context Aware Joint Morphological Tagging and Lemma Prediction},
  author={Jha, Saurav and Sudhakar, Akhilesh and Singh, Anil Kumar},
  journal={arXiv preprint arXiv:1811.08619},
  year={2018}
}

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
config		config
datasets		datasets
graph_outputs		graph_outputs
output		output
resources		resources
src		src
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config

config

datasets

datasets

graph_outputs

graph_outputs

output

output

resources

resources

src

src

README.md

README.md

main.py

main.py

requirements.txt

requirements.txt

Repository files navigation

Multi-Task Deep Morph Analyzer

Framework

Getting started

Clone the repository

Provide the arguments

Sample run commands:

Outputs

Graph outputs

Citation

About

Releases

Packages

Contributors 2

Languages

srvCodes/morph_analyzer

Folders and files

Latest commit

History

Repository files navigation

Multi-Task Deep Morph Analyzer

Framework

Getting started

Clone the repository

Provide the arguments

Sample run commands:

Outputs

Graph outputs

Citation

About

Resources

Stars

Watchers

Forks

Languages