GitHub - peermohtaram/Urdu-POS-Tagger: Part of Speech Tagger (POS) for Urdu Language with Hidden Markov Model (HMM) using Kneser-Ney Smoothing

This directory contains the implementation of Hidden Markov Moded(HMM) based Part-of-Speech(POS) Tagger using Kneser-Ney Smoothing. All the code is written in Python. Dataset containing Training, Validation and Test data is in the same directory.

Use 'kn_pos.py' file for training of model and getting tag on testing data. It gives a 'tagged_output.txt' file containing word, tag pair of test data in tab separated form (word tag), with each pair on a single line.

To run the 'kn_pos.py' file, use the following command:
python kn_pos.py path/to/trainfile path/to/testfile

It will output a 'tagged_output.txt' in the same directory where 'kn_pos.py' file is located.

To evalute the tags generated by tagger against the correct tags, use the 'evalute.py' file. Run the following command:
python evaluation.py tagged_output.txt path/to/validationfile

It will print out the accuracy.

DATA FORMATS:

Training data should be in tab separated word,tag format:

ٹریور NN
ٹینک NN
مختلف JJ
قسم NN
کی PSP
چڑیوں NN
جیسے PRR

Validation data should be in this format:
ابتدائی JJ
نقصان NN
کے PSP
بعد NN
معین NNP
علی NNP
اور CC
مورگن NNP
نے PSP

Test data should be in this format:

ابتدائی
نقصان
کے
بعد
معین
علی
اور

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
Test_data.txt		Test_data.txt
Training_data.txt		Training_data.txt
Validation-data.txt		Validation-data.txt
evaluation.py		evaluation.py
kn_pos.py		kn_pos.py
py_run.ipynb		py_run.ipynb
tagged_output.txt		tagged_output.txt
urdu_kn_hmm_pos.ipynb		urdu_kn_hmm_pos.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Test_data.txt

Test_data.txt

Training_data.txt

Training_data.txt

Validation-data.txt

Validation-data.txt

evaluation.py

evaluation.py

kn_pos.py

kn_pos.py

py_run.ipynb

py_run.ipynb

tagged_output.txt

tagged_output.txt

urdu_kn_hmm_pos.ipynb

urdu_kn_hmm_pos.ipynb

Repository files navigation

About

Languages

peermohtaram/Urdu-POS-Tagger

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Languages