ud-pos-tagger

Hidden Markov Model based POS tagging for 60+ languages on universal dependencies (UD) data

https://medium.com/@prannerta100/universal-dependencies-a-hidden-markov-quest-drem-yol-lok-2ca930ffc94f

Introduction

In this notebook (HMM_UD.ipynb), we will use the Pomegranate library to build a simple Hidden Markov Model for part-of-speech tagging.

The goal here is breadth rather than depth: we want to cover as many languages in the UD tagset as possible, therefore we did not implement additional features like:

Laplace Smoothing Wiki Link
Backoff Smoothing Speech & Language Processing Ch. 4,9,10
Extending to Trigrams Trigram Paper

આ પોથીમાં આપણે વિવિધ ભાષાઓમાં શબ્દ ભેદ (પાર્ટ્સ ઓફ સ્પીચ) ઉકેલવાનું કામ હિડેન માર્કોવ મોડેલ (HMM) વડે કરીશું.

અહીં દાડમ લાઇબ્રેરી Pomegranate વાપરવામાં આવી છે.

આપણું લક્ષ્ય ઊંડાણ ને બદલે વિસ્તારનો છે, એટલા માટે નીચે આપેલ ગુણવિશેષનો સમાવેશ નથી:

લાપ્લેસ નિયમિતકારણ વિકિપીડિયા
બેકઓફ નિયમિતકરણ ચોંપડી પાઠ 4,9,10
ટ્રાઇગ્રામ સંશોધન પેપર

Preprocessing and imports

We scan the entire UD folder to read in all the names of the respective language subdirectories, and prune out datasets that don't have train sets. Lack of a dev set is tolerated, as dev sets are fused to the training set, given the lack of iterative training in our HMM implementation.

We need the following libraries installed:

Pomegranate
Numpy
Collections
pyconll

In addition, helper functions are found in data_prep.py and hmm_utils.py. Make sure you have these files in the same directory as this notebook!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.ipynb_checkpoints		.ipynb_checkpoints
HMM_UD.ipynb		HMM_UD.ipynb
README.md		README.md
data_prep.py		data_prep.py
fig1.PNG		fig1.PNG
fig1.json		fig1.json
fig1.svg		fig1.svg
fig2.json		fig2.json
fig2.png		fig2.png
fig2.svg		fig2.svg
hmm_utils.py		hmm_utils.py
index.html		index.html
results.csv		results.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.ipynb_checkpoints

.ipynb_checkpoints

HMM_UD.ipynb

HMM_UD.ipynb

README.md

README.md

data_prep.py

data_prep.py

fig1.PNG

fig1.PNG

fig1.json

fig1.json

fig1.svg

fig1.svg

fig2.json

fig2.json

fig2.png

fig2.png

fig2.svg

fig2.svg

hmm_utils.py

hmm_utils.py

index.html

index.html

results.csv

results.csv

Repository files navigation

ud-pos-tagger

Introduction

Preprocessing and imports

About

Releases

Packages

Languages

prannerta100/ud-pos-tagger

Folders and files

Latest commit

History

Repository files navigation

ud-pos-tagger

Introduction

Preprocessing and imports

About

Topics

Resources

Stars

Watchers

Forks

Languages