LSTM_POS_Tagger

A simple POS Tagger made with a Bidirectional LSTM using keras trained on the Brown Corpus

Paper used as reference - Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Recurrent Neural Network

See DetailedDescription.pdf for a detailed description of the whole project.

Video Explanation: A video explaining the whole project can be found here

TL;DR:

The code does the following:

Extracts POS tagging training data from the Brown corpus (extract_data.py)
Converts a text file with the Glove Vectors into a pickle file (make_glove_pickle.py)
Trains a Bidirectional LSTM using the vectors and data. (make_model.py)
Allows pos tag prediction on new sentences fed in through model_evaluation.py

Uses keras and tensorflow backend

Glove file not included. It can be found here Download glove.6B.zip Unzip it and paste the .txts in the current dir Rest should be handled by the scripts

Setup

Use environment.yml to set up the environment using anaconda

Sample output of training (for 2 epochs):

(LSTM_POS_Tagger) D:\Projects\LSTM_POS_Tagger>python model_evaluation.py
Using TensorFlow backend.

The sentence is  ['i', 'want', 'to', 'dance', 'with', 'a', 'girl']
The tokenized sentence is  [[46187  7416  3956 31382 30171 28645 35332]]
The padded tokenized sentence is  [[    0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0 46187  7416  3956
  31382 30171 28645 35332]]
['i', 'want', 'to', 'dance', 'with', 'a', 'girl']
ppss
vb-hl
in-hl
vb
in
at
nn

Example output on training

(LSTM_POS_Tagger) D:\Projects\LSTM_POS_Tagger>python make_model.py
Using TensorFlow backend.
TOTAL TAGS 471
TOTAL WORDS 49511
We have 36634 TRAINING samples
We have 9159 VALIDATION samples
We have 11449 TEST samples
Total 400000 word vectors.
Embedding matrix shape (49512, 100)
X_train shape (36634, 100)

model fitting - Bidirectional LSTM
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         (None, 100)               0
_________________________________________________________________
embedding_1 (Embedding)      (None, 100, 100)          4951200
_________________________________________________________________
bidirectional_1 (Bidirection (None, 100, 128)          84480
_________________________________________________________________
time_distributed_1 (TimeDist (None, 100, 472)          60888
=================================================================
Total params: 5,096,568
Trainable params: 5,096,568
Non-trainable params: 0
_________________________________________________________________
Epoch 1/2
1144/1144 [==============================] - 675s 590ms/step - loss: 0.2088 - acc: 0.9579 - val_loss: 0.0578 - val_acc: 0.9851
Epoch 2/2
1144/1144 [==============================] - 701s 613ms/step - loss: 0.0482 - acc: 0.9870 - val_loss: 0.0453 - val_acc: 0.9879
MODEL SAVED in Models/ as model.h5
TEST LOSS 0.043562
TEST ACCURACY: 0.987889

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Models		Models
PickledData		PickledData
brown		brown
.gitignore		.gitignore
DetailedDescription.pdf		DetailedDescription.pdf
README.md		README.md
environment.yml		environment.yml
extract_data.py		extract_data.py
initial_model.h5		initial_model.h5
make_glove_pickle.py		make_glove_pickle.py
make_model.py		make_model.py
model_evaluation.py		model_evaluation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Models

Models

PickledData

PickledData

brown

brown

.gitignore

.gitignore

DetailedDescription.pdf

DetailedDescription.pdf

README.md

README.md

environment.yml

environment.yml

extract_data.py

extract_data.py

initial_model.h5

initial_model.h5

make_glove_pickle.py

make_glove_pickle.py

make_model.py

make_model.py

model_evaluation.py

model_evaluation.py

Repository files navigation

LSTM_POS_Tagger

TL;DR:

Setup

Sample output of training (for 2 epochs):

Example output on training

About

Releases

Packages

Contributors 2

Languages

aneesh-joshi/LSTM_POS_Tagger

Folders and files

Latest commit

History

Repository files navigation

LSTM_POS_Tagger

TL;DR:

Setup

Sample output of training (for 2 epochs):

Example output on training

About

Resources

Stars

Watchers

Forks

Languages