Skip to content

roomylee/rcnn-text-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Recurrent Convolutional Neural Network for Text Classification

Tensorflow implementation of "Recurrent Convolutional Neural Network for Text Classification".

rcnn

Data: Movie Review

  • Movie reviews with one sentence per review. Classification involves detecting positive/negative reviews (Pang and Lee, 2005).
  • Download "sentence polarity dataset v1.0" at the Official Download Page.
  • Located in "data/rt-polaritydata/" in my repository.
  • rt-polarity.pos contains 5331 positive snippets.
  • rt-polarity.neg contains 5331 negative snippets.

Implementation of Recurrent Structure

recurrent_structure

  • Bidirectional RNN (Bi-RNN) is used to implement the left and right context vectors.
  • Each context vector is created by shifting the output of Bi-RNN and concatenating a zero state indicating the start of the context.

Usage

Train

  • positive data is located in "data/rt-polaritydata/rt-polarity.pos".

  • negative data is located in "data/rt-polaritydata/rt-polarity.neg".

  • "GoogleNews-vectors-negative300" is used as pre-trained word2vec model.

  • Display help message:

     $ python train.py --help
  • Train Example:

     $ python train.py --cell_type "lstm" \
     --pos_dir "data/rt-polaritydata/rt-polarity.pos" \
     --neg_dir "data/rt-polaritydata/rt-polarity.neg"\
     --word2vec "GoogleNews-vectors-negative300.bin"

Evalutation

  • Movie Review dataset has no test data.

  • If you want to evaluate, you should make test dataset from train data or do cross validation. However, cross validation is not implemented in my project.

  • The bellow example just use full rt-polarity dataset same the train dataset.

  • Evaluation Example:

     $ python eval.py \
     --pos_dir "data/rt-polaritydata/rt-polarity.pos" \
     --neg_dir "data/rt-polaritydata/rt-polarity.neg" \
     --checkpoint_dir "runs/1523902663/checkpoints"

Result

  • Comparision between Recurrent Convolutional Neural Network and Convolutional Neural Network.
  • dennybritz's cnn-text-classification-tf is used for compared CNN model.
  • Same pre-trained word2vec used for both models.

Accuracy for validation set

accuracy

Loss for validation set

accuracy

Reference

  • Recurrent Convolutional Neural Network for Text Classification (AAAI 2015), S Lai et al. [paper]

Releases

No releases published

Packages

No packages published

Languages