Sentiment Classification of Movie Reviews

Last updated: 2022-May-21

Neural Network model to classify whether a movie review is positive or negative. Movie reviews are written in English and obtained from IMDB.

Please contact me at sky@u.nus.edu if you have any questions.

The repository is publicly available at https://github.com/yulonglong/SentimentClassifier

Stable release with pdf report is in https://github.com/yulonglong/SentimentClassifier/releases/tag/v1.4

This repository has been updated with more recent version of PyTorch and other libraries, please visit tag v1.4 to replicate results reported in the pdf documentation.

TLDR:

Assuming you are on Ubuntu 22.04 with Python 3.10.4, open terminal
./setup.sh (Install packages) or ./setup_global.sh (To install packages globally with sudo access)
cd data && ./run.sh && cd .. (Download and Prepare dataset)
./run_train.sh 0 GTX1070 (Train model on GPU 0, a GTX1070)
./run_test (Test model on CPU)

Requirements and Environment:

Ubuntu 22.04
Python 3.10.4 (default on Ubuntu 22.04)

Python Library Required (requirements.txt):

torch (PyTorch)
h5py
numpy
scikit-learn
scipy
nltk
pydot
matplotlib

Python libraries above can be installed via pip3 -r requirements.txt.

Python libraries setup:

If you are running a UNIX based machine, you can run the shell script ./setup.sh to install all the required python libraries, assuming pip is already installed. If you prefer to install manually, please refer to the list of libraries above.

Dataset:

IMDB Large Moview Review dataset is obtained from http://ai.stanford.edu/~amaas/data/sentiment/
Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). Learning Word Vectors for Sentiment Analysis. The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011).
If you are setting up for the first time:
- Download and process dataset from scratch by running run.sh script in data folder
- i.e., execute the following command cd data && ./run.sh
- The data will be automatically downloaded and preprocessed into training, validation, and test set
- If you do not have java development kit (JDK) installed, please install by running this command:
  - sudo apt update && sudo apt install default-jdk

To train the model

Execute ./run_train.sh <GPU_Number> <GPU_Name> , e.g., $> ./run_train.sh 0 TITANX
Please make sure Nvidia CUDA is installed to be able to train the model using GPU.
For more details on the training arguments, refer to the sample run_train.sh shell script

Pre-trained word embeddings

To train your own word embeddings from the provided unsupervised movie reviews, execute ./run_word2vec
To download pre-trained word embeddings such as GloVe, execute cd word2vec/vectors && ./run.sh

To test the model

For quick testing, execute ./run_test.sh
There are three arguments for testing:
- -v : vocab path, the path to the vocabulary files saved during training
- -m : model path, the path to the best model saved during training
- -ts : test path, the path to the text file containing the movie review to be evaluated
Sample command : python test.py -v saved_model/vocab_v50000.pkl -m saved_model/best_model_weights.h5 -ts data/aclImdb/train/unsup/74_0.txt

To run web demo

For quick testing: python3 web_demo.py
For production (using pm2): pm2 start web_demo.py --name sentiment_classifier --interpreter python3
Visit the web UI at 127.0.0.1:5021

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
best_model		best_model
command		command
core		core
data		data
web_flask		web_flask
word2vec		word2vec
.gitignore		.gitignore
readme.md		readme.md
requirements.txt		requirements.txt
run_test.sh		run_test.sh
run_train.sh		run_train.sh
run_word2vec.sh		run_word2vec.sh
setup.sh		setup.sh
setup_global.sh		setup_global.sh
test_batch.py		test_batch.py
test_single.py		test_single.py
train.py		train.py
web_demo.py		web_demo.py

yulonglong/SentimentClassifier

Folders and files

Latest commit

History

Repository files navigation

Sentiment Classification of Movie Reviews

About

Resources

Stars

Watchers

Forks

Languages