GitHub - rangwani-harsh/char-cnn-char-rnn-sentiment-analysis

Character Level Models For Sentiment Analysis

The CNN version is the same as Yoon Kim's CNN applied at character level. The char RNN is a GRU based model.

Dataset

Twitter US Sentiment Analysis Dataset which are split into three seperate files of Negetive, Positive and Neutral Tweets.

Implementation Details:

No Preprocessing
Two models (One is char cnn and the other is char rnn).
Evaluation metric - Macro F1
Saving the checkpoints if validation_macro_f1 > best_macro_f1.

For Requirements

Install the requirements by pip install -r requirements.txt

To evaluate the validation accuracy:

python main.py --test --snapshot saved-models/best-cnn.pt
python main.py --test -snapshot saved-models/best-rnn.pt --rnn

For prediction of the sentence file:

python predict.py --input input_file -output output_file In the input file write the instances to be classified line by line.

Suggestions for improving the model

Hyperparameter tuning
Preprocessing
Char CNN and Char LSTM can be used on token level as being used as a whole.
Explore class_weight parameter to make our classifier work equally well on all classes.

For CNN best validation F1 is around 73%. 
              precision    recall  f1-score   support

           0     0.8567    0.8858    0.8710       911
           1     0.5943    0.6176    0.6058       306
           2     0.7990    0.6599    0.7228       247

   micro avg     0.7917    0.7917    0.7917      1464
   macro avg     0.7500    0.7211    0.7332      1464
weighted avg     0.7921    0.7917    0.7906      1464

For Char RNN validation F1 is around 70%
              precision    recall  f1-score   support

           0     0.8448    0.8606    0.8526       911
           1     0.5569    0.5915    0.5737       306
           2     0.7251    0.6194    0.6681       247

   micro avg     0.7637    0.7637    0.7637      1464
   macro avg     0.7090    0.6905    0.6982      1464
weighted avg     0.7645    0.7637    0.7632      1464

0- negetive 1-neutral 2-positive

Training was done on Titan XP GPU. As the training and validation sets are being determined at run time I have tried to keep the seeds same and it works on my system however I haven't used deterministic version of cuda.

The boiler plate code was taken from the repository https://github.com/srviest/char-cnn-text-classification-pytorch.

In case you require assistance please feel free to email me harsh.rangwani.cse15@iitbhu.ac.in

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
input.txt		input.txt
main.py		main.py
model.py		model.py
mydatasets.py		mydatasets.py
output.txt		output.txt
predict.py		predict.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

input.txt

input.txt

main.py

main.py

model.py

model.py

mydatasets.py

mydatasets.py

output.txt

output.txt

predict.py

predict.py

requirements.txt

requirements.txt

train.py

train.py

Repository files navigation

Character Level Models For Sentiment Analysis

Dataset

Implementation Details:

For Requirements

To evaluate the validation accuracy:

For prediction of the sentence file:

Suggestions for improving the model

About

Releases

Packages

Contributors 11

Languages

License

rangwani-harsh/char-cnn-char-rnn-sentiment-analysis

Folders and files

Latest commit

History

Repository files navigation

Character Level Models For Sentiment Analysis

Dataset

Implementation Details:

For Requirements

To evaluate the validation accuracy:

For prediction of the sentence file:

Suggestions for improving the model

About

Topics

Resources

License

Stars

Watchers

Forks

Languages