Final Natural Language Processing Project using sentiment analysis. Most of the project is written in Python.
Data Cleaning
Data Cleaner - CleaningFile.py
Data Extractor - Extract_Data.py
Vocabulary Generator - GenerateVocab.java
Initial Methods
Naive Bayes' - NaiveBayesClassifier.py
VADER - VaderYelp.py
TextBlob - TestBlob.py
Opinion Lexicon - OpinionLexicon.py
Datasets
500 Entries - Clean_sample.txt
1,000 Entries - Development1000.txt
5,000 Entries - Development5000.txt
10,000 Entries - Development10000.txt
Combination Methods
Majority Voting - MajorityVoting.py
Accuracy Weighting - AccuracyWeighting.py
Error Analysis - ErrorAnalysis.py
SAMPLE OUTPUT
On dataset of 5000 reviews
Naive Bayes': Accuracy: 0.8342
VADER: Accuracy: 0.8444
TextBlob: Accuracy: 0.8816
Opinion Lexicon: Accuracy: 0.5521
Majority Voting: Accuracy: 0.8452
Accuracy Weighting: Accuracy: 0.8492
Error Analysis: Accuracy: 0.8843