Sentiment analysis tools benchmarking

Description

 The sentiment analysis is the use of natural language processing, text analysis, 
 computational linguistics, and biometrics to systematically identify, extract,
 quantify, and study affective states and subjective information.
 
 The aim of this project is to measure the accuracy of the most popular sentiment analysis tools.

The benchmark test was carried out by using the below described datasets:

IMDb Dataset: containing a list of movie review divided into negative and positive statements. It contains the most 25k popular reviews. source
Twitter US Airlines Dataset: containing the tweets about each of the major US airlines since Feb 2015. Each tweet il classified positive, negative or neutral. source
Sentiment140 dataset: this dataset contains various tweets record that includes polarity, date, and the tweet text source

The minimum information contained by each record of those datasets are:

the text to analyze
the related correct sentiment

Overview:

The project has one entrypoint file: make_benchmark.py. It has dedicated commands to execute a benchmark with a specific dataset and a specific sentiment analysis tool. The entrypoint will then call two of the underlying layers:

dataset reader: a module that reads the dataset and pre-process/standardizes the data to adapt them to the sentiment analysis core procedure
sentiment matcher: is the sentiment analysis core procedure which acts as result post-process; it understands the results and standardizes them to the caller in order to calculate the sentiment hit or sentiment miss. Each sentiment matcher knows the correct sentiment for each record; this allows to calculate a sentiment hit or sentiment miss.

Commands

-dataset: allow you to select one of the supported dataset to execute the benchmark. The possibile values are:

-imdb: to select the imdb dataset
-twitter: to select the twitter dataset
-sentiment140: to select the sentiment140 dataset

-tool: allow you to specify the tool to use to execute the sentiment analysis prediction/benchmark

-vader: to select vader as sentiment analysis executor tool
-textblob: to select textblob as sentiment analysis executor tool
-azure: to select azure text-language-engine as sentiment analysis executor tool
-aws: to select amazon comprehend as sentiment analysis executor tool

An example of execution with azure text-language-engine as sentiment analysis engine and the sentiment140 dataset as input of the benchmark:

python make_benchmark.py -dataset sentiment140_dataset.csv -tool azure

An example of response that shows the number of sentiment hit and sentiment miss:

.
...
.....
positives 739 ### negatives 574 ### neutrals 1039
positives 739 ### negatives 574 ### neutrals 1040
positives 739 ### negatives 574 ### neutrals 1041
positives 739 ### negatives 574 ### neutrals 1041
hits: 523, analyzed rows: 2470

Usage Examples

To run a benchmark of the vader tool given imdb sentences dataset as input:

python .\make_benchmark.py -dataset imdb -tool vader

To run a benchmark of the textblob tool given imdb sentences dataset as input:

python .\make_benchmark.py -dataset imdb -tool textblob

To run a benchmark of the textblob tool given sentiment140 dataset as input:

python .\make_benchmark.py -dataset sentiment140 -tool textblob

To run a benchmark of the aws tool given sentiment140 dataset as input:

python .\make_benchmark.py -dataset sentiment140 -tool aws

To run a benchmark of the azure tool given twitter dataset as input:

python .\make_benchmark.py -dataset twitter -tool azure

How to configure credential for AWS and Azure services

AWS

Create .aws folder under C:/Users/user/
Create the file .aws/credentials
The credentials file will contain:

[default]
aws_access_key_id = ...
aws_secret_access_key = ...

the AWS comprehend client automatically reads the credentials file to open the connection

Azure

The azure_sentiment_matcher.py module reads the Azure endpoint key from an environment variable called AZURE_KEY, so:

Get the endpoint key from the Azure resource manager portal
Create the AZURE_KEY environment variable and assign it the key value
You are ready to run the azure_sentiment_matcher.py

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
dataset_readers		dataset_readers
resources		resources
sentiment_matchers		sentiment_matchers
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
make_benchmark.py		make_benchmark.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset_readers

dataset_readers

resources

resources

sentiment_matchers

sentiment_matchers

.gitattributes

.gitattributes

.gitignore

.gitignore

README.md

README.md

make_benchmark.py

make_benchmark.py

requirements.txt

requirements.txt

Repository files navigation

Sentiment analysis tools benchmarking

Description

Overview:

Commands

Usage Examples

How to configure credential for AWS and Azure services

AWS

Azure

About

Releases

Packages

Languages

leo-capvano/sentiment_analysis_benchmarking

Folders and files

Latest commit

History

Repository files navigation

Sentiment analysis tools benchmarking

Description

Overview:

Commands

Usage Examples

How to configure credential for AWS and Azure services

AWS

Azure

About

Topics

Resources

Stars

Watchers

Forks

Languages