Skip to content

Analysing how travellers in February 2015 expressed their feelings on Twitter.

License

Notifications You must be signed in to change notification settings

AmritK10/Twitter_US_Airline_Sentiment_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Twitter_US_Airline_Sentiment_Analysis

Dataset

This data originally came from Crowdflower's Data for Everyone library: http://www.crowdflower.com/data-for-everyone which states:
A sentiment analysis job about the problems of each major U.S. airline. Twitter data was scraped from February of 2015 and contributors were asked to first classify positive, negative, and neutral tweets, followed by categorizing negative reasons (such as "late flight" or "rude service").

It contains whether the sentiment of the tweets in this set was positive, neutral, or negative for six US airlines:
Screen Shot 2019-03-31 at 5 50 43 PM

Features

The csv file has been added to the repo as Tweets_data.csv.It contains the following features (columns):
tweet_id
airline_sentiment
airline_sentiment_confidence
negativereason
negativereason_confidence
airline
airline_sentiment_gold
name
negativereason_gold
retweet_count
text
tweet_coord
tweet_created
tweet_location
user_timezone
Tweets

Implementation

The data was cleaned using Natural Language Toolkit (NLTK).
For the analysis, Multinomial Naive Bayes and Supprt Vector Machine were used.

Multinomial Naive Bayes Results

MultinomialNB classifier from sklearn was used.
Training Accuracy: 80.87%
Testing Accuracy: 77.18%

Support Vector Machine Results

SVC classifier from sklearn was used.
Training Accuracy: 87.94%
Training Weighted Average F1-score: 0.88
Testing Accuracy: 78.79%
Testing Weighted Average F1-score: 0.79

About

Analysing how travellers in February 2015 expressed their feelings on Twitter.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published