In this exercise I utilized Python libraries - pandas, numpy, matplotlib.pyplot, tweepy, seaborn, datetime, VADER - JSON traversals, and Twitter's API to perform a sentiment analysis on the news mood based on tweets from five different news organizations - BBC, CBS, CNN, Fox News, and New York times.
-
The scatterplot reflecting the sentiment for the most recent one hundred tweets on Twitter for five major news organizations was highly variable ranging anywhere from ~-0.95 to +0.95, with -1 being the most negative sentiment, and +1 being the most positive sentiment, based on the VADER (Valence Aware Dictionary and sEntiment Reasoner) Sentiment Analysis. Visually it was difficult to determine which news organizations had the most positive or negative sentiments based on the scatterplot alone.
-
Numerous points on the scatterplot were located at the 0 (zero) y-intercept. My first assumption was that these points simply represented an overwhelming number of tweets with neutral sentiment, but a closer look at the tweet text indicated that several of these “neutral” points also represented tweets in languages other than English, which could not be evaluated by VADER, and were, therefore, given a compound score of 0. I added a filter to my code so that only English tweets were counted and evaluated, but a few tweets in other languages still managed to come through in my analysis.
-
A bar plot representing the mean tweet sentiment made it easier to interpret the overall sentiment at a specific time for each news organization as being more positive or more negative. Having said that, the sentiment means for the same news organization varied tremendously from hour to hour, and day to day (data not shown). When I ran my code two days ago, which coincided with the release of the book “Fire and Fury: Inside the Trump White House” by Michael Wolff for example, all news organizations presented a negative mean sentiment. The bar plot below represents an analysis performed Sunday night (01/07/2018) with positive mean sentiment values for BBC, CBS and the NY Times (ranging from +0.06 to +0.09, a slightly negative mean for Fox News (- 0.03) and a negative sentiment mean for CNN (-0.1). I noticed that several of the tweets were about the Golden Globe Awards, which may partially explain the overall boost in tweet sentiment this evening, compared to earlier today. Overall, it would be best to sample tweets throughout a couple of months or a year to get a better idea of the overall sentiment for each news organization on Twitter.
# Import dependencies
import tweepy
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import json
import numpy as np
from IPython.display import display
from datetime import datetime
# Import and Initialize Sentiment Analyzer
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
#Set up and call config document
import yaml
TWITTER_CONFIG_FILE = 'auth.yaml'
with open (TWITTER_CONFIG_FILE, 'r') as config_file:
config = yaml.load (config_file)
#print(type(config))
# Twitter API Keys
access_token = config ['twitter']['access_token']
access_token_secret = config ['twitter']['access_token_secret']
consumer_key= config['twitter']['consumer_key']
consumer_secret = config ['twitter']['consumer_secret']
#print(access_token, access_token_secret, consumer_key, consumer_secret)
# Setup Tweepy API Authentication
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, parser=tweepy.parsers.JSONParser())
# Target Search Term
news_orgs = ("BBC", "CBS", "CNN","FoxNews","nytimes")
# Create arrays to hold sentiments for all news organizations
all_sentiments=[]
sentiment_means=[]
# Loop through all target news organizations
for org in news_orgs:
# Reset counter for each news_org loop
counter=1
# Variables for holding sentiments
compound_list = []
positive_list = []
negative_list = []
neutral_list = []
# Run search for each tweet
public_tweets = api.search(org, count=100, result_type="recent",lang='en')
#print(json.dumps(public_tweets["statuses"], indent=4, sort_keys=True, separators=(',',': ')))
# Loop through all tweets
for tweet in public_tweets["statuses"]:
# Run Vader Analysis on each tweet
compound = analyzer.polarity_scores(tweet["text"])["compound"]
pos = analyzer.polarity_scores(tweet["text"])["pos"]
neu = analyzer.polarity_scores(tweet["text"])["neu"]
neg = analyzer.polarity_scores(tweet["text"])["neg"]
# Add each value to the appropriate arrays above
compound_list.append(compound)
positive_list.append(pos)
negative_list.append(neg)
neutral_list.append(neu)
#print(org)
#print (compound_list, tweets_ago)
#print(" ")
# Append all sentiments to an array
all_sentiments.append({" Media" : org,
"Date": tweet["created_at"],
"Compound": compound,
"Positive": pos,
"Neutral": neu,
"Negative": neg,
"Tweets_Ago": counter
})
# Add 1 to counter
counter+=1
# Store the Average Sentiments into the array created above
sentiment_means.append({" Media": org,
"Compound_Mean": np.mean(compound_list),
"Positive": np.mean(positive_list),
"Neutral": np.mean(negative_list),
"Negative": np.mean(neutral_list),
"Count": len(compound_list)
})
# Convert all_sentiments to DataFrame
all_sentiments_pd = pd.DataFrame.from_dict(all_sentiments)
all_sentiments_pd.to_csv("sentiments_array_pd.csv")
display(all_sentiments_pd)
#print(all_sentiments_pd.dtypes)
# Convert sentiment_means to DataFrame
sentiment_means_pd = pd.DataFrame.from_dict(sentiment_means)
display(sentiment_means_pd)
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
Media | Compound | Date | Negative | Neutral | Positive | Tweets_Ago | |
---|---|---|---|---|---|---|---|
0 | BBC | 0.0000 | Mon Jan 08 07:04:28 +0000 2018 | 0.000 | 1.000 | 0.000 | 1 |
1 | BBC | 0.5719 | Mon Jan 08 07:04:27 +0000 2018 | 0.000 | 0.850 | 0.150 | 2 |
2 | BBC | -0.6597 | Mon Jan 08 07:04:27 +0000 2018 | 0.306 | 0.694 | 0.000 | 3 |
3 | BBC | 0.7906 | Mon Jan 08 07:04:26 +0000 2018 | 0.000 | 0.750 | 0.250 | 4 |
4 | BBC | 0.0000 | Mon Jan 08 07:04:25 +0000 2018 | 0.000 | 1.000 | 0.000 | 5 |
5 | BBC | 0.5994 | Mon Jan 08 07:04:25 +0000 2018 | 0.075 | 0.717 | 0.208 | 6 |
6 | BBC | -0.5423 | Mon Jan 08 07:04:25 +0000 2018 | 0.218 | 0.691 | 0.091 | 7 |
7 | BBC | 0.5719 | Mon Jan 08 07:04:24 +0000 2018 | 0.000 | 0.575 | 0.425 | 8 |
8 | BBC | 0.5719 | Mon Jan 08 07:04:23 +0000 2018 | 0.000 | 0.850 | 0.150 | 9 |
9 | BBC | 0.6369 | Mon Jan 08 07:04:23 +0000 2018 | 0.000 | 0.755 | 0.245 | 10 |
10 | BBC | 0.6369 | Mon Jan 08 07:04:23 +0000 2018 | 0.000 | 0.802 | 0.198 | 11 |
11 | BBC | -0.4404 | Mon Jan 08 07:04:22 +0000 2018 | 0.253 | 0.642 | 0.106 | 12 |
12 | BBC | 0.5106 | Mon Jan 08 07:04:22 +0000 2018 | 0.092 | 0.683 | 0.225 | 13 |
13 | BBC | -0.4767 | Mon Jan 08 07:04:22 +0000 2018 | 0.256 | 0.744 | 0.000 | 14 |
14 | BBC | 0.0000 | Mon Jan 08 07:04:20 +0000 2018 | 0.000 | 1.000 | 0.000 | 15 |
15 | BBC | 0.5719 | Mon Jan 08 07:04:20 +0000 2018 | 0.000 | 0.850 | 0.150 | 16 |
16 | BBC | -0.2263 | Mon Jan 08 07:04:19 +0000 2018 | 0.087 | 0.913 | 0.000 | 17 |
17 | BBC | 0.0000 | Mon Jan 08 07:04:19 +0000 2018 | 0.000 | 1.000 | 0.000 | 18 |
18 | BBC | 0.3612 | Mon Jan 08 07:04:19 +0000 2018 | 0.000 | 0.898 | 0.102 | 19 |
19 | BBC | 0.0000 | Mon Jan 08 07:04:18 +0000 2018 | 0.000 | 1.000 | 0.000 | 20 |
20 | BBC | 0.5719 | Mon Jan 08 07:04:18 +0000 2018 | 0.000 | 0.850 | 0.150 | 21 |
21 | BBC | 0.5719 | Mon Jan 08 07:04:18 +0000 2018 | 0.000 | 0.850 | 0.150 | 22 |
22 | BBC | -0.2617 | Mon Jan 08 07:04:18 +0000 2018 | 0.127 | 0.785 | 0.088 | 23 |
23 | BBC | -0.6486 | Mon Jan 08 07:04:18 +0000 2018 | 0.374 | 0.485 | 0.141 | 24 |
24 | BBC | -0.3595 | Mon Jan 08 07:04:18 +0000 2018 | 0.161 | 0.839 | 0.000 | 25 |
25 | BBC | -0.2732 | Mon Jan 08 07:04:17 +0000 2018 | 0.123 | 0.877 | 0.000 | 26 |
26 | BBC | 0.0000 | Mon Jan 08 07:04:17 +0000 2018 | 0.000 | 1.000 | 0.000 | 27 |
27 | BBC | 0.0000 | Mon Jan 08 07:04:16 +0000 2018 | 0.000 | 1.000 | 0.000 | 28 |
28 | BBC | 0.5267 | Mon Jan 08 07:04:15 +0000 2018 | 0.094 | 0.644 | 0.262 | 29 |
29 | BBC | -0.1027 | Mon Jan 08 07:04:15 +0000 2018 | 0.123 | 0.877 | 0.000 | 30 |
... | ... | ... | ... | ... | ... | ... | ... |
470 | nytimes | 0.5719 | Mon Jan 08 07:03:39 +0000 2018 | 0.000 | 0.861 | 0.139 | 71 |
471 | nytimes | -0.8519 | Mon Jan 08 07:03:39 +0000 2018 | 0.283 | 0.717 | 0.000 | 72 |
472 | nytimes | 0.2732 | Mon Jan 08 07:03:39 +0000 2018 | 0.107 | 0.741 | 0.152 | 73 |
473 | nytimes | 0.2732 | Mon Jan 08 07:03:39 +0000 2018 | 0.000 | 0.806 | 0.194 | 74 |
474 | nytimes | 0.0000 | Mon Jan 08 07:03:39 +0000 2018 | 0.000 | 1.000 | 0.000 | 75 |
475 | nytimes | -0.4767 | Mon Jan 08 07:03:38 +0000 2018 | 0.147 | 0.853 | 0.000 | 76 |
476 | nytimes | 0.0000 | Mon Jan 08 07:03:38 +0000 2018 | 0.000 | 1.000 | 0.000 | 77 |
477 | nytimes | 0.0000 | Mon Jan 08 07:03:37 +0000 2018 | 0.000 | 1.000 | 0.000 | 78 |
478 | nytimes | -0.4215 | Mon Jan 08 07:03:37 +0000 2018 | 0.109 | 0.891 | 0.000 | 79 |
479 | nytimes | 0.0000 | Mon Jan 08 07:03:36 +0000 2018 | 0.000 | 1.000 | 0.000 | 80 |
480 | nytimes | 0.0000 | Mon Jan 08 07:03:35 +0000 2018 | 0.000 | 1.000 | 0.000 | 81 |
481 | nytimes | -0.1695 | Mon Jan 08 07:03:35 +0000 2018 | 0.180 | 0.702 | 0.118 | 82 |
482 | nytimes | -0.3182 | Mon Jan 08 07:03:35 +0000 2018 | 0.091 | 0.909 | 0.000 | 83 |
483 | nytimes | 0.4939 | Mon Jan 08 07:03:35 +0000 2018 | 0.000 | 0.802 | 0.198 | 84 |
484 | nytimes | 0.0000 | Mon Jan 08 07:03:35 +0000 2018 | 0.000 | 1.000 | 0.000 | 85 |
485 | nytimes | 0.0000 | Mon Jan 08 07:03:33 +0000 2018 | 0.000 | 1.000 | 0.000 | 86 |
486 | nytimes | 0.0000 | Mon Jan 08 07:03:31 +0000 2018 | 0.000 | 1.000 | 0.000 | 87 |
487 | nytimes | 0.3818 | Mon Jan 08 07:03:31 +0000 2018 | 0.000 | 0.885 | 0.115 | 88 |
488 | nytimes | -0.5106 | Mon Jan 08 07:03:31 +0000 2018 | 0.320 | 0.680 | 0.000 | 89 |
489 | nytimes | -0.5122 | Mon Jan 08 07:03:31 +0000 2018 | 0.212 | 0.788 | 0.000 | 90 |
490 | nytimes | -0.4215 | Mon Jan 08 07:03:30 +0000 2018 | 0.109 | 0.891 | 0.000 | 91 |
491 | nytimes | 0.0000 | Mon Jan 08 07:03:30 +0000 2018 | 0.000 | 1.000 | 0.000 | 92 |
492 | nytimes | 0.5719 | Mon Jan 08 07:03:30 +0000 2018 | 0.000 | 0.861 | 0.139 | 93 |
493 | nytimes | -0.5574 | Mon Jan 08 07:03:29 +0000 2018 | 0.375 | 0.625 | 0.000 | 94 |
494 | nytimes | 0.0000 | Mon Jan 08 07:03:29 +0000 2018 | 0.000 | 1.000 | 0.000 | 95 |
495 | nytimes | 0.0000 | Mon Jan 08 07:03:28 +0000 2018 | 0.000 | 1.000 | 0.000 | 96 |
496 | nytimes | -0.0516 | Mon Jan 08 07:03:28 +0000 2018 | 0.239 | 0.606 | 0.155 | 97 |
497 | nytimes | -0.4215 | Mon Jan 08 07:03:27 +0000 2018 | 0.109 | 0.891 | 0.000 | 98 |
498 | nytimes | 0.0000 | Mon Jan 08 07:03:27 +0000 2018 | 0.000 | 1.000 | 0.000 | 99 |
499 | nytimes | 0.4767 | Mon Jan 08 07:03:25 +0000 2018 | 0.000 | 0.795 | 0.205 | 100 |
500 rows × 7 columns
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
Media | Compound_Mean | Count | Negative | Neutral | Positive | |
---|---|---|---|---|---|---|
0 | BBC | 0.083353 | 100 | 0.84301 | 0.06990 | 0.08707 |
1 | CBS | 0.061578 | 100 | 0.86744 | 0.05553 | 0.07701 |
2 | CNN | -0.107187 | 100 | 0.82274 | 0.10897 | 0.06833 |
3 | FoxNews | -0.028154 | 100 | 0.82891 | 0.09015 | 0.08093 |
4 | nytimes | 0.057799 | 100 | 0.86923 | 0.05773 | 0.07304 |
# Create a scatterplot
all_sentiments_pd.set_index('Tweets_Ago', inplace=True)
all_sentiments_pd.groupby(' Media')['Compound'].plot(legend=True, marker = 'o', linewidth=0)
# Customize scatterplot features
plt.style.use('ggplot')
plt.axhline(c='k', alpha=0.2, linestyle= 'dashed')
plt.axis([0,6,-1.1,1.1])
plt.xlim(0,101)
plt.ylim(-1,1)
plt.xlabel("Tweets Ago", fontsize=15)
plt.ylabel("Tweet Polarity", fontsize=15)
plt.legend(loc=(1.0, 0.75),edgecolor='black')
plt.grid(True, ls='dashed')
plt.title("Sentiment Analysis per Media Source" + " "+ "(" + datetime.now().strftime('%m/%d/%Y') + ")")
plt.savefig("Sentiment Analysis of Media Tweets.png",bbox_inches='tight')
plt.show()
# Create a barplot
ax=sns.barplot(x=' Media', y='Compound_Mean', data=sentiment_means_pd)
# Customize barplot features
ax.set_xlabel('Media', fontsize=15)
ax.set_ylabel('Tweet Polarity', fontsize=15)
ax.set_title("Overall Media Sentiment based on Twitter"+ " "+ "(" + datetime.now().strftime('%m/%d/%Y') + ")")
ax.set_ylim(-0.12, 0.12)
ax.grid(True, ls='dashed')
ax.hlines(0, -1, 10, colors='k', alpha=0.4)
plt.savefig("Overall Sentiment based on Twitter.png")
plt.show()