GitHub - minas26902/NewsSentimentAnalysis: In this exercise I utilized Python libraries - pandas, numpy, matplotlib.pyplot, tweepy, seaborn, datetime, VADER - JSON traversals, and Twitter's API to perform a sentiment analysis on the news mood based on tweets from five different news organizations - BBC, CBS, CNN, Fox News, and New York times.

News Sentiment Analysis

In this exercise I utilized Python libraries - pandas, numpy, matplotlib.pyplot, tweepy, seaborn, datetime, VADER - JSON traversals, and Twitter's API to perform a sentiment analysis on the news mood based on tweets from five different news organizations - BBC, CBS, CNN, Fox News, and New York times.

Three observable trends based on the data below-

The scatterplot reflecting the sentiment for the most recent one hundred tweets on Twitter for five major news organizations was highly variable ranging anywhere from ~-0.95 to +0.95, with -1 being the most negative sentiment, and +1 being the most positive sentiment, based on the VADER (Valence Aware Dictionary and sEntiment Reasoner) Sentiment Analysis. Visually it was difficult to determine which news organizations had the most positive or negative sentiments based on the scatterplot alone.
Numerous points on the scatterplot were located at the 0 (zero) y-intercept. My first assumption was that these points simply represented an overwhelming number of tweets with neutral sentiment, but a closer look at the tweet text indicated that several of these “neutral” points also represented tweets in languages other than English, which could not be evaluated by VADER, and were, therefore, given a compound score of 0. I added a filter to my code so that only English tweets were counted and evaluated, but a few tweets in other languages still managed to come through in my analysis.
A bar plot representing the mean tweet sentiment made it easier to interpret the overall sentiment at a specific time for each news organization as being more positive or more negative. Having said that, the sentiment means for the same news organization varied tremendously from hour to hour, and day to day (data not shown). When I ran my code two days ago, which coincided with the release of the book “Fire and Fury: Inside the Trump White House” by Michael Wolff for example, all news organizations presented a negative mean sentiment. The bar plot below represents an analysis performed Sunday night (01/07/2018) with positive mean sentiment values for BBC, CBS and the NY Times (ranging from +0.06 to +0.09, a slightly negative mean for Fox News (- 0.03) and a negative sentiment mean for CNN (-0.1). I noticed that several of the tweets were about the Golden Globe Awards, which may partially explain the overall boost in tweet sentiment this evening, compared to earlier today. Overall, it would be best to sample tweets throughout a couple of months or a year to get a better idea of the overall sentiment for each news organization on Twitter.

# Import dependencies
import tweepy
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import json
import numpy as np
from IPython.display import display
from datetime import datetime

# Import and Initialize Sentiment Analyzer
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

#Set up and call config document
import yaml
TWITTER_CONFIG_FILE = 'auth.yaml'

with open (TWITTER_CONFIG_FILE, 'r') as config_file:
    config = yaml.load (config_file)
#print(type(config))

# Twitter API Keys
access_token = config ['twitter']['access_token']
access_token_secret = config ['twitter']['access_token_secret']
consumer_key= config['twitter']['consumer_key']
consumer_secret = config ['twitter']['consumer_secret']
#print(access_token, access_token_secret, consumer_key, consumer_secret)

# Setup Tweepy API Authentication
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, parser=tweepy.parsers.JSONParser())

# Target Search Term
news_orgs = ("BBC", "CBS", "CNN","FoxNews","nytimes")
    
# Create arrays to hold sentiments for all news organizations
all_sentiments=[]
sentiment_means=[]

# Loop through all target news organizations
for org in news_orgs:
    
    # Reset counter for each news_org loop
    counter=1
    
    # Variables for holding sentiments
    compound_list = []
    positive_list = []
    negative_list = []
    neutral_list = []
    
    # Run search for each tweet
    public_tweets = api.search(org, count=100, result_type="recent",lang='en')       
    #print(json.dumps(public_tweets["statuses"], indent=4, sort_keys=True, separators=(',',': ')))   
    
    # Loop through all tweets
    for tweet in public_tweets["statuses"]:

        # Run Vader Analysis on each tweet
        compound = analyzer.polarity_scores(tweet["text"])["compound"]
        pos = analyzer.polarity_scores(tweet["text"])["pos"]
        neu = analyzer.polarity_scores(tweet["text"])["neu"]
        neg = analyzer.polarity_scores(tweet["text"])["neg"]

        # Add each value to the appropriate arrays above
        compound_list.append(compound)
        positive_list.append(pos)
        negative_list.append(neg)
        neutral_list.append(neu)  
        #print(org)
        #print (compound_list, tweets_ago)
        #print(" ")
        
        # Append all sentiments to an array
        all_sentiments.append({" Media" : org,
                           "Date": tweet["created_at"], 
                           "Compound": compound,
                           "Positive": pos,
                           "Neutral": neu,
                           "Negative": neg,
                           "Tweets_Ago": counter
                            })  
        # Add 1 to counter    
        counter+=1
        
    # Store the Average Sentiments into the array created above
    sentiment_means.append({" Media": org,
                    "Compound_Mean": np.mean(compound_list),
                    "Positive": np.mean(positive_list),
                    "Neutral": np.mean(negative_list),
                    "Negative": np.mean(neutral_list),
                    "Count": len(compound_list)
                    })

# Convert all_sentiments to DataFrame
all_sentiments_pd = pd.DataFrame.from_dict(all_sentiments)
all_sentiments_pd.to_csv("sentiments_array_pd.csv")
display(all_sentiments_pd)
#print(all_sentiments_pd.dtypes)

# Convert sentiment_means to DataFrame 
sentiment_means_pd = pd.DataFrame.from_dict(sentiment_means) 
display(sentiment_means_pd)

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	Media	Compound	Date	Negative	Neutral	Positive	Tweets_Ago
0	BBC	0.0000	Mon Jan 08 07:04:28 +0000 2018	0.000	1.000	0.000	1
1	BBC	0.5719	Mon Jan 08 07:04:27 +0000 2018	0.000	0.850	0.150	2
2	BBC	-0.6597	Mon Jan 08 07:04:27 +0000 2018	0.306	0.694	0.000	3
3	BBC	0.7906	Mon Jan 08 07:04:26 +0000 2018	0.000	0.750	0.250	4
4	BBC	0.0000	Mon Jan 08 07:04:25 +0000 2018	0.000	1.000	0.000	5
5	BBC	0.5994	Mon Jan 08 07:04:25 +0000 2018	0.075	0.717	0.208	6
6	BBC	-0.5423	Mon Jan 08 07:04:25 +0000 2018	0.218	0.691	0.091	7
7	BBC	0.5719	Mon Jan 08 07:04:24 +0000 2018	0.000	0.575	0.425	8
8	BBC	0.5719	Mon Jan 08 07:04:23 +0000 2018	0.000	0.850	0.150	9
9	BBC	0.6369	Mon Jan 08 07:04:23 +0000 2018	0.000	0.755	0.245	10
10	BBC	0.6369	Mon Jan 08 07:04:23 +0000 2018	0.000	0.802	0.198	11
11	BBC	-0.4404	Mon Jan 08 07:04:22 +0000 2018	0.253	0.642	0.106	12
12	BBC	0.5106	Mon Jan 08 07:04:22 +0000 2018	0.092	0.683	0.225	13
13	BBC	-0.4767	Mon Jan 08 07:04:22 +0000 2018	0.256	0.744	0.000	14
14	BBC	0.0000	Mon Jan 08 07:04:20 +0000 2018	0.000	1.000	0.000	15
15	BBC	0.5719	Mon Jan 08 07:04:20 +0000 2018	0.000	0.850	0.150	16
16	BBC	-0.2263	Mon Jan 08 07:04:19 +0000 2018	0.087	0.913	0.000	17
17	BBC	0.0000	Mon Jan 08 07:04:19 +0000 2018	0.000	1.000	0.000	18
18	BBC	0.3612	Mon Jan 08 07:04:19 +0000 2018	0.000	0.898	0.102	19
19	BBC	0.0000	Mon Jan 08 07:04:18 +0000 2018	0.000	1.000	0.000	20
20	BBC	0.5719	Mon Jan 08 07:04:18 +0000 2018	0.000	0.850	0.150	21
21	BBC	0.5719	Mon Jan 08 07:04:18 +0000 2018	0.000	0.850	0.150	22
22	BBC	-0.2617	Mon Jan 08 07:04:18 +0000 2018	0.127	0.785	0.088	23
23	BBC	-0.6486	Mon Jan 08 07:04:18 +0000 2018	0.374	0.485	0.141	24
24	BBC	-0.3595	Mon Jan 08 07:04:18 +0000 2018	0.161	0.839	0.000	25
25	BBC	-0.2732	Mon Jan 08 07:04:17 +0000 2018	0.123	0.877	0.000	26
26	BBC	0.0000	Mon Jan 08 07:04:17 +0000 2018	0.000	1.000	0.000	27
27	BBC	0.0000	Mon Jan 08 07:04:16 +0000 2018	0.000	1.000	0.000	28
28	BBC	0.5267	Mon Jan 08 07:04:15 +0000 2018	0.094	0.644	0.262	29
29	BBC	-0.1027	Mon Jan 08 07:04:15 +0000 2018	0.123	0.877	0.000	30
...	...	...	...	...	...	...	...
470	nytimes	0.5719	Mon Jan 08 07:03:39 +0000 2018	0.000	0.861	0.139	71
471	nytimes	-0.8519	Mon Jan 08 07:03:39 +0000 2018	0.283	0.717	0.000	72
472	nytimes	0.2732	Mon Jan 08 07:03:39 +0000 2018	0.107	0.741	0.152	73
473	nytimes	0.2732	Mon Jan 08 07:03:39 +0000 2018	0.000	0.806	0.194	74
474	nytimes	0.0000	Mon Jan 08 07:03:39 +0000 2018	0.000	1.000	0.000	75
475	nytimes	-0.4767	Mon Jan 08 07:03:38 +0000 2018	0.147	0.853	0.000	76
476	nytimes	0.0000	Mon Jan 08 07:03:38 +0000 2018	0.000	1.000	0.000	77
477	nytimes	0.0000	Mon Jan 08 07:03:37 +0000 2018	0.000	1.000	0.000	78
478	nytimes	-0.4215	Mon Jan 08 07:03:37 +0000 2018	0.109	0.891	0.000	79
479	nytimes	0.0000	Mon Jan 08 07:03:36 +0000 2018	0.000	1.000	0.000	80
480	nytimes	0.0000	Mon Jan 08 07:03:35 +0000 2018	0.000	1.000	0.000	81
481	nytimes	-0.1695	Mon Jan 08 07:03:35 +0000 2018	0.180	0.702	0.118	82
482	nytimes	-0.3182	Mon Jan 08 07:03:35 +0000 2018	0.091	0.909	0.000	83
483	nytimes	0.4939	Mon Jan 08 07:03:35 +0000 2018	0.000	0.802	0.198	84
484	nytimes	0.0000	Mon Jan 08 07:03:35 +0000 2018	0.000	1.000	0.000	85
485	nytimes	0.0000	Mon Jan 08 07:03:33 +0000 2018	0.000	1.000	0.000	86
486	nytimes	0.0000	Mon Jan 08 07:03:31 +0000 2018	0.000	1.000	0.000	87
487	nytimes	0.3818	Mon Jan 08 07:03:31 +0000 2018	0.000	0.885	0.115	88
488	nytimes	-0.5106	Mon Jan 08 07:03:31 +0000 2018	0.320	0.680	0.000	89
489	nytimes	-0.5122	Mon Jan 08 07:03:31 +0000 2018	0.212	0.788	0.000	90
490	nytimes	-0.4215	Mon Jan 08 07:03:30 +0000 2018	0.109	0.891	0.000	91
491	nytimes	0.0000	Mon Jan 08 07:03:30 +0000 2018	0.000	1.000	0.000	92
492	nytimes	0.5719	Mon Jan 08 07:03:30 +0000 2018	0.000	0.861	0.139	93
493	nytimes	-0.5574	Mon Jan 08 07:03:29 +0000 2018	0.375	0.625	0.000	94
494	nytimes	0.0000	Mon Jan 08 07:03:29 +0000 2018	0.000	1.000	0.000	95
495	nytimes	0.0000	Mon Jan 08 07:03:28 +0000 2018	0.000	1.000	0.000	96
496	nytimes	-0.0516	Mon Jan 08 07:03:28 +0000 2018	0.239	0.606	0.155	97
497	nytimes	-0.4215	Mon Jan 08 07:03:27 +0000 2018	0.109	0.891	0.000	98
498	nytimes	0.0000	Mon Jan 08 07:03:27 +0000 2018	0.000	1.000	0.000	99
499	nytimes	0.4767	Mon Jan 08 07:03:25 +0000 2018	0.000	0.795	0.205	100

500 rows × 7 columns

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	Media	Compound_Mean	Count	Negative	Neutral	Positive
0	BBC	0.083353	100	0.84301	0.06990	0.08707
1	CBS	0.061578	100	0.86744	0.05553	0.07701
2	CNN	-0.107187	100	0.82274	0.10897	0.06833
3	FoxNews	-0.028154	100	0.82891	0.09015	0.08093
4	nytimes	0.057799	100	0.86923	0.05773	0.07304

# Create a scatterplot
all_sentiments_pd.set_index('Tweets_Ago', inplace=True)
all_sentiments_pd.groupby(' Media')['Compound'].plot(legend=True, marker = 'o', linewidth=0)

# Customize scatterplot features
plt.style.use('ggplot')
plt.axhline(c='k', alpha=0.2, linestyle= 'dashed')
plt.axis([0,6,-1.1,1.1])
plt.xlim(0,101)
plt.ylim(-1,1)
plt.xlabel("Tweets Ago", fontsize=15)
plt.ylabel("Tweet Polarity", fontsize=15)
plt.legend(loc=(1.0, 0.75),edgecolor='black')
plt.grid(True, ls='dashed')
plt.title("Sentiment Analysis per Media Source" + " "+ "(" + datetime.now().strftime('%m/%d/%Y') + ")")
plt.savefig("Sentiment Analysis of Media Tweets.png",bbox_inches='tight')
plt.show()

# Create a barplot
ax=sns.barplot(x=' Media', y='Compound_Mean', data=sentiment_means_pd)

# Customize barplot features
ax.set_xlabel('Media', fontsize=15)
ax.set_ylabel('Tweet Polarity', fontsize=15)
ax.set_title("Overall Media Sentiment based on Twitter"+ " "+ "(" + datetime.now().strftime('%m/%d/%Y') + ")")
ax.set_ylim(-0.12, 0.12)
ax.grid(True, ls='dashed')
ax.hlines(0, -1, 10, colors='k', alpha=0.4)
plt.savefig("Overall Sentiment based on Twitter.png")
plt.show()

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README_files		README_files
.gitignore		.gitignore
HW7-NewsMood_AMD.ipynb		HW7-NewsMood_AMD.ipynb
Overall_Sentiment_based_on_Twitter.png		Overall_Sentiment_based_on_Twitter.png
README.md		README.md
Sentiment_Analysis_of_Media_Tweets.png		Sentiment_Analysis_of_Media_Tweets.png
sentiments_array_pd.csv		sentiments_array_pd.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_files

README_files

.gitignore

.gitignore

HW7-NewsMood_AMD.ipynb

HW7-NewsMood_AMD.ipynb

Overall_Sentiment_based_on_Twitter.png

Overall_Sentiment_based_on_Twitter.png

README.md

README.md

Sentiment_Analysis_of_Media_Tweets.png

Sentiment_Analysis_of_Media_Tweets.png

sentiments_array_pd.csv

sentiments_array_pd.csv

Repository files navigation

News Sentiment Analysis

Three observable trends based on the data below-

About

Releases

Packages

Languages

minas26902/NewsSentimentAnalysis

Folders and files

Latest commit

History

Repository files navigation

News Sentiment Analysis

Three observable trends based on the data below-

About

Topics

Resources

Stars

Watchers

Forks

Languages