Skip to content

nickfboyd1/twitter-sentiment-analysis-etl

Repository files navigation

Overview

The goal of this post is to demonstrate how to use Twitter's API & Python's primary natural-language processing (NLP) libraries to easily analyze the overall sentiment of recent tweets on any given subject.

Additionally, I'll show how to load this data into a SQL database or S3 bucket on AWS. This type of ETL or data pipeline would likely be necessary in the case that I wanted to continously maintain and analyze a larger list of tweets over time, or in a professional setting where multiple analysts and engineers need to access to the data.

The following topics are covered in the included jupyter notebook file:

  • Extracting tweets from the Twitter API using a Python library called Tweepy
  • Transforming the tweets data into a pandas dataframe for ease of analysis
  • Cleaning and pre-processing unstructured, unlabeled text data
  • Loading the cleaned, tweet dataframe into a SQL database or Amazon S3 bucket.
  • Executing sentiment analysis using libraries like scikit-learn and transformers