Tweets Collection and Analysis Pipeline

This project implements a data pipeline using Docker. Tweets are streamed about sustainability are streamed via Tweepy Listener and stored in a MongoDB database. The ETL job performs live sentiment analysis (using VADER) on the stored tweets and loads them with the according score into a PostgreSQL database. In the end tweets with most positive sentiment are posted on Slack using a Webhook.

Docker Compose necessities

Setting up local environmental variables

Twitter API Access (via https://developer.twitter.com/):
- TWITTER_API_KEY
- TWITTER_API_SECRET
- TWITTER_ACCESS_TOKEN
- TWITTER_ACCESS_TOKEN_SECRET
PostgreSQL Credentials for your database:
- POSTGRES_USER
- POSTGRES_PASSWORD
- POSTGRES_DB
SLACK API Access (via https://api.slack.com/apps):
- SLACK_WEBHOOK (e.g https://hooks.slack.com/services/...)

Changing streaming filter

The file get_tweets.py contains the Tweepy Tweets Listener. The topic filter is found at the end of the file: stream.filter(track=['sustainable'], languages=['en'])

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Images		Images
etl_job		etl_job
slackbot		slackbot
tweety		tweety
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Images

Images

etl_job

etl_job

slackbot

slackbot

tweety

tweety

README.md

README.md

docker-compose.yml

docker-compose.yml

Repository files navigation

Tweets Collection and Analysis Pipeline

Docker Compose necessities

Changing streaming filter

About

Releases

Packages

Languages

mfriebel/tweet_bot

Folders and files

Latest commit

History

Repository files navigation

Tweets Collection and Analysis Pipeline

Docker Compose necessities

Changing streaming filter

About

Topics

Resources

Stars

Watchers

Forks

Languages