Big Data Processing Pipeline

Program was implemented using Python, Twitter API, Kafka, MongoDB, and Tableau. Refer the report for further implementation details:
View Report

Architecture

Overview:

Twitter API is leveraged to obtain information to be processed
Kafka takes the data and connects the various other components of this pipeline
MongoDB stores the obtained tweets for later analysis
Tableau creates meaningful visualizations

Results:

Upon examining the visualizations we see a relative concentration of tweets containing the COVID hashtag in the Americas, Europe, and Southern Asia, this seems to line up with expectations of areas that both have a high adoption of twitter and many Covid-19 cases. Further work needs to be done to validate this conclusion though.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
doc		doc
images		images
LICENSE		LICENSE
README.md		README.md
dataVisualization.twb		dataVisualization.twb
mongodbConsumer.py		mongodbConsumer.py
twitterProducer.py		twitterProducer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

doc

doc

images

images

LICENSE

LICENSE

README.md

README.md

dataVisualization.twb

dataVisualization.twb

mongodbConsumer.py

mongodbConsumer.py

twitterProducer.py

twitterProducer.py

Repository files navigation

Big Data Processing Pipeline

Architecture

Results:

About

Releases

Packages

Contributors 2

Languages

License

chandnii7/Big-Data-Processing-Pipeline

Folders and files

Latest commit

History

Repository files navigation

Big Data Processing Pipeline

Architecture

Results:

About

Topics

Resources

License

Stars

Watchers

Forks

Languages