Skip to content

Realtime social media data analytics with Apache Spark, Python, Kafka, Pandas, etc

Notifications You must be signed in to change notification settings

ErwinPP/Realtime-Data-Analytics-Using-Spark

 
 

Repository files navigation

Realtime Data Analytics Using Apache Spark

Realtime social media data analytics with Apache Spark, Python, Kafka, Pandas, etc

Description

Project uses Apache Spark functionalities (SparkSQL, Spark Streaming, MLib) to build machine learning models (Batch Processing-Slow) and then apply the model with (Spark Streaming-Fast) to predict new output.

Data MashUp

We utilize historical and streaming data from different social media networks through network provided APIs.

Tools

  • DataBricks Community Edition
  • Anaconda Python 2.7 Distro (Pandas, etc)
  • Apache Spark (SparkSQL, Spark Streaming, Spark MLib, GraphX)
  • Apache Kafka (Realtime distributed message passing tool)
  • Persistent Data Store (RDMBS:MySQL, Columnar:CSV, Casandra, Document:MongoDB)

Required Libraries

pip install Twitter
pip install PyGithub
pip install

Associated Project - R3levancy!

Discovering what everyone is whispering about on social media. Fantastic tool to discover what's really trending across social media and hot topics discovery.

  • Delivering REALTIME news, events, alerts tailored to users needs and interest.
  • Search Twitter, Facebook, Google+ for keywords.
  • Batch process with Spark
  • Present on web pages, send alerts and push to users.

About

Realtime social media data analytics with Apache Spark, Python, Kafka, Pandas, etc

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%