Skip to content

Big Data Management project: The collection of data from a network of sensors was simulated (kafka), which then had to be processed (spark) and stored (cassandraDB) in a distributed and efficient way.

MRColorR/supreme-pancake

Repository files navigation

supreme-pancake

Repo for Big Data Management project

Three components were created in this project, a producer / data collector (kafka), a distributed database (CassandraDB) and a consumer / data processor (Spark).
The collection of data from a network of sensors was simulated, which then had to be processed and stored in a distributed and efficient way. The data collected (or generated) by kafka were then processed by spark and saved for long-term archiving on cassanda db.
The connection between the PCs has been made simple and scalable using Zerotier.

  • Leave a star ⭐ if you like this project 🙂 thank you.

What's inside

  • Kafka module
  • Cassanda db module
  • Spark module
  • Data cleaning scripts
  • Distributed job start and stop scripts
  • Project runme script
  • Project document with details