Skip to content
#

spark-mllib

Here are 205 public repositories matching this topic...

FlightAnalysis

This project implemented a lambda architecture for analyzing domestic flight data in the US from 2009 to 2020. It used Apache Spark for batch processing, Spark Streaming for real-time analysis, and SVM models to predict flight cancellations and delays, with Docker for cluster management and Grafana for real-time visualization.

  • Updated Jul 28, 2023
  • Jupyter Notebook

This repository includes a web application that is connected to a product recommendation system developed with the comprehensive Amazon Review Data (2018) dataset, consisting of nearly 233.1 million records and occupying approximately 128 gigabytes (GB) of data storage, using MongoDB, PySpark, and Apache Kafka.

  • Updated Jun 26, 2023
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the spark-mllib topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the spark-mllib topic, visit your repo's landing page and select "manage topics."

Learn more