Skip to content
#

apache-hadoop

Here are 73 public repositories matching this topic...

This repository aims to develop a basic search engine utilizing Hadoop's MapReduce framework to index and process extensive text corpora efficiently. The dataset used for this project is a subset of the English Wikipedia dump, totaling 5.2 GB in size. The project focuses on implementing a naive search algorithm to address challenges in information.

  • Updated Mar 31, 2024
  • Jupyter Notebook
FlightAnalysis

This project implemented a lambda architecture for analyzing domestic flight data in the US from 2009 to 2020. It used Apache Spark for batch processing, Spark Streaming for real-time analysis, and SVM models to predict flight cancellations and delays, with Docker for cluster management and Grafana for real-time visualization.

  • Updated Jul 28, 2023
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the apache-hadoop topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the apache-hadoop topic, visit your repo's landing page and select "manage topics."

Learn more