Implement a Hive data warehouse to store meaningful data, apply Machine Learning like Clustering or Regression for dealing with business problems
-
Updated
May 25, 2024 - Jupyter Notebook
Implement a Hive data warehouse to store meaningful data, apply Machine Learning like Clustering or Regression for dealing with business problems
hadoop-cos(CosN文件系统)为Apache Hadoop、Spark以及Tez等大数据计算框架集成提供支持,可以像访问HDFS一样读写存储在腾讯云COS上的数据。同时也支持作为Druid等查询与分析引擎的Deep Storage
Some simple, kinda introductory projects based on Apache Hadoop to be used as guides in order to make the MapReduce model look less weird or boring.
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
The implementation of Apache Spark (combine with PySpark, Jupyter Notebook) on top of Hadoop cluster using Docker
This repository provides a guide to preprocess and analyze the network intrusion data set using NumPy, Pandas, and matplotlib, and implement a random forest classifier machine learning model using Scikit-learn.
An example of installation Apache Spark on AWS
This repository aims to develop a basic search engine utilizing Hadoop's MapReduce framework to index and process extensive text corpora efficiently. The dataset used for this project is a subset of the English Wikipedia dump, totaling 5.2 GB in size. The project focuses on implementing a naive search algorithm to address challenges in information.
In this project we will use Hadoop MapReduce to implement a very basic “Sentiment Analysis” using the review text in the Yelp Academic Dataset as training data.
The goal of this project is to learn data processing using Spark with practical examples on datasets and also apply programming with Scala.
This project aims to establish a data streaming pipeline with storage, processing, and visualization
Kubernetes operator for managing the lifecycle of Apache Hadoop Yarn Tasks on Kubernetes.
Final Project for IBM Data Engineering & Python Professional Certificate -- Applied all skills and methods utilized in the series of courses for this certification
Data Science Project - for 'Advanced Topics in Database Systems' M.Sc. Course ECE @ntua
Export Hadoop YARN (resource-manager) metrics in prometheus format
Simplified Hadoop Setup and Configuration Automation
Exercises in the Scala programming language with an emphasis on big data programming and applications in Apache Hadoop and Apache Spark.
This project implemented a lambda architecture for analyzing domestic flight data in the US from 2009 to 2020. It used Apache Spark for batch processing, Spark Streaming for real-time analysis, and SVM models to predict flight cancellations and delays, with Docker for cluster management and Grafana for real-time visualization.
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Add a description, image, and links to the apache-hadoop topic page so that developers can more easily learn about it.
To associate your repository with the apache-hadoop topic, visit your repo's landing page and select "manage topics."