apache-hadoop

This repository provides a guide to preprocess and analyze the network intrusion data set using NumPy, Pandas, and matplotlib, and implement a random forest classifier machine learning model using Scikit-learn.

numpy scikit-learn pandas matplotlib scapy npcap libcap apache-hadoop

Updated May 8, 2024
Jupyter Notebook

keramiozsoy / apache-spark-yarn-mode-aws-101

Star

An example of installation Apache Spark on AWS

python aws scala spark apache-spark yarn hive hadoop jupyter jupyter-notebook hdfs spark-shell apache-hadoop hdfs-cluster

Updated Apr 17, 2024

aaqib-ahmed-nazir / BDA_Assignment02

Star

This repository aims to develop a basic search engine utilizing Hadoop's MapReduce framework to index and process extensive text corpora efficiently. The dataset used for this project is a subset of the English Wikipedia dump, totaling 5.2 GB in size. The project focuses on implementing a naive search algorithm to address challenges in information.

python search-engine hadoop jupyter-notebook python3 mapreduce search-algorithm jupiter-notebook apache-hadoop mapreduce-python

Updated Mar 31, 2024
Jupyter Notebook

rachmanz / WSL2DW

Star

Intalasi WSL2 untuk Praktikum ABD

derby-database apache-hive apache-hadoop

Updated Mar 7, 2024

SomeshChevella / Apache-Hadoop-Map-Reduce--Basic-Sentiment-Analysis-on-Yelp-Dataset

Star

In this project we will use Hadoop MapReduce to implement a very basic “Sentiment Analysis” using the review text in the Yelp Academic Dataset as training data.

mapreduce-java apache-hadoop

Updated Feb 23, 2024
Java

Abdelhakim-gh / Spark_MinProject

Star

The goal of this project is to learn data processing using Spark with practical examples on datasets and also apply programming with Scala.

scala apache-spark apache-hive apache-hadoop

Updated Feb 12, 2024
HTML

Abdelhakim-gh / BigData_Project

Star

This project aims to establish a data streaming pipeline with storage, processing, and visualization

python github-api elasticsearch kibana apache-flink apache-kafka apache-hadoop

Updated Feb 12, 2024
Python

chriskery / hadoop-operator

Star

Kubernetes operator for managing the lifecycle of Apache Hadoop Yarn Tasks on Kubernetes.

kubernetes hadoop k8s hadoop-cluster kubernetes-operator apache-hadoop

Updated Jan 19, 2024
Go

dmarks84 / Coursework_Capstone_Full_Data_Engineering

Star

Final Project for IBM Data Engineering & Python Professional Certificate -- Applied all skills and methods utilized in the series of courses for this certification

python api sql cassandra apache-spark mongodb etl nosql postgresql plotly pandas seaborn scipy beautifulsoup apache-kafka apache-airflow apache-hadoop dags

Updated Jan 17, 2024
Jupyter Notebook

VikentiosVitalis / advanced_topics_in_database_systems

Star

Data Science Project - for 'Advanced Topics in Database Systems' M.Sc. Course ECE @ntua

python data-science big-data apache-spark pyspark apache-hadoop

Updated Jan 17, 2024
Python

PBWebMedia / yarn-prometheus-exporter

Star

Export Hadoop YARN (resource-manager) metrics in prometheus format

yarn hadoop metrics exporter apache prometheus resource-manager yarn-hadoop-cluster apache-hadoop

Updated Oct 13, 2023
Go

whoami-anoint / EasyHadoop

Star

Simplified Hadoop Setup and Configuration Automation

data-science big-data hdfs ec2-instance big-data-analytics apache-hadoop big-data-projects hdfs-cluster big-data-essentials

Updated Sep 2, 2023
Shell

heracliteanflux / exercises-scala

Star

Exercises in the Scala programming language with an emphasis on big data programming and applications in Apache Hadoop and Apache Spark.

distributed-systems scala spark apache-spark hadoop distributed-computing map-reduce distributed-file-system mrjob apache-maven apache-hadoop

Updated Aug 22, 2023
Java

Lucass97 / FlightAnalysis

Star

This project implemented a lambda architecture for analyzing domestic flight data in the US from 2009 to 2020. It used Apache Spark for batch processing, Spark Streaming for real-time analysis, and SVM models to predict flight cancellations and delays, with Docker for cluster management and Grafana for real-time visualization.

Updated Jul 28, 2023
Jupyter Notebook

mahmoudparsian / data-algorithms-book

Star

MapReduce, Spark, Java, and Scala for Data Algorithms Book

python java machine-learning scala apache-spark distributed-computing design-patterns pyspark mapreduce reducers partitioning hadoop-mapreduce distributed-algorithms mappers data-algorithms apache-hadoop

Updated Apr 21, 2023
Java

Improve this page

Add a description, image, and links to the apache-hadoop topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the apache-hadoop topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apache-hadoop

Here are 73 public repositories matching this topic...

Narius2030 / Sakila-Business-Analysis

tencentyun / hadoop-cos

Coursal / Hadoop-Examples

mahmoudparsian / big-data-mapreduce-course

nghoanglong / spark-cluster-with-docker

jagdish4501 / Network-intrusion-Detection

keramiozsoy / apache-spark-yarn-mode-aws-101

aaqib-ahmed-nazir / BDA_Assignment02

rachmanz / WSL2DW

SomeshChevella / Apache-Hadoop-Map-Reduce--Basic-Sentiment-Analysis-on-Yelp-Dataset

Abdelhakim-gh / Spark_MinProject

Abdelhakim-gh / BigData_Project

chriskery / hadoop-operator

dmarks84 / Coursework_Capstone_Full_Data_Engineering

VikentiosVitalis / advanced_topics_in_database_systems

PBWebMedia / yarn-prometheus-exporter

whoami-anoint / EasyHadoop

heracliteanflux / exercises-scala

Lucass97 / FlightAnalysis

mahmoudparsian / data-algorithms-book

Improve this page

Add this topic to your repo