hadoop-mapreduce

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

python big-data spark apache-spark hadoop etl xml python3 xml-parsing pyspark data-pipeline datalake hadoop-mapreduce spark-sql etl-framework hadoop-hdfs etl-pipeline etl-components

Updated May 6, 2023
Python

jmaister / wordcount

Sponsor

Star

Hadoop MapReduce word counting with Java

java maven wordcount hadoop-mapreduce

Updated Feb 13, 2020
Java

arshdeepbahga / cloud-computing-solutions-architect-book-code

Star

Source code for the examples in the book Cloud Computing Solutions Architect: A Hands-On Approach by Arshdeep Bahga and Vijay Madisetti

Updated Jul 5, 2019
CSS

seraogianluca / k-means-mapreduce

Star

K-Means algorithm implementation with Hadoop and Spark for the course of Cloud Computing of the MSc AIDE at the University of Pisa.

spark hadoop machine-learning-algorithms iteration clustering-algorithm hadoop-mapreduce kmeans-algorithm k-means-clustering centroids-initialization

Updated Oct 16, 2020
Java

Keerthivasan13 / CSCI572-Information_Retrieval_And_Web_Search_Engines

Star

Search Engine projects

Updated May 20, 2020
Java

benedekh / bigdata-projects

Star

Student projects in Big Data field.

big-data spark apache-spark hadoop bigdata mapreduce hadoop-mapreduce

Updated Mar 18, 2024
Java

caizkun / mapreduce-examples

Star

A collection of mapreduce problems and solutions

mapreduce hadoop-mapreduce

Updated Oct 9, 2017
Java

pfisterer / apache-hadoop-helm

Star

Helm chart for Apache Hadoop using multi-arch docker images

docker kubernetes hadoop helm hadoop-filesystem hadoop-mapreduce hadoop-hdfs helm-chart

Updated Mar 21, 2022
Dockerfile

BGI-flexlab / SOAPgaea

Star

spark hadoop variant-calling hadoop-mapreduce bioinfo

Updated Dec 2, 2022
Java

MoustafaAMahmoud / BigDataInDepth

Star

Data Engineering Course

distributed-systems scala kafka spark hadoop dwh hadoop-mapreduce distrubted-systems

Updated May 5, 2024
TeX

absnaik810 / CloudComputing

Star

Projects done in the Cloud Computing course.

hadoop nosql pagerank hbase hdfs inverted-index hadoop-mapreduce

Updated May 23, 2018
Java

Christianivh / data_repo

Star

Repositorio de datos

python spark hadoop jupyter-notebook hdfs hadoop-mapreduce

Updated Apr 7, 2024
Jupyter Notebook

thedatasociety / lab-hadoop

Star

hive hadoop hbase flume sqoop hadoop-mapreduce hadoop-streaming mrjob hadoop-hdfs hadoop-yarn

Updated Jan 19, 2024
PLpgSQL

sloopstash / kickstart-hadoop

Star

Our Hadoop starter-kit repository contains Hadoop configurations, OCI image templates, Kubernetes YAML templates, AWS CloudFormation templates, Chef cookbooks, and Shell scripts needed to automate and run Hadoop cluster nodes as containerized as well as non-containerized workloads.

Updated Aug 13, 2021
Ruby

drexly / movie140reviewcorpus

Star

네이버 영화 164397건 중 140자 평이 있는 영화별 평점 raw data for spark

data-science natural-language-processing database big-data spark corpus movie-database korean movie-reviews hadoop-mapreduce python-crawler

Updated Oct 26, 2017

prabaprakash / Hadoop-2.3

Star

Hadoop 2.3 for Windows x64

java hadoop hadoop-mapreduce

Updated Sep 24, 2014
CSS

Improve this page

Add a description, image, and links to the hadoop-mapreduce topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the hadoop-mapreduce topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hadoop-mapreduce

Here are 767 public repositories matching this topic...

mahmoudparsian / data-algorithms-book

anjalysam / Hadoop

bytedance / CloudShuffleService

maniram-yadav / Big_DataHadoop_Projects

vim89 / datapipelines-essentials-python

jmaister / wordcount

arshdeepbahga / cloud-computing-solutions-architect-book-code

seraogianluca / k-means-mapreduce

Keerthivasan13 / CSCI572-Information_Retrieval_And_Web_Search_Engines

benedekh / bigdata-projects

caizkun / mapreduce-examples

pfisterer / apache-hadoop-helm

BGI-flexlab / SOAPgaea

MoustafaAMahmoud / BigDataInDepth

absnaik810 / CloudComputing

Christianivh / data_repo

thedatasociety / lab-hadoop

sloopstash / kickstart-hadoop

drexly / movie140reviewcorpus

prabaprakash / Hadoop-2.3

Improve this page

Add this topic to your repo