#

amazon-emr

Here are 31 public repositories matching this topic...

robertgv / Data_Lake_in_AWS

Udacity Data Engineering Nanodegree Program

udacity apache-spark amazon-emr udacity-nanodegree amazon-s3 udacity-data-engineer-nanodegree

Updated Jun 1, 2020
Python

Lostefra / SparkTemplate

A simple Java-Scala mixed project template for Apache Spark

java scala spark sbt intellij amazon-emr

Updated May 11, 2020
Scala

Sampsonyu / Data_Lake_with_Spark

Data Lake with Spark

aws spark amazon-emr python3 data-lake elt spark-sql amazon-s3

Updated Jun 9, 2021
Jupyter Notebook

Mohammed-siddiq / Page-Rank-In-Spark

Page rank implementation in SPARK to rank authors and venues based on their publications in the DBLP dataset.

scala spark sbt amazon-emr pagerank-algorithm amazon-s3 dblp-dataset

Updated Apr 21, 2019
Scala

Faisal-AlDhuwayhi / Data-Lake

Building Data Lake and ETL pipelines using Amazon EMR, S3, and Apache Spark

aws sql big-data spark amazon-emr pyspark data-engineering data-lake cloud-computing amazon-s3 etl-pipeline big-data-processing

Updated Dec 23, 2022
Python

tmusabbir / emr-with-custom-metrics

Amazon EMR Automatic Scaling using Custom Metrics

emr bigdata cloudwatch amazon-emr amazon-web-services emr-cluster

Updated Oct 2, 2020
Shell

cmeb45 / fuzzyjoin

amazon-emr aws-emr map-reduce mapreduce string-matching string-similarity

Updated Mar 26, 2016
Java

DarthVi / knn-ncc-spark

An implementation in Scala of kNN and NCC based on Spark

machine-learning scala spark amazon-emr knn ncc

Updated Oct 21, 2019
Scala

MrBenA / Udacity_Capstone-ETL_Pipeline

Udacity Data Engineering Capstone project

python apache-spark amazon-emr relational-databases amazon-web-services data-modeling amazon-redshift extract-transform-load amazon-s3

Updated Oct 11, 2021
Python

timchansdp / Churn-Prediction-with-PySpark

With Amazon EMR and machine learning techniques supported by PySpark, a model was built to assist the fictitious music streaming service provider to predict customer churn rate based on user click data.

big-data amazon-emr pyspark churn-prediction

Updated Dec 7, 2021
Jupyter Notebook

DeepHiveMind / Amazon-EMR-on-Amazon-EKS-Spark-job-with-AWS-Step-Functions

Orchestrate an Amazon EMR on Amazon EKS Spark job with AWS Step Functions

aws spark amazon-emr aws-step-functions amazon-eks

Updated Apr 4, 2021

esakik / data-engineering-essentials

Samples related to data engineering, e.g. spark, embulk, airflow, etc.

apache-spark protocol-buffers amazon-emr data-engineering digdag fluentd apache-beam embulk apache-avro mrjob apache-airflow cloud-dataflow apache-hadoop cloud-dataproc

Updated Dec 8, 2022
Python

aws-samples / amazon-emr-yarn-capacity-scheduler

Configure Hadoop YARN CapacityScheduler on Amazon EMR on Amazon EC2 for multi-tenant heterogeneous workloads

amazon-emr aws-cloudformation apache-hadoop-yarn fair-scheduler fifo-scheduler capacity-scheduler

Updated Aug 11, 2022
Shell

jaceyca / Rankmaniac

Used Amazon's Elastic MapReduce to rank the top 20 nodes based on PageRank of graphs with over 100,000 nodes http://courses.cms.caltech.edu/cs144/homeworks/rankmaniac.pdf

amazon-emr mapreduce

Updated Jul 26, 2020
Python

cameres / emr-spark-jupyter

📓 Repository/Tutorial for initiallizing Jupyter Notebook and Spark cluster on Amazon EMR

emr tutorial spark jupyter cluster jupyter-notebook amazon-emr spark-clusters

Updated Dec 4, 2016
Python

garystafford / emr-superset-demo

Project files for the post: Installing Apache Superset on Amazon EMR: Add data exploration and visualization to your analytics cluster.

aws superset amazon-emr apache-superset

Updated Dec 29, 2020
Python

build-on-aws / ci-cd-serverless-spark

Sample CI/CD pipeline for using GitHub Actions with Amazon EMR Serverless Spark.

aws spark apache-spark serverless amazon-emr github-actions

Updated Apr 7, 2023
Python

aws-samples / amazon-s3-access-points-for-cross-account-integration-samples

This repo provides cross-account integration code samples using Amazon S3 Access points

amazon-emr amazon-s3 amazon-s3-access-points aws-cross-account-s3-integration

Updated Dec 28, 2021
Java

WorksApplications / ansible_aws_emr

Unofficial Ansible module for Amazon EMR

amazon-emr ansible-modules emr-management

Updated Feb 19, 2019
Python

awslabs / amazon-emr-vscode-toolkit

A VS Code Extension to make it easier to manage and develop Spark jobs on EMR

python apache-spark amazon-emr pyspark

Updated May 17, 2024
TypeScript

Improve this page

Add a description, image, and links to the amazon-emr topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the amazon-emr topic, visit your repo's landing page and select "manage topics."