Udacity Data Engineering Nanodegree Program
-
Updated
Jun 1, 2020 - Python
Udacity Data Engineering Nanodegree Program
Page rank implementation in SPARK to rank authors and venues based on their publications in the DBLP dataset.
Building Data Lake and ETL pipelines using Amazon EMR, S3, and Apache Spark
Amazon EMR Automatic Scaling using Custom Metrics
An implementation in Scala of kNN and NCC based on Spark
Udacity Data Engineering Capstone project
With Amazon EMR and machine learning techniques supported by PySpark, a model was built to assist the fictitious music streaming service provider to predict customer churn rate based on user click data.
Orchestrate an Amazon EMR on Amazon EKS Spark job with AWS Step Functions
Samples related to data engineering, e.g. spark, embulk, airflow, etc.
Configure Hadoop YARN CapacityScheduler on Amazon EMR on Amazon EC2 for multi-tenant heterogeneous workloads
Used Amazon's Elastic MapReduce to rank the top 20 nodes based on PageRank of graphs with over 100,000 nodes http://courses.cms.caltech.edu/cs144/homeworks/rankmaniac.pdf
📓 Repository/Tutorial for initiallizing Jupyter Notebook and Spark cluster on Amazon EMR
Project files for the post: Installing Apache Superset on Amazon EMR: Add data exploration and visualization to your analytics cluster.
Sample CI/CD pipeline for using GitHub Actions with Amazon EMR Serverless Spark.
This repo provides cross-account integration code samples using Amazon S3 Access points
Unofficial Ansible module for Amazon EMR
A VS Code Extension to make it easier to manage and develop Spark jobs on EMR
Add a description, image, and links to the amazon-emr topic page so that developers can more easily learn about it.
To associate your repository with the amazon-emr topic, visit your repo's landing page and select "manage topics."