Terraform module to create AWS EMR resources 🇺🇦
-
Updated
May 4, 2024 - HCL
Terraform module to create AWS EMR resources 🇺🇦
Implemented random forest machine learning algorithm using pyspark on AWS EMR to classify the wines. The model is then deployed in docker container.
PySpark RDD and DataFrame Examples
A Cloud based Reddit stock sentiment analyzer that analyzes overall sentiment from a configurable selection of stock subreddits for each stock. The architecture utilizes AWS MSK (Kafka), AWS EMR (PySpark) and AWS Lambda (Python 3) for maximum scalability and the OpenAI API for sentiment analysis through prompt engineering.
A scalable prototype of an image recognition engine deployed on AWS.
Realtime data pipeline
A CNN is deployed in AWS to extract image features in the context of distributed computing.
Daily Incremental load ETL pipeline for Ecommerce company using AWS Lambda and AWS EMR cluster, Deployed using Apache airflow in a docker container.
Predicting customer churn for the music app, Sparkify, using PySpark on AWS EMR clusters
Define a big data architecture and perform distributed machine learning calculations on an EMR cluster using AWS
With this app, you can see what programming skills are most in-demand in the current job market.
Stand-alone Scala & Java tool to anonymize OOXML Documents (DOCX)
BERT, AWS RDS, AWS Forecast, EMR Spark Cluster, Hive, Serverless, Google Assistant + Raspberry Pi, Infrared, Google Cloud Platform Natural Language, Anomaly detection, Tensorflow, Mathematics
In this repo, I build a LogisticRegression prediction model with Dask and PySpark and initialize an AWS EMR cluster to run the entire pipeline.
Credit defaulting results in a large profit loss to banks and other credit lenders. The success of the banking industry results in the ability to understand risk. This project uses big data technologies like Mapreduce, HDFS along with PySpark and AWS for analysis of credit history and its prediction
EMR + Hadoop to Redshift ELT workflow using spark steps API and orchestrated by Apache-Airflow, which ingests disparate datasets focused around 7Gb of I94 arrivals information to produce a simple star schema in Redshift
Add a description, image, and links to the aws-emr-clusters topic page so that developers can more easily learn about it.
To associate your repository with the aws-emr-clusters topic, visit your repo's landing page and select "manage topics."