aws-glue

Star

Here are 194 public repositories matching this topic...

ricardolsmendes / aws-glue-ci-cd-blueprint

Star

Companion repository for the "Streamlining AWS Glue CI/CD — A Comprehensive Blueprint" blog post

aws devops terraform ci-cd dataops infrastructure-as-code aws-glue iac-terraform

Updated May 24, 2024
HCL

aws / aws-sdk-pandas

Star

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Updated May 24, 2024
Python

data-dot-all / dataall

Star

A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.

aws data-science data aws-s3 redshift etl-framework aws-glue aws-lake-formation lakehouse lakeformation

Updated May 24, 2024
Python

aws-samples / transactional-datalake-using-apache-iceberg-on-aws-glue

Star

Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming and DMS

apache-spark aws-athena aws-glue aws-dms apache-iceberg

Updated May 22, 2024
Python

cloudposse / terraform-aws-glue

Sponsor

Star

Terraform modules for provisioning and managing AWS Glue resources

aws workflow etl glue aws-glue etl-job

Updated May 21, 2024
HCL

ev2900 / MongoDB_Streams_Glue_Iceberg

Star

Process DynamoDB change streams via. AWS Glue w Iceberg to keep a copy of a collection in S3 upto date

python glue mondodb mongodb-change-streams aws-glue apache-iceberg

Updated May 21, 2024
Python

ev2900 / Glue_Examples

Star

PySpark code samples designed for AWS Glue

aws glue pyspark aws-glue

Updated May 21, 2024
Python

ev2900 / Glue_Hudi

Star

Apache Hudi examples designed to be run on AWS Glue via. Glue Jobs

aws glue aws-glue hudi apache-hudi hudi-examples

Updated May 21, 2024
Python

lindsaygelle / AWSComprehend

Star

AWS Comprehend is an event-driven, serverless data processing pipeline that leverages AWS services to perform natural language processing and analysis on user-submitted text files.

aws serverless terraform aws-s3 aws-sqs aws-step-functions aws-glue terraform-aws aws-comprehend terraform-project aws-eventbridge

Updated May 21, 2024
HCL

mihirkudale / youtube-analysis-data-engineering-project

Star

This project aims to securely manage, streamline, and perform analysis on the structured and semi-structured YouTube videos data based on the video categories and the trending metrics.

python aws aws-lambda aws-s3 data-engineering aws-athena aws-redshift aws-glue aws-quicksight youtube-analysis

Updated May 20, 2024
Python

ccao-data / model-sales-val

Star

Heuristics for detecting outlier and non-arms-length sales

python model aws-s3 aws-glue

Updated May 16, 2024
Python

ritesh-ojha / Data-Engineering

Star

End to End Data Engineering Projects

python docker airflow apache-spark aws-s3 data-engineering aws-ec2 apache-kafka aws-glue

Updated May 13, 2024
Python

aws-samples / monitoring-apache-iceberg-table-metadata-layer

Star

Sample code to collect Apache Iceberg metrics for table monitoring

aws apache-spark monitoring aws-lambda aws-cloudwatch data-quality aws-glue sam-cli apache-iceberg pyiceberg

Updated May 12, 2024
Python

jrabuffetti / Taxis-en-NYC-Sostenibilidad-y-Eficiencia

Star

Proyecto donde automatizamos el proceso de recolección , exploración, optimización y visualización de datos, como así también el entrenamiento de modelos de Machine Learning utilizando Amazon Web Services (AWS)

python aws aws-lambda aws-s3 scikit-learn pandas seaborn matplotlib aws-ec2 taxis aws-glue aws-sagemaker streamlit fastappi

Updated May 2, 2024
Jupyter Notebook

dashmug / glue-devtools

Star

Glue Development Tools

python boilerplate pyspark aws-glue

Updated May 24, 2024
Python

shubhamjais40 / AWS-Data-Pipeline-Project-Implementing-Data-Validation-Using-Lambda-based-Gluecrawler-v1.0

Star

This Project demonstrates the Technology shift in Automobile Firm to resolve the data engineering challenge of manual data ops. AWS Cloud Services implemented here as: S3 bucket for lake storage incoming batches, Lambda Python Script for automating the validation function call and Glue Crawler to generate relational table with successful testing.

aws-lambda aws-s3 python3 aws-athena aws-glue data-pipeline-monitoring

Updated Apr 27, 2024
Python

aws-samples / cloud-experiments

Star

Open innovation with 60 minute cloud experiments on AWS

data-science machine-learning notebooks amazon-rekognition amazon-s3 amazon-athena aws-cloud amazon-sagemaker amazon-comprehend aws-glue

Updated Apr 22, 2024
Jupyter Notebook

hackolade / glue

Star

Hackolade plugin for AWS Glue Data Catalog

hive nosql glue data-catalog data-modeling data-models er-diagram schema-design entity-relationship-diagram aws-glue

Updated Apr 22, 2024
JavaScript

aws-samples / streamlit-application-deployment-on-aws

Star

Streamlit EDA Dashboard Powered by AWS Cloud

aws aws-cognito aws-cloudformation aws-athena aws-glue aws-sagemaker streamlit-dashboard

Updated Apr 18, 2024
Python

DimaKuriptya / RedditETL

Star

This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. The pipeline leverages a combination of tools and services including Apache Airflow, Celery, PostgreSQL, Amazon S3, AWS Glue, Amazon Athena, and Amazon Redshift.

python docker redis airflow aws-s3 postgresql pandas celery aws-athena aws-redshift aws-glue

Updated Apr 11, 2024
Python

Improve this page

Add a description, image, and links to the aws-glue topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the aws-glue topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aws-glue

Here are 194 public repositories matching this topic...

ricardolsmendes / aws-glue-ci-cd-blueprint

aws / aws-sdk-pandas

data-dot-all / dataall

aws-samples / transactional-datalake-using-apache-iceberg-on-aws-glue

cloudposse / terraform-aws-glue

ev2900 / MongoDB_Streams_Glue_Iceberg

ev2900 / Glue_Examples

ev2900 / Glue_Hudi

lindsaygelle / AWSComprehend

mihirkudale / youtube-analysis-data-engineering-project

ccao-data / model-sales-val

ritesh-ojha / Data-Engineering

aws-samples / monitoring-apache-iceberg-table-metadata-layer

jrabuffetti / Taxis-en-NYC-Sostenibilidad-y-Eficiencia

dashmug / glue-devtools

shubhamjais40 / AWS-Data-Pipeline-Project-Implementing-Data-Validation-Using-Lambda-based-Gluecrawler-v1.0

aws-samples / cloud-experiments

hackolade / glue

aws-samples / streamlit-application-deployment-on-aws

DimaKuriptya / RedditETL

Improve this page

Add this topic to your repo