Companion repository for the "Streamlining AWS Glue CI/CD — A Comprehensive Blueprint" blog post
-
Updated
May 24, 2024 - HCL
Companion repository for the "Streamlining AWS Glue CI/CD — A Comprehensive Blueprint" blog post
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming and DMS
Process DynamoDB change streams via. AWS Glue w Iceberg to keep a copy of a collection in S3 upto date
Apache Hudi examples designed to be run on AWS Glue via. Glue Jobs
AWS Comprehend is an event-driven, serverless data processing pipeline that leverages AWS services to perform natural language processing and analysis on user-submitted text files.
This project aims to securely manage, streamline, and perform analysis on the structured and semi-structured YouTube videos data based on the video categories and the trending metrics.
End to End Data Engineering Projects
Sample code to collect Apache Iceberg metrics for table monitoring
Proyecto donde automatizamos el proceso de recolección , exploración, optimización y visualización de datos, como así también el entrenamiento de modelos de Machine Learning utilizando Amazon Web Services (AWS)
This Project demonstrates the Technology shift in Automobile Firm to resolve the data engineering challenge of manual data ops. AWS Cloud Services implemented here as: S3 bucket for lake storage incoming batches, Lambda Python Script for automating the validation function call and Glue Crawler to generate relational table with successful testing.
Open innovation with 60 minute cloud experiments on AWS
Hackolade plugin for AWS Glue Data Catalog
Streamlit EDA Dashboard Powered by AWS Cloud
This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. The pipeline leverages a combination of tools and services including Apache Airflow, Celery, PostgreSQL, Amazon S3, AWS Glue, Amazon Athena, and Amazon Redshift.
Add a description, image, and links to the aws-glue topic page so that developers can more easily learn about it.
To associate your repository with the aws-glue topic, visit your repo's landing page and select "manage topics."