#

datalake

Here are 224 public repositories matching this topic...

Team5-S20-Cohort / Project_5_My_first_Data_Lake

MSc. Data Engineering Project at Data ScienceTech Institute (DSTI )

python aws data big-data aws-s3 bigdata ci-cd data-engineering aws-ec2 datalake dataengineering covid19-data

Updated Mar 8, 2021
HTML

dbbatalha / human-resources-analytics

Repositório para armazenar códigos do projeto.

python docker machine-learning airflow minio datalake human-resources pycaret

Updated Dec 2, 2021
Python

jszafran / personal-aws-data-lake

Personal, cloud based (AWS), data lake for experimenting with cloud services.

aws data cloud etl terraform data-engineering datalake dataengineering

Updated Mar 6, 2022
HCL

MalondaClement / DataLake

DataLake project 💾

mysql python3 datalake

Updated Nov 6, 2022
Python

cirograu / cgs_tweets

Solução para buscar tweets com uma determinada “HashTag” e armazená-los em formato Parquet

python docker airflow spark docker-compose docker-image datalake jypyternotebook

Updated Feb 14, 2023
Jupyter Notebook

islajd / DataLake

Serverless Datalake solution using AWS Serverless. Built with AWS S3, Glue, Athena, Firehose.

nodejs aws s3 glue datalake firehose cdk

Updated Jun 15, 2022
JavaScript

dd-Splunk / splunk-datalake

How to combine smart store and ingest action for datalake use case

splunk s3 ingest datalake smartstore

Updated Jan 15, 2024
Python

shubhambhardwaj007 / Ansible-Hadoop-JobTracker-Role

An Ansible Role to Configure and setup Hadoop Job Tracker Node.

ansible ansible-playbook hadoop bigdata ansible-role hadoop-cluster ansible-roles elt ansible-galaxy datalake hadoop-mapreduce dataanalysis hadoop-clusters

Updated May 18, 2021
Jinja

snhaider9977 / Azure-Storage-Account-Size-Calculator

This script calculates the size of each folder within an Azure Storage container and provides a summary of the folder sizes. The calculated sizes are then exported to a CSV file and displayed in the console for easy reference

powershell azure datalake storageaccount

Updated Jul 12, 2023
PowerShell

mxdara / Data-lake-with-pyspark-in-S3

Specifically, I bulid an ETF pipline to extract their data from S3 and processes them using Spark, and loads the data into a new S3 as a set of dimensional tables.

s3-bucket pyspark datalake

Updated Jun 20, 2023
Python

KirillZhul / de-project-sprint-7

PySpark, DataLake

python airflow pyspark datalake

Updated Dec 19, 2023
Python

hbuddana / Azure_Data_Factory_COVID-19_Reporting

Data Engineering Project on Covid19 Reporting – Using Azure Data Factory, Databricks, HDInsight, Azure Data Factory – An End to End ETL pipeline in addition to a Power BI report dashboard.

sql azure databricks datalake

Updated Feb 20, 2024
Jupyter Notebook

pprzetacznik / datalake-aws

Sample data lake pipeline on AWS implemented using Terraform

python aws csv terraform parquet datalake

Updated Oct 26, 2023
HCL

donjude / data-lakes-with-spark

This project is about building a data lake and creating an ETL pipeline in Spark that loads data from Amazon S3, processes the data into analytics tables, and loads them back into S3

python spark apache-spark hadoop ec2 s3 aws-cli hdfs mapreduce amazon-web-services datalake aws-athena spark-sql emr-cluster etl-pipeline

Updated Jun 15, 2021
Python

javi-domi / aws-datalake

Datalake on AW

python spark aws-lambda aws-s3 aws-emr datalake etl-pipeline

Updated Oct 18, 2022
Python

aboudnik / ariadne

A new Data Lake: virtual data platform, Catalog, and Resource-Driven processing

bigdata datalake spark-sql

Updated Jun 17, 2022
Java

mataram / genos

general cloud datalake platform

datalake gobuffalo

Updated Oct 24, 2018
CSS

mathias-mike / Project-Sparkify

Big Data solutions for Sparkify (An online music streaming startup)

data spark etl analytics redshift datawarehouse datalake dataengineering

Updated Mar 5, 2022
Jupyter Notebook

epomatti / az-data-services

End-to-end scenario for Azure data services.

data azure terraform data-engineering databricks synapse datalake lake

Updated Nov 7, 2023
HCL

SEPHIRONOVA / Data_Engineering_Projects

Udacity Data Engineering Nanodegree

aws airflow etl data-engineering datalake

Updated Aug 3, 2021
Jupyter Notebook

Improve this page

Add a description, image, and links to the datalake topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the datalake topic, visit your repo's landing page and select "manage topics."