Realtime data lakehouse in AWS - 100% serverless stack.

Pre-requisites:

Have an AWS Account.
Have an Admin role created for terraform.
Have terraform installed. (v.0.12+ required)
On MAC, prepare ~/.aws/credentials file to contain:

[default]
aws_access_key_id=<access_key_id>
aws_secret_access_key=<secret_access_key>
region=<aws_region>

[test-datawarehouse]
role_arn = <arn for aws account named test>
source_profile = default

[sandbox-datawarehouse]
role_arn = <arn for aws account named sandbox>
source_profile = default

PS: test and sandbox are two different accounts to put a logical separation. You don't need two accounts to get started though. If you end up having just one account ensure it's settings are appropriate in the credentials file above.

How to run?

With pre-requisites satisfied, from root folder, make tf/init
make tf/plan
make tf/apply

Components created

AWS Kinesis Firehose - event bus.
AWS Lambda to pre-process aggregated data in firehose.
AWS Storage - Transient storage.
AWS Glue Crawler - Crawls transient storage & catalogs data from firehose.
AWS CloudWatch Event - Listens to successful completion of crawler from step 4 & triggers Lambda (Step 6)
AWS Lambda - Trigger Glue ETL (pyspark) job.
AWS Glue ETL - Pyspark ETL job, sink = s3 data lake.
AWS Glue Crawler - Catalog post ETL data.
AWS Storage - S3 data lake
AWS Athena - Enables SQL over S3 data lake. (step 9)
AWS XRay - For monitoring

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
terraform		terraform
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

terraform

terraform

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Realtime data lakehouse in AWS - 100% serverless stack.

Pre-requisites:

How to run?

Components created

About

Releases

Packages

Languages

1x-eng/aws_real_time_datawarehouse

Folders and files

Latest commit

History

Repository files navigation

Realtime data lakehouse in AWS - 100% serverless stack.

Pre-requisites:

How to run?

Components created

About

Resources

Stars

Watchers

Forks

Languages