Skip to content

Slowly chaning dimensions using delta.io and Spark

License

Notifications You must be signed in to change notification settings

dg-hub/spark-delta-scd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spark Delta SCD

A Scala Package for automating Slowly Changing dimensions via Delta Lake storage format

See Delta Lake website for more information https://delta.io/

License

MIT - Copyright (c) 2019 Daniel G - https://github.com/dg-hub

Features

  • createDeltaTable() - Creates a target table with attribute columns from DataFrame
  • optimise() - Reduce target Delta data to 1 (numFiles)
  • execute() - Executes a SCD merge into target path

Feature requests

Please use Git Hub issues to request new features:

https://github.com/dg-hub/spark-delta-scd/issues

Release Notes

Version 1.8 (December 24, 2019) Add initial code to build and execute Merge into target

Maven Dependency

<dependency>
  <groupId>nz.co.glidden</groupId>
  <artifactId>spark-delta-scd</artifactId>
  <version>1.8</version>
</dependency>

Usage

Include the libary from Maven Central:
bash> spark-shell --packages nz.co.glidden:spark-delta-scd:1.8
Create a Target Delta Table
scala> Scd2.createDeltaTable(source_dataframe,"/tmp/delta-target-location",true)
Execute SCD Merge into Target Delta Table
scala> Scd2.execute(updates_dataframe,"/tmp/delta-target-location",Seq("id"),true)
Read the target Table
scala> spark.read.format("delta").load("/tmp/delta-target-location").show()

About

Slowly chaning dimensions using delta.io and Spark

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages