Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCD Type 2 from periodic snapshots #2009

Open
plaflamme opened this issue Jan 23, 2024 · 2 comments
Open

SCD Type 2 from periodic snapshots #2009

plaflamme opened this issue Jan 23, 2024 · 2 comments
Labels
Improvement Improves existing functionality

Comments

@plaflamme
Copy link
Contributor

This PR: #1997 adds a new way of maintaining a SCD Type 2 model from detecting changes to the source table's columns.

This issue is for tracking the idea of extending this behaviour to build a SCD Type 2 from a table that contains periodic snapshots of the source data.

Imagine a source table that looks like this:

name,price
foo,20

And some periodic process that takes snapshot of this data and makes those snapshots available in another table, e.g.:

name,price,snapshot_date
foo,20,2024-01-01
foo,20,2024-01-02
foo,21,2024-01-03
foo,21,2024-01-04
food,21,2024-01-05

These snaphsots allow tracking the changes that were made to the individual rows (by comparing the values), but it also contains a timestamp that can be used to determine when those changes occured. As such, a SCD Type 2 dimension can be built from this data which might look like this:

name,price,valid_from,valid_to
foo,20,2024-01-01,2024-01-02
foo,21,2024-01-03,2024-01-04
food,21,2024-01-05,NULL

Ideally, the new SCD_TYPE_2_BY_COLUMN model kind would allow specifying a column (snapshot_date in this case) as the timestamp to use for determining when a row has changed instead of using execution_time.

@treysp treysp added the Improvement Improves existing functionality label Jan 24, 2024
@gavin-dawn-capital
Copy link

Adding a +1, would be ideal to have a 3rd model that combines the two SCD2 model types where I can specify the date (in this case a snapshot date or loaded at) and the columns to check for changes, when changes are detected the date column is used if not row is ignored

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Improvement Improves existing functionality
Projects
None yet
Development

No branches or pull requests

4 participants