You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We completed the techspikes around our data infrastructure. As an outcome, we're going to implement our ETL pipeline in DuckDB such that we maintain many of our constraints:
We can test everything end-to-end (pytest, duckdb, plots)
We can reduce requirements on servers/infra/ops by using in-memory, embedded, on-disk db (duckb)
We can continue doing distributed computing (ray)
DB store can grow elastic w/ lvm, efs, filestore
Outline
Our first goal is to take the current ETL workflow and update it end-to-end.
Shelved Deliverables
CLOSED TICKET - Add ETL checkpoint to enforce SLAs, and process data incrementally. #694
Reason: We're going to instead implement a build step that leverages a simple SQL strategy w/ temp tables, such that we can enforce SLAs in a clean manner.
Discovered some issues related to data fetching on the main branch, because multiple things are rewritten in the ETF flow I leave them here in case they got solved along the way, if not maybe here would be the place to solve them:
Column don't match: "timestamp" and "tx_id" - solved if file is deleted
Error on saving data - solved if the gql command is rerun
Motivation
We completed the techspikes around our data infrastructure. As an outcome, we're going to implement our ETL pipeline in DuckDB such that we maintain many of our constraints:
Outline
Our first goal is to take the current ETL workflow and update it end-to-end.
Shelved Deliverables
CLOSED TICKET - Add ETL checkpoint to enforce SLAs, and process data incrementally. #694
Reason: We're going to instead implement a build step that leverages a simple SQL strategy w/ temp tables, such that we can enforce SLAs in a clean manner.
DoD
[First Deliverable - Update Ingestion + Load]
[Core System Updates]
[Update ETL Deliverables]
etl_
view for ETL build steps. Downstream bronze and silver tables require data from bothlive_
andbuild_
tables. #810[ETL CLI Deliverables]
pdr analytics describe, query, validate, resume
CLI command #883[Cleanup Deliverables]
parquet_dir
uselake_dir
instead #770[Ratchet Integration]
#1109
[Post-DuckDB Merge - Core Functionality]
[Post-DuckDB - Peripheral Functionality]
These are frozen. Do not start/complete until DuckDB review/work is complete.
fill
becomesinsert
,override
becomesupsert
The text was updated successfully, but these errors were encountered: