Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor idea: Move queries of separate data artifacts into a single dbt model #277

Open
atvaccaro opened this issue Aug 31, 2023 · 0 comments

Comments

@atvaccaro
Copy link
Contributor

Currently, the generate_reports_data.py script queries several different tables (e.g. fct_monthly_reports_site_organization_gtfs_vendors and fct_daily_reports_site_organization_scheduled_service_summary) which are processed and "joined" together by being written into the same output folders. Rather than try to combine these artifacts and/or add validation with something like Pydantic on top of these existing queries, It should be possible to create a single dbt model whose grain is year-month-itp_id so rows are 1:1 with final report pages. BigQuery rows can contain JSON and arrays to represent the nested nature of some of this data.

If this model is implemented, the "data generation" script could consist of just querying this single model and writing a single artifact (with some additional fields added post-query, such as RT feed URLs, that are more difficult to do in BigQuery).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant