Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][CHIP] compile.py improvements: object linking #700

Open
cristianfr opened this issue Feb 28, 2024 · 0 comments
Open

[WIP][CHIP] compile.py improvements: object linking #700

cristianfr opened this issue Feb 28, 2024 · 0 comments

Comments

@cristianfr
Copy link
Contributor

cristianfr commented Feb 28, 2024

Problem Statement

Our compile api tends to limit the scope to the conf in question, this has a negative effect that encourages append only materialized configs and makes deprecation of group bys quite a challenge. Deleting a group by can affect joins that are already materialized and since deletion is expected to be handled by the users without going through the API is prone to mistakes.

Similarly it's common that when users build a new group by based on a source used by other group bys they are going to need similar job tuning settings (like minimal cores to make sure the streaming job process all partitions). Pointing users to similar group bys during compile could save duplication of group bys, or even improve feature discoverability.

Requirements

  • A delete mode for group bys that prevents uploads and streaming jobs that are not consumed.
  • Simple reference to similar group bys (i.e. group bys from the same source table and same key) when developing a group by

Verification

  • Unit tests

Approach

  • One approach is to compile the state of the production folder previous to compiling.
  • This makes it simple to point out similar group bys by linking objects together during compilation. This could be also the basis for an explore.py like API, or even the backend for a python API discoverability UI. Linking objects would make the delete mode fairly trivial. This is probably worth another CHIP but having a compiler that links the objects would allow to decompose a feature name into it's first principles.
  • Alternative approaches - and the reason for discarding those approaches.

User API (when required)

  • To delete: compile.py --conf production/group_bys/... --delete

Planning

TBD

@cristianfr cristianfr changed the title [WIP][CHIP] compile.py [delete mode + similar group bys] [WIP][CHIP] compile.py improvements: object linking Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant