Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise clustering of event store #642

Open
cortadocodes opened this issue Apr 11, 2024 · 0 comments
Open

Optimise clustering of event store #642

cortadocodes opened this issue Apr 11, 2024 · 0 comments
Assignees
Labels
decision needed A decision is required (e.g. on UX or company policy) tech-debt Technical debt (tidy up, refactoring, restructuring, caused by laziness now)

Comments

@cortadocodes
Copy link
Member

Feature request

Use Case

We need to decide which fields to cluster on in the BigQuery event store and whether to pull the event kind out as a column.

Current state

The event kind is stored in the event JSON field and is queryable but cannot be ordered by (I don't think we need to order by it). We're currently clustering on ["sender", "question_uuid"] in that order. Clustering is order-dependent on the filtered fields and must include the fields of higher priority (to the left) of a clustered field to take advantage of the clustering.

@thclark says: "We’d need to cluster on event_kind otherwise you’d have to process (for example) all the log rows every time you want to query for input or output values (remember it’s column based storage so the filters aren’t like conventional SQL, it’ll process all rows in order to apply a filter). Also, regardless of clustering I think (??) it may be more efficient to filter directly on a column than on a JSONField."

Proposed Solution

Discuss and choose:

  • Whether to pull the event kind out as a field
  • The fields to cluster on and in what order
@cortadocodes cortadocodes added decision needed A decision is required (e.g. on UX or company policy) tech-debt Technical debt (tidy up, refactoring, restructuring, caused by laziness now) labels Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
decision needed A decision is required (e.g. on UX or company policy) tech-debt Technical debt (tidy up, refactoring, restructuring, caused by laziness now)
Projects
None yet
Development

No branches or pull requests

2 participants