Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add projection for batch exports on inserted_at #21839

Closed
wants to merge 4 commits into from

Conversation

tomasfarias
Copy link
Contributor

Problem

The most confusing thing for batch export users is their data not appearing as it was ingested with a delay. Having a proper projection that sorts the data by COALESCE(inserted_at, _timestamp) would allow us to remove any timestamp bounds that exclude events with delay, thus solving the problem. Also, it removes the need for a list of exceptions set via UNCONSTRAINED_TIMESTAMP_TEAM_IDS.

Changes

Adds projection, materializes last month. Only last month should work for ongoing realtime exports, we'll worry about backfills later.

馃憠 Stay up-to-date with PostHog coding conventions for a smoother review.

Does this work well for both Cloud and self-hosted?

How did you test this code?

Takes a second to create and materialize in ClickHouse cloud, what could go wrong?

Comment on lines 10 to 22
uuid,
event,
properties,
timestamp,
team_id,
distinct_id,
elements_chain,
created_at,
person_id,
inserted_at,
_timestamp,
person_created_at,
person_properties
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed set of columns instead of * should make this lighter to materialize.

@tomasfarias tomasfarias force-pushed the feat/batch-exports-inserted_at-projection branch from 287c91c to 22339b4 Compare May 2, 2024 14:13
@tomasfarias tomasfarias closed this May 2, 2024
@tomasfarias
Copy link
Contributor Author

tomasfarias commented May 2, 2024

Not going forward with this solution due to storage usage concerns, and ClickHouse projections being unreliable. Exploring a different solution that involves us writing the data somewhere else in different order (which is basically what a projection is, but to a more fitting storage).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant