New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add APPROX_HISTOGRAM_K Operation #735
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few comments. But the approach looks sound overall.
aggregator/src/main/scala/ai/chronon/aggregator/base/SimpleAggregators.scala
Outdated
Show resolved
Hide resolved
aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala
Outdated
Show resolved
Hide resolved
aggregator/src/main/scala/ai/chronon/aggregator/base/SimpleAggregators.scala
Outdated
Show resolved
Hide resolved
spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala
Outdated
Show resolved
Hide resolved
ec04140
to
c7769f6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also add a test to spark/test/FetcherTest.scala
Looks good otherwise. Mostly non-blocking comments.
aggregator/src/main/scala/ai/chronon/aggregator/base/SimpleAggregators.scala
Outdated
Show resolved
Hide resolved
spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala
Outdated
Show resolved
Hide resolved
aggregator/src/main/scala/ai/chronon/aggregator/base/SimpleAggregators.scala
Outdated
Show resolved
Hide resolved
Appreciate you taking a look. Will address your comments and some from our internal repo shortly. |
Summary
Adds an APPROX_HISTOGRAM_K operation based on the FrequentItems Sketch:
https://datasketches.apache.org/docs/Frequency/FrequentItemsOverview.html
Why / Goal
The current histogram operation uses unbounded memory and isn't stable for production use cases.
Test Plan
We've end to end tested everything through backfills/CLI on the batch side on our end.
Checklist
Reviewers