Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: ValueAggregator state capability #13019

Open
davecromberge opened this issue Apr 27, 2024 · 0 comments
Open

Performance: ValueAggregator state capability #13019

davecromberge opened this issue Apr 27, 2024 · 0 comments

Comments

@davecromberge
Copy link
Member

davecromberge commented Apr 27, 2024

Summary
Some value aggregators are resource intensive for pairwise merges on index creation.
Enhance the ValueAggregator interface to allow implementors to store intermediate state when merging aggregated records with raw values.

Why it is important
Some operations involving sketches perform poorly when intermediate book-keeping structures are re-created for every merge operation. Instead, using a Merge/Union object as state will permit more efficient merging of raw values before yielding a final result to store in an index such as the StarTree. This will speed up resource-intensive segment merges and index creation operations in large-scale production clusters.

Proposal
Extend the existing interface to include a State type S:

public interface ValueAggregator<R, S, A>

A new method is added to realise the aggregate value A from S:

A getFinalAggregatedValue(S stateValue);

All existing methods that merge records should use the state S instead.

Alternatives
I have preserved the current interface structures in my branch but have instead changed the aggregated type to an opaque Object. This means that I can dynamically switch between state and final aggregates for sketches such as the ThetaSketch - see this file. This yields significant performance improvements over the current implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants