New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(query): Quantile query implementation using DDSketch #817
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HimaVarsha94 this looks like a great first step towards adding DDSketch.
I think we will need to implement the serialization, otherwise we could not transmit intermediate quantile results across nodes. We can discuss that offline....
@@ -324,6 +327,91 @@ final case class MaxHistogram(innerHist: MutableHistogram, max: Double) extends | |||
} | |||
} | |||
|
|||
final case class DDSHistogram(relativeAccuracy: Double) extends HistogramWithBuckets { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add ScalaDoc here with description.... what are the units or bounds of relativeAccuracy? what are typical values?
getValueAtQuantile(q, getCount) | ||
} | ||
|
||
def getValueAtQuantile(quantile: Double, count: Long): Double = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose you had to implement this due to some reason you could not use the underlying DDSketch's own code? Please add comment
indexMapping.value(bin.getIndex) | ||
} | ||
|
||
def compare(other: DDSHistogram): Int = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to only compare logGamma and not the buckets? Not sure of the purpose, and why we return an Int (as opposed to Boolean)
def serialize(intoBuf: Option[MutableDirectBuffer] = None): MutableDirectBuffer = ??? | ||
|
||
val logGamma: Double = Math.log((1 + relativeAccuracy) / (1 - relativeAccuracy)) | ||
val maxIndexedValue = maxIndexableValue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a self-reference. I see later there is a method named the same, but for clarify you should rename the method, or since this is just initialized once, you could probably move the init code to up here.
var tdig = TDigest.createArrayDigest(100) | ||
val row = new QuantileAggTransientRow() | ||
class QuantileHolder(var timestamp: Long = 0L, | ||
var h: bv.DDSHistogram = bv.DDSHistogram.empty(relativeAccuracy)) extends AggregateHolder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line up the columns?
The PR is WIP but sending out to get rough comments about the implementation.
TODO:
Pull Request checklist
Current behavior :
The current quantile query uses T-Digest.
New behavior :
The new quantile query implements DDSketch. The primary advantage is to guarantee relative accuracy.