Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(query): Quantile query implementation using DDSketch #817

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

HimaVarsha94
Copy link
Contributor

@HimaVarsha94 HimaVarsha94 commented Jul 13, 2020

The PR is WIP but sending out to get rough comments about the implementation.
TODO:

  • Handle negative values with another sketch
  • Clean code

Pull Request checklist

  • The commit(s) message(s) follows the contribution guidelines ?
  • Tests for the changes have been added (for bug fixes / features) ?
  • Docs have been added / updated (for bug fixes / features) ?

Current behavior :
The current quantile query uses T-Digest.

New behavior :
The new quantile query implements DDSketch. The primary advantage is to guarantee relative accuracy.

@HimaVarsha94 HimaVarsha94 changed the title WIP: feat(query): Quantile query implementation using DDSketch feat(query): Quantile query implementation using DDSketch Jul 13, 2020
Copy link
Member

@velvia velvia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HimaVarsha94 this looks like a great first step towards adding DDSketch.

I think we will need to implement the serialization, otherwise we could not transmit intermediate quantile results across nodes. We can discuss that offline....

@@ -324,6 +327,91 @@ final case class MaxHistogram(innerHist: MutableHistogram, max: Double) extends
}
}

final case class DDSHistogram(relativeAccuracy: Double) extends HistogramWithBuckets {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add ScalaDoc here with description.... what are the units or bounds of relativeAccuracy? what are typical values?

getValueAtQuantile(q, getCount)
}

def getValueAtQuantile(quantile: Double, count: Long): Double = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you had to implement this due to some reason you could not use the underlying DDSketch's own code? Please add comment

indexMapping.value(bin.getIndex)
}

def compare(other: DDSHistogram): Int = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to only compare logGamma and not the buckets? Not sure of the purpose, and why we return an Int (as opposed to Boolean)

def serialize(intoBuf: Option[MutableDirectBuffer] = None): MutableDirectBuffer = ???

val logGamma: Double = Math.log((1 + relativeAccuracy) / (1 - relativeAccuracy))
val maxIndexedValue = maxIndexableValue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a self-reference. I see later there is a method named the same, but for clarify you should rename the method, or since this is just initialized once, you could probably move the init code to up here.

var tdig = TDigest.createArrayDigest(100)
val row = new QuantileAggTransientRow()
class QuantileHolder(var timestamp: Long = 0L,
var h: bv.DDSHistogram = bv.DDSHistogram.empty(relativeAccuracy)) extends AggregateHolder {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line up the columns?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants