Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(query): Adding quota parameters for individual query limits and further reporting. #1548

Merged
merged 3 commits into from
Apr 7, 2023

Conversation

kvpetrov
Copy link
Contributor

@kvpetrov kvpetrov commented Apr 1, 2023

Adding quota parameters for individual query limits and further reporting.

  • The commit(s) message(s) follows the contribution guidelines ?
  • Tests for the changes have been added (for bug fixes / features) ?
  • Docs have been added / updated (for bug fixes / features) ?

Current behavior :
Currently, there are some limits established for query processing, however, these limits are either defaults in the scala code or properties of FiloDB runtime.

New behavior :
We want to be able to pass these limits as part of the query utilizing PlannerParams

BREAKING CHANGES
Protos defining PlannerParams are redefined in a manner which is not backwards compatible. This should be ok as our scala pojos normally are not backwards compatible.

Other information:

alextheimer
alextheimer previously approved these changes Apr 3, 2023
Copy link
Contributor

@alextheimer alextheimer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotta love those ExecPlan::printTree comparison tests 😛
Approved-- please consider some of the comments I left before you merge.

Comment on lines 25 to 46
case class IndividualQuota(
execPlanSamples: Int = 1000000, // Limit on ExecPlan results in samples, default is 100K
execPlanResultBytes: Long = 18000000, // Limit on ExecPlan results in bytes, default is 18MB
groupByCardinality: Int = 100000, // Limit on "group by" clause results, default is 100K
joinQueryCardinality: Int = 100000, // Limit on binary join results, default is 100K
timeSeriesSamplesScanBytes: Long = 300000000) // Limit on max data scanned per shard, default is 300 MB

object IndividualQuota {
def apply(warnDefaults: Boolean): IndividualQuota = {
if (warnDefaults) {
IndividualQuota(
execPlanSamples = 50000,
execPlanResultBytes = 15000000,
groupByCardinality = 50000,
joinQueryCardinality = 50000,
timeSeriesSamplesScanBytes = 150000
)
} else {
IndividualQuota()
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these separate "warning" thresholds needed? Is it sufficient to warn at X% of the limit?

If the separate thresholds are needed, a Threshold(warning, limit) class is another option to consider (this way we wouldn't need something like apply(warnDefaults)).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did we come up with these warn thresholds? isn't 150 k for warn threshold too low when the absolute limit is 300 MB? I like @alextheimer's idea of using a static % as a warn limit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only copying whatever was already in the existing properties. This PR essentially groups limits in an object and allows to have two sets of quotas/limits: (1) enforced (2) warn and allows to pass these parameters from the QueryService but it DOES NOT change the existing behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning piece is fundamental for the PR. The reason we do not enforce the limits (we have very high limits) is because it's impossible for us to enforce them. There would be a number of users whose queries will stop to work and we simply cannot let it happen (imagine alerts or recording rules), instead, we need to:
A) establish quotas that we believe are resonable on WARN level
B) find out all the users/cases where the limits are breached
C) establish SPECIFIC HIGH limits for recording rules/alerts/ some API queries
D) enforce reasonable limits for EVERYBODY but the specific cases above
E) notify the users with specific high limits to let them know that their RR/alerts will be disabled in X days if they don't take care of it.

At this stage I think it's reasonable to have more flexibility in defining what the warn and enforce levels are. The ratio is hard to come up by looking at the crystal ball. Maybe it's 5x maybe 2. This we will figure out as part of an experiment around steps A/B/C.

Comment on lines 66 to 68
object PlannerParams {
def apply(constSpread: Option[SpreadProvider], sampleLimit: Int): PlannerParams =
PlannerParams(spreadOverride = constSpread, sampleLimit = sampleLimit)
PlannerParams(spreadOverride = constSpread, enforcedQuota = IndividualQuota(execPlanSamples = sampleLimit))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure where we use this specific apply, but it might construct PlannerParams with some warning limits less than the enforced limits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is expected, isn't it? the warning limits are smaller than enforced, ie at 2MB I warn you, at 10MB I won't let your query run.

Comment on lines 76 to 81
if (
result.size > queryContext.plannerParams.enforcedQuota.joinQueryCardinality &&
cardinality == Cardinality.OneToOne
) => throw new BadQueryException(s"The join in this query has input cardinality of ${result.size} which" +
s" is more than limit of ${queryContext.plannerParams.enforcedQuota.joinQueryCardinality}." +
s" Try applying more filters or reduce time range.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where will warnQuota be used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls, see the comment above, I will also update the description in the PR

@@ -22,26 +22,50 @@ case class PromQlQueryParams(promQl: String, startSecs: Long, stepSecs: Long, en

case object UnavailablePromQlQueryParams extends TsdbQueryParams

case class IndividualQuota(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename toPerQueryLimits ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
IndividualQuota sounds like stateful quotas for a given user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do it later, in the second PR, that PR will do nothing but renaming the object?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I leave it to you and @vishramachandran the later we do more we will have to work on preserving backward compatibility.

core/src/main/scala/filodb.core/query/QueryContext.scala Outdated Show resolved Hide resolved
Copy link
Contributor

@amolnayak311 amolnayak311 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see changes and incorporate the changes requested. Thanks for putting this together.

Comment on lines 25 to 46
case class IndividualQuota(
execPlanSamples: Int = 1000000, // Limit on ExecPlan results in samples, default is 100K
execPlanResultBytes: Long = 18000000, // Limit on ExecPlan results in bytes, default is 18MB
groupByCardinality: Int = 100000, // Limit on "group by" clause results, default is 100K
joinQueryCardinality: Int = 100000, // Limit on binary join results, default is 100K
timeSeriesSamplesScanBytes: Long = 300000000) // Limit on max data scanned per shard, default is 300 MB

object IndividualQuota {
def apply(warnDefaults: Boolean): IndividualQuota = {
if (warnDefaults) {
IndividualQuota(
execPlanSamples = 50000,
execPlanResultBytes = 15000000,
groupByCardinality = 50000,
joinQueryCardinality = 50000,
timeSeriesSamplesScanBytes = 150000
)
} else {
IndividualQuota()
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did we come up with these warn thresholds? isn't 150 k for warn threshold too low when the absolute limit is 300 MB? I like @alextheimer's idea of using a static % as a warn limit.

@@ -22,26 +22,50 @@ case class PromQlQueryParams(promQl: String, startSecs: Long, stepSecs: Long, en

case object UnavailablePromQlQueryParams extends TsdbQueryParams

case class IndividualQuota(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
IndividualQuota sounds like stateful quotas for a given user.

groupByCardLimit: Int = 100000,
joinQueryCardLimit: Int = 100000,
resultByteLimit: Long = 18000000, // 18MB
enforcedQuota: IndividualQuota = IndividualQuota(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find quotas and user principal , origin details etc in planner params inappropriate. I look at PlannerParams as parameters necessary during query planning. We don't use quotas/limits in query planning and it's more a runtime behavior (post planning, during query execution). So we should not call the class PlannerParams and it's more like a QueryContext. I understand you have picked the existing fields, added them to an object and added more fields and not necessarily expecting you to change it now.

@vishramachandran @alextheimer what's your view?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can move them to QueryContext too as part of a different PR, originally the parameters were int PlannerParams.

core/src/main/scala/filodb.core/query/QueryContext.scala Outdated Show resolved Hide resolved
@@ -180,13 +180,28 @@ object ProtoConverters {

implicit class PlannerParamsToProtoConverter(pp: PlannerParams) {
def toProto: GrpcMultiPartitionQueryService.PlannerParams = {
val enforcedQuotaBuilder = GrpcMultiPartitionQueryService.IndividualQuota.newBuilder()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the IndividualQuota <-> proto implicit class. Do not duplicate the object creation logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is one defined:
Line 243.
implicit class IndividualQuotaFromProtoConverter(giq: GrpcMultiPartitionQueryService.IndividualQuota) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just converts the proto to case class (fromProto), we need a toProto to convert the case class to gRPC object. Anywhere we deal with converting to and from proto we should use this implicit conversions rather than copying the object building logic as we have in this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got you, will add toProto (only fromProto is defined now)

query/src/main/scala/filodb/query/ProtoConverters.scala Outdated Show resolved Hide resolved
query/src/main/scala/filodb/query/ProtoConverters.scala Outdated Show resolved Hide resolved
@amolnayak311
Copy link
Contributor

Please update the description to give details if the PR and commit as per these guidelines.

@kvpetrov kvpetrov changed the title Adding quota parameters for individual query quotas and further repor… Adding quota parameters for individual query limits and further reporting. Apr 7, 2023
@kvpetrov kvpetrov changed the title Adding quota parameters for individual query limits and further reporting. feat(query): Adding quota parameters for individual query limits and further reporting. Apr 7, 2023
Copy link
Contributor

@amolnayak311 amolnayak311 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kvpetrov for the changes. LGTM. I have retriggered the PRB which was failing.

@kvpetrov kvpetrov merged commit c8ab83f into filodb:develop Apr 7, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants