-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(query): Adding quota parameters for individual query limits and further reporting. #1548
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotta love those ExecPlan::printTree
comparison tests 😛
Approved-- please consider some of the comments I left before you merge.
case class IndividualQuota( | ||
execPlanSamples: Int = 1000000, // Limit on ExecPlan results in samples, default is 100K | ||
execPlanResultBytes: Long = 18000000, // Limit on ExecPlan results in bytes, default is 18MB | ||
groupByCardinality: Int = 100000, // Limit on "group by" clause results, default is 100K | ||
joinQueryCardinality: Int = 100000, // Limit on binary join results, default is 100K | ||
timeSeriesSamplesScanBytes: Long = 300000000) // Limit on max data scanned per shard, default is 300 MB | ||
|
||
object IndividualQuota { | ||
def apply(warnDefaults: Boolean): IndividualQuota = { | ||
if (warnDefaults) { | ||
IndividualQuota( | ||
execPlanSamples = 50000, | ||
execPlanResultBytes = 15000000, | ||
groupByCardinality = 50000, | ||
joinQueryCardinality = 50000, | ||
timeSeriesSamplesScanBytes = 150000 | ||
) | ||
} else { | ||
IndividualQuota() | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these separate "warning" thresholds needed? Is it sufficient to warn at X% of the limit?
If the separate thresholds are needed, a Threshold(warning, limit)
class is another option to consider (this way we wouldn't need something like apply(warnDefaults)
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did we come up with these warn thresholds? isn't 150 k for warn threshold too low when the absolute limit is 300 MB? I like @alextheimer's idea of using a static % as a warn limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm only copying whatever was already in the existing properties. This PR essentially groups limits in an object and allows to have two sets of quotas/limits: (1) enforced (2) warn and allows to pass these parameters from the QueryService but it DOES NOT change the existing behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The warning piece is fundamental for the PR. The reason we do not enforce the limits (we have very high limits) is because it's impossible for us to enforce them. There would be a number of users whose queries will stop to work and we simply cannot let it happen (imagine alerts or recording rules), instead, we need to:
A) establish quotas that we believe are resonable on WARN level
B) find out all the users/cases where the limits are breached
C) establish SPECIFIC HIGH limits for recording rules/alerts/ some API queries
D) enforce reasonable limits for EVERYBODY but the specific cases above
E) notify the users with specific high limits to let them know that their RR/alerts will be disabled in X days if they don't take care of it.
At this stage I think it's reasonable to have more flexibility in defining what the warn and enforce levels are. The ratio is hard to come up by looking at the crystal ball. Maybe it's 5x maybe 2. This we will figure out as part of an experiment around steps A/B/C.
object PlannerParams { | ||
def apply(constSpread: Option[SpreadProvider], sampleLimit: Int): PlannerParams = | ||
PlannerParams(spreadOverride = constSpread, sampleLimit = sampleLimit) | ||
PlannerParams(spreadOverride = constSpread, enforcedQuota = IndividualQuota(execPlanSamples = sampleLimit)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure where we use this specific apply
, but it might construct PlannerParams
with some warning limits less than the enforced limits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is expected, isn't it? the warning limits are smaller than enforced, ie at 2MB I warn you, at 10MB I won't let your query run.
if ( | ||
result.size > queryContext.plannerParams.enforcedQuota.joinQueryCardinality && | ||
cardinality == Cardinality.OneToOne | ||
) => throw new BadQueryException(s"The join in this query has input cardinality of ${result.size} which" + | ||
s" is more than limit of ${queryContext.plannerParams.enforcedQuota.joinQueryCardinality}." + | ||
s" Try applying more filters or reduce time range.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where will warnQuota
be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls, see the comment above, I will also update the description in the PR
@@ -22,26 +22,50 @@ case class PromQlQueryParams(promQl: String, startSecs: Long, stepSecs: Long, en | |||
|
|||
case object UnavailablePromQlQueryParams extends TsdbQueryParams | |||
|
|||
case class IndividualQuota( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we rename toPerQueryLimits
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
IndividualQuota
sounds like stateful quotas for a given user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we do it later, in the second PR, that PR will do nothing but renaming the object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I leave it to you and @vishramachandran the later we do more we will have to work on preserving backward compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see changes and incorporate the changes requested. Thanks for putting this together.
case class IndividualQuota( | ||
execPlanSamples: Int = 1000000, // Limit on ExecPlan results in samples, default is 100K | ||
execPlanResultBytes: Long = 18000000, // Limit on ExecPlan results in bytes, default is 18MB | ||
groupByCardinality: Int = 100000, // Limit on "group by" clause results, default is 100K | ||
joinQueryCardinality: Int = 100000, // Limit on binary join results, default is 100K | ||
timeSeriesSamplesScanBytes: Long = 300000000) // Limit on max data scanned per shard, default is 300 MB | ||
|
||
object IndividualQuota { | ||
def apply(warnDefaults: Boolean): IndividualQuota = { | ||
if (warnDefaults) { | ||
IndividualQuota( | ||
execPlanSamples = 50000, | ||
execPlanResultBytes = 15000000, | ||
groupByCardinality = 50000, | ||
joinQueryCardinality = 50000, | ||
timeSeriesSamplesScanBytes = 150000 | ||
) | ||
} else { | ||
IndividualQuota() | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did we come up with these warn thresholds? isn't 150 k for warn threshold too low when the absolute limit is 300 MB? I like @alextheimer's idea of using a static % as a warn limit.
@@ -22,26 +22,50 @@ case class PromQlQueryParams(promQl: String, startSecs: Long, stepSecs: Long, en | |||
|
|||
case object UnavailablePromQlQueryParams extends TsdbQueryParams | |||
|
|||
case class IndividualQuota( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
IndividualQuota
sounds like stateful quotas for a given user.
groupByCardLimit: Int = 100000, | ||
joinQueryCardLimit: Int = 100000, | ||
resultByteLimit: Long = 18000000, // 18MB | ||
enforcedQuota: IndividualQuota = IndividualQuota(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find quotas and user principal , origin details etc in planner params inappropriate. I look at PlannerParams
as parameters necessary during query planning. We don't use quotas/limits in query planning and it's more a runtime behavior (post planning, during query execution). So we should not call the class PlannerParams
and it's more like a QueryContext
. I understand you have picked the existing fields, added them to an object and added more fields and not necessarily expecting you to change it now.
@vishramachandran @alextheimer what's your view?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can move them to QueryContext too as part of a different PR, originally the parameters were int PlannerParams.
@@ -180,13 +180,28 @@ object ProtoConverters { | |||
|
|||
implicit class PlannerParamsToProtoConverter(pp: PlannerParams) { | |||
def toProto: GrpcMultiPartitionQueryService.PlannerParams = { | |||
val enforcedQuotaBuilder = GrpcMultiPartitionQueryService.IndividualQuota.newBuilder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the IndividualQuota
<-> proto implicit class. Do not duplicate the object creation logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is one defined:
Line 243.
implicit class IndividualQuotaFromProtoConverter(giq: GrpcMultiPartitionQueryService.IndividualQuota) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just converts the proto to case class (fromProto
), we need a toProto
to convert the case class to gRPC object. Anywhere we deal with converting to and from proto we should use this implicit conversions rather than copying the object building logic as we have in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got you, will add toProto (only fromProto is defined now)
Please update the description to give details if the PR and commit as per these guidelines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kvpetrov for the changes. LGTM. I have retriggered the PRB which was failing.
Adding quota parameters for individual query limits and further reporting.
Current behavior :
Currently, there are some limits established for query processing, however, these limits are either defaults in the scala code or properties of FiloDB runtime.
New behavior :
We want to be able to pass these limits as part of the query utilizing PlannerParams
BREAKING CHANGES
Protos defining PlannerParams are redefined in a manner which is not backwards compatible. This should be ok as our scala pojos normally are not backwards compatible.
Other information: