You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi community, we have a potentially skewed low latency traffic targeting a CPU-bound knative service. With concurrency-based autoscaling, we are seeing a high p90+ latency. After we manually increase min-scale to an overprovisioned level, the p90+ latency goes back to the normal level. We suspect this might indicate an underprovision of autoscaler, and want to understand the reasons and explore potential solutions.
Hypothetical traffic pattern & example service settings:
We receive one request every 10ms. Plus, at the start tick of each second, we receive 10 requests in parallel.
The service is CPU-bound and can only process one request at a time (i.e. containerConcurrency=1). Additional requests have to wait in queue. Each request takes 10ms to process.
Expected behavior: autoscaler scales the service up to 11 (or higher considering the target utilization percentage)
Actual behavior: autoscaler underprovisions the service and higher p90+ latency.
We studied the autoscaler logic for concurrency based metric a bit and here's our understanding (definitely correct us if we are wrong): the way autoscaler tracks concurrency is actually AverageConcurrency). Using the above hypothetical traffic example, for each seconds:
// https://github.com/knative/serving/blob/main/vendor/knative.dev/networking/pkg/http/stats/request.go#L96-L104
func (s *RequestStats) compute(now time.Time) {
if durationSinceChange := now.Sub(s.lastChange); durationSinceChange > 0 {
durationSecs := durationSinceChange.Seconds()
s.secondsInUse += durationSecs // this will be 1 second after accumulation
s.computedConcurrency += s.concurrency * durationSecs // this will be 11*0.01+10*0.01+...+2*0.01+(1*0.01)*90=65*0.01+90*0.01=1.55
s.computedProxiedConcurrency += s.proxiedConcurrency * durationSecs
s.lastChange = now
}
}
// https://github.com/knative/serving/blob/main/vendor/knative.dev/networking/pkg/http/stats/request.go#L144-L147
if s.secondsInUse > 0 {
report.AverageConcurrency = s.computedConcurrency / s.secondsInUse // this will be 1.55
report.AverageProxiedConcurrency = s.computedProxiedConcurrency / s.secondsInUse
}
With that (AverageConcurrency=1.55) it looks like autoscaler will try to scale up to 2, even if we have a peak concurrency of 11, i.e., autoscaler underprovisions if from the perspective of peak concurrency (but certainly makes sense for average concurrency)
Questions:
Is our above understanding correct?
I understand that average concurrency is desired in most cases in providing a good balance, but curious if there's any way in this case we can make it more reactive to such low-latency uneven traffic pattern. Ideally if we can have some toggle set on a per-service/revision basis to tune the sensitiveness of the concurrency metric, e.g., if with both average concurrency and peak concurrency reported, potentially a config ratio could help to tune autoscaling sensitiveness
Ask your question here:
Hi community, we have a potentially skewed low latency traffic targeting a CPU-bound knative service. With concurrency-based autoscaling, we are seeing a high p90+ latency. After we manually increase min-scale to an overprovisioned level, the p90+ latency goes back to the normal level. We suspect this might indicate an underprovision of autoscaler, and want to understand the reasons and explore potential solutions.
Hypothetical traffic pattern & example service settings:
Expected behavior: autoscaler scales the service up to 11 (or higher considering the target utilization percentage)
Actual behavior: autoscaler underprovisions the service and higher p90+ latency.
We studied the autoscaler logic for concurrency based metric a bit and here's our understanding (definitely correct us if we are wrong): the way autoscaler tracks concurrency is actually AverageConcurrency). Using the above hypothetical traffic example, for each seconds:
With that (AverageConcurrency=1.55) it looks like autoscaler will try to scale up to 2, even if we have a peak concurrency of 11, i.e., autoscaler underprovisions if from the perspective of peak concurrency (but certainly makes sense for average concurrency)
Questions:
TIA for any insights and help!
The text was updated successfully, but these errors were encountered: