Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoscaler underprovisions for uneven low latency traffic #15000

Open
Peilun-Li opened this issue Mar 11, 2024 · 0 comments
Open

Autoscaler underprovisions for uneven low latency traffic #15000

Peilun-Li opened this issue Mar 11, 2024 · 0 comments
Labels
kind/question Further information is requested

Comments

@Peilun-Li
Copy link

Peilun-Li commented Mar 11, 2024

Ask your question here:

Hi community, we have a potentially skewed low latency traffic targeting a CPU-bound knative service. With concurrency-based autoscaling, we are seeing a high p90+ latency. After we manually increase min-scale to an overprovisioned level, the p90+ latency goes back to the normal level. We suspect this might indicate an underprovision of autoscaler, and want to understand the reasons and explore potential solutions.

Hypothetical traffic pattern & example service settings:

  1. We receive one request every 10ms. Plus, at the start tick of each second, we receive 10 requests in parallel.
  2. The service is CPU-bound and can only process one request at a time (i.e. containerConcurrency=1). Additional requests have to wait in queue. Each request takes 10ms to process.

Expected behavior: autoscaler scales the service up to 11 (or higher considering the target utilization percentage)
Actual behavior: autoscaler underprovisions the service and higher p90+ latency.

We studied the autoscaler logic for concurrency based metric a bit and here's our understanding (definitely correct us if we are wrong): the way autoscaler tracks concurrency is actually AverageConcurrency). Using the above hypothetical traffic example, for each seconds:

// https://github.com/knative/serving/blob/main/vendor/knative.dev/networking/pkg/http/stats/request.go#L96-L104 
func (s *RequestStats) compute(now time.Time) {
	if durationSinceChange := now.Sub(s.lastChange); durationSinceChange > 0 {
		durationSecs := durationSinceChange.Seconds()
		s.secondsInUse += durationSecs // this will be 1 second after accumulation 
		s.computedConcurrency += s.concurrency * durationSecs // this will be 11*0.01+10*0.01+...+2*0.01+(1*0.01)*90=65*0.01+90*0.01=1.55
		s.computedProxiedConcurrency += s.proxiedConcurrency * durationSecs
		s.lastChange = now
	}
}

// https://github.com/knative/serving/blob/main/vendor/knative.dev/networking/pkg/http/stats/request.go#L144-L147
	if s.secondsInUse > 0 {
		report.AverageConcurrency = s.computedConcurrency / s.secondsInUse // this will be 1.55
		report.AverageProxiedConcurrency = s.computedProxiedConcurrency / s.secondsInUse
	}

With that (AverageConcurrency=1.55) it looks like autoscaler will try to scale up to 2, even if we have a peak concurrency of 11, i.e., autoscaler underprovisions if from the perspective of peak concurrency (but certainly makes sense for average concurrency)

Questions:

  1. Is our above understanding correct?
  2. I understand that average concurrency is desired in most cases in providing a good balance, but curious if there's any way in this case we can make it more reactive to such low-latency uneven traffic pattern. Ideally if we can have some toggle set on a per-service/revision basis to tune the sensitiveness of the concurrency metric, e.g., if with both average concurrency and peak concurrency reported, potentially a config ratio could help to tune autoscaling sensitiveness
autoscaler concurrency = (1-sensitivenss_ratio) * average_concurrency + sensitivenss_ratio * peak_concurrency

TIA for any insights and help!

@Peilun-Li Peilun-Li added the kind/question Further information is requested label Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant