Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polling khstate from K8s API-server every 5sec for khcheck leading to client side throttling and high latency #1061

Open
sid8489 opened this issue May 13, 2022 · 6 comments
Assignees
Labels
do-not-close Prevents issues from being automatically closed if stale

Comments

@sid8489
Copy link

sid8489 commented May 13, 2022

Currently kuberhealthy-operator creates a resource called khstate for every khcheck. It keeps on polling khstate for each khcheck from k8s API server every 5 sec Here. It leads to a lot of requests to K8s API server.

We are seeing a lot of ### Waited for Xs due to client-side throttling, not priority and fairness. message in kuberhealthy-operator pods .
Screenshot 2022-05-13 at 11 54 12 AM

Also We are seeing a lot of latency in request served by kuberhealthy server (P95 ~ 53 sec for 2xx, only 50% request are served under 1sec). These metrics have been generated using standard metrics exposed by istio sidecar
image

Note: <50 checks are running in cluster. All above snapshots are for master pod
@integrii What's the motvation behind storing khstate as crd instead in inmemory (maybe as a concurrent hash map)?

@integrii integrii self-assigned this May 13, 2022
@integrii
Copy link
Collaborator

This makes sense, and repeatedly polling the API server is not optimal. It seems like Kuberhealthy needs a refactor to a reflector cache for khstate and khcheck resources. Then, when a khstate is needed, Kuberhealthy will not even make a call to the API server.

The reason that khstate exists is to provide a persistent store of state that all Kuberhealthy pods can agree on. This makes Kuberhealthy highly available without complex clustering algorithms. By default Kuberhealthy comes with two pods in a deployment. The pod that comes alphabetically first is always the master pod, but either Kuberhealthy pod can service requests for status because they both just return the data from the khstate resources.

Implementing a reflector would stop the API spam, improve the speed of Kuberhealthy, and prevent the issue you are seeing here.

@integrii
Copy link
Collaborator

I just did some work on this... I am refactoring this to use the existing cache for khstate resources... Progress is in the khstate-cache branch.

@sid8489
Copy link
Author

sid8489 commented Oct 10, 2022

I just did some work on this... I am refactoring this to use the existing cache for khstate resources... Progress is in the khstate-cache branch.

@integrii When can we expect a release for this? Also I would like to contribute to this change in case anything is pending

@jplouis
Copy link

jplouis commented Jan 18, 2024

Trying to understand the code base. Could the StateReflector be passed into the external.Check either instead of or along with the khstatechlient? Looking at this line. That way the ext.podHasReportedInAfterTime could use the reflector cache?

@jplouis
Copy link

jplouis commented Feb 28, 2024

@integrii I did a quick and dirty pass of using the state reflector store to alleviate some API requests. It improved the throughput a little bit. Don't have the numbers handy.

master...jplouis:kuberhealthy:master

Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment on the issue or this will be closed in 15 days.

@github-actions github-actions bot added the Stale label Mar 30, 2024
@integrii integrii added do-not-close Prevents issues from being automatically closed if stale and removed Stale labels Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-close Prevents issues from being automatically closed if stale
Projects
None yet
Development

No branches or pull requests

3 participants