Polling khstate from K8s API-server every 5sec for khcheck leading to client side throttling and high latency #1061

sid8489 · 2022-05-13T06:50:02Z

Currently kuberhealthy-operator creates a resource called khstate for every khcheck. It keeps on polling khstate for each khcheck from k8s API server every 5 sec Here. It leads to a lot of requests to K8s API server.

We are seeing a lot of ### Waited for Xs due to client-side throttling, not priority and fairness. message in kuberhealthy-operator pods .

Also We are seeing a lot of latency in request served by kuberhealthy server (P95 ~ 53 sec for 2xx, only 50% request are served under 1sec). These metrics have been generated using standard metrics exposed by istio sidecar

Note: <50 checks are running in cluster. All above snapshots are for master pod
@integrii What's the motvation behind storing khstate as crd instead in inmemory (maybe as a concurrent hash map)?

integrii · 2022-05-15T20:18:58Z

This makes sense, and repeatedly polling the API server is not optimal. It seems like Kuberhealthy needs a refactor to a reflector cache for khstate and khcheck resources. Then, when a khstate is needed, Kuberhealthy will not even make a call to the API server.

The reason that khstate exists is to provide a persistent store of state that all Kuberhealthy pods can agree on. This makes Kuberhealthy highly available without complex clustering algorithms. By default Kuberhealthy comes with two pods in a deployment. The pod that comes alphabetically first is always the master pod, but either Kuberhealthy pod can service requests for status because they both just return the data from the khstate resources.

Implementing a reflector would stop the API spam, improve the speed of Kuberhealthy, and prevent the issue you are seeing here.

integrii · 2022-05-16T00:00:55Z

I just did some work on this... I am refactoring this to use the existing cache for khstate resources... Progress is in the khstate-cache branch.

sid8489 · 2022-10-10T05:17:26Z

I just did some work on this... I am refactoring this to use the existing cache for khstate resources... Progress is in the khstate-cache branch.

@integrii When can we expect a release for this? Also I would like to contribute to this change in case anything is pending

jplouis · 2024-01-18T02:39:11Z

Trying to understand the code base. Could the StateReflector be passed into the external.Check either instead of or along with the khstatechlient? Looking at this line. That way the ext.podHasReportedInAfterTime could use the reflector cache?

jplouis · 2024-02-28T15:40:02Z

@integrii I did a quick and dirty pass of using the state reflector store to alleviate some API requests. It improved the throughput a little bit. Don't have the numbers handy.

master...jplouis:kuberhealthy:master

github-actions · 2024-03-30T00:20:36Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment on the issue or this will be closed in 15 days.

integrii self-assigned this May 13, 2022

fungaren mentioned this issue Jul 11, 2023

KuberhealthyCheck status field instead of separate KuberhealthyState object #1120

Open

github-actions bot added the Stale label Mar 30, 2024

integrii added do-not-close Prevents issues from being automatically closed if stale and removed Stale labels Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Polling khstate from K8s API-server every 5sec for khcheck leading to client side throttling and high latency #1061

Polling khstate from K8s API-server every 5sec for khcheck leading to client side throttling and high latency #1061

sid8489 commented May 13, 2022 •

edited

integrii commented May 15, 2022

integrii commented May 16, 2022

sid8489 commented Oct 10, 2022

jplouis commented Jan 18, 2024

jplouis commented Feb 28, 2024

github-actions bot commented Mar 30, 2024

Polling khstate from K8s API-server every 5sec for khcheck leading to client side throttling and high latency #1061

Polling khstate from K8s API-server every 5sec for khcheck leading to client side throttling and high latency #1061

Comments

sid8489 commented May 13, 2022 • edited

integrii commented May 15, 2022

integrii commented May 16, 2022

sid8489 commented Oct 10, 2022

jplouis commented Jan 18, 2024

jplouis commented Feb 28, 2024

github-actions bot commented Mar 30, 2024

sid8489 commented May 13, 2022 •

edited