The endpoint is lost when the APIServer is restored. #124547

Black-max12138 · 2024-04-26T06:32:26Z

What happened?

It happens when the apiserver goes down and after a few minutes when the apiserver comes back up, some endpoints have notReadyAddresses and do not recover.It's an accidental phenomenon.
The cause is that the endpoint obtained from the Informer is not the latest. In the syncService method of endpoint_controller.go,
currentEndpoints, err := e.endpointsLister.Endpoints(service.Namespace).Get(service.Name)
Because the obtained endpoint is not the latest, the system determines that the endpoints are the same. As a result, the endpoint is not updated.
I have added the log to print the endpoint and confirmed this section.
This is what the log shows.
I0425 08:25:57.715142 11 endpoints_controller.go:423] "About to update endpoints for service" service="manager/service-mchiroer"
I0425 08:25:57.715216 11 endpoints_controller.go:516] "endpoints are equal, skipping update" service="manager/service-mchiroer"
I0425 08:25:57.715225 11 endpoints_controller.go:389] "Finished syncing service endpoints" service="manager/service-mchiroer" startTime="83.332µs"
So I think the cache in informer is not caching the latest data, which is a bug.

What did you expect to happen?

The notReadyAddresses of the endpoint should be changed to addresses when the pod status is updated.

How can we reproduce it (as minimally and precisely as possible)?

1、Stop the apiserver service of the cluster.
2、Recover the apiserver service after a few minutes.
Repeat the preceding operations. The problem will recur.

Anything else we need to know?

No response

Kubernetes version

1.28

Cloud provider

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2024-04-26T06:32:35Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2024-04-26T06:35:37Z

@Black-max12138: The label(s) sig/area/controller-manager cannot be applied, because the repository doesn't have them.

In response to this:

/sig area/controller-manager

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Black-max12138 · 2024-04-26T06:36:32Z

/area controller-manager

neolit123 · 2024-04-26T11:19:42Z

So I think the cache in informer is not caching the latest data, which is a bug.

/sig api-machinery

Black-max12138 · 2024-05-07T03:23:12Z

Our latest discovery is that it's not a caching issue. It's a delay in pushing update events.
Therefore, we add UpdateFunc to endpointsInformer. When an update event is received, the endpoint is refreshed.

Black-max12138 added the kind/bug Categorizes issue or PR as related to a bug. label Apr 26, 2024

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 26, 2024

k8s-ci-robot added the area/controller-manager label Apr 26, 2024

k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The endpoint is lost when the APIServer is restored. #124547

The endpoint is lost when the APIServer is restored. #124547

Black-max12138 commented Apr 26, 2024

k8s-ci-robot commented Apr 26, 2024

k8s-ci-robot commented Apr 26, 2024

Black-max12138 commented Apr 26, 2024

neolit123 commented Apr 26, 2024

Black-max12138 commented May 7, 2024

The endpoint is lost when the APIServer is restored. #124547

The endpoint is lost when the APIServer is restored. #124547

Comments

Black-max12138 commented Apr 26, 2024

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot commented Apr 26, 2024

k8s-ci-robot commented Apr 26, 2024

Black-max12138 commented Apr 26, 2024

neolit123 commented Apr 26, 2024

Black-max12138 commented May 7, 2024