Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics server 0.6.4 returns 404 for service discovery #1410

Open
johnmreynolds opened this issue Jan 19, 2024 · 7 comments
Open

Metrics server 0.6.4 returns 404 for service discovery #1410

johnmreynolds opened this issue Jan 19, 2024 · 7 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@johnmreynolds
Copy link

What happened:

Upgraded metrics-server from 0.6.3 to 0.6.4 on k8s 1.27 on EKS. After doing so, apiservice discovery failed with the metrics server returning a 404 to the probe. Reverting to 0.6.3 resolved the issue.

What you expected to happen:

Metrics server continues to function.

Anything else we need to know?:

Instead service discovery from apiservice fails with:
failing or missing response from https://10.252.25.64:10250/apis/metrics.k8s.io/v1beta1: bad status from https://10.252.25.64:10250/apis/metrics.k8s.io/v1beta1: 404

After reverting to 0.6.3 it's fine again.

Environment:

  • Kubernetes distribution (GKE, EKS, Kubeadm, the hard way, etc.):

EKS.

  • Container Network Setup (flannel, calico, etc.):

Amazon EKS CNI.

  • Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"26+", GitVersion:"v1.26.10-eks-e71965b", GitCommit:"59abc9f4aa3073d5b8283627e77e98703ef4ad97", GitTreeState:"clean", BuildDate:"2023-11-14T10:00:51Z", GoVersion:"go1.20.10", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"27+", GitVersion:"v1.27.9-eks-5e0fdde", GitCommit:"3f8ed3d5017d988600f597734a4851930eda35a6", GitTreeState:"clean", BuildDate:"2024-01-02T20:34:38Z", GoVersion:"go1.20.12", Compiler:"gc", Platform:"linux/amd64"}

  • Metrics Server manifest

Unchanged chart 3.9.0 from https://kubernetes-sigs.github.io/metrics-server/

  • Kubelet config:

Fargate defaults.

  • Metrics server logs:

Server seems to be running fine.

  • Status of Metrics API:

FailedDiscovery due to bad status.

/kind bug

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 19, 2024
@johnmreynolds
Copy link
Author

No idea if this is useful or easy to debug, but I thought I'd report this in case anyone else if having this issue.

I've reverted to 0.6.3 for now and it all seems fine.

@yangjunmyfm192085
Copy link
Contributor

This is very strange. metrics-server v0.6.3 to v0.6.4 are basically unchanged
https://github.com/kubernetes-sigs/metrics-server/releases/tag/v0.6.4

By the way, why do you use port 10250 to access apiserver? This port should be the kubelet port by default.

@johnmreynolds
Copy link
Author

I've honestly know idea on the port issue, I've not changed anything on that from the defaults as far as I know. That was just the error message from k8s apiservice I was quoting. Possibly that expands the port number.

@dgrisonnet
Copy link
Member

/assign @yangjunmyfm192085
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 25, 2024
@yamagai
Copy link

yamagai commented Feb 2, 2024

exact same issue with metrics-server v0.7.0 on EKS Fargate (k8s version is 1.26).

However, this problem did not occur with metrics-server v0.6.4.

@jmansar
Copy link

jmansar commented Feb 4, 2024

I had exactly the same issue using both helm charts 3.11.0 and 3.10.0 with metrics-server 0.6.4 and 0.6.3 running on EKS Fargate.

The default container port has changed from 4443 to 10250 in 3.10.0 in the following commit

Restoring metrics-server container port back to 4443 on the helm chart strangely resolved the issue.

containerPort: 4443

@elovelan
Copy link

elovelan commented Apr 24, 2024

See #1026 for more details on why this change was made.

Fargate is one of the few situations (along with hostNetwork: true) where this comes into play, because unlike most nodes, a Fargate node shares its IP with the pod running on it (and doesn't allow hostPort remapping). This is a common issue for Fargate and software that intentionally adopts the pattern of using the kubelet port by default to make firewalling easier (e.g. see the warning in the cert-manager docs).

IMO this pattern is becoming common enough that AWS should add it to its Fargate documentation. In the meantime, similar to the cert-manager doc mentioned above, it feels like it might be worth adding this workaround to the EKS section of the Known Issues doc (I'm happy to open this PR).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

7 participants