-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to scrape node" err="Get \"https://10.100.93.58:10250/metrics/resource\": context deadline exceeded" #1352
Comments
Are you using any CNI plugin (calico, weave, vpc-cni, etc.)? If so, setting |
Yes using aws VPC-cni. Let me try the setting up hostnetwork: true in deployment of metrics server and update the observation here. |
@MahiraTechnology Let me know how it goes - we're on EKS and noticed in newer versions of the vpc-cni plugin, there was a communication breakdown somewhere between the pod VPC (where metrics-server runs), the node VPC, and the control plane. After setting |
@brosef i tried to deploy the Metrics server with hostnetwork: true, starting seeing below issue. panic: failed to create listener: failed to listen on 0.0.0.0:10250: listen tcp 0.0.0.0:10250: bind: address already in use |
you probably have to change the port to something else, 10250 will clash with the kubelet API port. try setting |
@brosef i deployed with port 4443 still i am seeing the same issue in metrics server pod.I1026 18:09:39.580732 1 serving.go:342] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key) |
check your security group and firewall rules. ensure tcp 10250 is open between nodes |
@brosef
|
that could mean one of many things. try running through this article: https://repost.aws/knowledge-center/eks-cni-plugin-troubleshooting |
@brosef i went through above shared link , everthing looks ok. I see below error msg on HPA -failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready) Metrics server continue to print the same logs as before. |
/assign @CatherineF-dev @dgrisonnet |
Connection to node hostname or IP from within the metrics-server Pod is a problem to me. I'm facing the same issue when using flannel. |
@MahiraTechnology After opening the port 10250 and 443 on Node SG with the source range of VPC the issue fixed |
Same as @vgokul984
|
What happened:
Looks like pods are not scaling based on the load which is causing the pods to restart
What you expected to happen:
HPA should scale based on the load
Anything else we need to know?:
Environment:
Running on EKS 1.27
metrics-server 0.6.3
Kubernetes distribution (GKE, EKS, Kubeadm, the hard way, etc.):
Container Network Setup (flannel, calico, etc.):
Kubernetes version (use
kubectl version
):Metrics Server manifest
spoiler for Metrics Server manifest:
Using helm chart
spoiler for Kubelet config:
spoiler for Metrics Server logs:
I1025 19:47:59.616036 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
E1025 19:48:28.004348 1 scraper.go:140] "Failed to scrape node" err="Get "https://10.100.55.152:10250/metrics/resource\": context deadline exceeded" node="ip-10-100-55-152.ca-central-1.compute.internal"
E1025 19:48:58.004680 1 scraper.go:140] "Failed to scrape node" err="Get "https://10.100.48.155:10250/metrics/resource\": dial tcp 10.100.48.155:10250: i/o timeout" node="ip-10-100-48-155.ca-central-1.compute.internal"
E1025 19:49:13.005190 1 scraper.go:140] "Failed to scrape node" err="Get "https://10.100.48.155:10250/metrics/resource\": context deadline exceeded" node="ip-10-100-48-155.ca-central-1.compute.internal"
E1025 19:49:28.003975 1 scraper.go:140] "Failed to scrape node" err="Get "https://10.100.48.155:10250/metrics/resource\": context deadline exceeded" node="ip-10-100-48-155.ca-central-1.compute.internal"
E1025 19:53:29.599618 1 scraper.go:140] "Failed to scrape node" err="Get "https://10.100.78.163:10250/metrics/resource\": remote error: tls: internal error" node="ip-10-100-78-163.ca-central-1.compute.internal"
E1025 19:54:44.588439 1 scraper.go:140] "Failed to scrape node" err="Get "https://10.100.50.210:10250/metrics/resource\": remote error: tls: internal error" node="ip-10-100-50-210.ca-central-1.compute.internal"
E1025 19:55:28.004773 1 scraper.go:140] "Failed to scrape node" err="Get "https://10.100.73.41:10250/metrics/resource\": context deadline exceeded" node="ip-10-100-73-41.ca-central-1.compute.internal"
spolier for Status of Metrics API:
kubectl describe apiservices v1beta1.metrics.k8s.io
Name: v1beta1.metrics.k8s.io
Namespace:
Labels: app.kubernetes.io/instance=metrics-server
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=metrics-server
app.kubernetes.io/version=0.6.3
helm.sh/chart=metrics-server-3.10.0
Annotations: meta.helm.sh/release-name: metrics-server
meta.helm.sh/release-namespace: kube-system
API Version: apiregistration.k8s.io/v1
Kind: APIService
Metadata:
Creation Timestamp: 2023-07-03T08:59:27Z
Resource Version: 82108513
UID: de273b86-9ba6-4d8d-929c-b972d87717e1
Spec:
Group: metrics.k8s.io
Group Priority Minimum: 100
Insecure Skip TLS Verify: true
Service:
Name: metrics-server
Namespace: kube-system
Port: 443
Version: v1beta1
Version Priority: 100
Status:
Conditions:
Last Transition Time: 2023-10-25T19:48:23Z
Message: all checks passed
Reason: Passed
Status: True
Type: Available
Events:
/kind bug
The text was updated successfully, but these errors were encountered: