Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EKS Fargate Matrics-server fails to scrape itself #1422

Open
Paddy-CH opened this issue Feb 16, 2024 · 9 comments
Open

EKS Fargate Matrics-server fails to scrape itself #1422

Paddy-CH opened this issue Feb 16, 2024 · 9 comments
Assignees
Labels
kind/support Categorizes issue or PR as a support question. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@Paddy-CH
Copy link

What happened:
Logs from the matrics-server pod show this repeatedly
E0216 11:45:59.265624 1 scraper.go:149] "Failed to scrape node" err="Get "https://10.6.194.69:10250/metrics/resource\": dial tcp 10.6.194.69:10250: connect: connection refused" node="fargate-ip-10-6-194-69.eu-west-2.compute.internal"

What you expected to happen:
To be able to scrape itself.

Anything else we need to know?:
The secure port and container port are set to 4443. If I change it to 10250 as the call requires the error changes to 'forbidden'. I also get 'error: Metrics API not available' from kubectl when I try to access it.

Environment:

  • Kubernetes distribution EKS Fargate

  • Kubernetes version 1.29

  • Metrics Server manifest

spoiler for Metrics Server manifest:

apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-view: "true"
name: system:aggregated-metrics-reader
rules:

  • apiGroups:
    • metrics.k8s.io
      resources:
    • pods
    • nodes
      verbs:
    • get
    • list
    • watch

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
rules:

  • apiGroups:
    • ""
      resources:
    • nodes/metrics
      verbs:
    • get
  • apiGroups:
    • ""
      resources:
    • pods
    • nodes
      verbs:
    • get
    • list
    • watch

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server-auth-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:

  • kind: ServiceAccount
    name: metrics-server
    namespace: kube-system

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:

  • kind: ServiceAccount
    name: metrics-server
    namespace: kube-system

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:metrics-server
subjects:

  • kind: ServiceAccount
    name: metrics-server
    namespace: kube-system

apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
ports:

  • name: https
    port: 443
    protocol: TCP
    targetPort: https
    selector:
    k8s-app: metrics-server

apiVersion: apps/v1
kind: Deployment
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: metrics-server
strategy:
rollingUpdate:
maxUnavailable: 0
template:
metadata:
labels:
k8s-app: metrics-server
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
command:
- /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
image: registry.k8s.io/metrics-server/metrics-server:v0.7.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /livez
port: https
scheme: HTTPS
periodSeconds: 10
name: metrics-server
ports:
- containerPort: 4443
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /readyz
port: https
scheme: HTTPS
initialDelaySeconds: 20
periodSeconds: 10
resources:
requests:
cpu: 100m
memory: 200Mi
limits:
cpu: 100m
memory: 200Mi
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- mountPath: /tmp
name: tmp-dir
nodeSelector:
kubernetes.io/os: linux
priorityClassName: system-cluster-critical
serviceAccountName: metrics-server
volumes:
- emptyDir: {}
name: tmp-dir

apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
labels:
k8s-app: metrics-server
name: v1beta1.metrics.k8s.io
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: metrics-server
namespace: kube-system
version: v1beta1
versionPriority: 100

  • Kubelet config:
spoiler for Kubelet config:
  • Metrics server logs:
spoiler for Metrics Server logs:
  • Status of Metrics API:
spolier for Status of Metrics API:
kubectl describe apiservice v1beta1.metrics.k8s.io

/kind bug

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 16, 2024
@yangjunmyfm192085
Copy link
Contributor

"Failed to scrape node" err="Get "[https://10.6.194.69:10250/metrics/resource\](https://10.6.194.69:10250/metrics/resource%5C)":
This error represents an exception when metrics-server accesses the metrics/resource endpoint of kubelet.
Please check whether the firewall blocks access to the kubelet 10250 port, or is the kubelet listening port not 10250?

@Paddy-CH
Copy link
Author

Hi,
Initially I had it set to 4443. When I saw the error I changed it to 10250. When I did that the error changed to a 'forbidden' error when trying to scrape itself, also when I tried kubectl I got 'Metrics API not available'

@yangjunmyfm192085
Copy link
Contributor

Could you use the command kubectl get node fargate-ip-10-6-194-69.eu-west-2.compute.internal -oyaml to check the value of kubeletEndpoint?

@Paddy-CH
Copy link
Author

It returns
daemonEndpoints:
kubeletEndpoint:
Port: 10250

@yangjunmyfm192085
Copy link
Contributor

Hi, @Paddy-CH, It looks like the EKS environment, metrics-server cannot access the kubelet's 10250 port normally. This should not be a issue with metrics-server. Please also check the security policy of the environment?

@yangjunmyfm192085
Copy link
Contributor

/kind support

@k8s-ci-robot k8s-ci-robot added the kind/support Categorizes issue or PR as a support question. label Feb 20, 2024
@yangjunmyfm192085
Copy link
Contributor

/remove-kind bug

@k8s-ci-robot k8s-ci-robot removed the kind/bug Categorizes issue or PR as related to a bug. label Feb 20, 2024
@dashpole
Copy link

/assign @yangjunmyfm192085
/triage accepted

@honarkhah
Copy link

Related to aws/containers-roadmap#1798

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

5 participants