Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] kyverno initContainer fail to start when external.metrics.k8s.io/v1beta1 return empty array for resources #1490

Closed
nathanwang-comp opened this issue Jan 22, 2021 · 35 comments · Fixed by #1494
Assignees
Labels
bug Something isn't working

Comments

@nathanwang-comp
Copy link

Software version numbers
State the version numbers of applications involved in the bug.

  • Kubernetes version: 1.16
  • Kyverno version: v1.3.0 and v1.3.1

Describe the bug
A clear and concise description of what the bug is.
Kyverno init container fail to start with external metrics enable

To Reproduce
Steps to reproduce the behavior:

  1. install prometheus external metrics
  2. install kyverno v1.3.0 or v1.3.1
  3. kyverno pod status is Init:CrashLoopBackOff
  4. See error
    ``I0121 23:21:03.940127 1 main.go:140] "msg"="Using in-cluster configuration"
    E0121 23:21:03.967938 1 memcache.go:206] couldn't get resource list for external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1
    E0121 23:21:03.968323 1 client.go:337] dclient "msg"="failed to get registered preferred resources" "error"="unable to retrieve the complete list of server APIs: external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1"
    I0121 23:21:03.968343 1 client.go:286] dclient "msg"="schema not found" "kind"="ValidatingWebhookConfiguration"
    E0121 23:21:03.968727 1 client.go:337] dclient "msg"="failed to get registered preferred resources" "error"="unable to retrieve the complete list of server APIs: external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1"
    I0121 23:21:03.968755 1 client.go:286] dclient "msg"="schema not found" "kind"="ValidatingWebhookConfiguration"
    E0121 23:21:03.970418 1 client.go:337] dclient "msg"="failed to get registered preferred resources" "error"="unable to retrieve the complete list of server APIs: external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1"
    I0121 23:21:03.970443 1 client.go:286] dclient "msg"="schema not found" "kind"="MutatingWebhookConfiguration"
    E0121 23:21:03.970578 1 client.go:337] dclient "msg"="failed to get registered preferred resources" "error"="unable to retrieve the complete list of server APIs: external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1"
    I0121 23:21:03.970592 1 client.go:286] dclient "msg"="schema not found" "kind"="MutatingWebhookConfiguration"
    E0121 23:21:03.971480 1 client.go:337] dclient "msg"="failed to get registered preferred resources" "error"="unable to retrieve the complete list of server APIs: external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1"
    I0121 23:21:03.971498 1 client.go:286] dclient "msg"="schema not found" "kind"="ValidatingWebhookConfiguration"
    E0121 23:21:03.971788 1 client.go:337] dclient "msg"="failed to get registered preferred resources" "error"="unable to retrieve the complete list of server APIs: external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1"
    I0121 23:21:03.971799 1 client.go:286] dclient "msg"="schema not found" "kind"="ValidatingWebhookConfiguration"
    E0121 23:21:03.972535 1 client.go:337] dclient "msg"="failed to get registered preferred resources" "error"="unable to retrieve the complete list of server APIs: external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1"
    I0121 23:21:03.972554 1 client.go:286] dclient "msg"="schema not found" "kind"="MutatingWebhookConfiguration"
    E0121 23:21:03.973374 1 client.go:337] dclient "msg"="failed to get registered preferred resources" "error"="unable to retrieve the complete list of server APIs: external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1"
    I0121 23:21:03.973395 1 client.go:286] dclient "msg"="schema not found" "kind"="MutatingWebhookConfiguration"
    E0121 23:21:03.973791 1 client.go:337] dclient "msg"="failed to get registered preferred resources" "error"="unable to retrieve the complete list of server APIs: external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1"
    I0121 23:21:03.973804 1 client.go:286] dclient "msg"="schema not found" "kind"="Namespace"
    E0121 23:21:03.974495 1 client.go:337] dclient "msg"="failed to get registered preferred resources" "error"="unable to retrieve the complete list of server APIs: external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1"
    I0121 23:21:03.974519 1 client.go:286] dclient "msg"="schema not found" "kind"="ClusterPolicyReport"
    E0121 23:21:03.974961 1 client.go:337] dclient "msg"="failed to get registered preferred resources" "error"="unable to retrieve the complete list of server APIs: external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1"
    I0121 23:21:03.974996 1 client.go:286] dclient "msg"="schema not found" "kind"="ReportChangeRequest"
    E0121 23:21:03.975570 1 client.go:337] dclient "msg"="failed to get registered preferred resources" "error"="unable to retrieve the complete list of server APIs: external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1"
    I0121 23:21:03.975586 1 client.go:286] dclient "msg"="schema not found" "kind"="ClusterReportChangeRequest"
    panic: runtime error: invalid memory address or nil pointer dereference
    [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x115b350]

goroutine 25 [running]:
main.removeReportChangeRequest(0xc00007f220, 0x13bfe86, 0x13, 0xc000082901, 0xc000397ef0)
/home/runner/work/kyverno/kyverno/cmd/initContainer/main.go:312 +0x110
main.executeRequest(0xc00007f220, 0x13bfe86, 0x13, 0x13b48e8, 0x0, 0x0, 0x0)
/home/runner/work/kyverno/kyverno/cmd/initContainer/main.go:128 +0x179
main.process.func1(0xc0000829c0, 0xc000082960, 0xc00007f220, 0xc000082900, 0x158b720, 0xc000325ec0, 0xc0000825a0)
/home/runner/work/kyverno/kyverno/cmd/initContainer/main.go:188 +0xfb
created by main.process
/home/runner/work/kyverno/kyverno/cmd/initContainer/main.go:184 +0xd7
5. kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "external.metrics.k8s.io/v1beta1",
"resources": []
}`
6. delete external metrics api, redeploy kyverno, it starts successful.

Expected behavior
A clear and concise description of what you expected to happen.
external metrics return empty resource which is correct when there is no metrics. Kyverno shouldn't fail with empty resource.

@nathanwang-comp nathanwang-comp added the bug Something isn't working label Jan 22, 2021
realshuting added a commit to realshuting/kyverno that referenced this issue Jan 22, 2021
Signed-off-by: Shuting Zhao <shutting06@gmail.com>
@realshuting realshuting self-assigned this Jan 22, 2021
@realshuting realshuting added this to the Kyverno Release 1.3.2 milestone Jan 22, 2021
@realshuting
Copy link
Member

realshuting commented Jan 22, 2021

@nathanwang-comp I've made a fix for the crash issue in this PR, can you test with tag v1.3.1-7-g62a4a3a7 for both init and kyverno containers?

I found a similar issue for such an error, can you please check if the workaround solves the problem?

@nathanwang-comp
Copy link
Author

nathanwang-comp commented Jan 22, 2021

@realshuting I will test it tomorrow, and let you know if the fix work. the workaround doesn't work for me, the external metrics api has backend service, just doesn't have metrics, I can't delete it.

@nathanwang-comp
Copy link
Author

nathanwang-comp commented Jan 22, 2021

@realshuting I just did test the pod is in CrashLoopBackOff
I0122 03:28:31.135110 1 version.go:17] "msg"="Kyverno" "Version"="v1.3.1-7-g62a4a3a7"
I0122 03:28:31.135147 1 version.go:18] "msg"="Kyverno" "BuildHash"="main/62a4a3a7da84ea9040da9f38557e039982501e47"
I0122 03:28:31.135158 1 version.go:19] "msg"="Kyverno" "BuildTime"="2021-01-22_02:59:17AM"
I0122 03:28:31.135243 1 config.go:92] CreateClientConfig "msg"="Using in-cluster configuration"
I0122 03:28:31.138247 1 reflector.go:175] Starting reflector *unstructured.Unstructured (0s) from pkg/mod/k8s.io/client-go@v0.18.12/tools/cache/reflector.go:125
E0122 03:28:31.169166 1 memcache.go:206] couldn't get resource list for external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1
E0122 03:28:31.169538 1 client.go:337] dclient "msg"="failed to get registered preferred resources" "error"="unable to retrieve the complete list of server APIs: external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1"
I0122 03:28:31.169560 1 client.go:286] dclient "msg"="schema not found" "kind"="ClusterPolicy"
E0122 03:28:31.169583 1 util.go:71] CRDInstalled "msg"="failed to check CRD status" "error"="unable to retrieve the complete list of server APIs: external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1" "kind"="ClusterPolicy"
E0122 03:28:31.169615 1 main.go:127] setup "msg"="Failed to access Kyverno CRDs" "error"="CRDs not installed"

@realshuting
Copy link
Member

Have you tried this workaround I mentioned earlier - helm/helm#6361 (comment)?

@nathanwang-comp
Copy link
Author

the workaround doesn't fit my problem, we can't delete the api, since we have the external metrics adapter.

@nathanwang-comp
Copy link
Author

@realshuting we have install an external metrics service, when there is no HPA defined, the resource return from external metrics API is empty array which is correct behavior. we can't delete the external metrics API, since it will be used for our HPA.

@nathanwang-comp
Copy link
Author

same as this issue #1324

@nathanwang-comp
Copy link
Author

@realshuting any workaround or patch I can try? Thanks!

@nathanwang-comp
Copy link
Author

@realshuting Do you have container tag I can test for your fix?

@realshuting
Copy link
Member

Sorry it was closed by mistake, @JimBugwadia is working on the fix, will update you once we have the fix.

@realshuting realshuting reopened this Jan 23, 2021
@nathanwang-comp
Copy link
Author

Thanks!

@realshuting
Copy link
Member

@nathanwang-comp Here's the image tag v1.3.1-9-g05da4190 for your testing.

@nathanwang-comp
Copy link
Author

@realshuting Just test the image, it works. can you create a rc tag? Thanks!

realshuting added a commit that referenced this issue Jan 30, 2021
…ller) (#1500)

* skip sending API request for filtered resource

* fix PR comment

Signed-off-by: Shuting Zhao <shutting06@gmail.com>

* fixes #1490

Signed-off-by: Shuting Zhao <shutting06@gmail.com>

* fix bug - namespace is not returned properly

Signed-off-by: Shuting Zhao <shutting06@gmail.com>

* reduce throttling - list resource using lister

* refactor resource cache

* fix test

Signed-off-by: Shuting Zhao <shutting06@gmail.com>

* fix label selector

Signed-off-by: Shuting Zhao <shutting06@gmail.com>

* fix build failure

Signed-off-by: Shuting Zhao <shutting06@gmail.com>
JimBugwadia pushed a commit that referenced this issue Feb 1, 2021
…ller) (#1500)

* skip sending API request for filtered resource

* fix PR comment

Signed-off-by: Shuting Zhao <shutting06@gmail.com>

* fixes #1490

Signed-off-by: Shuting Zhao <shutting06@gmail.com>

* fix bug - namespace is not returned properly

Signed-off-by: Shuting Zhao <shutting06@gmail.com>

* reduce throttling - list resource using lister

* refactor resource cache

* fix test

Signed-off-by: Shuting Zhao <shutting06@gmail.com>

* fix label selector

Signed-off-by: Shuting Zhao <shutting06@gmail.com>

* fix build failure

Signed-off-by: Shuting Zhao <shutting06@gmail.com>
@Amr-Aly
Copy link

Amr-Aly commented Nov 21, 2021

Having the same issue, any ETA on this?

@JimBugwadia
Copy link
Member

JimBugwadia commented Nov 21, 2021

I installed prometheus-adapter and Kyverno 1.5.2-rc1 and the pod security policies:

kubectl apply -f https://raw.githubusercontent.com/kyverno/kyverno/release-1.5/definitions/install.yaml
kustomize build https://github.com/kyverno/policies/pod-security | kubectl apply -f -

I then tried to run a insecure pod, and its blocked as expected:

λ kubectl run nginx --image=nginx
Error from server: admission webhook "validate.kyverno.svc-fail" denied the request:

resource Pod/default/nginx was blocked due to the following policies

require-run-as-non-root:
  check-containers: 'validation error: Running as root is not allowed. The fields
    spec.securityContext.runAsNonRoot, spec.containers[*].securityContext.runAsNonRoot,
    and spec.initContainers[*].securityContext.runAsNonRoot must be `true`. Rule check-containers[0]
    failed at path /spec/securityContext/runAsNonRoot/. Rule check-containers[1] failed
    at path /spec/containers/0/securityContext/.'

In my logs, I do see the errors on not being able to fetch API server metadata:

E1121 22:26:30.483584       1 crdSync.go:68]  "msg"="failed to update in-cluster api versions" "error"="unable to fetch apiResourceLists: unable to retrieve the complete list of server APIs: custom.metrics.k8s.io/v1beta1: Got empty response for: custom.metrics.k8s.io/v1beta1"
E1121 22:26:30.713210       1 crdSync.go:107]  "msg"="sync failed, unable to update in-cluster api versions" "error"="unable to fetch apiResourceLists: unable to retrieve the complete list of server APIs: custom.metrics.k8s.io/v1beta1: Got empty response for: custom.metrics.k8s.io/v1beta1"
E1121 22:26:30.829755       1 crdSync.go:107]  "msg"="sync failed, unable to update in-cluster api versions" "error"="unable to fetch apiResourceLists: unable to retrieve the complete list of server APIs: custom.metrics.k8s.io/v1beta1: Got empty response for: custom.metrics.k8s.io/v1beta1"

However, I am not seeing any other issues as Kyverno is able to run and apply policies. What am I missing?

@roeelandesman
Copy link

roeelandesman commented Dec 2, 2021

I'd like to +1 this in hopes of (admittedly, selfishly) pushing the fix out asap as I'm seeing the same errors.

@JimBugwadia
Copy link
Member

@roeelandesman - to clarify, are you seeing Kyverno not starting or operating properly?

Or, is Kyverno functional but there are errors shown in the logs as noted here?

@roeelandesman
Copy link

roeelandesman commented Dec 2, 2021

The Kyverno pod logs (k get pods -n kyverno) show the same error relating to the external.metrics.k8s.io/v1beta1 API

E1201 17:23:16.185106       1 crdSync.go:68]  "msg"="failed to update in-cluster api versions" "error"="unable to fetch apiResourceLists: unable to retrieve the complete list of server APIs: external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1"

In trying to apply a sample policy to the cluster I see an identical error:

Error from server: error when creating "scripts/kyverno-sample.yaml": admission webhook "validate-policy.kyverno.svc" denied the request: unable to retrieve the complete list of server APIs: external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1

@JimBugwadia
Copy link
Member

@roeelandesman - thanks! Does the policy work as expected?

@roeelandesman
Copy link

Nope, I think the admission webhook blocks it from registration in the cluster.

If I run k get ClusterPolicy -A nothing comes up

@JimBugwadia
Copy link
Member

Interesting! That is different from what I am seeing.

Can you please share the policy that cannot be configured? Also, can you try setting policy.spec.schemaValidation to false.

@roeelandesman
Copy link

I grabbed the YAML directly from a sample policy here: https://kyverno.io/policies/other/limit_containers_per_pod/limit_containers_per_pod/

Setting schemaValidation to false outputs the same log error :/

@JimBugwadia
Copy link
Member

Setting schemaValidation to false outputs the same log error :/

Does the policy get created with schemaValidation: false?

@roeelandesman
Copy link

I don't believe so

❯ k get clusterpolicies.kyverno.io -A
No resources found

@JimBugwadia
Copy link
Member

Thanks @roeelandesman! I tried again in 1.5.1 and was able to reproduce what you are seeing.

This was fixed in 1.6.x (main) via: #2634.

We will merge this fix for 1.5.2.

@JimBugwadia JimBugwadia assigned vyankyGH and unassigned realshuting Dec 2, 2021
@JimBugwadia
Copy link
Member

@vyankd - can you please create a PR for release-1.5 via a cherry-pick or merge of changes in #2634?

@roeelandesman
Copy link

Thanks @JimBugwadia, much appreciated!

@realshuting
Copy link
Member

Closed via #2784.

@roeelandesman
Copy link

Can confirm that release v2.1.4-rc3 did fix these issues for me and I was able to install policies to the cluster again. Thank you all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants