Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed Operator will leave Serving installation in a partial state. #1756

Open
mmisztal1980 opened this issue Mar 23, 2024 · 1 comment
Open
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@mmisztal1980
Copy link

In what area(s)?

/area autoscale

What version of Knative?

0.11.x

kn version
Version:      v1.11.0
Build Date:   2023-07-27 07:42:56
Git Revision: b7508e67
Supported APIs:
* Serving
  - serving.knative.dev/v1 (knative-serving v1.11.0)
* Eventing
  - sources.knative.dev/v1 (knative-eventing v1.11.0)
  - eventing.knative.dev/v1 (knative-eventing v1.11.0)

Expected Behavior

Using kn service create 'hello-example' --image ghcr.io/knative/helloworld-go:latest --env TARGET="First" I'm expecting to deploy a hello-wolrd example to start playing with the knative.

Actual Behavior

kn service create 'hello-example' --image ghcr.io/knative/helloworld-go:latest --env TARGET="First"
Creating service 'hello-example' in namespace 'default':

  0.072s The Route is still working to reflect the latest desired specification.
  0.072s Configuration "hello-example" is waiting for a Revision to become ready.
  0.072s ...
  1.153s Revision "hello-example-00001" failed with message: Failed to create new replica set "hello-example-00001-deployment-7b56748d46": Unauthorized.
  1.166s Configuration "hello-example" does not have any ready Revision.
  1.176s ...
  1.179s Configuration "hello-example" is waiting for a Revision to become read

The process starts but doesn't complete. The pod is successfully scheduled in the default namespace and is ready, however the kn service is not

k get pods
NAME                                              READY   STATUS    RESTARTS   AGE
hello-example-00001-deployment-7b56748d46-mt5kk   2/2     Running   0          31s

Steps to Reproduce the Problem

  • On MacOS install docker-desktop and enable the k8s docker-desktop cluster
    • Engine 25.03, k8s v1.29.1
  • Install operator-sdk using instructions found here
  • Install OLM using operator-sdk olm install
  • Install an operator using kubectl create -f https://operatorhub.io/install/knative-operator.yaml or apply this manifest:
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: knative-operator
  namespace: operators
spec:
  channel: stable
  name: knative-operator
  source: operatorhubio-catalog
  sourceNamespace: olm
  • Apply the k8s manifests to enable serving:
apiVersion: v1
kind: Namespace
metadata:
  name: knative-serving
---
apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
  name: knative-serving
  namespace: knative-serving
  • Run kn service create 'hello-example' --image ghcr.io/knative/helloworld-go:latest --env TARGET="First"

Any addiitional details and investigation so far can be found on CNCF slack here

@mmisztal1980 mmisztal1980 added the kind/bug Categorizes issue or PR as related to a bug. label Mar 23, 2024
@dprotaso dprotaso transferred this issue from knative/serving Mar 23, 2024
@dprotaso dprotaso changed the title kn service create doesn't complete Failed Operator will leave Serving installation in a partial state. Mar 23, 2024
@dprotaso
Copy link
Member

Following up here it looks like the default installation expects Istio and when it is not installed the operator will fail with Ready=False saying the Istio resources are not present.

This halts the installation of other manifests and leaves serving in a weird state. eg. in the above example the mutating & validating webhooks are not installed. This allowed the user to create a Knative Service and it reconciled all then when it created the PodAutoscaler it didn't default a annotation required to select which autoscaler to use.

Ideally it would be good to try to apply all the resources in the manifest and then report all errors the operator installation encounters.

But since the operator did report the failure I think we could just simply document checking the installation in the docs.

I'll leave this issue open for @houshengbo close out and make a docs issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants