Upgrade to 1.6.1 prevents applying of Serving and Eventing CRs using server-side apply #1184

tshak · 2022-08-17T17:54:45Z

After upgrading to Operator 1.6.1, some Serving and Eventing CRs will no longer apply using server-side apply. This appears to happen on clusters that at one point had pre v1beta1 CRs.

Repro Steps

Install Operator 1.2.2
Apply a Serving or Eventing CR using v1alpha1 using a server-side apply (e.g. kubectl apply --server-side=true)
Upgrade to Operator version 1.5.3 (note that this repros upgrading with 1.3.2 and 1.4.1 as intermediate steps as well)
Update CRs to use v1beta1 using server-side apply
Upgrade to Operator version 1.6.1
Perform a server-side apply on any CR (no changes necessary)

Expected

Server-side apply succeeds.

Actual

Server-side apply fails with the following error:

Error from server: request to convert CR to an invalid group/version: operator.knative.dev/v1alpha1

Additional Information

This was reproduced on live clusters have have used the Operator since at least release v1.0. This appears to be due to a lingering metadata.managedFields entry that references operator.knative.dev/v1alpha1. You can find a script that easily reproduces this issue in this repo.

The text was updated successfully, but these errors were encountered:

houshengbo · 2022-08-17T20:22:36Z

tshak/knoperator-v1beta1-ssa-repro#1 This is the change i requested: upgrade operator incrementally one minor version by one minor version

Give more time like 30s between each installation and before applying the v1beta1 CR.
@tshak

I tested it with docker-desktop, and it worked.

Regarding the managedFields, I am not sure whether it can resolve your issue, by waiting for longer time before applying the v1beta1 CR.

tshak · 2022-08-17T21:03:38Z

I do not think that it's a timing issue as our automation in our live cluster retries applying the CR and was failing many hours after the Operator upgrade to v1.6.1. For the synthetic repro I ran your PR and I'm still seeing the issue in a brand new Kind cluster.

houshengbo · 2022-08-18T17:19:58Z

@evankanderson @dprotaso

Here is what we can possibly do with the migrateresource function in knative/pkg. Now we are doing

_, err := client.Namespace(item.GetNamespace()).
			Patch(ctx, item.GetName(), types.MergePatchType, []byte("{}"), metav1.PatchOptions{})

to migrate the existing CRs from old to new version.
Can we change it into:

_, err := client.Namespace(item.GetNamespace()).
			Patch(ctx, item.GetName(), types.MergePatchType, []byte(`{"metadata":{"managedFields": [{}]}}`), metav1.PatchOptions{})

to clear the managedField in order to solve this issue as described for managedField with old version, blocking the installation of the CR at the new version?

tshak · 2022-08-18T17:38:47Z

This should solve the issue. I'm not sure if there are any unintended consequences of deleting all managedField entries. The best solution is probably to remove only operator.knative.dev/v1alpha1. I'm not very opinionated either way though.

dprotaso · 2022-08-18T17:49:54Z

My guess is the post-install job isn't being run after installing 1.5.3

https://github.com/knative/operator/blob/knative-v1.5.3/config/post-install/storage-version-migrator.yaml

This should migrate the CRD to a new storage version

houshengbo · 2022-08-18T19:54:52Z

@dprotaso I actually moved the implementation into the operator controller itself in 1.5.3.
So we do not need to run the migration job, the operator will do it automatically, when it is launched. It is equivalent to run the post-install job.
What I am proposing here is to "clear the managedField" or remove only operator.knative.dev/v1alpha1 as @tshak said.

tshak · 2022-08-18T20:00:56Z

It is correct that we did not run the post-install job. My contention is that this goes against the biggest value proposition of the Operator pattern: to automate resource lifecycle issues such as this one.

dprotaso · 2022-08-18T21:38:39Z

This seems like a k8s bug - let me see if I can repro with a trivial example

tshak · 2022-08-19T06:50:30Z

I agree that it's probably a k8s bug. If a managedField entry has an invalid apiVersion it should not prevent a server-side apply. There may be another bug around API conversions missing the managedField entry in the first place. Regardless, I still think that the Operator should still perform either of the workarounds proposed by @houshengbo.

houshengbo · 2022-08-19T09:33:16Z

@tshak Another way without touching knative/pkg could be only accessing serving and eventing CRs in knative operator and changing the managedField for both of them, if they exist.

dprotaso · 2022-08-19T16:08:57Z

Created the upstream issue based on this slack discussion

kubernetes/kubernetes#111937

github-actions · 2022-11-18T01:34:09Z

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

knative-prow-robot · 2022-12-18T01:52:26Z

This issue or pull request is stale because it has been open for 90 days with no activity.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close

/lifecycle stale

github-actions · 2023-03-21T01:31:35Z

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

dprotaso · 2023-04-20T08:18:49Z

/reopen
/lifecycle frozen

The issue isn't resolved upstream

knative-prow · 2023-04-20T08:18:52Z

@dprotaso: Reopened this issue.

In response to this:

/reopen
/lifecycle frozen

The issue isn't resolved upstream

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tshak added the kind/bug Categorizes issue or PR as related to a bug. label Aug 17, 2022

dprotaso mentioned this issue Aug 19, 2022

managedField entry for a removed API version prevent further resource updates kubernetes/kubernetes#111937

Open

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 18, 2022

github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 19, 2022

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 21, 2023

github-actions bot closed this as completed Apr 20, 2023

knative-prow bot reopened this Apr 20, 2023

knative-prow bot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade to 1.6.1 prevents applying of Serving and Eventing CRs using server-side apply #1184

Upgrade to 1.6.1 prevents applying of Serving and Eventing CRs using server-side apply #1184

tshak commented Aug 17, 2022

houshengbo commented Aug 17, 2022 •

edited

tshak commented Aug 17, 2022

houshengbo commented Aug 18, 2022

tshak commented Aug 18, 2022 •

edited

dprotaso commented Aug 18, 2022

houshengbo commented Aug 18, 2022

tshak commented Aug 18, 2022

dprotaso commented Aug 18, 2022 •

edited

tshak commented Aug 19, 2022 •

edited

houshengbo commented Aug 19, 2022

dprotaso commented Aug 19, 2022

github-actions bot commented Nov 18, 2022

knative-prow-robot commented Dec 18, 2022

github-actions bot commented Mar 21, 2023

dprotaso commented Apr 20, 2023

knative-prow bot commented Apr 20, 2023

Upgrade to 1.6.1 prevents applying of Serving and Eventing CRs using server-side apply #1184

Upgrade to 1.6.1 prevents applying of Serving and Eventing CRs using server-side apply #1184

Comments

tshak commented Aug 17, 2022

Repro Steps

Expected

Actual

Additional Information

houshengbo commented Aug 17, 2022 • edited

tshak commented Aug 17, 2022

houshengbo commented Aug 18, 2022

tshak commented Aug 18, 2022 • edited

dprotaso commented Aug 18, 2022

houshengbo commented Aug 18, 2022

tshak commented Aug 18, 2022

dprotaso commented Aug 18, 2022 • edited

tshak commented Aug 19, 2022 • edited

houshengbo commented Aug 19, 2022

dprotaso commented Aug 19, 2022

github-actions bot commented Nov 18, 2022

knative-prow-robot commented Dec 18, 2022

github-actions bot commented Mar 21, 2023

dprotaso commented Apr 20, 2023

knative-prow bot commented Apr 20, 2023

houshengbo commented Aug 17, 2022 •

edited

tshak commented Aug 18, 2022 •

edited

dprotaso commented Aug 18, 2022 •

edited

tshak commented Aug 19, 2022 •

edited