Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subscription is stuck when installing same operator multiple times into different namespaces at different dates #3210

Open
pgodowski opened this issue Apr 24, 2024 · 0 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@pgodowski
Copy link

Bug Report

This one is really odd, and might be somewhat related to the way how bundle unpack Job names are generated based on the hash value of the bundle (and namespaces?).

When I installed one operator into specific namespace and then try to attempt in another day (actually, 20 days later) to install same operator but into different namespace, then the 2nd Subscription hung and never is reconciled.

What did you do?

  • At Apr 4th installed operand-deployment-lifecycle-manager.v4.0.0 into namespace cp30test. All good there

  • At Apr 24th attempted to install the same package (same catalogsource, same channel, same packagename) but into namespace cp46test

    • different namespace, but very similar name
  • Subscription in cp46test is hung - i.e. it is never reconciled fully - except of the status field updated that all the catalog sources are healty

  • in namespace openshift-operator-lifecycle-manager in the catalog-operator-8586f5974d-khh7g Pod there are erorr messages like:

    • E0424 20:12:47.382632 1 queueinformer_operator.go:319] sync "cp46test" failed: bundle unpacking failed with an error: jobs.batch "8d67f73b77c43214c1f31adf025bfc258a4b6d671a34f339926a897eb6d45c6" already exists
  • Indeed, a Job named 8d67f73b77c43214c1f31adf025bfc258a4b6d671a34f339926a897eb6d45c6 exists in openshift-marketplace namespace

    • image
    • BUT, the Job was created April 4th and it was completed Apr 4th (and today is Apr 24th):
      • image
  • So, seems that OLM fails to install 2nd instance of same operator, if there is some hash function collision of the bundle unpack Job names

Some more screenshots:
image
image

Attaching the relevant YAML resources and OCP must-gather generated (but has 176MB, above size limit, can submit requested subset of files, or you can ping me to get access to the whole package):

What did you expect to see?

I would like to see 2nd installation of the operator in separate namespace working just fine

What did you see instead? Under which circumstances?

As above, install hung

Environment

  • OCP 4.14.20
  • clusterId: 98256eec-eacc-43a0-bf1c-00be2aa0aa85
  • Kubernetes version information:

  • Kubernetes cluster kind: OCP

  • k8s version: v1.27.11+ec42b99

Possible Solution

Mitigation is to manually remove the Job which completed Apr 4th and then installation will proceed.

Additional context

N/A

@pgodowski pgodowski added the kind/bug Categorizes issue or PR as related to a bug. label Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

1 participant