Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ODS pipeline fails on labeling because of non-related resources #949

Open
SHoen opened this issue Oct 12, 2022 · 10 comments
Open

ODS pipeline fails on labeling because of non-related resources #949

SHoen opened this issue Oct 12, 2022 · 10 comments
Labels
bug Something isn't working

Comments

@SHoen
Copy link
Contributor

SHoen commented Oct 12, 2022

Describe the bug
The following command for labeling the components fails because non-ods related resources exist in the namespace.

Command
oc label --overwrite all -l app=myproject-my-app -n myproject-dev app.kubernetes.io/name=my-app app.kubernetes.io/instance- app.kubernetes.io/part-of- app.kubernetes.io/managed-by=tailor app.openshift.io/runtime-version- helm.sh/chart- app.opendevstack.org/project=myproject

Error
Error from server (Forbidden): nopermission.some.resource.com is forbidden: User "system:serviceaccount:myproject-cd:jenkins" cannot list resource "nopermission" in API group "some.resource.com" in the namespace "myproject-dev"

To Reproduce
Steps to reproduce the behavior:

  1. Provision a ods-project and a ods-component
  2. Make sure that the deployment with ODS is working and deploy to the -dev namespace
  3. Deploy resources in your project-dev namespace where service account of Jenkins has no permission to edit / or read.
  4. Try to deploy your ods-component again.

Expected behavior
ODS would only try to label ODS related components and would not fail if there exist some resources where jenkins doesn't have permission.

Affected version (please complete the following information):

  • OpenShift: 4
  • OpenDevStack 4.x
@SHoen SHoen added the bug Something isn't working label Oct 12, 2022
@clemensutschig
Copy link
Member

this is a must fix ... otherwise we will not be able to deploy anything outside the scope of an ods component (e.g. operator) @michaelsauter @jafarre-bi @metmajer

@michaelsauter
Copy link
Member

otherwise we will not be able to deploy anything outside the scope of an ods component (e.g. operator)

Meaning you do not want to label the resources of the operator? Or even that the operator resources themselves should somehow be deployed by other means, outside ODS?

I'd think that if an operator is installed and provides a CRD, you'd want to deploy that CRD somehow via ODS. For that, the deploying serviceaccount needs permissions. If you have the permissions, you should also be able to list the resources, and, if it has an app=myproject label, apply all the other labels to the resource. @SHoen what is the exact situation where this error occurs? Is there some operator resource that you do not want to roll out with ODS?

Still, the all in the labelling command is tricky, and that feels related to @serverhorror's comment in the Helm PR on labelling, see #916 (comment).

@SHoen
Copy link
Contributor Author

SHoen commented Oct 12, 2022

@michaelsauter Thank you for your comment.
Yes, an operator installed some CRDs, which are not related to ODS a ods project and we don't want to deploy them via ODS.
Unfortunately this CRDs blocked ODS on the cluster because jenkins didn't have the permission to "list"? some of these components and failed on the labeling command outlined in the command section of my bug report.

@michaelsauter
Copy link
Member

Yes, an operator installed some CRDs, which are not related to ODS a ods project and we don't want to deploy them via ODS.

I think that somehow goes against the implicit assumption in ODS that it manages everything that is deployed. How does this play together with the docs generated by the release manager? Do the resources deployed by the operator not show up at all? Is that intended?

If it is intended, I think the smallest change possible is listing explicitly which resources to apply labels to.

@SHoen
Copy link
Contributor Author

SHoen commented Oct 13, 2022

Indeed this is a very good question. Think of some resources that might be already qualified in a different way. I mean including them could be a new feature but for now the issue is, that the pipeline was failing on them.

@serverhorror
Copy link
Contributor

TL;DR: You're right. I think the issue is that we label things from the client side. I don't know of a good way to resolve this reliably.

Thoughts/Discussion Points below -- just as starting point for my perspective:

@michaelsauter

Still, the all in the labelling command is tricky, and that feels related to @serverhorror's comment in the Helm PR on labelling, see #916 (comment).

Absolutely agree. Labeling from the client side is always going to be a problem. By client side I mean: As long as we use oc commands in any pipeline and rely on that there is no good and reliable way to even assume that things are managed by the framework.

It's akin' to having some API and a JS driven frontend where the only input validation is happening on the JS side and nothing is verified on the server side.

goes against the implicit assumption in ODS that it manages everything that is deployed

While this assumption exists, and I am in favor of it being actually true, that's all it is: An assumption.

People are using oc directly and ODS provides no way to have these resources under control.

If it is intended, I think the smallest change possible is listing explicitly which resources to apply labels to.

Will that solve our problem thou?

We will change less things

@SHoen

Unfortunately this CRDs blocked ODS on the cluster because jenkins didn't have the permission to "list"? some of these

I think that's kind of expected. Currently the only point that "manages" what is in the cluster, from an ODS perspective, is Jenkins. It's all client side and we have no server side verification that something is compliant.


In General

As long as we overwrite well-known labels we will stay in the trouble zone. Those labels are well-known, meaning that most authors of Kubernetes resources know them and will -- nay are supposed to -- make use of them.

Us changing these labels requires that we also MUST change every other occurrence of that string to match up and fullfill any kind of contract that expects the label values to match up.

As soon as the labeling happens

oc label --overwrite all -l app=myproject-my-app \
  -n myproject-dev \
  app.kubernetes.io/name=my-app \
  app.kubernetes.io/instance- \
  app.kubernetes.io/part-of- \
  app.kubernetes.io/managed-by=tailor \
  app.openshift.io/runtime-version- \
  helm.sh/chart- \
  app.opendevstack.org/project=myproject

I expect these problems to happen in case of any kind of helm chart or externally maintained resources:

  • labels and labelSelectors will not match up any longer leading to orphaned resources (pods mostly, in case of CRDs it might be other things)
  • we might run into immutable field problems
  • we might break the deployment completely because we remove labels that might be required

@michaelsauter
Copy link
Member

If it is intended, I think the smallest change possible is listing explicitly which resources to apply labels to.

Will that solve our problem thou?

No it won't solve the general problem that you describe (and I agree with your view). But it may unblock for now, leaving the current system with its flaws in place. Doing the small change should buy time to work on the general topic on the side.

@metmajer
Copy link
Member

@SHoen what are you trying to achieve? @michaelsauter's observation with ODS' Release Manager managing the deployment of an entire system is spot-on. Unless you plan to install services into the -cd namespace to support. Otherwise, this requires a larger discussion.

@clemensutschig
Copy link
Member

@metmajer @michaelsauter - assuming an operator is involved which deploys custom resources, roles etc .. I doubt everything managed by ODS will solve this .. or am I lost somewhere

@michaelsauter
Copy link
Member

@clemensutschig Not sure I get your comment correctly. I tried to say that the release manager has the implicit assumption that it manages everything that is deployed. An operator managing its own resources conflicts with this idea in my view. A middle ground may be that the operator potentially provides an option to specify resource labels. Still, the documentation that would be produced by the RM would not reflect what was actually deployed ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants