Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ServiceL2Status follow-up: create resources in main namespace with speaker pod as OwnerRef #2311

Open
2 tasks done
oribon opened this issue Mar 7, 2024 · 10 comments · May be fixed by #2351
Open
2 tasks done

ServiceL2Status follow-up: create resources in main namespace with speaker pod as OwnerRef #2311

oribon opened this issue Mar 7, 2024 · 10 comments · May be fixed by #2351
Assignees

Comments

@oribon
Copy link
Member

oribon commented Mar 7, 2024

Is your feature request related to a problem?

#2158 is merged, but there's a corner-case we don't handle where if a speaker is deleted permanently (either the node is gone, or the user doesn't want it to act as a speaker) the l2 statuses that it created would be left dangling.

Describe the solution you'd like

Since cross-namespace owner refs are not allowed, we think a reasonable approach would be creating the statuses in the same namespace as the speakers. When we do that, we can no longer use servicename-node as the resource's name, because multiple services with the same name can arrive from different namespaces.
So the idea is to have the statuses created with a generated name such as GenerateName: <node-name>- on the same namespace as the speaker, with the speaker pod being an ownerRef of the resource. Once a speaker is restarted / gone k8s will handle the deletion for us.

Additional context

No response

I've read and agree with the following

  • I've checked all open and closed issues and my request is not there.
  • I've checked all open and closed pull requests and my request is not there.
@oribon
Copy link
Member Author

oribon commented Mar 7, 2024

cc @lwabish :)

fedepaol added a commit to fedepaol/metallb that referenced this issue Mar 7, 2024
Because of metallb#2311 we are going
to move the status instances to the metallb namespace. This might
require a change in the permissions too (from cluster to namespaced) so
the new version of MetalLB might not be able to delete the "legacy"
instances of the CRD because they belong to namespaces the new metallb
might not have permissions on.

Because of this, we hide the feature behind a flag, effectively
disabling it until the issue is fixed.

Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
fedepaol added a commit to fedepaol/metallb that referenced this issue Mar 7, 2024
Because of metallb#2311 we are going
to move the status instances to the metallb namespace. This might
require a change in the permissions too (from cluster to namespaced) so
the new version of MetalLB might not be able to delete the "legacy"
instances of the CRD because they belong to namespaces the new metallb
might not have permissions on.

Because of this, we hide the feature behind a flag, effectively
disabling it until the issue is fixed.

Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
fedepaol added a commit to fedepaol/metallb that referenced this issue Mar 7, 2024
Because of metallb#2311 we are going
to move the status instances to the metallb namespace. This might
require a change in the permissions too (from cluster to namespaced) so
the new version of MetalLB might not be able to delete the "legacy"
instances of the CRD because they belong to namespaces the new metallb
might not have permissions on.

Because of this, we hide the feature behind a flag, effectively
disabling it until the issue is fixed.

Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
fedepaol added a commit to fedepaol/metallb that referenced this issue Mar 7, 2024
Because of metallb#2311 we are going
to move the status instances to the metallb namespace. This might
require a change in the permissions too (from cluster to namespaced) so
the new version of MetalLB might not be able to delete the "legacy"
instances of the CRD because they belong to namespaces the new metallb
might not have permissions on.

Because of this, we hide the feature behind a flag, effectively
disabling it until the issue is fixed.

Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
fedepaol added a commit to fedepaol/metallb that referenced this issue Mar 7, 2024
We must be able to skip those tests out until
metallb#2311 is fixed.

Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
github-merge-queue bot pushed a commit that referenced this issue Mar 7, 2024
Because of #2311 we are going
to move the status instances to the metallb namespace. This might
require a change in the permissions too (from cluster to namespaced) so
the new version of MetalLB might not be able to delete the "legacy"
instances of the CRD because they belong to namespaces the new metallb
might not have permissions on.

Because of this, we hide the feature behind a flag, effectively
disabling it until the issue is fixed.

Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
fedepaol added a commit that referenced this issue Mar 7, 2024
Because of #2311 we are going
to move the status instances to the metallb namespace. This might
require a change in the permissions too (from cluster to namespaced) so
the new version of MetalLB might not be able to delete the "legacy"
instances of the CRD because they belong to namespaces the new metallb
might not have permissions on.

Because of this, we hide the feature behind a flag, effectively
disabling it until the issue is fixed.

Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
fedepaol added a commit that referenced this issue Mar 7, 2024
We must be able to skip those tests out until
#2311 is fixed.

Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
@lwabish
Copy link
Contributor

lwabish commented Mar 8, 2024

Indeed.
Here are several things that occured in my mind:

  1. I have met some situation like this before. We used a helm hook to trigger a job to gc crs because in my case the only way to install is from a helm chart.
  2. Is it possible to detect speaker pod deletion with a mutating webhook, which could do some gc ?
  3. When I implemented layer2status feature, It was a natual thought to put status cr in the same namespace with the service because from a user perspective, I may want to check the status of service and status cr simultaneously without switching namespaces back and forth. Of course, this is a quick summary of my personal opinion, and I look forward to hearing your thoughts.

@fedepaol
Copy link
Member

fedepaol commented Mar 8, 2024

Indeed. Here are several things that occured in my mind:

If we can set an owner reference from the speaker pod itself to the status instance, then the gc will happen within kubernetes.

1. I have met some situation like this before. We used a helm hook to trigger a job to gc crs because in my case the only way to install is from a helm chart.

2. Is it possible to detect speaker pod deletion with a mutating webhook, which could do some gc ?

This is not possible. It'd mean set a webhook on the path for all the pods (because webhooks are not namespaced iirc), which is not acceptable. Also, we learned to stay away as much as possible from webhooks (see #1597)

3. When I implemented layer2status feature, It was a natual thought to put status cr in the same namespace with the service because from a user perspective, I may want to check the status of service and status cr simultaneously without switching namespaces back and forth. Of course, this is a quick summary of my personal opinion, and I look forward to hearing your thoughts.

I agree that this feels more natural. However, having the status of what metallb does inside metallb's namespace doesn't sound too much of a stretch, especially if it solves the problem of the point before.

@lwabish
Copy link
Contributor

lwabish commented Mar 11, 2024

Filtering pod from webhooks should not be a problem with the help of label selectors or namspace selectors.But I do agree that webhooks are tricky sometimes.

I'd love to follow your final advice and continue work on this improvement, but maybe will start a few days later if acceptable.

@fedepaol
Copy link
Member

@lwabish just checking if you are still interested in helping with this. There is obviously no rush!

@lwabish
Copy link
Contributor

lwabish commented Mar 26, 2024

@lwabish just checking if you are still interested in helping with this. There is obviously no rush!

Yes I'd love to keep working on this.

@lwabish
Copy link
Contributor

lwabish commented Mar 26, 2024

I 'll implement this maybe this weekend

@fedepaol
Copy link
Member

Thanks a lot!

@jhoblitt
Copy link

Is it the weekend yet? :)

@fedepaol
Copy link
Member

Is it the weekend yet? :)

The pr was filed as you can see above :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants