ServiceL2Status follow-up: create resources in main namespace with speaker pod as OwnerRef #2311

oribon · 2024-03-07T10:46:05Z

Is your feature request related to a problem?

#2158 is merged, but there's a corner-case we don't handle where if a speaker is deleted permanently (either the node is gone, or the user doesn't want it to act as a speaker) the l2 statuses that it created would be left dangling.

Describe the solution you'd like

Since cross-namespace owner refs are not allowed, we think a reasonable approach would be creating the statuses in the same namespace as the speakers. When we do that, we can no longer use servicename-node as the resource's name, because multiple services with the same name can arrive from different namespaces.
So the idea is to have the statuses created with a generated name such as GenerateName: <node-name>- on the same namespace as the speaker, with the speaker pod being an ownerRef of the resource. Once a speaker is restarted / gone k8s will handle the deletion for us.

Additional context

No response

I've read and agree with the following

I've checked all open and closed issues and my request is not there.
I've checked all open and closed pull requests and my request is not there.

The text was updated successfully, but these errors were encountered:

oribon · 2024-03-07T10:47:01Z

cc @lwabish :)

Because of metallb#2311 we are going to move the status instances to the metallb namespace. This might require a change in the permissions too (from cluster to namespaced) so the new version of MetalLB might not be able to delete the "legacy" instances of the CRD because they belong to namespaces the new metallb might not have permissions on. Because of this, we hide the feature behind a flag, effectively disabling it until the issue is fixed. Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>

We must be able to skip those tests out until metallb#2311 is fixed. Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>

Because of #2311 we are going to move the status instances to the metallb namespace. This might require a change in the permissions too (from cluster to namespaced) so the new version of MetalLB might not be able to delete the "legacy" instances of the CRD because they belong to namespaces the new metallb might not have permissions on. Because of this, we hide the feature behind a flag, effectively disabling it until the issue is fixed. Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>

We must be able to skip those tests out until #2311 is fixed. Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>

lwabish · 2024-03-08T02:27:32Z

Indeed.
Here are several things that occured in my mind:

I have met some situation like this before. We used a helm hook to trigger a job to gc crs because in my case the only way to install is from a helm chart.
Is it possible to detect speaker pod deletion with a mutating webhook, which could do some gc ?
When I implemented layer2status feature, It was a natual thought to put status cr in the same namespace with the service because from a user perspective, I may want to check the status of service and status cr simultaneously without switching namespaces back and forth. Of course, this is a quick summary of my personal opinion, and I look forward to hearing your thoughts.

fedepaol · 2024-03-08T09:41:50Z

Indeed. Here are several things that occured in my mind:

If we can set an owner reference from the speaker pod itself to the status instance, then the gc will happen within kubernetes.

1. I have met some situation like this before. We used a helm hook to trigger a job to gc crs because in my case the only way to install is from a helm chart.

2. Is it possible to detect speaker pod deletion with a mutating webhook, which could do some gc ?

This is not possible. It'd mean set a webhook on the path for all the pods (because webhooks are not namespaced iirc), which is not acceptable. Also, we learned to stay away as much as possible from webhooks (see #1597)

3. When I implemented layer2status feature, It was a natual thought to put status cr in the same namespace with the service because from a user perspective, I may want to check the status of service and status cr simultaneously without switching namespaces back and forth. Of course, this is a quick summary of my personal opinion, and I look forward to hearing your thoughts.

I agree that this feels more natural. However, having the status of what metallb does inside metallb's namespace doesn't sound too much of a stretch, especially if it solves the problem of the point before.

lwabish · 2024-03-11T05:51:12Z

Filtering pod from webhooks should not be a problem with the help of label selectors or namspace selectors.But I do agree that webhooks are tricky sometimes.

I'd love to follow your final advice and continue work on this improvement, but maybe will start a few days later if acceptable.

fedepaol · 2024-03-26T10:05:45Z

@lwabish just checking if you are still interested in helping with this. There is obviously no rush!

lwabish · 2024-03-26T13:32:56Z

@lwabish just checking if you are still interested in helping with this. There is obviously no rush!

Yes I'd love to keep working on this.

lwabish · 2024-03-26T13:43:42Z

I 'll implement this maybe this weekend

fedepaol · 2024-03-27T10:37:00Z

Thanks a lot!

jhoblitt · 2024-04-17T00:06:17Z

Is it the weekend yet? :)

fedepaol · 2024-04-17T07:43:25Z

Is it the weekend yet? :)

The pr was filed as you can see above :)

oribon added the enhancement label Mar 7, 2024

fedepaol mentioned this issue Mar 7, 2024

Speaker: hide the L2ServiceStatus generation behind a flag #2312

Merged

fedepaol added a commit to fedepaol/metallb that referenced this issue Mar 7, 2024

E2E: put all the l2 service status tests under the L2ServiceStatus label

c7eab49

We must be able to skip those tests out until metallb#2311 is fixed. Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>

fedepaol added a commit that referenced this issue Mar 7, 2024

E2E: put all the l2 service status tests under the L2ServiceStatus label

3add83f

We must be able to skip those tests out until #2311 is fixed. Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>

lwabish linked a pull request Apr 1, 2024 that will close this issue

Fix #2311: change layer2status object namespace and add owner reference #2351

Open

oribon assigned lwabish Apr 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ServiceL2Status follow-up: create resources in main namespace with speaker pod as OwnerRef #2311

ServiceL2Status follow-up: create resources in main namespace with speaker pod as OwnerRef #2311

oribon commented Mar 7, 2024

oribon commented Mar 7, 2024

lwabish commented Mar 8, 2024

fedepaol commented Mar 8, 2024

lwabish commented Mar 11, 2024

fedepaol commented Mar 26, 2024

lwabish commented Mar 26, 2024

lwabish commented Mar 26, 2024

fedepaol commented Mar 27, 2024

jhoblitt commented Apr 17, 2024

fedepaol commented Apr 17, 2024

ServiceL2Status follow-up: create resources in main namespace with speaker pod as OwnerRef #2311

ServiceL2Status follow-up: create resources in main namespace with speaker pod as OwnerRef #2311

Comments

oribon commented Mar 7, 2024

Is your feature request related to a problem?

Describe the solution you'd like

Additional context

I've read and agree with the following

oribon commented Mar 7, 2024

lwabish commented Mar 8, 2024

fedepaol commented Mar 8, 2024

lwabish commented Mar 11, 2024

fedepaol commented Mar 26, 2024

lwabish commented Mar 26, 2024

lwabish commented Mar 26, 2024

fedepaol commented Mar 27, 2024

jhoblitt commented Apr 17, 2024

fedepaol commented Apr 17, 2024