Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move allocated pci deletion in defer of del cmd #278

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mlguerrero12
Copy link
Contributor

The allocated pci is supposed to be deleted when the del command is completed. However, placing this logic at the end of the function does not always guaranteed its execution. For instance, if ResetVfConfig or ReleaseVF fail, the err variable is overwritten which causes the cache to be deleted in the defer function and the next del retry to exit without deleting the allocated pci.
By placing this logic along the deletion of the cache will ensure that all operations in the del command are done.

The allocated pci is supposed to be deleted when the del command
is completed. However, placing this logic at the end of the function
does not always guaranteed its execution. For instance, if ResetVfConfig
or ReleaseVF fail, the err variable is overwritten which causes the cache
to be deleted in the defer function and the next del retry to exit
without deleting the allocated pci.
By placing this logic along the deletion of the cache will ensure that
all operations in the del command are done.
Copy link
Collaborator

@SchSeba SchSeba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general there is a protection for that in the allocation function.
if the namespace doesn't exist we will clean the allocation for new pods

if we failed on finishing the Del function and the namespace still exist we will have a generic problem in cmdAdd for a new pod saying just failed to find the vf to mode

@mlguerrero12
Copy link
Contributor Author

First of all, it is not correct to rely on protections in the add command. We need to delete the allocation when the del command finishes for all cases. That's with the aim of having consistency. It is not correct to delete for some cases and for others not.

Also, note that the error when the delete allocation fails is useless, the err var is not overwritten which will lead a retry that will surely fail (most likely in ReleaseVf). The cache will be deleted and the next retry will exit again without deleting the allocation.

Second, by introducing this synchronization logic (not to run add before del), we are seeing new issues when for some reason the namespace is not deleted but all networking inside is removed. Before, the plugin simply continue but now it stays stuck in the add command, even though, it is safe to continue.

This won't solve the above, but it will help reduce it by having the previous behavior when the del command correctly finishes for all cases.

@SchSeba
Copy link
Collaborator

SchSeba commented Aug 1, 2023

Hi @mlguerrero12 thanks for the comment!

the thing is that if we failed on moving back the vf because of an issue (and we already encounter some of them) it will not be right to delete the lock because again the next pod that wants to use this will not work.

I agree we can improve when to remove the allocation file if the VF is not in the namespace anymore but for some bug the network namespace is still preset we can remove the lock.

WDYT?

@mlguerrero12
Copy link
Contributor Author

@SchSeba, I think we are mixing up the issues here. I agree that we need to find a way to inspect a namespace that failed to be deleted due to a bug to see if the networking has been removed.

The protection that is currently present in the add command is supposed to be when the del command fails to complete. When the del command correctly terminates, the lock is supposed to be deleted. However, this is not always the case since the deletion of the lock is done at the end of the function. For instance, the del command terminates in line 205, 226 and 254 and the lock is not deleted. Do you have a valid reason for this? Also, if the lock deletion fails, the next retry will fail in ReleaseVF for no dpdk drivers. So, it's useless.

What I want to achieve in this PR is to consistently delete the lock for all cases that the del command terminates and not rely on the protection in the add command which is supposed to be only when the del command is skipped.

@SchSeba
Copy link
Collaborator

SchSeba commented Aug 8, 2023

I get your point. but for this to work I think we need also another step we need to change the places where we are creating a new err variable like

if _, err := utils.GetVfid(netConf.DeviceID, netConf.Master); err != nil {
		return fmt.Errorf("cmdDel() error obtaining VF ID: %q", err)
	}

...
if err := sm.ResetVFConfig(netConf); err != nil {
		return fmt.Errorf("cmdDel() error reseting VF: %q", err)
	}
...

this way we will really clean this up only if there was not error in the Delete function no?

@mlguerrero12
Copy link
Contributor Author

Based on the comments around the pci allocation code. The main purpose is to have a synchronization mechanism to not execute an add command before a del command.

In config.go, line 41.

" // This is to prevent issues where kubelet request to delete a pod and in the same time a new pod using the same
// vf is started. we can have an issue where the cmdDel of the old pod is called AFTER the cmdAdd of the new one
// This will block the new pod creation until the cmdDel is done."

Line 71,

" // To prevent a locking of a PCI address for every pciAddress file we also add the netns path where it's been used
// This way if for some reason the cmdDel command was not called but the pod namespace doesn't exist anymore
// we release the PCI address"

Now, it seems you want to use this as well to detect errors in the del command and force a check in the add command. This is not mentioned anywhere. I thought the protection was only when the del command is not executed.

@SchSeba
Copy link
Collaborator

SchSeba commented Mar 18, 2024

Hi @mlguerrero12 thanks for the comment!

you are right maybe we can add a better comment for this.
another use-case is the protection if for some reason the cniDel failed.

also we saw use-cases where crio is not able to remove the network namespace and the VF remains on that namespace instead of moving back into the host network.

in the case without this block, we will get again the generic error in the new pod creation saying enable to find vf interface

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants