-
Notifications
You must be signed in to change notification settings - Fork 116
Add reference tracking to netdev and release netdev reference in nondl on shutdown #255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Netdev refcounting was added a couple years back in 5.17 [1], but onload only uses the untracked versions. This patch adds support to the ones that are low-hanging fruit. For sfc, kernel compatibility for netdevice_tracker is already provided by its kernel_compat.h. Tracking in tc_encap_actions.c is already upstream [2] so this is backporting from there. For rx_common.c I just sent a patch upstream [3] and backported it here. Outside sfc I added the same support to ci/driver/kernel_compat.h. [1] https://lore.kernel.org/netdev/20211205042217.982127-4-eric.dumazet@gmail.com/ [2] torvalds/linux@7e5e7d800011ad#diff-9ffb1b01a8a11d0d7b6976a10eede3bebdcfaf256ef26d0d95714fe8aa03c89fR156 [3] https://lore.kernel.org/netdev/20241217224717.1711626-1-zhuyifei@google.com/T/ Signed-off-by: YiFei Zhu <zhuyifei@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks for your PR!
Overall, I think the netdev reference counting is a helpful addition. Once your upstream changes are accepted and have been pulled into out our sfc-linux-net repo, we will import the net driver with your changes this way.
As for your changes to the reboot notifier, it makes sense to me but I think there is an opportunity for this to race with the delete_module
syscall (this is the only place I could find in the kernel source where mod->exit() is called). You have mentioned that shutdown shouldn't be concerned with this but I imagine the race would cause kernel warnings at the least. I'd like some of the other members of my team to weigh in on this.
I had a bit of consideration for this. Both cleanup functions hold RTNL lock: onload/src/driver/linux_resource/nondl_driver.c Lines 95 to 100 in 467ee38
onload/src/driver/linux_resource/nondl_resource.c Lines 90 to 109 in 467ee38
onload/src/driver/linux_resource/nondl_resource.c Lines 193 to 214 in 467ee38
So data races aren't possible. Both functions are also in the format of
So both functions are essentially no-ops if they have already been called. The only place I see that it could cause crash / warn is |
fdbe123
to
2958798
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points.
Regarding EFRM_ASSERT(nondl_driver == driver);
, you could check if nondl_driver == NULL
and exit early.
On shutdown / reboot, if netdevs don't have a refcount of 1, the kernel loops with message: unregister_netdevice: waiting for eth0 to become free. Usage count = 3 unregister_netdevice: waiting for eth0 to become free. Usage count = 3 unregister_netdevice: waiting for eth0 to become free. Usage count = 3 [...] With netdev refcount tracking, the traces are visible: unregister_netdevice: waiting for eth0 to become free. Usage count = 3 ref_tracker: eth%d@ff3cf768c8344548 has 1/2 users at efrm_nic_add+0x29b/0x460 [sfc_resource] efrm_nondl_add_device+0x124/0x1c0 [sfc_resource] efrm_nondl_try_add_device+0x27/0xf0 [sfc_resource] efrm_nondl_register_netdev+0x106/0x160 [sfc_resource] nondl_register_store+0x174/0x1e0 [sfc_resource] kernfs_fop_write_iter+0x128/0x1c0 vfs_write+0x308/0x420 ksys_write+0x5f/0xe0 do_syscall_64+0x7b/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e ref_tracker: eth%d@ff3cf768c8344548 has 1/2 users at efrm_nondl_register_netdev+0xa9/0x160 [sfc_resource] nondl_register_store+0x174/0x1e0 [sfc_resource] kernfs_fop_write_iter+0x128/0x1c0 vfs_write+0x308/0x420 ksys_write+0x5f/0xe0 do_syscall_64+0x7b/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e This patch makes sfc_resource register a reboot notifier that calls efrm_nondl_unregister & efrm_nondl_shutdown that handles each of the two refcounts. Considering that the only other caller of those functions is the cleanup_sfc_resource module exit function, I don't think shutdown should be concerned about module unload racing under normal circumstances. In any case, efrm_nondl_unregister_driver is made to exit early before the assert, in the event that the race does end up happening. Signed-off-by: YiFei Zhu <zhuyifei@google.com>
2958798
to
c632bb4
Compare
Done. |
Hi all. Please take another look at this fix when you have time. Thanks. |
Hi @zhuyifei1999, I was doing some testing of this change but there seems to be a kernel crash when rebooting. I've been testing using a rebase of this branch onto master: https://github.com/ligallag-amd/onload/tree/yifeinetdevref (I managed to get your sfc_linux_net changes backported). I see the following fault on
fault:
|
Weird. What's the line number of the crash? It should show up in the line that has
but the line is truncated. (I'll take a look when I'm more awake) |
Good spot - I was lazy when copy pasting from my terminal and must not have noticed this! I've updated my comment with the correct log. |
Huh, that would be onload/src/lib/efrm/driver_object.c Lines 178 to 179 in bbfd98d
This means that by the time reboot is happening, onload is still being used. But systemd-shutdown literally broadcasts SIGKILL so nothing should be able to survive: Is there some sort of lingering state somewhere? I'll try to reproduce. |
I've tested again and I still see the panic. The machine has just come back up after reboot so should be in a clean state. All I've done prior to installing onload is load netconsole so the paired machine can see the dmesg output. Testing on master, I don't see this issue. |
Hmm, I think the reason I don't repro is because my assertions are off... I build with onload/src/include/ci/efrm/debug_linux.h Lines 96 to 113 in bbfd98d
Anyways I looked at the code to see how it works. The list entry is removed upon the last reference to onload/src/lib/efrm/driver_object.c Line 497 in bbfd98d
so I added tracking to when the reference count gets increased / decreased: diff --git a/src/lib/efrm/driver_object.c b/src/lib/efrm/driver_object.c
index 7e74afc6..437214d6 100644
--- a/src/lib/efrm/driver_object.c
+++ b/src/lib/efrm/driver_object.c
@@ -346,6 +346,8 @@ static struct efrm_client_callbacks efrm_null_callbacks = {
static void efrm_client_init_from_nic(struct efrm_nic *rnic,
struct efrm_client *client)
{
+ pr_emerg("efrm_client_init_from_nic: %px\n", client);
+ dump_stack();
client->nic = &rnic->efhw_nic;
client->ref_count = 1;
client->flags = 0;
@@ -489,6 +491,8 @@ void efrm_client_put(struct efrm_client *client)
EFRM_ASSERT(client->ref_count > 0);
spin_lock_bh(&efrm_nic_tablep->lock);
+ pr_emerg("efrm_client_put: %px %d->%d\n", client, client->ref_count, client->ref_count - 1);
+ dump_stack();
if (--client->ref_count > 0)
client = NULL;
else
@@ -503,6 +507,8 @@ void efrm_client_add_ref(struct efrm_client *client)
{
EFRM_ASSERT(client->ref_count > 0);
spin_lock_bh(&efrm_nic_tablep->lock);
+ pr_emerg("efrm_client_add_ref: %px %d->%d\n", client, client->ref_count, client->ref_count + 1);
+ dump_stack();
++client->ref_count;
spin_unlock_bh(&efrm_nic_tablep->lock);
} Normally, the initial reference happens with stack, when the nic is registered wit the sysfs file:
And the final reference is when the onload module, not the sfc module, is unloaded:
Because this patch only concerns itself with Anyhow, I was able to repro after adding
|
What somewhat worries me is that, where And then maybe I could consider going for an
It seems that the unregister would fail if the device is up: onload/src/driver/linux_resource/nondl_resource.c Lines 178 to 179 in bbfd98d
... yet when I grep for |
We are using onload in AF_XDP mode, and one thing we noticed was that as long as onload was loaded, shutdown / reboot would loop with console message:
This is because module cleanup functions only get called in the delete_module syscall and not reboot. I added reference counting and fixed the issue by adding a reboot notifier that calls the appropriate cleanup functions necessary to release the netdev references.