Skip to content

Add reference tracking to netdev and release netdev reference in nondl on shutdown #255

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

zhuyifei1999
Copy link
Contributor

We are using onload in AF_XDP mode, and one thing we noticed was that as long as onload was loaded, shutdown / reboot would loop with console message:

unregister_netdevice: waiting for eth0 to become free. Usage count = 3
unregister_netdevice: waiting for eth0 to become free. Usage count = 3
unregister_netdevice: waiting for eth0 to become free. Usage count = 3

This is because module cleanup functions only get called in the delete_module syscall and not reboot. I added reference counting and fixed the issue by adding a reboot notifier that calls the appropriate cleanup functions necessary to release the netdev references.

Netdev refcounting was added a couple years back in 5.17 [1], but
onload only uses the untracked versions. This patch adds support
to the ones that are low-hanging fruit.

For sfc, kernel compatibility for netdevice_tracker is already
provided by its kernel_compat.h. Tracking in tc_encap_actions.c
is already upstream [2] so this is backporting from there. For
rx_common.c I just sent a patch upstream [3] and backported it
here.

Outside sfc I added the same support to ci/driver/kernel_compat.h.

[1] https://lore.kernel.org/netdev/20211205042217.982127-4-eric.dumazet@gmail.com/
[2] torvalds/linux@7e5e7d800011ad#diff-9ffb1b01a8a11d0d7b6976a10eede3bebdcfaf256ef26d0d95714fe8aa03c89fR156
[3] https://lore.kernel.org/netdev/20241217224717.1711626-1-zhuyifei@google.com/T/

Signed-off-by: YiFei Zhu <zhuyifei@google.com>
@zhuyifei1999 zhuyifei1999 requested a review from a team as a code owner December 18, 2024 23:43
Copy link
Contributor

@ligallag-amd ligallag-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for your PR!

Overall, I think the netdev reference counting is a helpful addition. Once your upstream changes are accepted and have been pulled into out our sfc-linux-net repo, we will import the net driver with your changes this way.

As for your changes to the reboot notifier, it makes sense to me but I think there is an opportunity for this to race with the delete_module syscall (this is the only place I could find in the kernel source where mod->exit() is called). You have mentioned that shutdown shouldn't be concerned with this but I imagine the race would cause kernel warnings at the least. I'd like some of the other members of my team to weigh in on this.

@ligallag-amd ligallag-amd requested a review from a team December 20, 2024 16:03
@zhuyifei1999
Copy link
Contributor Author

You have mentioned that shutdown shouldn't be concerned with this but I imagine the race would cause kernel warnings at the least.

I had a bit of consideration for this. Both cleanup functions hold RTNL lock:

extern void efrm_nondl_unregister(void)
{
#ifdef EFHW_HAS_AF_XDP
efrm_nondl_unregister_driver(&efrm_nondl_driver);
#endif
}

/* Unregister a driver from the non-driverlink resource manager. */
void efrm_nondl_unregister_driver(struct efrm_nondl_driver *driver)
{
struct efrm_nondl_device *device, *device_n;
rtnl_lock();
EFRM_ASSERT(nondl_driver == driver);
list_for_each_entry_safe_reverse(device, device_n, &driver->devices,
driver_node)
efrm_nondl_del_device(device);
BUG_ON(!list_empty(&driver->devices));
nondl_driver = NULL;
rtnl_unlock();
}
EXPORT_SYMBOL(efrm_nondl_unregister_driver);

void efrm_nondl_shutdown(void)
{
struct efrm_nondl_device *device, *device_n;
/* If we are being unloaded then any module which depends on
* us should have been unloaded already, so our driver list
* should be empty. If not then something went badly wrong.
*
* We still might have devices registered with us, though. If
* so they need to be cleaned up. */
rtnl_lock();
BUG_ON(nondl_driver);
list_for_each_entry_safe_reverse(device, device_n, &nondl_device_list, node)
efrm_nondl_cleanup_netdev(device);
BUG_ON(!list_empty(&nondl_device_list));
rtnl_unlock();
}

So data races aren't possible. Both functions are also in the format of

list_for_each_entry_safe_reverse(...)
   cleanup(...)
BUG_ON(!list_empty(...));

So both functions are essentially no-ops if they have already been called. The only place I see that it could cause crash / warn is efrm_nondl_unregister_driver calling EFRM_ASSERT(nondl_driver == driver);, which I just didn't bother fixing in the initial implementation, considering a normal shutdown PID 1 typically kills all processes before calling reboot syscall, so no one would be there to call delete_module syscall. And I'm not sure, if I were to fix this, what the best way to fix it would be.

Copy link
Contributor

@ligallag-amd ligallag-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points.

Regarding EFRM_ASSERT(nondl_driver == driver);, you could check if nondl_driver == NULL and exit early.

On shutdown / reboot, if netdevs don't have a refcount of 1, the
kernel loops with message:
  unregister_netdevice: waiting for eth0 to become free. Usage count = 3
  unregister_netdevice: waiting for eth0 to become free. Usage count = 3
  unregister_netdevice: waiting for eth0 to become free. Usage count = 3
  [...]

With netdev refcount tracking, the traces are visible:
  unregister_netdevice: waiting for eth0 to become free. Usage count = 3
  ref_tracker: eth%d@ff3cf768c8344548 has 1/2 users at
       efrm_nic_add+0x29b/0x460 [sfc_resource]
       efrm_nondl_add_device+0x124/0x1c0 [sfc_resource]
       efrm_nondl_try_add_device+0x27/0xf0 [sfc_resource]
       efrm_nondl_register_netdev+0x106/0x160 [sfc_resource]
       nondl_register_store+0x174/0x1e0 [sfc_resource]
       kernfs_fop_write_iter+0x128/0x1c0
       vfs_write+0x308/0x420
       ksys_write+0x5f/0xe0
       do_syscall_64+0x7b/0x160
       entry_SYSCALL_64_after_hwframe+0x76/0x7e

  ref_tracker: eth%d@ff3cf768c8344548 has 1/2 users at
       efrm_nondl_register_netdev+0xa9/0x160 [sfc_resource]
       nondl_register_store+0x174/0x1e0 [sfc_resource]
       kernfs_fop_write_iter+0x128/0x1c0
       vfs_write+0x308/0x420
       ksys_write+0x5f/0xe0
       do_syscall_64+0x7b/0x160
       entry_SYSCALL_64_after_hwframe+0x76/0x7e

This patch makes sfc_resource register a reboot notifier that calls
efrm_nondl_unregister & efrm_nondl_shutdown that handles each of the two
refcounts. Considering that the only other caller of those functions
is the cleanup_sfc_resource module exit function, I don't think shutdown
should be concerned about module unload racing under normal circumstances.
In any case, efrm_nondl_unregister_driver is made to exit early before
the assert, in the event that the race does end up happening.

Signed-off-by: YiFei Zhu <zhuyifei@google.com>
@zhuyifei1999
Copy link
Contributor Author

Regarding EFRM_ASSERT(nondl_driver == driver);, you could check if nondl_driver == NULL and exit early.

Done.

@wdebruij
Copy link
Contributor

Hi all. Please take another look at this fix when you have time. Thanks.

@ligallag-amd
Copy link
Contributor

ligallag-amd commented Apr 30, 2025

Hi @zhuyifei1999, I was doing some testing of this change but there seems to be a kernel crash when rebooting. I've been testing using a rebase of this branch onto master: https://github.com/ligallag-amd/onload/tree/yifeinetdevref (I managed to get your sfc_linux_net changes backported). I see the following fault on rhel 9.0 Linux 5.14.0-70.13.1.el9_0.x86_64 and rhel8.10 Linux 4.18.0-553.el8_10.x86_64 when built and loaded with:

$ HAVE_AF_XDP=1 make -j $(nproc)
$ HAVE_SFC=0 HAVE_AF_XDP=1 make -C "$(mmaketool --toppath)/build/$(mmaketool --userbuild)" -j $(nproc
$ sudo "$(mmaketool --toppath)/build/$(mmaketool --driverbuild)/driver/linux/load.sh" onload_xdp

fault:

[   87.962944] systemd-shutdown[1]: Rebooting.
[   87.963118] kvm: exiting hardware virtualization
[   87.963353] [sfc efrm] efrm_nondl_unregister_device: unregister ens1f1
[   87.963565] [sfc efrm] efrm_nic_del_device:
[   87.963732] [sfc efrm] efrm_nic_shutdown:
[   87.963885] [sfc efrm] efrm_vi_rm_delayed_free: 00000000bf228592
[   87.964133] ------------[ cut here ]------------
[   87.964291] kernel BUG at /home/ligallag/Dev/onload/src/driver/linux_resource/../../lib/efrm/driver_object.c:179!
[   87.964699] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[   87.964875] CPU: 5 PID: 1 Comm: systemd-shutdow Kdump: loaded Tainted: G           OE    --------- ---  5.14.0-70.13.1.el9_0.x86_64 #1
[   87.965320] Hardware name: Dell Inc. PowerEdge R250/0YVNH0, BIOS 1.6.3 02/22/2023
[   87.965619] RIP: 0010:efrm_nic_dtor+0x7f/0x90 [sfc_resource]
[   87.965826] Code: e8 16 c3 ff ff 48 8b bb 08 02 00 00 e8 fa 90 ff ff 48 8b bb 10 02 00 00 e8 6e 90 2e c1 48 c7 83 10 02 00 00 00 00 00 00 5b c3 <0f> 0b 0f 0b 0f 0b 0f 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
[   87.966510] RSP: 0018:ffffb7868006fd58 EFLAGS: 00010287
[   87.966688] RAX: ffff99f94e6f85f8 RBX: ffff99f94e6f8400 RCX: 0000000000000002
[   87.966975] RDX: ffff99f94cb1b788 RSI: 0000000000000282 RDI: ffff99f94e6f8400
[   87.967245] RBP: ffff99f94e6f8400 R08: 0000000000000001 R09: ffff99fc1fb694a8
[   87.967510] R10: 000000000000002f R11: 000000000000027d R12: 0000000000000001
[   87.967770] R13: 0000000000000000 R14: ffffffff83260200 R15: 00000000fffffff9
[   87.968090] FS:  00007f0e07584b40(0000) GS:ffff99fc1fb40000(0000) knlGS:0000000000000000
[   87.968447] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   87.968657] CR2: 00007f0e082584a0 CR3: 0000000121828002 CR4: 0000000000770ee0
[   87.968926] PKRU: 55555554
[   87.969045] Call Trace:
[   87.969140]  linux_efrm_nic_dtor+0x1a/0x50 [sfc_resource]
[   87.969350]  efrm_nic_shutdown+0x40/0x74 [sfc_resource]
[   87.969571]  efrm_nic_del_device.cold+0x75/0x7f [sfc_resource]
[   87.969791]  efrm_nondl_del_device+0x2b/0xb0 [sfc_resource]
[   87.969980]  efrm_nondl_unregister_driver+0x3a/0x70 [sfc_resource]
[   87.978430]  sfc_resource_shutdown_notify+0xa/0x20 [sfc_resource]
[   87.986428]  blocking_notifier_call_chain+0x5a/0x80
[   87.994197]  __do_sys_reboot.cold+0x16/0x5b
[   88.002031]  ? vfs_writev+0xc7/0x170
[   88.010695]  ? __call_rcu+0x110/0x320
[   88.018722]  ? do_writev+0x6b/0x110
[   88.026602]  do_syscall_64+0x38/0x90
[   88.035651]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   88.043603] RIP: 0033:0x7f0e081a7947
[   88.051390] Code: 0a 00 f7 d8 64 89 02 b8 ff ff ff ff eb b8 0f 1f 44 00 00 f3 0f 1e fa 89 fa be 69 19 12 28 bf ad de e1 fe b8 a9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 b1 f4 0a 00 f7 d8 64 89 02 b8
[   88.067964] RSP: 002b:00007ffe1b15baf8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a9
[   88.076970] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0e081a7947
[   88.085901] RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead
[   88.095289] RBP: 00007ffe1b15bd10 R08: 0000000000000000 R09: 00007ffe1b15aef0
[   88.104669] R10: 00007ffe1b15b0b0 R11: 0000000000000202 R12: 0000000000000000
[   88.114078] R13: 00007ffe1b15bb60 R14: 00007ffe1b15be48 R15: 0000000000000000
[   88.122455] Modules linked in: onload(OE) sfc_char(OE) sfc_resource(OE) sfc(OE) nf_flow_table nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c mtdblock mtd_blkdevs 8021q garp mrp stp llc mdio hwmon_vid crc32_generic mii netconsole nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc vfat fat intel_rapl_msr intel_rapl_common intel_pmc_core x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support dcdbas kvm_intel ipmi_ssif kvm irqbypass intel_cstate intel_uncore mei_me i2c_i801 pcspkr i2c_smbus joydev acpi_ipmi mei ipmi_si acpi_power_meter acpi_pad fuse ext4 mbcache jbd2 sd_mod t10_pi sg drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops crct10dif_pclmul cec crc32_pclmul ahci crc32c_intel libahci drm libata tg3 ghash_clmulni_intel i2c_algo_bit mtd video pinctrl_tigerlake ipmi_devintf ipmi_msghandler [last unloaded: sfc_resource]

@zhuyifei1999
Copy link
Contributor Author

zhuyifei1999 commented Apr 30, 2025

Weird. What's the line number of the crash? It should show up in the line that has

[   87.964291] kernel BUG at /home/ligallag/Dev/onload/src/driver/linux_resource/../../lib/efrm

but the line is truncated.

(I'll take a look when I'm more awake)

@ligallag-amd
Copy link
Contributor

Good spot - I was lazy when copy pasting from my terminal and must not have noticed this! I've updated my comment with the correct log.

@zhuyifei1999
Copy link
Contributor Author

Huh, that would be

/* Things have gone very wrong if there are any driver clients */
EFRM_ASSERT(list_empty(&efrm_nic->clients));

This means that by the time reboot is happening, onload is still being used. But systemd-shutdown literally broadcasts SIGKILL so nothing should be able to survive:

https://github.com/systemd/systemd/blob/366dd4a662dac18c1a5beabb8f0c4c576a378206/src/shutdown/shutdown.c#L454

Is there some sort of lingering state somewhere?

I'll try to reproduce.

@ligallag-amd
Copy link
Contributor

Is there some sort of lingering state somewhere?

I've tested again and I still see the panic. The machine has just come back up after reboot so should be in a clean state. All I've done prior to installing onload is load netconsole so the paired machine can see the dmesg output.

Testing on master, I don't see this issue.

@zhuyifei1999
Copy link
Contributor Author

zhuyifei1999 commented May 1, 2025

Hmm, I think the reason I don't repro is because my assertions are off... I build with sudo ./scripts/onload_install --no-sfc and I see my compilation command a lot of -DNDEBUG, which toggles the asserts off:

#ifndef NDEBUG
#define EFRM_ASSERT(cond) BUG_ON((cond) == 0)
#define _EFRM_ASSERT(cond, file, line) \
do { \
if (unlikely(!(cond))) { \
EFRM_ERR("assertion \"%s\" failed at %s %d", \
#cond, file, line); \
BUG(); \
} \
} while (0)
#define EFRM_DO_DEBUG(expr) expr
#define EFRM_VERIFY_EQ(expr, val) EFRM_ASSERT((expr) == (val))
#else
#define EFRM_ASSERT(cond) do {} while(0)
#define EFRM_DO_DEBUG(expr) do {} while(0)
#define EFRM_VERIFY_EQ(expr, val) expr
#endif

Anyways I looked at the code to see how it works.

The list entry is removed upon the last reference to struct efrm_client:

list_del(&client->link);

so I added tracking to when the reference count gets increased / decreased:

diff --git a/src/lib/efrm/driver_object.c b/src/lib/efrm/driver_object.c
index 7e74afc6..437214d6 100644
--- a/src/lib/efrm/driver_object.c
+++ b/src/lib/efrm/driver_object.c
@@ -346,6 +346,8 @@ static struct efrm_client_callbacks efrm_null_callbacks = {
 static void efrm_client_init_from_nic(struct efrm_nic *rnic,
                                      struct efrm_client *client)
 {
+       pr_emerg("efrm_client_init_from_nic: %px\n", client);
+       dump_stack();
        client->nic = &rnic->efhw_nic;
        client->ref_count = 1;
        client->flags = 0;
@@ -489,6 +491,8 @@ void efrm_client_put(struct efrm_client *client)
        EFRM_ASSERT(client->ref_count > 0);
 
        spin_lock_bh(&efrm_nic_tablep->lock);
+       pr_emerg("efrm_client_put: %px %d->%d\n", client, client->ref_count, client->ref_count - 1);
+       dump_stack();
        if (--client->ref_count > 0)
                client = NULL;
        else
@@ -503,6 +507,8 @@ void efrm_client_add_ref(struct efrm_client *client)
 {
        EFRM_ASSERT(client->ref_count > 0);
        spin_lock_bh(&efrm_nic_tablep->lock);
+       pr_emerg("efrm_client_add_ref: %px %d->%d\n", client, client->ref_count, client->ref_count + 1);
+       dump_stack();
        ++client->ref_count;
        spin_unlock_bh(&efrm_nic_tablep->lock);
 }

Normally, the initial reference happens with stack, when the nic is registered wit the sysfs file:

[  +0.005770] efrm_client_init_from_nic: ff11e05fe0172780
[  +0.005261] CPU: 50 PID: 152937 Comm: load-onload-and Kdump: loaded Tainted: G           O       6.9.0-gcp-tp #1
[  +0.011563] Hardware name: Google Google Compute Engine/izumi, BIOS 0.20241203.0-6 12/03/2024
[  +0.008545] Call Trace:
[  +0.002494]  <TASK>
[  +0.003504]  dump_stack_lvl+0x51/0x70
[  +0.003704]  efrm_client_get_by_foo+0xd7/0x190 [sfc_resource]
[  +0.007151]  oo_nic_add+0x37/0x140 [onload]
[  +0.004242]  oo_nic_probe+0x8c/0xa0 [onload]
[  +0.005686]  efrm_nondl_register_netdev+0xf1/0x150 [sfc_resource]
[  +0.006138]  nondl_register_store+0x174/0x1d0 [sfc_resource]
[  +0.004995]  ? do_syscall_64+0x87/0x160
[  +0.000005]  kernfs_fop_write_iter+0x128/0x1c0
[  +0.016014]  vfs_write+0x308/0x420
[  +0.003440]  ksys_write+0x5f/0xe0
[  +0.004720]  do_syscall_64+0x7b/0x160
[...]

And the final reference is when the onload module, not the sfc module, is unloaded:

[19937.132090] [onload] oo_nic_remove: ifindex=2 oo_index=0
[19937.137433] efrm_client_put: ff11e05fe0172780 1->0
[19937.142246] CPU: 22 PID: 153100 Comm: modprobe Kdump: loaded Tainted: G           O       6.9.0-gcp-tp #1
[19937.151827] Hardware name: Google Google Compute Engine/izumi, BIOS 0.20241203.0-6 12/03/2024
[19937.160362] Call Trace:
[19937.162835]  <TASK>
[19937.164956]  dump_stack_lvl+0x51/0x70
[19937.168646]  efrm_client_put+0x36/0xa0 [sfc_resource]
[19937.173745]  oo_nic_shutdown+0x62/0x90 [onload]
[19937.178336]  onload_module_exit+0xa9/0xf0 [onload]
[19937.183154]  __do_sys_delete_module.constprop.0+0x176/0x310
[19937.188745]  ? syscall_trace_enter+0x11a/0x1b0
[19937.193207]  do_syscall_64+0x7b/0x160
[...]

Because this patch only concerns itself with sfc_resource.ko module cleanup rather than onload.ko module cleanup, onload is still technically "using" sfc_resource... I'm not exactly sure what's the best way to fix this. I could also make onload perform reboot notifier cleanup maybe? I'll need strict ordering for the cleanups, but considering sfc_resource.ko has a dependency on onload.ko this should be fine.

Anyhow, I was able to repro after adding --debug to the onload_install command:

[  793.179502] systemd-shutdown[1]: Rebooting.
[  793.183790] [sfc efrm] efrm_nondl_unregister_device: unregister eth0
[  793.190164] [sfc efrm] efrm_nic_del_device:
[  793.291150] idpf 0000:05:00.0 eth0: XDPSQ sharing disabled
[  793.300709] [sfc efrm] efrm_nic_shutdown:
[  793.304752] [sfc efrm] efrm_vi_rm_delayed_free: 0000000036a34784
[  793.310840] ------------[ cut here ]------------
[  793.315475] kernel BUG at /usr/src/onload/src/driver/linux_resource/../../lib/efrm/driver_object.c:177!
[  793.324892] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[  793.330136] CPU: 71 PID: 1 Comm: systemd-shutdow Kdump: loaded Tainted: G           O       6.9.0-gcp-tp #1
[  793.339887] Hardware name: Google Google Compute Engine/izumi, BIOS 0.20241203.0-6 12/03/2024
[  793.348425] RIP: 0010:efrm_nic_dtor+0x83/0x90 [sfc_resource]
[  793.354135] Code: ff 48 8b bb f8 01 00 00 e8 ea 7b ff ff 48 8b bb 00 02 00 00 e8 8e 2d 1a f8 48 c7 83 00 02 00 00 00 00 00 00 5b c3 cc cc cc cc <0f> 0b 0f 0b 0f 0b 0f 0b 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90
[  793.372901] RSP: 0018:ff1ebc348006fc58 EFLAGS: 00010287
[  793.378144] RAX: ff16e7633ef6e5e8 RBX: ff16e7633ef6e400 RCX: ff16e6d49c192868
[  793.385295] RDX: ff16e7344676bea8 RSI: ff16e6d49c192858 RDI: ff16e7633ef6e400
[  793.392441] RBP: ff16e7633ef6e400 R08: ff16e7633f1b5f70 R09: ff16e7633f1b5f70
[  793.399592] R10: 00000000000001e0 R11: 0000000000000000 R12: ff16e6d5412c6000
[  793.406743] R13: 0000000000000000 R14: 00000000fffffffc R15: 0000000000000000
[  793.413889] FS:  00007f8a68b1bb40(0000) GS:ff16e7633f180000(0000) knlGS:0000000000000000
[  793.421996] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  793.427761] CR2: 00007f8a68dfa680 CR3: 0000006094a52004 CR4: 0000000000771ef0
[  793.434909] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  793.442059] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[  793.449210] PKRU: 55555554
[  793.451936] Call Trace:
[  793.454414]  <TASK>
[  793.456544]  ? die+0x33/0x90
[  793.459456]  ? do_trap+0xdf/0x110
[  793.462798]  ? efrm_nic_dtor+0x83/0x90 [sfc_resource]
[  793.467881]  ? do_error_trap+0x65/0x80
[  793.471659]  ? efrm_nic_dtor+0x83/0x90 [sfc_resource]
[  793.476743]  ? exc_invalid_op+0x4e/0x70
[  793.480597]  ? efrm_nic_dtor+0x83/0x90 [sfc_resource]
[  793.485681]  ? asm_exc_invalid_op+0x16/0x20
[  793.489894]  ? efrm_nic_dtor+0x83/0x90 [sfc_resource]
[  793.494977]  linux_efrm_nic_dtor+0x1a/0x50 [sfc_resource]
[  793.500404]  efrm_nic_shutdown+0x3e/0x80 [sfc_resource]
[  793.505656]  efrm_nic_del_device+0x7e/0x120 [sfc_resource]
[  793.511165]  efrm_nondl_del_device+0x2b/0xb0 [sfc_resource]
[  793.516770]  efrm_nondl_unregister_driver+0x3a/0x70 [sfc_resource]
[  793.522980]  sfc_resource_shutdown_notify+0xa/0x20 [sfc_resource]
[  793.529102]  notifier_call_chain+0x57/0xd0
[  793.533227]  blocking_notifier_call_chain+0x3d/0x60
[  793.538132]  kernel_restart+0x1d/0x90
[  793.541823]  __do_sys_reboot+0x19b/0x220
[...]

@zhuyifei1999
Copy link
Contributor Author

zhuyifei1999 commented May 1, 2025

I could also make onload perform reboot notifier cleanup maybe?

What somewhat worries me is that, where cleanup_sfc_resource has the invocations I'm looking for, efrm_nondl_unregister and efrm_nondl_shutdown, early in the body of cleanup_sfc_resource, onload_module_exit calls oo_nic_shutdown very late in that function. I'm not fully confident that I can just simply call oo_nic_remove without the other functions first...

And then maybe I could consider going for an unregister code path, but

$ echo eth0 | sudo tee /sys/module/sfc_resource/afxdp/unregister 
eth0
tee: /sys/module/sfc_resource/afxdp/unregister: Device or resource busy

It seems that the unregister would fail if the device is up:

if(device->is_up)
return -EBUSY;

... yet when I grep for is_up I don't see anywhere this attribute can ever become false :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants