kernel crash after "unregister_netdevice: waiting for lo to become free. Usage count = 3" #5618

tankywoo · 2014-05-06T03:48:01Z

This happens when I login the container, and can't quit by Ctrl-c.

My system is Ubuntu 12.04, kernel is 3.8.0-25-generic.

docker version:

root@wutq-docker:~# docker version
Client version: 0.10.0
Client API version: 1.10
Go version (client): go1.2.1
Git commit (client): dc9c28f
Server version: 0.10.0
Server API version: 1.10
Git commit (server): dc9c28f
Go version (server): go1.2.1
Last stable version: 0.10.0

I have used the script https://raw.githubusercontent.com/dotcloud/docker/master/contrib/check-config.sh to check, and all right.

I watch the syslog and found this message:

May  6 11:30:33 wutq-docker kernel: [62365.889369] unregister_netdevice: waiting for lo to become free. Usage count = 3
May  6 11:30:44 wutq-docker kernel: [62376.108277] unregister_netdevice: waiting for lo to become free. Usage count = 3
May  6 11:30:54 wutq-docker kernel: [62386.327156] unregister_netdevice: waiting for lo to become free. Usage count = 3
May  6 11:31:02 wutq-docker kernel: [62394.423920] INFO: task docker:1024 blocked for more than 120 seconds.
May  6 11:31:02 wutq-docker kernel: [62394.424175] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May  6 11:31:02 wutq-docker kernel: [62394.424505] docker          D 0000000000000001     0  1024      1 0x00000004
May  6 11:31:02 wutq-docker kernel: [62394.424511]  ffff880077793cb0 0000000000000082 ffffffffffffff04 ffffffff816df509
May  6 11:31:02 wutq-docker kernel: [62394.424517]  ffff880077793fd8 ffff880077793fd8 ffff880077793fd8 0000000000013f40
May  6 11:31:02 wutq-docker kernel: [62394.424521]  ffff88007c461740 ffff880076b1dd00 000080d081f06880 ffffffff81cbbda0
May  6 11:31:02 wutq-docker kernel: [62394.424526] Call Trace:                                                         
May  6 11:31:02 wutq-docker kernel: [62394.424668]  [<ffffffff816df509>] ? __slab_alloc+0x28a/0x2b2
May  6 11:31:02 wutq-docker kernel: [62394.424700]  [<ffffffff816f1849>] schedule+0x29/0x70
May  6 11:31:02 wutq-docker kernel: [62394.424705]  [<ffffffff816f1afe>] schedule_preempt_disabled+0xe/0x10
May  6 11:31:02 wutq-docker kernel: [62394.424710]  [<ffffffff816f0777>] __mutex_lock_slowpath+0xd7/0x150
May  6 11:31:02 wutq-docker kernel: [62394.424715]  [<ffffffff815dc809>] ? copy_net_ns+0x69/0x130
May  6 11:31:02 wutq-docker kernel: [62394.424719]  [<ffffffff815dc0b1>] ? net_alloc_generic+0x21/0x30
May  6 11:31:02 wutq-docker kernel: [62394.424724]  [<ffffffff816f038a>] mutex_lock+0x2a/0x50
May  6 11:31:02 wutq-docker kernel: [62394.424727]  [<ffffffff815dc82c>] copy_net_ns+0x8c/0x130
May  6 11:31:02 wutq-docker kernel: [62394.424733]  [<ffffffff81084851>] create_new_namespaces+0x101/0x1b0
May  6 11:31:02 wutq-docker kernel: [62394.424737]  [<ffffffff81084a33>] copy_namespaces+0xa3/0xe0
May  6 11:31:02 wutq-docker kernel: [62394.424742]  [<ffffffff81057a60>] ? dup_mm+0x140/0x240
May  6 11:31:02 wutq-docker kernel: [62394.424746]  [<ffffffff81058294>] copy_process.part.22+0x6f4/0xe60
May  6 11:31:02 wutq-docker kernel: [62394.424752]  [<ffffffff812da406>] ? security_file_alloc+0x16/0x20
May  6 11:31:02 wutq-docker kernel: [62394.424758]  [<ffffffff8119d118>] ? get_empty_filp+0x88/0x180
May  6 11:31:02 wutq-docker kernel: [62394.424762]  [<ffffffff81058a80>] copy_process+0x80/0x90
May  6 11:31:02 wutq-docker kernel: [62394.424766]  [<ffffffff81058b7c>] do_fork+0x9c/0x230
May  6 11:31:02 wutq-docker kernel: [62394.424769]  [<ffffffff816f277e>] ? _raw_spin_lock+0xe/0x20
May  6 11:31:02 wutq-docker kernel: [62394.424774]  [<ffffffff811b9185>] ? __fd_install+0x55/0x70
May  6 11:31:02 wutq-docker kernel: [62394.424777]  [<ffffffff81058d96>] sys_clone+0x16/0x20
May  6 11:31:02 wutq-docker kernel: [62394.424782]  [<ffffffff816fb939>] stub_clone+0x69/0x90
May  6 11:31:02 wutq-docker kernel: [62394.424786]  [<ffffffff816fb5dd>] ? system_call_fastpath+0x1a/0x1f
May  6 11:31:04 wutq-docker kernel: [62396.466223] unregister_netdevice: waiting for lo to become free. Usage count = 3
May  6 11:31:14 wutq-docker kernel: [62406.689132] unregister_netdevice: waiting for lo to become free. Usage count = 3
May  6 11:31:25 wutq-docker kernel: [62416.908036] unregister_netdevice: waiting for lo to become free. Usage count = 3
May  6 11:31:35 wutq-docker kernel: [62427.126927] unregister_netdevice: waiting for lo to become free. Usage count = 3
May  6 11:31:45 wutq-docker kernel: [62437.345860] unregister_netdevice: waiting for lo to become free. Usage count = 3

After happend this, I open another terminal and kill this process, and then restart docker, but this will be hanged.

I reboot the host, and it still display that messages for some minutes when shutdown:

The text was updated successfully, but these errors were encountered:

drpancake · 2014-05-23T13:44:25Z

I'm seeing a very similar issue for eth0. Ubuntu 12.04 also.

I have to power cycle the machine. From /var/log/kern.log:

May 22 19:26:08 box kernel: [596765.670275] device veth5070 entered promiscuous mode
May 22 19:26:08 box kernel: [596765.680630] IPv6: ADDRCONF(NETDEV_UP): veth5070: link is not ready
May 22 19:26:08 box kernel: [596765.700561] IPv6: ADDRCONF(NETDEV_CHANGE): veth5070: link becomes ready
May 22 19:26:08 box kernel: [596765.700628] docker0: port 7(veth5070) entered forwarding state
May 22 19:26:08 box kernel: [596765.700638] docker0: port 7(veth5070) entered forwarding state
May 22 19:26:19 box kernel: [596777.386084] [FW DBLOCK] IN=docker0 OUT= PHYSIN=veth5070 MAC=56:84:7a:fe:97:99:9e:df:a7:3f:23:42:08:00 SRC=172.17.0.8 DST=172.17.42.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=170 DF PROTO=TCP SPT=51615 DPT=13162 WINDOW=14600 RES=0x00 SYN URGP=0
May 22 19:26:21 box kernel: [596779.371993] [FW DBLOCK] IN=docker0 OUT= PHYSIN=veth5070 MAC=56:84:7a:fe:97:99:9e:df:a7:3f:23:42:08:00 SRC=172.17.0.8 DST=172.17.42.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=549 DF PROTO=TCP SPT=46878 DPT=12518 WINDOW=14600 RES=0x00 SYN URGP=0
May 22 19:26:23 box kernel: [596780.704031] docker0: port 7(veth5070) entered forwarding state
May 22 19:27:13 box kernel: [596831.359999] docker0: port 7(veth5070) entered disabled state
May 22 19:27:13 box kernel: [596831.361329] device veth5070 left promiscuous mode
May 22 19:27:13 box kernel: [596831.361333] docker0: port 7(veth5070) entered disabled state
May 22 19:27:24 box kernel: [596841.516039] unregister_netdevice: waiting for eth0 to become free. Usage count = 1
May 22 19:27:34 box kernel: [596851.756060] unregister_netdevice: waiting for eth0 to become free. Usage count = 1
May 22 19:27:44 box kernel: [596861.772101] unregister_netdevice: waiting for eth0 to become free. Usage count = 1

egasimus · 2014-06-04T21:43:32Z

Hey, this just started happening for me as well.

Docker version:

Client version: 0.11.1
Client API version: 1.11
Go version (client): go1.2.1
Git commit (client): fb99f99
Server version: 0.11.1
Server API version: 1.11
Git commit (server): fb99f99
Go version (server): go1.2.1
Last stable version: 0.11.1

Kernel log: http://pastebin.com/TubCy1tG

System details:
Running Ubuntu 14.04 LTS with patched kernel (3.14.3-rt4). Yet to see it happen with the default linux-3.13.0-27-generic kernel. What's funny, though, is that when this happens, all my terminal windows freeze, letting me type a few characters at most before that. The same fate befalls any new ones I open, too - and I end up needing to power cycle my poor laptop just like the good doctor above. For the record, I'm running fish shell in urxvt or xterm in xmonad. Haven't checked if it affects plain bash.

egasimus · 2014-06-05T06:05:24Z

This might be relevant:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1065434#yui_3_10_3_1_1401948176063_2050

Copying a fairly large amount of data over the network inside a container
and then exiting the container can trigger a missing decrement in the per
cpu reference count on a network device.

Sure enough, one of the times this happened for me was right after apt-getting a package with a ton of dependencies.

drpancake · 2014-06-05T14:48:32Z

Upgrading from Ubuntu 12.04.3 to 14.04 fixed this for me without any other changes.

csabahenk · 2014-07-22T08:09:16Z

I experience this on RHEL7, 3.10.0-123.4.2.el7.x86_64

egasimus · 2014-07-22T09:21:10Z

I've noticed the same thing happening with my VirtualBox virtual network interfaces when I'm running 3.14-rt4. It's supposed to be fixed in vanilla 3.13 or something.

spiffytech · 2014-07-25T14:35:13Z

@egasimus Same here - I pulled in hundreds of MB of data before killing the container, then got this error.

spiffytech · 2014-07-25T18:55:25Z

I upgraded to Debian kernel 3.14 and the problem appears to have gone away. Looks like the problem existed in some kernels < 3.5, was fixed in 3.5, regressed in 3.6, and was patched in something 3.12-3.14. https://bugzilla.redhat.com/show_bug.cgi?id=880394

egasimus · 2014-07-27T10:19:58Z

@spiffytech Do you have any idea where I can report this regarding the realtime kernel flavour? I think they're only releasing a RT patch for every other version, and would really hate to see 3.16-rt come out with this still broken. :/

EDIT: Filed it at kernel.org.

ibuildthecloud · 2014-12-22T06:14:58Z

I'm getting this on Ubuntu 14.10 running a 3.18.1. Kernel log shows

Dec 21 22:49:31 inotmac kernel: [15225.866600] unregister_netdevice: waiting for lo to become free. Usage count = 2
Dec 21 22:49:40 inotmac kernel: [15235.179263] INFO: task docker:19599 blocked for more than 120 seconds.
Dec 21 22:49:40 inotmac kernel: [15235.179268]       Tainted: G           OE  3.18.1-031801-generic #201412170637
Dec 21 22:49:40 inotmac kernel: [15235.179269] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 21 22:49:40 inotmac kernel: [15235.179271] docker          D 0000000000000001     0 19599      1 0x00000000
Dec 21 22:49:40 inotmac kernel: [15235.179275]  ffff8802082abcc0 0000000000000086 ffff880235c3b700 00000000ffffffff
Dec 21 22:49:40 inotmac kernel: [15235.179277]  ffff8802082abfd8 0000000000013640 ffff8800288f2300 0000000000013640
Dec 21 22:49:40 inotmac kernel: [15235.179280]  ffff880232cf0000 ffff8801a467c600 ffffffff81f9d4b8 ffffffff81cd9c60
Dec 21 22:49:40 inotmac kernel: [15235.179282] Call Trace:
Dec 21 22:49:40 inotmac kernel: [15235.179289]  [<ffffffff817af549>] schedule+0x29/0x70
Dec 21 22:49:40 inotmac kernel: [15235.179292]  [<ffffffff817af88e>] schedule_preempt_disabled+0xe/0x10
Dec 21 22:49:40 inotmac kernel: [15235.179296]  [<ffffffff817b1545>] __mutex_lock_slowpath+0x95/0x100
Dec 21 22:49:40 inotmac kernel: [15235.179299]  [<ffffffff8168d5c9>] ? copy_net_ns+0x69/0x150
Dec 21 22:49:40 inotmac kernel: [15235.179302]  [<ffffffff817b15d3>] mutex_lock+0x23/0x37
Dec 21 22:49:40 inotmac kernel: [15235.179305]  [<ffffffff8168d5f8>] copy_net_ns+0x98/0x150
Dec 21 22:49:40 inotmac kernel: [15235.179308]  [<ffffffff810941f1>] create_new_namespaces+0x101/0x1b0
Dec 21 22:49:40 inotmac kernel: [15235.179311]  [<ffffffff8109432b>] copy_namespaces+0x8b/0xa0
Dec 21 22:49:40 inotmac kernel: [15235.179315]  [<ffffffff81073458>] copy_process.part.28+0x828/0xed0
Dec 21 22:49:40 inotmac kernel: [15235.179318]  [<ffffffff811f157f>] ? get_empty_filp+0xcf/0x1c0
Dec 21 22:49:40 inotmac kernel: [15235.179320]  [<ffffffff81073b80>] copy_process+0x80/0x90
Dec 21 22:49:40 inotmac kernel: [15235.179323]  [<ffffffff81073ca2>] do_fork+0x62/0x280
Dec 21 22:49:40 inotmac kernel: [15235.179326]  [<ffffffff8120cfc0>] ? get_unused_fd_flags+0x30/0x40
Dec 21 22:49:40 inotmac kernel: [15235.179329]  [<ffffffff8120d028>] ? __fd_install+0x58/0x70
Dec 21 22:49:40 inotmac kernel: [15235.179331]  [<ffffffff81073f46>] SyS_clone+0x16/0x20
Dec 21 22:49:40 inotmac kernel: [15235.179334]  [<ffffffff817b3ab9>] stub_clone+0x69/0x90
Dec 21 22:49:40 inotmac kernel: [15235.179336]  [<ffffffff817b376d>] ? system_call_fastpath+0x16/0x1b
Dec 21 22:49:41 inotmac kernel: [15235.950976] unregister_netdevice: waiting for lo to become free. Usage count = 2
Dec 21 22:49:51 inotmac kernel: [15246.059346] unregister_netdevice: waiting for lo to become free. Usage count = 2

I'll send docker version/info once the system isn't frozen anymore :)

sbward · 2014-12-23T01:31:48Z

We're seeing this issue as well. Ubuntu 14.04, 3.13.0-37-generic

jbalonso · 2014-12-29T19:33:00Z

On Ubuntu 14.04 server, my team has found that downgrading from 3.13.0-40-generic to 3.13.0-32-generic "resolves" the issue. Given @sbward's observation, that would put the regression after 3.13.0-32-generic and before (or including) 3.13.0-37-generic.

I'll add that, in our case, we sometimes see a negative usage count.

rsampaio · 2015-01-15T20:11:29Z

FWIW we hit this bug running lxc on trusty kernel (3.13.0-40-generic #69-Ubuntu) the message appears in dmesg followed by this stacktrace:

[27211131.602869] INFO: task lxc-start:26342 blocked for more than 120 seconds.
[27211131.602874]       Not tainted 3.13.0-40-generic #69-Ubuntu
[27211131.602877] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27211131.602881] lxc-start       D 0000000000000001     0 26342      1 0x00000080
[27211131.602883]  ffff88000d001d40 0000000000000282 ffff88001aa21800 ffff88000d001fd8
[27211131.602886]  0000000000014480 0000000000014480 ffff88001aa21800 ffffffff81cdb760
[27211131.602888]  ffffffff81cdb764 ffff88001aa21800 00000000ffffffff ffffffff81cdb768
[27211131.602891] Call Trace:
[27211131.602894]  [<ffffffff81723b69>] schedule_preempt_disabled+0x29/0x70
[27211131.602897]  [<ffffffff817259d5>] __mutex_lock_slowpath+0x135/0x1b0
[27211131.602900]  [<ffffffff811a2679>] ? __kmalloc+0x1e9/0x230
[27211131.602903]  [<ffffffff81725a6f>] mutex_lock+0x1f/0x2f
[27211131.602905]  [<ffffffff8161c2c1>] copy_net_ns+0x71/0x130
[27211131.602908]  [<ffffffff8108f889>] create_new_namespaces+0xf9/0x180
[27211131.602910]  [<ffffffff8108f983>] copy_namespaces+0x73/0xa0
[27211131.602912]  [<ffffffff81065b16>] copy_process.part.26+0x9a6/0x16b0
[27211131.602915]  [<ffffffff810669f5>] do_fork+0xd5/0x340
[27211131.602917]  [<ffffffff810c8e8d>] ? call_rcu_sched+0x1d/0x20
[27211131.602919]  [<ffffffff81066ce6>] SyS_clone+0x16/0x20
[27211131.602921]  [<ffffffff81730089>] stub_clone+0x69/0x90
[27211131.602923]  [<ffffffff8172fd2d>] ? system_call_fastpath+0x1a/0x1f

MrMMorris · 2015-03-16T17:34:54Z

Ran into this on Ubuntu 14.04 and Debian jessie w/ kernel 3.16.x.

Docker command:

docker run -t -i -v /data/sitespeed.io:/sitespeed.io/results company/dockerfiles:sitespeed.io-latest --name "Superbrowse"

This seems like a pretty bad issue...

MrMMorris · 2015-03-17T19:57:32Z

@jbalonso even with 3.13.0-32-generic I get the error after only a few successful runs 😭

rsampaio · 2015-03-17T20:09:12Z

@MrMMorris could you share a reproducer script using public available images?

unclejack · 2015-03-18T15:38:14Z

Everyone who's seeing this error on their system is running a package of the Linux kernel on their distribution that's far too old and lacks the fixes for this particular problem.

If you run into this problem, make sure you run apt-get update && apt-get dist-upgrade -y and reboot your system. If you're on Digital Ocean, you also need to select the kernel version which was just installed during the update because they don't use the latest kernel automatically (see https://digitalocean.uservoice.com/forums/136585-digitalocean/suggestions/2814988-give-option-to-use-the-droplet-s-own-bootloader).

CentOS/RHEL/Fedora/Scientific Linux users need to keep their systems updated using yum update and reboot after installing the updates.

When reporting this problem, please make sure your system is fully patched and up to date with the latest stable updates (no manually installed experimental/testing/alpha/beta/rc packages) provided by your distribution's vendor.

MrMMorris · 2015-03-18T17:45:39Z

@unclejack

I ran apt-get update && apt-get dist-upgrade -y

ubuntu 14.04 3.13.0-46-generic

Still get the error after only one docker run

I can create an AMI for reproducing if needed

unclejack · 2015-03-18T17:54:13Z

@MrMMorris Thank you for confirming it's still a problem with the latest kernel package on Ubuntu 14.04.

MrMMorris · 2015-03-18T18:03:27Z

Anything else I can do to help, let me know! 😄

rsampaio · 2015-03-18T18:23:44Z

@MrMMorris if you can provide a reproducer there is a bug opened for Ubuntu and it will be much appreciated: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1403152

MrMMorris · 2015-03-18T18:48:14Z

@rsampaio if I have time today, I will definitely get that for you!

fxposter · 2015-03-23T14:35:21Z

This problem also appears on 3.16(.7) on both Debian 7 and Debian 8: #9605 (comment). Rebooting the server is the only way to fix this for now.

chrisjstevenson · 2015-04-27T22:33:13Z

Seeing this issue on RHEL 6.6 with kernel 2.6.32-504.8.1.el6.x86_64 when starting some docker containers (not all containers)
kernel:unregister_netdevice: waiting for lo to become free. Usage count = -1

Again, rebooting the server seems to be the only solution at this time

popsikle · 2015-05-12T15:09:11Z

Also seeing this on CoreOS (647.0.0) with kernel 3.19.3.

Rebooting is also the only solution I have found.

fxposter · 2015-05-20T13:49:26Z

Tested Debian jessie with sid's kernel (4.0.2) - the problem remains.

popsikle · 2015-06-19T16:00:11Z

Anyone seeing this issue running non-ubuntu containers?

fxposter · 2015-06-19T16:05:02Z

Yes. Debian ones.
19 июня 2015 г. 19:01 пользователь "popsikle" notifications@github.com
написал:

Anyone seeing this issue running non-ubuntu containers?

—
Reply to this email directly or view it on GitHub
#5618 (comment).

SuperSandro2000 · 2020-09-06T12:22:30Z

@steelcowboy You can configure rsyslog to only void those annoying messages instead of all emergencies which is more desirable.

I wrote the following into /etc/rsyslog.d/40-unreigster-netdevice.conf and restarted rsyslog systemctl restart rsyslog.

# match frequent not relevant emergency messages generated by Docker when transfering large amounts of data through the network
:msg,contains,"unregister_netdevice: waiting for lo to become free. Usage count = 1" /dev/null

# discard matching messages
& stop

hedza06 · 2020-10-20T21:12:23Z

Any news here?

StruggleYang · 2020-12-29T01:43:03Z

kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

Any news here?

w-simon · 2020-12-29T02:10:56Z

kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

Any news here?

This patch has fixed this problem:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ee60ad219f5c7c4fb2f047f88037770063ef785f

We have analyzed in the following link, and this problem could also be reproduced:
https://github.com/w-simon/kernel_debug_notes/blob/master/reproduce_a_leaked_dst_entry

Xyaren · 2021-02-05T23:51:16Z

This is still happening for me on kernel Ubuntu 5.8.0-41.46-generic 5.8.18

rseffner · 2021-05-21T07:28:19Z

For me this first happening at going from kernel 5.10.37 to 5.10.38 with debian 10.9 amd64 on different machines.

tsjk · 2021-05-22T16:19:16Z

I saw this for the first time on a Gentoo system with kernel v5.4.120, just upgraded from kernel v5.4.117. Kernel sources used: sys-kernel/gentoo-sources.

I get

unregister_netdevice: waiting for ip6_vti0 to become free. Usage count = 1

every 10 seconds or so.

rantala · 2021-05-24T10:31:21Z

I saw this for the first time on a Gentoo system with kernel v5.4.120, just upgraded from kernel v5.4.117. Kernel sources used: sys-kernel/gentoo-sources.

I get
unregister_netdevice: waiting for ip6_vti0 to become free. Usage count = 1
every 10 seconds or so.

Hi, this regression was introduced in 5.4.120, and is fixed in 5.4.121.

nivseg2 · 2021-10-06T05:41:02Z

Hi, this regression was introduced in 5.4.120, and is fixed in 5.4.121.

Hello, do you have any more info on which specific commits introduced
and fixed the issue as mentioned above?

rantala · 2021-10-06T06:36:04Z

Hi, this regression was introduced in 5.4.120, and is fixed in 5.4.121.

Hello, do you have any more info on which specific commits introduced
and fixed the issue as mentioned above?

See the commits authored by Eric Dumazet:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=v5.4.121

It was specifically about ip6_vti interfaces: 5.4.120 added https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.4.121&id=98ebeb87b2cf26663270e8e656fe599a32e4c96d which introduced the regression.

(If I remember right, same issue was seen in some other stable/LTS kernel versions as well.)

nivseg2 · 2021-10-06T08:03:38Z

Ah, thanks!

truman369 · 2021-12-21T12:39:00Z

I have the same issue on kernels 5.10.70 and 5.14.9 This happens when I restart ipv6 containers.
Does anyone know solutions except of suppressing the output?

tconrado · 2021-12-21T14:31:20Z

I have some reports that Ubuntu 20.04 LTS HWE did not present this issue;

the work around is to never close the namespace, or to guarantee that the namespace is free of networks before closing it

coolljt0725 · 2021-12-22T14:50:29Z

Linux kernel is adding a reference count tracing mechanism, https://lwn.net/ml/netdev/20211205042217.982127-1-eric.dumazet@gmail.com/ Hope with this mechanism, it would be easier to find and fix this kind of reference counting bugs in future.

nivseg2 · 2021-12-23T04:31:30Z

Have been trying to test it.

Unfortunately, I don't have a reproducer yet.

fserve · 2022-01-08T21:28:07Z

I have this same problem, this VM is running docker and ipv6. I have another VMs as this one and dont get this problem, maybe it's related to a running container. It stops after reboot but come back after a few days.

unregister_netdevice: waiting for lo to become free. Usage count = 1

Linux xxx 5.11.0-44-generic #48~20.04.2-Ubuntu SMP Tue Dec 14 15:36:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

linux-headers-generic-hwe-20.04/focal-updates,focal-security,now 5.11.0.44.48~20.04.22 amd64 [installed,automatic]
linux-headers-virtual-hwe-20.04/focal-updates,focal-security,now 5.11.0.44.48~20.04.22 amd64 [installed,automatic]
linux-hwe-5.11-headers-5.11.0-43/focal-updates,focal-updates,focal-security,focal-security,now 5.11.0-43.47~20.04.2 all [installed,automatic]
linux-hwe-5.11-headers-5.11.0-44/focal-updates,focal-updates,focal-security,focal-security,now 5.11.0-44.48~20.04.2 all [installed,automatic]
linux-image-virtual-hwe-20.04/focal-updates,focal-security,now 5.11.0.44.48~20.04.22 amd64 [installed,automatic]
linux-virtual-hwe-20.04/focal-updates,focal-security,now 5.11.0.44.48~20.04.22 amd64 [installed]

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.3 LTS
Release:        20.04
Codename:       focal

docker version
Client:
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.13.8
 Git commit:        20.10.7-0ubuntu5~20.04.2
 Built:             Mon Nov  1 00:34:17 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.7
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.8
  Git commit:       20.10.7-0ubuntu5~20.04.2
  Built:            Fri Oct 22 00:45:53 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.5.5-0ubuntu3~20.04.1
  GitCommit:
 runc:
  Version:          1.0.1-0ubuntu2~20.04.1
  GitCommit:
 docker-init:
  Version:          0.19.0
  GitCommit:

tconrado · 2022-01-08T22:06:41Z

Did u try with hwe release? We did not got this error message since we started to use Ubuntu Focal 20.04 HWE

fserve · 2022-01-11T01:32:11Z

Yes, ubuntu focal 20.04 hwe, as you can see here:

Linux xxx 5.11.0-44-generic #48~20.04.2-Ubuntu SMP Tue Dec 14 15:36:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

linux-headers-generic-hwe-20.04/focal-updates,focal-security,now 5.11.0.44.48~20.04.22 amd64 [installed,automatic]
linux-headers-virtual-hwe-20.04/focal-updates,focal-security,now 5.11.0.44.48~20.04.22 amd64 [installed,automatic]
linux-hwe-5.11-headers-5.11.0-43/focal-updates,focal-updates,focal-security,focal-security,now 5.11.0-43.47~20.04.2 all [installed,automatic]
linux-hwe-5.11-headers-5.11.0-44/focal-updates,focal-updates,focal-security,focal-security,now 5.11.0-44.48~20.04.2 all [installed,automatic]
linux-image-virtual-hwe-20.04/focal-updates,focal-security,now 5.11.0.44.48~20.04.22 amd64 [installed,automatic]
linux-virtual-hwe-20.04/focal-updates,focal-security,now 5.11.0.44.48~20.04.22 amd64 [installed]

tconrado · 2022-01-11T14:37:00Z

Sorry to hear. This is a real problem. I guess that many scenarios lead to this issue. On previous versions, the way we managed that was by preventing the namespace deletion.

truman369 · 2022-01-23T08:45:24Z

It looks like the issue is resolved in kernel 5.15.5. Anyway, I stopped getting error messages after switching to this version.

heyjoke · 2023-04-06T15:02:47Z

Sry to bring this old topic, anyone see that there is huge impact to the other system process at the same time? like SSH was impacted as well. Kernel version is 5.3.18

polarathene · 2023-05-06T07:22:04Z

The last notice (July 2018) advising users to 👍 / subscribe if they don't have helpful information to contribute regarding crashes, is no longer visible.

I think this comment I've put together summarizes the issue well enough for anyone that wants to dig through it. I also have the impression that many of the past discussions have resolved the issue experienced by the majority, along with reproductions shared not creating the failure or log message.

I would suggest closing / locking this issue. 9 years to this day since it was opened. It would be better to create a new issue for anyone else still affected to follow, and where more recent / relevant information can be tracked?

System details

This is not a kernel crash report, but from an attempt to go through over 600 items of this issue, looking for any useful information and reproductions (especially reproductions confirmed by multiple users). Reproduction was not possible.

Info requested:

Kernel: 6.2.0
Distribution: Ubuntu 23.04 (Vultr)
Network: Bridge with IPv4 + IPv6 IP addresses assigned to enp1s0
- NOTE: enp1s0 is created and configured via cloud-init => netplan => systemd-networkd, each veth docker creates for container start/stop triggers a cloud-init udev rule that recreates enp1s0 needlessly which can reset/undo kernel settings like proxy_ndp breaking IPv6 GUA address on other containers. This may have affected some users that reported.

daemon.json

{
  "userland-proxy": false,
  "experimental": true,
  "ipv6": true,
  "fixed-cidr-v6": "fd00:feed:face:f001::/64"
}

docker info

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.10.4
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.17.3
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 343
  Running: 50
  Paused: 0
  Stopped: 293
 Images: 5
 Server Version: 23.0.5
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc io.containerd.runc.v2
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 2806fc1057397dbaeefbea0e4e17bddfbd388f38
 runc version: v1.1.5-0-gf19387a
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.2.0-19-generic
 Operating System: Ubuntu 23.04
 OSType: linux
 Architecture: x86_64
 CPUs: 1
 Total Memory: 952.3MiB
 Name: ipv6-test
 ID: 6027f905-4930-4f94-bdad-1587cbcdf0ef
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Reproductions shared

None of these were reproducible for me. userland-proxy: false and IPv6 docker bridge enabled.

Presumably the issue has been resolved since (as some hint at, like the docker-stress comments, and kernel commits cited in early 2019).

June 2016 (userland-proxy: false + busybox container loop start/stop hundreds of containers):
- kernel crash after "unregister_netdevice: waiting for lo to become free. Usage count = 3" #5618 (comment)
Dec 2015 to Dec 2016 (docker-stress with useland-proxy: false + IPv6 enabled via kernel):
July 2017 (docker-samba-loop git clone):

Cherry-picked comments

May 2016 (Reference, logs from an AWS EC2 instance): kernel crash after "unregister_netdevice: waiting for lo to become free. Usage count = 3" #5618 (comment)
Nov 2016 (GCE VM with NodeJS findings): kernel crash after "unregister_netdevice: waiting for lo to become free. Usage count = 3" #5618 (comment)
Dec 2016 (Docker 1.12+ should no longer freeze daemon when encountered): kernel crash after "unregister_netdevice: waiting for lo to become free. Usage count = 3" #5618 (comment)
Jan 2017 to May 2017 (Technical insights):
Feb 2017 (Kubernetes resolved by avoiding veth "hairpin mode"): kernel crash after "unregister_netdevice: waiting for lo to become free. Usage count = 3" #5618 (comment)
May 2017 (unregister_netdevice message is normal, problem is when it repeats frequently): kernel crash after "unregister_netdevice: waiting for lo to become free. Usage count = 3" #5618 (comment)
July 2017 (technical overview of issue to subscribers of this issue, instructions on info to include with reports): kernel crash after "unregister_netdevice: waiting for lo to become free. Usage count = 3" #5618 (comment)
Aug 2018 (limits and cgroup settings such as from systemd service hinted as potentially relevant if too low, which have been an issue for docker elsewhere in the past and fluctuated in config and systemd updates during these years):
- kernel crash after "unregister_netdevice: waiting for lo to become free. Usage count = 3" #5618 (comment)
- kernel crash after "unregister_netdevice: waiting for lo to become free. Usage count = 3" #5618 (comment)
Jan 2019 (technical insights referencing docker source and kernel): kernel crash after "unregister_netdevice: waiting for lo to become free. Usage count = 3" #5618 (comment)
Nov 2019 (technical insights regarding ref link in kernel shared): kernel crash after "unregister_netdevice: waiting for lo to become free. Usage count = 3" #5618 (comment) (references 5.1 kernel commit as fix, first mentioned here via Apr 2019 comment, indirectly cited again in Dec 2020 comment)
- Oct 2022 (6.0 kernel release): This kernel commit (github equivalent) was for 6.0 kernel released in Oct 2022 and backported to 5.4 (but introduced a regression due to needing another commit backported which says 5.4 kernel >= 5.4.226 should work). From the commit message, it may have been related to reproductions that previously caused the problem, given they relied on creating hundreds of containers and other mentions of memory pressure playing a role in reproducing the condition.

Many other comments cited CentOS or similar systems with very dated kernels most of the time, or did not provide much helpful information. Another bulk appeared to be related to IPv6, and some with UDP / conntrack.

Various fixes related to networking (for IPv6 and UDP) have been made in both Docker and the kernel over this duration. Activity within the issue has also decreased significantly, implying the main causes have been resolved.

neersighted · 2023-07-01T01:37:13Z

I'm going to close this for now as stale; if we ever see a manifestation of this issue with a good reproducer, please open a new issue and link back here.

Thank you very much for the deep dive, @polarathene!

unclejack added the kernel label Jul 16, 2014

popsikle mentioned this issue May 11, 2015

Docker completely stuck - unregister_netdevice: waiting for lo to become free. Usage count = 1 coreos/bugs#254

Closed

thaJeztah mentioned this issue Sep 23, 2020

port mapping disappear after 'intensive' use #8817

Open

nook24 mentioned this issue Jul 9, 2021

docker-proxy generates extremely high system load it-novum/openITCOCKPIT#1186

Closed

webdock-io mentioned this issue Jan 21, 2022

Cannot override a single device config key canonical/lxd#9728

Closed

j0nnymoe mentioned this issue Aug 13, 2022

unregister_netdevice kernel message after restarting container linuxserver/docker-transmission#205

Closed

neersighted closed this as completed Jul 1, 2023

neersighted added the status/more-info-needed label Jul 1, 2023

moby locked as resolved and limited conversation to collaborators Jul 1, 2023

kernel crash after "unregister_netdevice: waiting for lo to become free. Usage count = 3" #5618

kernel crash after "unregister_netdevice: waiting for lo to become free. Usage count = 3" #5618

Comments

tankywoo commented May 6, 2014

drpancake commented May 23, 2014

egasimus commented Jun 4, 2014

egasimus commented Jun 5, 2014

drpancake commented Jun 5, 2014

csabahenk commented Jul 22, 2014

egasimus commented Jul 22, 2014

spiffytech commented Jul 25, 2014

spiffytech commented Jul 25, 2014

egasimus commented Jul 27, 2014

ibuildthecloud commented Dec 22, 2014

sbward commented Dec 23, 2014

jbalonso commented Dec 29, 2014

rsampaio commented Jan 15, 2015

MrMMorris commented Mar 16, 2015

MrMMorris commented Mar 17, 2015

rsampaio commented Mar 17, 2015

unclejack commented Mar 18, 2015

MrMMorris commented Mar 18, 2015

unclejack commented Mar 18, 2015

MrMMorris commented Mar 18, 2015

rsampaio commented Mar 18, 2015

MrMMorris commented Mar 18, 2015

fxposter commented Mar 23, 2015

chrisjstevenson commented Apr 27, 2015

popsikle commented May 12, 2015

fxposter commented May 20, 2015

popsikle commented Jun 19, 2015

fxposter commented Jun 19, 2015

SuperSandro2000 commented Sep 6, 2020

hedza06 commented Oct 20, 2020

StruggleYang commented Dec 29, 2020

w-simon commented Dec 29, 2020

Xyaren commented Feb 5, 2021

rseffner commented May 21, 2021

tsjk commented May 22, 2021 • edited

rantala commented May 24, 2021

nivseg2 commented Oct 6, 2021

rantala commented Oct 6, 2021

nivseg2 commented Oct 6, 2021

truman369 commented Dec 21, 2021 • edited

tconrado commented Dec 21, 2021

coolljt0725 commented Dec 22, 2021

nivseg2 commented Dec 23, 2021

fserve commented Jan 8, 2022 • edited

tconrado commented Jan 8, 2022 via email • edited by thaJeztah

fserve commented Jan 11, 2022 • edited

tconrado commented Jan 11, 2022 via email • edited by thaJeztah

truman369 commented Jan 23, 2022

heyjoke commented Apr 6, 2023

polarathene commented May 6, 2023 • edited

System details

Reproductions shared

Cherry-picked comments

neersighted commented Jul 1, 2023

tsjk commented May 22, 2021 •

edited

truman369 commented Dec 21, 2021 •

edited

fserve commented Jan 8, 2022 •

edited

tconrado commented Jan 8, 2022 via email •

edited by thaJeztah

fserve commented Jan 11, 2022 •

edited

tconrado commented Jan 11, 2022 via email •

edited by thaJeztah

polarathene commented May 6, 2023 •

edited