Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel crash after "unregister_netdevice: waiting for lo to become free. Usage count = 3" #5618

Closed
tankywoo opened this issue May 6, 2014 · 539 comments

Comments

@tankywoo
Copy link

tankywoo commented May 6, 2014

This happens when I login the container, and can't quit by Ctrl-c.

My system is Ubuntu 12.04, kernel is 3.8.0-25-generic.

docker version:

root@wutq-docker:~# docker version
Client version: 0.10.0
Client API version: 1.10
Go version (client): go1.2.1
Git commit (client): dc9c28f
Server version: 0.10.0
Server API version: 1.10
Git commit (server): dc9c28f
Go version (server): go1.2.1
Last stable version: 0.10.0

I have used the script https://raw.githubusercontent.com/dotcloud/docker/master/contrib/check-config.sh to check, and all right.

I watch the syslog and found this message:

May  6 11:30:33 wutq-docker kernel: [62365.889369] unregister_netdevice: waiting for lo to become free. Usage count = 3
May  6 11:30:44 wutq-docker kernel: [62376.108277] unregister_netdevice: waiting for lo to become free. Usage count = 3
May  6 11:30:54 wutq-docker kernel: [62386.327156] unregister_netdevice: waiting for lo to become free. Usage count = 3
May  6 11:31:02 wutq-docker kernel: [62394.423920] INFO: task docker:1024 blocked for more than 120 seconds.
May  6 11:31:02 wutq-docker kernel: [62394.424175] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May  6 11:31:02 wutq-docker kernel: [62394.424505] docker          D 0000000000000001     0  1024      1 0x00000004
May  6 11:31:02 wutq-docker kernel: [62394.424511]  ffff880077793cb0 0000000000000082 ffffffffffffff04 ffffffff816df509
May  6 11:31:02 wutq-docker kernel: [62394.424517]  ffff880077793fd8 ffff880077793fd8 ffff880077793fd8 0000000000013f40
May  6 11:31:02 wutq-docker kernel: [62394.424521]  ffff88007c461740 ffff880076b1dd00 000080d081f06880 ffffffff81cbbda0
May  6 11:31:02 wutq-docker kernel: [62394.424526] Call Trace:                                                         
May  6 11:31:02 wutq-docker kernel: [62394.424668]  [<ffffffff816df509>] ? __slab_alloc+0x28a/0x2b2
May  6 11:31:02 wutq-docker kernel: [62394.424700]  [<ffffffff816f1849>] schedule+0x29/0x70
May  6 11:31:02 wutq-docker kernel: [62394.424705]  [<ffffffff816f1afe>] schedule_preempt_disabled+0xe/0x10
May  6 11:31:02 wutq-docker kernel: [62394.424710]  [<ffffffff816f0777>] __mutex_lock_slowpath+0xd7/0x150
May  6 11:31:02 wutq-docker kernel: [62394.424715]  [<ffffffff815dc809>] ? copy_net_ns+0x69/0x130
May  6 11:31:02 wutq-docker kernel: [62394.424719]  [<ffffffff815dc0b1>] ? net_alloc_generic+0x21/0x30
May  6 11:31:02 wutq-docker kernel: [62394.424724]  [<ffffffff816f038a>] mutex_lock+0x2a/0x50
May  6 11:31:02 wutq-docker kernel: [62394.424727]  [<ffffffff815dc82c>] copy_net_ns+0x8c/0x130
May  6 11:31:02 wutq-docker kernel: [62394.424733]  [<ffffffff81084851>] create_new_namespaces+0x101/0x1b0
May  6 11:31:02 wutq-docker kernel: [62394.424737]  [<ffffffff81084a33>] copy_namespaces+0xa3/0xe0
May  6 11:31:02 wutq-docker kernel: [62394.424742]  [<ffffffff81057a60>] ? dup_mm+0x140/0x240
May  6 11:31:02 wutq-docker kernel: [62394.424746]  [<ffffffff81058294>] copy_process.part.22+0x6f4/0xe60
May  6 11:31:02 wutq-docker kernel: [62394.424752]  [<ffffffff812da406>] ? security_file_alloc+0x16/0x20
May  6 11:31:02 wutq-docker kernel: [62394.424758]  [<ffffffff8119d118>] ? get_empty_filp+0x88/0x180
May  6 11:31:02 wutq-docker kernel: [62394.424762]  [<ffffffff81058a80>] copy_process+0x80/0x90
May  6 11:31:02 wutq-docker kernel: [62394.424766]  [<ffffffff81058b7c>] do_fork+0x9c/0x230
May  6 11:31:02 wutq-docker kernel: [62394.424769]  [<ffffffff816f277e>] ? _raw_spin_lock+0xe/0x20
May  6 11:31:02 wutq-docker kernel: [62394.424774]  [<ffffffff811b9185>] ? __fd_install+0x55/0x70
May  6 11:31:02 wutq-docker kernel: [62394.424777]  [<ffffffff81058d96>] sys_clone+0x16/0x20
May  6 11:31:02 wutq-docker kernel: [62394.424782]  [<ffffffff816fb939>] stub_clone+0x69/0x90
May  6 11:31:02 wutq-docker kernel: [62394.424786]  [<ffffffff816fb5dd>] ? system_call_fastpath+0x1a/0x1f
May  6 11:31:04 wutq-docker kernel: [62396.466223] unregister_netdevice: waiting for lo to become free. Usage count = 3
May  6 11:31:14 wutq-docker kernel: [62406.689132] unregister_netdevice: waiting for lo to become free. Usage count = 3
May  6 11:31:25 wutq-docker kernel: [62416.908036] unregister_netdevice: waiting for lo to become free. Usage count = 3
May  6 11:31:35 wutq-docker kernel: [62427.126927] unregister_netdevice: waiting for lo to become free. Usage count = 3
May  6 11:31:45 wutq-docker kernel: [62437.345860] unregister_netdevice: waiting for lo to become free. Usage count = 3

After happend this, I open another terminal and kill this process, and then restart docker, but this will be hanged.

I reboot the host, and it still display that messages for some minutes when shutdown:
screen shot 2014-05-06 at 11 49 27

@drpancake
Copy link

I'm seeing a very similar issue for eth0. Ubuntu 12.04 also.

I have to power cycle the machine. From /var/log/kern.log:

May 22 19:26:08 box kernel: [596765.670275] device veth5070 entered promiscuous mode
May 22 19:26:08 box kernel: [596765.680630] IPv6: ADDRCONF(NETDEV_UP): veth5070: link is not ready
May 22 19:26:08 box kernel: [596765.700561] IPv6: ADDRCONF(NETDEV_CHANGE): veth5070: link becomes ready
May 22 19:26:08 box kernel: [596765.700628] docker0: port 7(veth5070) entered forwarding state
May 22 19:26:08 box kernel: [596765.700638] docker0: port 7(veth5070) entered forwarding state
May 22 19:26:19 box kernel: [596777.386084] [FW DBLOCK] IN=docker0 OUT= PHYSIN=veth5070 MAC=56:84:7a:fe:97:99:9e:df:a7:3f:23:42:08:00 SRC=172.17.0.8 DST=172.17.42.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=170 DF PROTO=TCP SPT=51615 DPT=13162 WINDOW=14600 RES=0x00 SYN URGP=0
May 22 19:26:21 box kernel: [596779.371993] [FW DBLOCK] IN=docker0 OUT= PHYSIN=veth5070 MAC=56:84:7a:fe:97:99:9e:df:a7:3f:23:42:08:00 SRC=172.17.0.8 DST=172.17.42.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=549 DF PROTO=TCP SPT=46878 DPT=12518 WINDOW=14600 RES=0x00 SYN URGP=0
May 22 19:26:23 box kernel: [596780.704031] docker0: port 7(veth5070) entered forwarding state
May 22 19:27:13 box kernel: [596831.359999] docker0: port 7(veth5070) entered disabled state
May 22 19:27:13 box kernel: [596831.361329] device veth5070 left promiscuous mode
May 22 19:27:13 box kernel: [596831.361333] docker0: port 7(veth5070) entered disabled state
May 22 19:27:24 box kernel: [596841.516039] unregister_netdevice: waiting for eth0 to become free. Usage count = 1
May 22 19:27:34 box kernel: [596851.756060] unregister_netdevice: waiting for eth0 to become free. Usage count = 1
May 22 19:27:44 box kernel: [596861.772101] unregister_netdevice: waiting for eth0 to become free. Usage count = 1

@egasimus
Copy link

egasimus commented Jun 4, 2014

Hey, this just started happening for me as well.

Docker version:

Client version: 0.11.1
Client API version: 1.11
Go version (client): go1.2.1
Git commit (client): fb99f99
Server version: 0.11.1
Server API version: 1.11
Git commit (server): fb99f99
Go version (server): go1.2.1
Last stable version: 0.11.1

Kernel log: http://pastebin.com/TubCy1tG

System details:
Running Ubuntu 14.04 LTS with patched kernel (3.14.3-rt4). Yet to see it happen with the default linux-3.13.0-27-generic kernel. What's funny, though, is that when this happens, all my terminal windows freeze, letting me type a few characters at most before that. The same fate befalls any new ones I open, too - and I end up needing to power cycle my poor laptop just like the good doctor above. For the record, I'm running fish shell in urxvt or xterm in xmonad. Haven't checked if it affects plain bash.

@egasimus
Copy link

egasimus commented Jun 5, 2014

This might be relevant:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1065434#yui_3_10_3_1_1401948176063_2050

Copying a fairly large amount of data over the network inside a container
and then exiting the container can trigger a missing decrement in the per
cpu reference count on a network device.

Sure enough, one of the times this happened for me was right after apt-getting a package with a ton of dependencies.

@drpancake
Copy link

Upgrading from Ubuntu 12.04.3 to 14.04 fixed this for me without any other changes.

@csabahenk
Copy link

I experience this on RHEL7, 3.10.0-123.4.2.el7.x86_64

@egasimus
Copy link

I've noticed the same thing happening with my VirtualBox virtual network interfaces when I'm running 3.14-rt4. It's supposed to be fixed in vanilla 3.13 or something.

@spiffytech
Copy link

@egasimus Same here - I pulled in hundreds of MB of data before killing the container, then got this error.

@spiffytech
Copy link

I upgraded to Debian kernel 3.14 and the problem appears to have gone away. Looks like the problem existed in some kernels < 3.5, was fixed in 3.5, regressed in 3.6, and was patched in something 3.12-3.14. https://bugzilla.redhat.com/show_bug.cgi?id=880394

@egasimus
Copy link

@spiffytech Do you have any idea where I can report this regarding the realtime kernel flavour? I think they're only releasing a RT patch for every other version, and would really hate to see 3.16-rt come out with this still broken. :/

EDIT: Filed it at kernel.org.

@ibuildthecloud
Copy link
Contributor

I'm getting this on Ubuntu 14.10 running a 3.18.1. Kernel log shows

Dec 21 22:49:31 inotmac kernel: [15225.866600] unregister_netdevice: waiting for lo to become free. Usage count = 2
Dec 21 22:49:40 inotmac kernel: [15235.179263] INFO: task docker:19599 blocked for more than 120 seconds.
Dec 21 22:49:40 inotmac kernel: [15235.179268]       Tainted: G           OE  3.18.1-031801-generic #201412170637
Dec 21 22:49:40 inotmac kernel: [15235.179269] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 21 22:49:40 inotmac kernel: [15235.179271] docker          D 0000000000000001     0 19599      1 0x00000000
Dec 21 22:49:40 inotmac kernel: [15235.179275]  ffff8802082abcc0 0000000000000086 ffff880235c3b700 00000000ffffffff
Dec 21 22:49:40 inotmac kernel: [15235.179277]  ffff8802082abfd8 0000000000013640 ffff8800288f2300 0000000000013640
Dec 21 22:49:40 inotmac kernel: [15235.179280]  ffff880232cf0000 ffff8801a467c600 ffffffff81f9d4b8 ffffffff81cd9c60
Dec 21 22:49:40 inotmac kernel: [15235.179282] Call Trace:
Dec 21 22:49:40 inotmac kernel: [15235.179289]  [<ffffffff817af549>] schedule+0x29/0x70
Dec 21 22:49:40 inotmac kernel: [15235.179292]  [<ffffffff817af88e>] schedule_preempt_disabled+0xe/0x10
Dec 21 22:49:40 inotmac kernel: [15235.179296]  [<ffffffff817b1545>] __mutex_lock_slowpath+0x95/0x100
Dec 21 22:49:40 inotmac kernel: [15235.179299]  [<ffffffff8168d5c9>] ? copy_net_ns+0x69/0x150
Dec 21 22:49:40 inotmac kernel: [15235.179302]  [<ffffffff817b15d3>] mutex_lock+0x23/0x37
Dec 21 22:49:40 inotmac kernel: [15235.179305]  [<ffffffff8168d5f8>] copy_net_ns+0x98/0x150
Dec 21 22:49:40 inotmac kernel: [15235.179308]  [<ffffffff810941f1>] create_new_namespaces+0x101/0x1b0
Dec 21 22:49:40 inotmac kernel: [15235.179311]  [<ffffffff8109432b>] copy_namespaces+0x8b/0xa0
Dec 21 22:49:40 inotmac kernel: [15235.179315]  [<ffffffff81073458>] copy_process.part.28+0x828/0xed0
Dec 21 22:49:40 inotmac kernel: [15235.179318]  [<ffffffff811f157f>] ? get_empty_filp+0xcf/0x1c0
Dec 21 22:49:40 inotmac kernel: [15235.179320]  [<ffffffff81073b80>] copy_process+0x80/0x90
Dec 21 22:49:40 inotmac kernel: [15235.179323]  [<ffffffff81073ca2>] do_fork+0x62/0x280
Dec 21 22:49:40 inotmac kernel: [15235.179326]  [<ffffffff8120cfc0>] ? get_unused_fd_flags+0x30/0x40
Dec 21 22:49:40 inotmac kernel: [15235.179329]  [<ffffffff8120d028>] ? __fd_install+0x58/0x70
Dec 21 22:49:40 inotmac kernel: [15235.179331]  [<ffffffff81073f46>] SyS_clone+0x16/0x20
Dec 21 22:49:40 inotmac kernel: [15235.179334]  [<ffffffff817b3ab9>] stub_clone+0x69/0x90
Dec 21 22:49:40 inotmac kernel: [15235.179336]  [<ffffffff817b376d>] ? system_call_fastpath+0x16/0x1b
Dec 21 22:49:41 inotmac kernel: [15235.950976] unregister_netdevice: waiting for lo to become free. Usage count = 2
Dec 21 22:49:51 inotmac kernel: [15246.059346] unregister_netdevice: waiting for lo to become free. Usage count = 2

I'll send docker version/info once the system isn't frozen anymore :)

@sbward
Copy link

sbward commented Dec 23, 2014

We're seeing this issue as well. Ubuntu 14.04, 3.13.0-37-generic

@jbalonso
Copy link

On Ubuntu 14.04 server, my team has found that downgrading from 3.13.0-40-generic to 3.13.0-32-generic "resolves" the issue. Given @sbward's observation, that would put the regression after 3.13.0-32-generic and before (or including) 3.13.0-37-generic.

I'll add that, in our case, we sometimes see a negative usage count.

@rsampaio
Copy link
Contributor

FWIW we hit this bug running lxc on trusty kernel (3.13.0-40-generic #69-Ubuntu) the message appears in dmesg followed by this stacktrace:

[27211131.602869] INFO: task lxc-start:26342 blocked for more than 120 seconds.
[27211131.602874]       Not tainted 3.13.0-40-generic #69-Ubuntu
[27211131.602877] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27211131.602881] lxc-start       D 0000000000000001     0 26342      1 0x00000080
[27211131.602883]  ffff88000d001d40 0000000000000282 ffff88001aa21800 ffff88000d001fd8
[27211131.602886]  0000000000014480 0000000000014480 ffff88001aa21800 ffffffff81cdb760
[27211131.602888]  ffffffff81cdb764 ffff88001aa21800 00000000ffffffff ffffffff81cdb768
[27211131.602891] Call Trace:
[27211131.602894]  [<ffffffff81723b69>] schedule_preempt_disabled+0x29/0x70
[27211131.602897]  [<ffffffff817259d5>] __mutex_lock_slowpath+0x135/0x1b0
[27211131.602900]  [<ffffffff811a2679>] ? __kmalloc+0x1e9/0x230
[27211131.602903]  [<ffffffff81725a6f>] mutex_lock+0x1f/0x2f
[27211131.602905]  [<ffffffff8161c2c1>] copy_net_ns+0x71/0x130
[27211131.602908]  [<ffffffff8108f889>] create_new_namespaces+0xf9/0x180
[27211131.602910]  [<ffffffff8108f983>] copy_namespaces+0x73/0xa0
[27211131.602912]  [<ffffffff81065b16>] copy_process.part.26+0x9a6/0x16b0
[27211131.602915]  [<ffffffff810669f5>] do_fork+0xd5/0x340
[27211131.602917]  [<ffffffff810c8e8d>] ? call_rcu_sched+0x1d/0x20
[27211131.602919]  [<ffffffff81066ce6>] SyS_clone+0x16/0x20
[27211131.602921]  [<ffffffff81730089>] stub_clone+0x69/0x90
[27211131.602923]  [<ffffffff8172fd2d>] ? system_call_fastpath+0x1a/0x1f

@MrMMorris
Copy link

Ran into this on Ubuntu 14.04 and Debian jessie w/ kernel 3.16.x.

Docker command:

docker run -t -i -v /data/sitespeed.io:/sitespeed.io/results company/dockerfiles:sitespeed.io-latest --name "Superbrowse"

This seems like a pretty bad issue...

@MrMMorris
Copy link

@jbalonso even with 3.13.0-32-generic I get the error after only a few successful runs 😭

@rsampaio
Copy link
Contributor

@MrMMorris could you share a reproducer script using public available images?

@unclejack
Copy link
Contributor

Everyone who's seeing this error on their system is running a package of the Linux kernel on their distribution that's far too old and lacks the fixes for this particular problem.

If you run into this problem, make sure you run apt-get update && apt-get dist-upgrade -y and reboot your system. If you're on Digital Ocean, you also need to select the kernel version which was just installed during the update because they don't use the latest kernel automatically (see https://digitalocean.uservoice.com/forums/136585-digitalocean/suggestions/2814988-give-option-to-use-the-droplet-s-own-bootloader).

CentOS/RHEL/Fedora/Scientific Linux users need to keep their systems updated using yum update and reboot after installing the updates.

When reporting this problem, please make sure your system is fully patched and up to date with the latest stable updates (no manually installed experimental/testing/alpha/beta/rc packages) provided by your distribution's vendor.

@MrMMorris
Copy link

@unclejack

I ran apt-get update && apt-get dist-upgrade -y

ubuntu 14.04 3.13.0-46-generic

Still get the error after only one docker run

I can create an AMI for reproducing if needed

@unclejack
Copy link
Contributor

@MrMMorris Thank you for confirming it's still a problem with the latest kernel package on Ubuntu 14.04.

@MrMMorris
Copy link

Anything else I can do to help, let me know! 😄

@rsampaio
Copy link
Contributor

@MrMMorris if you can provide a reproducer there is a bug opened for Ubuntu and it will be much appreciated: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1403152

@MrMMorris
Copy link

@rsampaio if I have time today, I will definitely get that for you!

@fxposter
Copy link

This problem also appears on 3.16(.7) on both Debian 7 and Debian 8: #9605 (comment). Rebooting the server is the only way to fix this for now.

@chrisjstevenson
Copy link

Seeing this issue on RHEL 6.6 with kernel 2.6.32-504.8.1.el6.x86_64 when starting some docker containers (not all containers)
kernel:unregister_netdevice: waiting for lo to become free. Usage count = -1

Again, rebooting the server seems to be the only solution at this time

@popsikle
Copy link

Also seeing this on CoreOS (647.0.0) with kernel 3.19.3.

Rebooting is also the only solution I have found.

@fxposter
Copy link

Tested Debian jessie with sid's kernel (4.0.2) - the problem remains.

@popsikle
Copy link

Anyone seeing this issue running non-ubuntu containers?

@fxposter
Copy link

Yes. Debian ones.
19 июня 2015 г. 19:01 пользователь "popsikle" notifications@github.com
написал:

Anyone seeing this issue running non-ubuntu containers?


Reply to this email directly or view it on GitHub
#5618 (comment).

@SuperSandro2000
Copy link

@steelcowboy You can configure rsyslog to only void those annoying messages instead of all emergencies which is more desirable.

I wrote the following into /etc/rsyslog.d/40-unreigster-netdevice.conf and restarted rsyslog systemctl restart rsyslog.

# match frequent not relevant emergency messages generated by Docker when transfering large amounts of data through the network
:msg,contains,"unregister_netdevice: waiting for lo to become free. Usage count = 1" /dev/null

# discard matching messages
& stop

@hedza06
Copy link

hedza06 commented Oct 20, 2020

Any news here?

@StruggleYang
Copy link

kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

Any news here?

@w-simon
Copy link

w-simon commented Dec 29, 2020

kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

Any news here?

This patch has fixed this problem:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ee60ad219f5c7c4fb2f047f88037770063ef785f

We have analyzed in the following link, and this problem could also be reproduced:
https://github.com/w-simon/kernel_debug_notes/blob/master/reproduce_a_leaked_dst_entry

@Xyaren
Copy link

Xyaren commented Feb 5, 2021

This is still happening for me on kernel Ubuntu 5.8.0-41.46-generic 5.8.18

@rseffner
Copy link

For me this first happening at going from kernel 5.10.37 to 5.10.38 with debian 10.9 amd64 on different machines.

@tsjk
Copy link

tsjk commented May 22, 2021

I saw this for the first time on a Gentoo system with kernel v5.4.120, just upgraded from kernel v5.4.117. Kernel sources used: sys-kernel/gentoo-sources.

I get

unregister_netdevice: waiting for ip6_vti0 to become free. Usage count = 1

every 10 seconds or so.

@rantala
Copy link

rantala commented May 24, 2021

I saw this for the first time on a Gentoo system with kernel v5.4.120, just upgraded from kernel v5.4.117. Kernel sources used: sys-kernel/gentoo-sources.

I get

unregister_netdevice: waiting for ip6_vti0 to become free. Usage count = 1

every 10 seconds or so.

Hi, this regression was introduced in 5.4.120, and is fixed in 5.4.121.

@nivseg2
Copy link

nivseg2 commented Oct 6, 2021

Hi, this regression was introduced in 5.4.120, and is fixed in 5.4.121.

Hello, do you have any more info on which specific commits introduced
and fixed the issue as mentioned above?

@rantala
Copy link

rantala commented Oct 6, 2021

Hi, this regression was introduced in 5.4.120, and is fixed in 5.4.121.

Hello, do you have any more info on which specific commits introduced
and fixed the issue as mentioned above?

See the commits authored by Eric Dumazet:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=v5.4.121

It was specifically about ip6_vti interfaces: 5.4.120 added https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.4.121&id=98ebeb87b2cf26663270e8e656fe599a32e4c96d which introduced the regression.

(If I remember right, same issue was seen in some other stable/LTS kernel versions as well.)

@nivseg2
Copy link

nivseg2 commented Oct 6, 2021

Ah, thanks!

@truman369
Copy link

truman369 commented Dec 21, 2021

I have the same issue on kernels 5.10.70 and 5.14.9 This happens when I restart ipv6 containers.
Does anyone know solutions except of suppressing the output?

@tconrado
Copy link

I have some reports that Ubuntu 20.04 LTS HWE did not present this issue;

the work around is to never close the namespace, or to guarantee that the namespace is free of networks before closing it

@coolljt0725
Copy link
Contributor

Linux kernel is adding a reference count tracing mechanism, https://lwn.net/ml/netdev/20211205042217.982127-1-eric.dumazet@gmail.com/ Hope with this mechanism, it would be easier to find and fix this kind of reference counting bugs in future.

@nivseg2
Copy link

nivseg2 commented Dec 23, 2021

Have been trying to test it.

Unfortunately, I don't have a reproducer yet.

@fserve
Copy link

fserve commented Jan 8, 2022

I have this same problem, this VM is running docker and ipv6. I have another VMs as this one and dont get this problem, maybe it's related to a running container. It stops after reboot but come back after a few days.

unregister_netdevice: waiting for lo to become free. Usage count = 1

Linux xxx 5.11.0-44-generic #48~20.04.2-Ubuntu SMP Tue Dec 14 15:36:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

linux-headers-generic-hwe-20.04/focal-updates,focal-security,now 5.11.0.44.48~20.04.22 amd64 [installed,automatic]
linux-headers-virtual-hwe-20.04/focal-updates,focal-security,now 5.11.0.44.48~20.04.22 amd64 [installed,automatic]
linux-hwe-5.11-headers-5.11.0-43/focal-updates,focal-updates,focal-security,focal-security,now 5.11.0-43.47~20.04.2 all [installed,automatic]
linux-hwe-5.11-headers-5.11.0-44/focal-updates,focal-updates,focal-security,focal-security,now 5.11.0-44.48~20.04.2 all [installed,automatic]
linux-image-virtual-hwe-20.04/focal-updates,focal-security,now 5.11.0.44.48~20.04.22 amd64 [installed,automatic]
linux-virtual-hwe-20.04/focal-updates,focal-security,now 5.11.0.44.48~20.04.22 amd64 [installed]

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.3 LTS
Release:        20.04
Codename:       focal

docker version
Client:
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.13.8
 Git commit:        20.10.7-0ubuntu5~20.04.2
 Built:             Mon Nov  1 00:34:17 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.7
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.8
  Git commit:       20.10.7-0ubuntu5~20.04.2
  Built:            Fri Oct 22 00:45:53 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.5.5-0ubuntu3~20.04.1
  GitCommit:
 runc:
  Version:          1.0.1-0ubuntu2~20.04.1
  GitCommit:
 docker-init:
  Version:          0.19.0
  GitCommit:

@tconrado
Copy link

tconrado commented Jan 8, 2022 via email

@fserve
Copy link

fserve commented Jan 11, 2022

Yes, ubuntu focal 20.04 hwe, as you can see here:

Linux xxx 5.11.0-44-generic #48~20.04.2-Ubuntu SMP Tue Dec 14 15:36:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

linux-headers-generic-hwe-20.04/focal-updates,focal-security,now 5.11.0.44.48~20.04.22 amd64 [installed,automatic]
linux-headers-virtual-hwe-20.04/focal-updates,focal-security,now 5.11.0.44.48~20.04.22 amd64 [installed,automatic]
linux-hwe-5.11-headers-5.11.0-43/focal-updates,focal-updates,focal-security,focal-security,now 5.11.0-43.47~20.04.2 all [installed,automatic]
linux-hwe-5.11-headers-5.11.0-44/focal-updates,focal-updates,focal-security,focal-security,now 5.11.0-44.48~20.04.2 all [installed,automatic]
linux-image-virtual-hwe-20.04/focal-updates,focal-security,now 5.11.0.44.48~20.04.22 amd64 [installed,automatic]
linux-virtual-hwe-20.04/focal-updates,focal-security,now 5.11.0.44.48~20.04.22 amd64 [installed]

@tconrado
Copy link

tconrado commented Jan 11, 2022 via email

@truman369
Copy link

It looks like the issue is resolved in kernel 5.15.5. Anyway, I stopped getting error messages after switching to this version.

@heyjoke
Copy link

heyjoke commented Apr 6, 2023

Sry to bring this old topic, anyone see that there is huge impact to the other system process at the same time? like SSH was impacted as well. Kernel version is 5.3.18

@polarathene
Copy link
Contributor

polarathene commented May 6, 2023

The last notice (July 2018) advising users to 👍 / subscribe if they don't have helpful information to contribute regarding crashes, is no longer visible.

I think this comment I've put together summarizes the issue well enough for anyone that wants to dig through it. I also have the impression that many of the past discussions have resolved the issue experienced by the majority, along with reproductions shared not creating the failure or log message.

I would suggest closing / locking this issue. 9 years to this day since it was opened. It would be better to create a new issue for anyone else still affected to follow, and where more recent / relevant information can be tracked?

System details

This is not a kernel crash report, but from an attempt to go through over 600 items of this issue, looking for any useful information and reproductions (especially reproductions confirmed by multiple users). Reproduction was not possible.

Info requested:

  • Kernel: 6.2.0
  • Distribution: Ubuntu 23.04 (Vultr)
  • Network: Bridge with IPv4 + IPv6 IP addresses assigned to enp1s0
    • NOTE: enp1s0 is created and configured via cloud-init => netplan => systemd-networkd, each veth docker creates for container start/stop triggers a cloud-init udev rule that recreates enp1s0 needlessly which can reset/undo kernel settings like proxy_ndp breaking IPv6 GUA address on other containers. This may have affected some users that reported.
daemon.json
{
  "userland-proxy": false,
  "experimental": true,
  "ipv6": true,
  "fixed-cidr-v6": "fd00:feed:face:f001::/64"
}
docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.10.4
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.17.3
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 343
  Running: 50
  Paused: 0
  Stopped: 293
 Images: 5
 Server Version: 23.0.5
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc io.containerd.runc.v2
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 2806fc1057397dbaeefbea0e4e17bddfbd388f38
 runc version: v1.1.5-0-gf19387a
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.2.0-19-generic
 Operating System: Ubuntu 23.04
 OSType: linux
 Architecture: x86_64
 CPUs: 1
 Total Memory: 952.3MiB
 Name: ipv6-test
 ID: 6027f905-4930-4f94-bdad-1587cbcdf0ef
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Reproductions shared

None of these were reproducible for me. userland-proxy: false and IPv6 docker bridge enabled.

Presumably the issue has been resolved since (as some hint at, like the docker-stress comments, and kernel commits cited in early 2019).

Cherry-picked comments

Many other comments cited CentOS or similar systems with very dated kernels most of the time, or did not provide much helpful information. Another bulk appeared to be related to IPv6, and some with UDP / conntrack.

Various fixes related to networking (for IPv6 and UDP) have been made in both Docker and the kernel over this duration. Activity within the issue has also decreased significantly, implying the main causes have been resolved.

@neersighted
Copy link
Member

I'm going to close this for now as stale; if we ever see a manifestation of this issue with a good reproducer, please open a new issue and link back here.

Thank you very much for the deep dive, @polarathene!

@moby moby locked as resolved and limited conversation to collaborators Jul 1, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests