New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kernel crash after "unregister_netdevice: waiting for lo to become free. Usage count = 3" #5618
Comments
I'm seeing a very similar issue for eth0. Ubuntu 12.04 also. I have to power cycle the machine. From
|
Hey, this just started happening for me as well. Docker version:
Kernel log: http://pastebin.com/TubCy1tG System details: |
This might be relevant:
Sure enough, one of the times this happened for me was right after |
Upgrading from Ubuntu 12.04.3 to 14.04 fixed this for me without any other changes. |
I experience this on RHEL7, 3.10.0-123.4.2.el7.x86_64 |
I've noticed the same thing happening with my VirtualBox virtual network interfaces when I'm running 3.14-rt4. It's supposed to be fixed in vanilla 3.13 or something. |
@egasimus Same here - I pulled in hundreds of MB of data before killing the container, then got this error. |
I upgraded to Debian kernel 3.14 and the problem appears to have gone away. Looks like the problem existed in some kernels < 3.5, was fixed in 3.5, regressed in 3.6, and was patched in something 3.12-3.14. https://bugzilla.redhat.com/show_bug.cgi?id=880394 |
@spiffytech Do you have any idea where I can report this regarding the realtime kernel flavour? I think they're only releasing a RT patch for every other version, and would really hate to see 3.16-rt come out with this still broken. :/ EDIT: Filed it at kernel.org. |
I'm getting this on Ubuntu 14.10 running a 3.18.1. Kernel log shows
I'll send |
We're seeing this issue as well. Ubuntu 14.04, 3.13.0-37-generic |
On Ubuntu 14.04 server, my team has found that downgrading from 3.13.0-40-generic to 3.13.0-32-generic "resolves" the issue. Given @sbward's observation, that would put the regression after 3.13.0-32-generic and before (or including) 3.13.0-37-generic. I'll add that, in our case, we sometimes see a negative usage count. |
FWIW we hit this bug running lxc on trusty kernel (3.13.0-40-generic #69-Ubuntu) the message appears in dmesg followed by this stacktrace:
|
Ran into this on Ubuntu 14.04 and Debian jessie w/ kernel 3.16.x. Docker command:
This seems like a pretty bad issue... |
@jbalonso even with 3.13.0-32-generic I get the error after only a few successful runs 😭 |
@MrMMorris could you share a reproducer script using public available images? |
Everyone who's seeing this error on their system is running a package of the Linux kernel on their distribution that's far too old and lacks the fixes for this particular problem. If you run into this problem, make sure you run CentOS/RHEL/Fedora/Scientific Linux users need to keep their systems updated using When reporting this problem, please make sure your system is fully patched and up to date with the latest stable updates (no manually installed experimental/testing/alpha/beta/rc packages) provided by your distribution's vendor. |
I ran ubuntu 14.04 3.13.0-46-generic Still get the error after only one I can create an AMI for reproducing if needed |
@MrMMorris Thank you for confirming it's still a problem with the latest kernel package on Ubuntu 14.04. |
Anything else I can do to help, let me know! 😄 |
@MrMMorris if you can provide a reproducer there is a bug opened for Ubuntu and it will be much appreciated: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1403152 |
@rsampaio if I have time today, I will definitely get that for you! |
This problem also appears on 3.16(.7) on both Debian 7 and Debian 8: #9605 (comment). Rebooting the server is the only way to fix this for now. |
Seeing this issue on RHEL 6.6 with kernel 2.6.32-504.8.1.el6.x86_64 when starting some docker containers (not all containers) Again, rebooting the server seems to be the only solution at this time |
Also seeing this on CoreOS (647.0.0) with kernel 3.19.3. Rebooting is also the only solution I have found. |
Tested Debian jessie with sid's kernel (4.0.2) - the problem remains. |
Anyone seeing this issue running non-ubuntu containers? |
Yes. Debian ones.
|
@steelcowboy You can configure rsyslog to only void those annoying messages instead of all emergencies which is more desirable. I wrote the following into
|
Any news here? |
kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1 Any news here? |
This patch has fixed this problem: We have analyzed in the following link, and this problem could also be reproduced: |
This is still happening for me on kernel Ubuntu 5.8.0-41.46-generic 5.8.18 |
For me this first happening at going from kernel 5.10.37 to 5.10.38 with debian 10.9 amd64 on different machines. |
I saw this for the first time on a Gentoo system with kernel v5.4.120, just upgraded from kernel v5.4.117. Kernel sources used: sys-kernel/gentoo-sources. I get
every 10 seconds or so. |
Hi, this regression was introduced in 5.4.120, and is fixed in 5.4.121. |
Hello, do you have any more info on which specific commits introduced |
See the commits authored by Eric Dumazet: It was specifically about ip6_vti interfaces: 5.4.120 added https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.4.121&id=98ebeb87b2cf26663270e8e656fe599a32e4c96d which introduced the regression. (If I remember right, same issue was seen in some other stable/LTS kernel versions as well.) |
Ah, thanks! |
I have the same issue on kernels 5.10.70 and 5.14.9 This happens when I restart ipv6 containers. |
I have some reports that Ubuntu 20.04 LTS HWE did not present this issue; the work around is to never close the namespace, or to guarantee that the namespace is free of networks before closing it |
Linux kernel is adding a reference count tracing mechanism, https://lwn.net/ml/netdev/20211205042217.982127-1-eric.dumazet@gmail.com/ Hope with this mechanism, it would be easier to find and fix this kind of reference counting bugs in future. |
Have been trying to test it. Unfortunately, I don't have a reproducer yet. |
I have this same problem, this VM is running docker and ipv6. I have another VMs as this one and dont get this problem, maybe it's related to a running container. It stops after reboot but come back after a few days.
|
Did u try with hwe release? We did not got this error message since we started to use Ubuntu Focal 20.04 HWE
|
Yes, ubuntu focal 20.04 hwe, as you can see here:
|
Sorry to hear. This is a real problem. I guess that many scenarios lead to this issue. On previous versions, the way we managed that was by preventing the namespace deletion.
|
It looks like the issue is resolved in kernel 5.15.5. Anyway, I stopped getting error messages after switching to this version. |
Sry to bring this old topic, anyone see that there is huge impact to the other system process at the same time? like SSH was impacted as well. Kernel version is 5.3.18 |
The last notice (July 2018) advising users to 👍 / subscribe if they don't have helpful information to contribute regarding crashes, is no longer visible. I think this comment I've put together summarizes the issue well enough for anyone that wants to dig through it. I also have the impression that many of the past discussions have resolved the issue experienced by the majority, along with reproductions shared not creating the failure or log message. I would suggest closing / locking this issue. 9 years to this day since it was opened. It would be better to create a new issue for anyone else still affected to follow, and where more recent / relevant information can be tracked? System detailsThis is not a kernel crash report, but from an attempt to go through over 600 items of this issue, looking for any useful information and reproductions (especially reproductions confirmed by multiple users). Reproduction was not possible.
daemon.json{
"userland-proxy": false,
"experimental": true,
"ipv6": true,
"fixed-cidr-v6": "fd00:feed:face:f001::/64"
} docker info
Reproductions sharedNone of these were reproducible for me. Presumably the issue has been resolved since (as some hint at, like the
Cherry-picked comments
Many other comments cited CentOS or similar systems with very dated kernels most of the time, or did not provide much helpful information. Another bulk appeared to be related to IPv6, and some with UDP / conntrack. Various fixes related to networking (for IPv6 and UDP) have been made in both Docker and the kernel over this duration. Activity within the issue has also decreased significantly, implying the main causes have been resolved. |
I'm going to close this for now as stale; if we ever see a manifestation of this issue with a good reproducer, please open a new issue and link back here. Thank you very much for the deep dive, @polarathene! |
This happens when I login the container, and can't quit by Ctrl-c.
My system is
Ubuntu 12.04
, kernel is3.8.0-25-generic
.docker version:
I have used the script https://raw.githubusercontent.com/dotcloud/docker/master/contrib/check-config.sh to check, and all right.
I watch the syslog and found this message:
After happend this, I open another terminal and kill this process, and then restart docker, but this will be hanged.
I reboot the host, and it still display that messages for some minutes when shutdown:
The text was updated successfully, but these errors were encountered: