Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPv6_rpfilter=yes breaks IPv6 connectivity over bridges when br_netfilter is in use #1235

Open
GigabyteProductions opened this issue Nov 7, 2023 · 23 comments
Labels
can't fix Can't fix. Likely due to technical reasons.

Comments

@GigabyteProductions
Copy link

Background:

IPv4 rpfilter happens in the kernel IP stack. The Linux kernel doesn't do rpfilter in the IPv6 stack at all, so the IPv6 rpfilter has to be implemented as a firewall rule dropping the problematic traffic prior to reaching the host IPv6 stack, firewalld implements this as an ip6tables -A PREROUTING -m rpfilter --invert -j DROP rule, or an nft meta nfproto ipv6 fib saddr . mark . iif oif missing drop rule.

Ethernet bridges under Linux do not normally (or at least by default) operate as transparent firewalls. In order for bridge IPv6 traffic to be filtered by the kernel, the br_netfilter module must be loaded, and either the system-wide /proc/sys/net/bridge/bridge-nf-call-ip6tables tunable must be set to 1 OR the interface-specific /sys/devices/virtual/net/${interface_name}/bridge/nf_call_iptables setting must be 1. Right now, the kernel's system-wide tunable is set to 1 upon loading br_netfilter.

The problem:

The above background creates a difference in functionality between IPv4 and IPv6 crossing a bridge.

For IPv4, reverse path filtering is only relevant for traffic directed at the host kernel (frames destined for the bridge interface's MAC address). IPv4 hosts can communicate with each other over the bridge as long as their packets aren't dropped in the filter/FORWARD chain. Multiple subnets may be used without requiring the bridge host to have prior knowledge of each subnet.

For IPv6, the firewall rule -m rpfilter --invert -j DROP (or its nft equivalent) matches all IPv6 traffic not matched by a route on the bridge host, which happens to be ALL traffic when the bridge interface is intentionally not assigned an IP address. For example, this completely breaks IPv6 between two VMs communicating over a libvirt "isolated" network, despite libvirt adding a -A LIBVIRT_FWX -i virbr0 -o virbr0 -j ACCEPT rule.

Note that br_netfilter is loaded by libvirt if any libvirt nwfilter is in use. Also note that a completely working IPv6 network will immediately come to a halt on a host if nwfilter is added for the first time to an existing libvirt host, and the bridge interface on the host doesn't happen to have an IPv6 address in the subnets in use (this is how we discovered this bug...)

Discussion:

Of course, we could disable the rpfilter rule all together by setting IPv6_rpfilter=no, but this is not an acceptable answer as this has negative security implications in other contexts.

I was hoping to modify the rule to add -m physdev ! --physdev-is-bridged to distinguish between frames that would cross the bridge and packets that would be forwarded between subnets, but --physdev-is-bridged doesn't know the difference in PREROUTING, and -m rpfilter only works in PREROUTING. Furthermore, while I don't know if this same limitation applies to nft's equivalent fib rule, nft has no native equivlanet for physdev checks, requiring the iptables rpfilter version, anyway...

I'm also guessing firewalld wants to stay away from utilizing marks, since there's no telling how else they may already be used or matched by the system. That being said, I am able to workaround the issue to get behavior similar to IPv4's rp_filter=1 using IPv6_rpfilter=no and the following firewalld direct rules:

# set mark bit 31 if packet is being bridge-forwarded rather than routed

sudo firewall-cmd --permanent --direct --add-rule ipv4 mangle FORWARD 10 -m physdev --physdev-is-bridged -j MARK --set-mark 0x80000000/0x80000000
sudo firewall-cmd --permanent --direct --add-rule ipv6 mangle FORWARD 10 -m physdev --physdev-is-bridged -j MARK --set-mark 0x80000000/0x80000000

# set mark bit 30 if IPv6 packet violates reverse path filtering rules

sudo firewall-cmd --permanent --direct --add-rule ipv6 mangle PREROUTING 10 -m rpfilter --validmark --invert -j MARK --set-mark 0x40000000/0x40000000

# drop input IPv6 packet if marked for rpfilter

sudo firewall-cmd --permanent --direct --add-rule ipv6 mangle INPUT 10 -m mark --mark 0x40000000/0x40000000 -j DROP

# drop forwarded IPv6 packet if marked for rpfilter but not bridge-forwarding

sudo firewall-cmd --permanent --direct --add-rule ipv6 mangle FORWARD 11 -m mark --mark 0x40000000/0xc0000000 -j DROP

Here's the permanent /etc/firewalld/direct.xml representation:

<?xml version="1.0" encoding="utf-8"?>
<direct>

  <!-- set mark bit 31 if packet is being bridge-forwarded rather than routed -->
  <rule ipv="ipv4" table="mangle" chain="FORWARD" priority="10">-m physdev --physdev-is-bridged -j MARK --set-mark 0x80000000/0x80000000</rule>
  <rule ipv="ipv6" table="mangle" chain="FORWARD" priority="10">-m physdev --physdev-is-bridged -j MARK --set-mark 0x80000000/0x80000000</rule>

  <!-- set mark bit 30 if IPv6 packet violates reverse path filtering rules -->
  <rule ipv="ipv6" table="mangle" chain="PREROUTING" priority="10">-m rpfilter --validmark --invert -j MARK --set-mark 0x40000000/0x40000000</rule>

  <!-- drop input IPv6 packet if marked for rpfilter -->
  <rule ipv="ipv6" table="mangle" chain="INPUT" priority="10">-m mark --mark 0x40000000/0x40000000 -j DROP</rule>

  <!-- drop forwarded IPv6 packet if marked for rpfilter but not bridge-forwarding -->
  <rule ipv="ipv6" table="mangle" chain="FORWARD" priority="11">-m mark --mark 0x40000000/0xc0000000 -j DROP</rule>

</direct>

Here's the relevant part of nft list ruleset for the above:

table ip6 mangle {
	chain PREROUTING {
		type filter hook prerouting priority mangle; policy accept;
		fib saddr . mark . iif oif 0 counter packets 3 bytes 192 meta mark set mark or 0x40000000 
	}

	chain INPUT {
		type filter hook input priority mangle; policy accept;
		mark and 0x40000000 == 0x40000000 counter packets 0 bytes 0 drop
	}

	chain FORWARD {
		type filter hook forward priority mangle; policy accept;
		# PHYSDEV match --physdev-is-bridged counter packets 6 bytes 384 meta mark set mark or 0x80000000 
		mark and 0xc0000000 == 0x40000000 counter packets 0 bytes 0 drop
	}
...
}

Even though the above workaround is written in iptables format, it works with both FirewallBackend=iptables and FirewallBackend=nftables (at least on Rocky Linux 8).

@erig0
Copy link
Collaborator

erig0 commented Nov 7, 2023

Hrm. I don't think this unexpected. Loading br_netfilter is saying "send bridged frames to netfilter, and filter them like they're layer 3 (IP/IPv6) packets"... and that's exactly what is happening.

I'm guessing the difference is sysctl rp_filter is not "activated" on these bridged frames. The frames never actually reach the IP stack (as they're bridged). This explains why the IPv4 sysctl only "works" for packets that are destined to the bridge's IP address (i.e. destined to host).

I'm not sure we can do anything about this on the firewalld side.

@GigabyteProductions
Copy link
Author

There is no rp_filter sysctl for IPv6. It is the firewall rule utilizing rpfilter in ip6tables or fib in nft that is now responsible for the implementation.

However, while IPv4 rp_filter=1 does not implicitly affect bridge traffic, the firewalld version of the rpfilter/fib rule does, and bridge forwarding is exactly where routing-table-based reverse path filtering does not make sense.

See: https://bugzilla.kernel.org/show_bug.cgi?id=6998

@erig0
Copy link
Collaborator

erig0 commented Nov 7, 2023

There is no rp_filter sysctl for IPv6. It is the firewall rule utilizing rpfilter in ip6tables or fib in nft that is now responsible for the implementation.

Right.

However, while IPv4 rp_filter=1 does not implicitly affect bridge traffic, the firewalld version of the rpfilter/fib rule does

It affects IPv6, yes.

and bridge forwarding is exactly where routing-table-based reverse path filtering does not make sense.

Agree. You're also explicitly asking for something that "does not make sense", i.e. layer3 filtering bridged packets. As a consequence of this, you also get hit with the IPv6 reverse path filtering checks. There is no way to opt into one, but not the other; you get both.


Your options are:

  1. disable IPv6_rpfilter in /etc/firewalld/firewalld.conf

  2. use FirewallBackend=iptables with the direct rules you have above

As you said, there is no solution to this with the nftables backend. That's largely because nftables has true bridge family support.

@erig0 erig0 added the can't fix Can't fix. Likely due to technical reasons. label Nov 7, 2023
@GigabyteProductions
Copy link
Author

There is no rp_filter sysctl for IPv6. It is the firewall rule utilizing rpfilter in ip6tables or fib in nft that is now responsible for the implementation.

Right.

However, while IPv4 rp_filter=1 does not implicitly affect bridge traffic, the firewalld version of the rpfilter/fib rule does

It affects IPv6, yes.

Can you clarify? I think you're suggesting the /proc/sys/net/ipv4/conf/*/rp_filter settings affect IPv6, but they do not.

and bridge forwarding is exactly where routing-table-based reverse path filtering does not make sense.

Agree. You're also explicitly asking for something that "does not make sense", i.e. layer3 filtering bridged packets. As a consequence of this, you also get hit with the IPv6 reverse path filtering checks. There is no way to opt into one, but not the other; you get both.

Upper layer firewalling on bridge traffic is not something that "does not make sense". This is how "transparent firewalls", such as libvirt nwfilter, are implemented.

It is specifically the reverse path filtering that does not make sense to apply to bridge traffic.

Your options are:

1. disable `IPv6_rpfilter` in `/etc/firewalld/firewalld.conf`

2. use `FirewallBackend=iptables` with the direct rules you have above

Please note that the direct can work with FirewallBackend=nftables thanks to the iptables-to-nftables wrappers.

As you said, there is no solution to this with the nftables backend. That's largely because nftables has true bridge family support.

I didn't say that there was "no solution" to this problem, nor do I believe that there is actually "no solution". For example, would you be open to a patch implementing the same workaround as a built-in option, but with user-configurable marks/masks?

However, if firewalld "won't" rely on iptables-to-nftables wrappers or marks, then I suppose this issue "should" be brought to the attention of upstream nftables/kernel developers in pursuit of a 100% nftables-native and mark-free solution.

@erig0
Copy link
Collaborator

erig0 commented Nov 8, 2023

There is no rp_filter sysctl for IPv6. It is the firewall rule utilizing rpfilter in ip6tables or fib in nft that is now responsible for the implementation.

Right.

However, while IPv4 rp_filter=1 does not implicitly affect bridge traffic, the firewalld version of the rpfilter/fib rule does

It affects IPv6, yes.

Can you clarify? I think you're suggesting the /proc/sys/net/ipv4/conf/*/rp_filter settings affect IPv6, but they do not.

I am not suggesting that. I'm saying yes to "firewalld version of rpfilter affects bridged traffic if using br_netfilter". But I'm also saying that I think that is expected and that you'll have to set IPv6_filter=no.

and bridge forwarding is exactly where routing-table-based reverse path filtering does not make sense.

Agree. You're also explicitly asking for something that "does not make sense", i.e. layer3 filtering bridged packets. As a consequence of this, you also get hit with the IPv6 reverse path filtering checks. There is no way to opt into one, but not the other; you get both.

Upper layer firewalling on bridge traffic is not something that "does not make sense". This is how "transparent firewalls", such as libvirt nwfilter, are implemented.

It is specifically the reverse path filtering that does not make sense to apply to bridge traffic.

Okay. Then you can disable it.

Your options are:

1. disable `IPv6_rpfilter` in `/etc/firewalld/firewalld.conf`

2. use `FirewallBackend=iptables` with the direct rules you have above

Please note that the direct can work with FirewallBackend=nftables thanks to the iptables-to-nftables wrappers.

Right that it would work with FirewallBackend=nftables as well.

That works regardless if iptables is iptables-legacy or iptables-nft. 1

As you said, there is no solution to this with the nftables backend. That's largely because nftables has true bridge family support.

I didn't say that there was "no solution" to this problem, nor do I believe that there is actually "no solution". For example, would you be open to a patch implementing the same workaround as a built-in option, but with user-configurable marks/masks?

No to using marks. Firewalld deliberately avoids using them to not conflict with users and other entities.

However, if firewalld "won't" rely on iptables-to-nftables wrappers or marks, then I suppose this issue "should" be brought to the attention of upstream nftables/kernel developers in pursuit of a 100% nftables-native and mark-free solution.

nftables already has a solution. They'll say to use the bridge family.

@GigabyteProductions
Copy link
Author

I didn't say that there was "no solution" to this problem, nor do I believe that there is actually "no solution". For example, would you be open to a patch implementing the same workaround as a built-in option, but with user-configurable marks/masks?

No to using marks. Firewalld deliberately avoids using them to not conflict with users and other entities.

Alright, I thought that might be the case. It's too bad that rpfilter and physdev can't be combined without marks.

However, if firewalld "won't" rely on iptables-to-nftables wrappers or marks, then I suppose this issue "should" be brought to the attention of upstream nftables/kernel developers in pursuit of a 100% nftables-native and mark-free solution.

nftables already has a solution. They'll say to use the bridge family.

Good point. That is a nice improvement over original the original iptables+br_netfilter way of doing it.

@andreaskaris
Copy link

andreaskaris commented Dec 11, 2023

This exact same issue bit me, too, and it cost me an hour to find this thread here and to find the solution. It would have cost me even more time had this issue not been opened.

I personally also think that this is a departure from the IPv4 behavior; GigabyteProductions exactly describes the issue (packets in IPv4 with rp_filter=1 pass the bridge without issues, but the same packets with IPv6 do not pass firewalld's RP filter).

IPv6_rpfilter=no seems like a pretty big hammer for this very specific problem.

There's also no way from firewall-cmd to see that this is in place, or am I wrong?
firewall-cmd --list-all does not show anything that would indicate that this implicit filter is on. As a user, the only option that I have to figure out that firewalld is configured for this is to look into the firewalld configuration file? (other than looking into the nft ruleset, of course) Or can I somehow see this with the firewall-cmd CLI?

@erig0
Copy link
Collaborator

erig0 commented Dec 11, 2023

I personally also think that this is a departure from the IPv4 behavior

It's a departure because the ipv4 rp_filter sysctl is implemented in the IP stack; not netfilter. firewalld's IPv6_rpfilter is implemented in the firewall/netfilter stack. A ipv6 rp_filter sysctl was never implemented in the kernel IP stack.

So when something loads br_netfilter that's saying "send all bridged frames to netfilter". But this does not affect IPv4 since the rp_filter is using the IP stack.

There's also no way from firewall-cmd to see that this is in place, or am I wrong?

No. It's not exposed in the CLI; only in /etc/firewalld/firewalld.conf. None of the knobs in the config file are shown in the CLI.

@erig0
Copy link
Collaborator

erig0 commented Dec 11, 2023

IMO, the only "user friendly" thing we can do here is have firewalld detect if br_netfilter is loaded and if IPv6_filter=yes then log an INFO/WARN. But that may trigger a false positive warning response for some users.

@vidlb
Copy link

vidlb commented Jan 19, 2024

I faced critical issues with docker + firewalld, every packet dropped during a docker pull with kernel message rpfilter_DROP for IPv6 addresses / docker bridges.
Is it related to this issue ?
The only way I found to fix it was to set IPv6_rpfilter=false , but it seems overkill

@erig0
Copy link
Collaborator

erig0 commented Jan 19, 2024

Is it related to this issue ?

You can check by seeing if br_netfilter is loaded via lsmod output. I think at least some docker versions load it by default.

@vidlb
Copy link

vidlb commented Jan 19, 2024

I'm using latest docker on debian 12. Yes the module is loaded

lsmod | grep br_
Module                  Size  Used by
br_netfilter           32768  0
bridge                311296  1 br_netfilter

@GigabyteProductions
Copy link
Author

If it is loaded, and you haven't taken any special steps to set /proc/sys/net/bridge/bridge-nf-call-*tables to 0, then your bridge traffic is going through "ip6tables"/nftables as if it is being routed, which is why the IPv6 rpfilter rule is affecting your bridge traffic.

Disabling the rpfilter rule is one approach, but can affect network security. If that is unacceptable, you may want to consider the mark-oriented direct rules shown earlier in this thread. If you're not concerned about the Docker containers communicating with the Docker host, another approach may be to ensure that the bridge in question itself has an address in each IPv6 subnet that you're using, so the frames pass the reverse path checks imposed by the rpfilter rule.

@GigabyteProductions
Copy link
Author

I forgot to mention that I only assumed the module was loaded because its functionality is required for your configuration. If this is not the case, preventing the module from loading, or disabling its functionality through those sysctls will also avoid the issue.

@vidlb
Copy link

vidlb commented Jan 19, 2024

I haven't changed config regarding bridge traffic, nor to activate the netfilter module.
I guess it is loaded by default with docker, not sure if I can disable it without issues.
TBH I'm at the limit of my network knowledge.

IPv6 is enabled on the host, but actually my docker app (nextcloud) isn't using it.
Still, docker is using it when querying dns or pulling from registry. I was unable to deactivate it, despite of ipv6=false set in my docker config. I don't understand if this only applies for newly created docker networks / bridges. Yet it doesn't affect the daemon behaviour on the default bridge docker0 for operations like docker pull.

I'm unsure if I should completely disable ipv6 on the host instead of disabling rpfilter, I wanted to try, but since it's still unstable in nextcloud (and experimental in docker), may be that's the safest option.

@vidlb
Copy link

vidlb commented Jan 19, 2024

I forgot to mention that I only assumed the module was loaded because its functionality is required for your configuration. If this is not the case, preventing the module from loading, or disabling its functionality through those sysctls will also avoid the issue.

I searched in /etc/modules* and /lib/modules* load directories, I could not find anything setting related to the netfilter mod.
I also checked in my grub config and /etc/sysctl.d to be there's nothing I changed manually.

I don't understand which config file docker is using (if it's indeed due to docker) to load this kernel module.
May be it's loaded by default for my kernel version ?

I struggle to understand all the security implications of disabling IPv6_rpfilter in /etc/firewalld/firewalld.conf , could someone clarify this please ?
In my case I'd say it's not a problem since the exposed app isn't using ipv6, but could it be a problem for the host and other firewall rules than the docker ones ?

@GigabyteProductions
Copy link
Author

Here is an article demonstrating the security issue of disabling reverse path filtering, from an IPv4 point of view: https://www.theurbanpenguin.com/rp_filter-and-lpic-3-linux-security/

Without reverse path filtering, a host can receive a packet with a spoofed source address on an untrusted network, and respond to it on the trusted network. This isn't to say that it is easy to establish a bidirectional TCP connection from outside a trusted network, but it may expose things like internal DNS servers to attacks from the outside.

The module will be loaded automatically by certain ip/ip6tables rules being added. Not sure if that's what's happening on Docker

I don't have a lot of Docker experience, myself. Are you using a bridge network to connect containers directly to the host network?

Last time I touched it, the out-of-the-box behavior was for the Docker host to setup a bridge, assign its own internal IP addresses, attached containers to it with veth pairs, and the purpose of the bridge was actually for routing between the internal Docker network and the host network. The only reason I can think for Docker to load br_netfilter in this configuration is to firewall the containers from each other, because br_netfilter is unnecessary in the context of routing alone. Can you clarify if your problem that containers can't talk to each other, or can't talk to the IPv6 Internet?

Personally, I think disabling IPv6 on the host machine is also overkill, but I can see how it may be a simpler solution for your use case than learning about how IPv6 works.

Also note that, according to this documentation, you probably have to restart docker to reload your network configuration. I'm guessing that if containers aren't restarted, they'll retain the IPv6 configuration they already had: https://github.com/nextcloud/all-in-one/blob/main/docker-ipv6-support.md

Ultimately, if turning off IPv6_rpfilter in firewalld restores IPv6 connectivity, then you must have IPv6 frames being switched from one bridge slave to another, on a bridge that doesn't have an IP address in the same subnet.

@GigabyteProductions
Copy link
Author

If you'd like some help identifying the details of the docker networking, exactly how firewalld is involved in the issue, and aren't afraid to show internal IP/MAC addresses for your system, please paste the output of ip link | grep master, ip address, ip -4 route, and ip -6 route for both the Docker host, and a troublesome container

@vidlb
Copy link

vidlb commented Jan 19, 2024

Thanks for the explanation.
Yes this is still what docker is doing, I have a default bridge then for each network docker creates, there is a bridge, and veth interfaces for each container that are communicating through the bridge.
What is unexpected is that all those interfaces have inet6 whereas I did not configure docker for ipv6, so it seems it's done by default now.

Then regarding the app, it's strange because, whereas I saw DNS ipv6 queries errors in the log (but I guess it fallback to ipv4), everything worked.
This morning I tried to pull a lot of images and it completely messed up the host : hundreds of packets dropped and this overloading the kernel, freezing every network connection on the host !

I must say I found this docker + firewall stuff to be a nightmare, first I discovered docker + ufw was simply doing nothing, then switched to firewalld, trying to follow every docs and tutorials, using a default / recommended config and everything, but obviously something is wrong, and I can't tell where.

Docker docs does not really help.

@vidlb
Copy link

vidlb commented Jan 19, 2024

I don't really feel comfortable to post all those addresses here, but thanks for the help.
I'm starting to believe this is more a docker problem than a firewalld problem, so I won't pollute this issue anymore.
Or may be just misconfiguration regarding subnets
Anyway : I need to learn ipv6

@vidlb
Copy link

vidlb commented Jan 20, 2024

Well after re-installing firewalld on my server I found out some interesting things. I still believe this is also a docker issue, but it may be linked to this one here. I'll try to keep it short :

  • Everything is working fine (nextcloud, docker pull) after enabling the firewall with default settings. Docker interfaces are added to a dedicated zone as expected.
  • Use LogDenied=all and everything is falling apart. Seems like the kernel is overloaded with messages of packets drops. I can't be sure if this only break the network or if it's a complete system failure, but what I know is that I lose ssh connection for 1mn, and the monitoring tool that should collect metrics is missing data for that short period.
    => It only occurs when pulling big docker images. For example pulling alpine:latest does not provoke this.

So it's strange the kernel is dropping hundreds of packets when I use docker pull, while it's apparently working as expected (if you don't log denied packets).
Here is an example of the rpfilter drop that is overloading journalctl :

janv. 19 23:33:22 hostname kernel: rpfilter_DROP: IN=enp1s0f0 OUT= MAC=... SRC=... DST=... LEN=1420 TC=0 HOPLIMIT=53 FLOWLBL=36022 PROTO=TCP SPT=443 DPT=36706 W>
... (few hundreds more)

Those are ipv6 addresses, but I also have other rejected packets on some ipv4 ones, port 5355.

After disabling the firewalld log again, I can see more clearly kernel messages that appear while using docker pull :

dst_alloc: 1101 callbacks suppressed
Route cache is full: consider increasing sysctl net.ipv6.route.max_size.
... (a dozen of similar messages)

I searched for this warning and found out the default route cache size (4096) isn't enough for the modern web. But it doesn't look like a big problem, except for performance.

FIY this bare metal server is running debian 12, with firewalld 1.3.3 and docker 25.
Another interesting thing is that I have a similar config on a VPS running ipv6, ubuntu 22 and docker 25, but the default firewalld version is 1.1 and I cannot reproduce the issue. Not sure if it's the firewalld, it could also be the kernel (linux 5.15 in ubuntu, linux 6.1 for debian).

Is it possible that something is wrong in my ipv6 config, or is it due to this issue and a change since firewalld 1.3 ?
In any case, it's strange that docker is producing that much packet drops with its default behaviour.
It's hard to believe them when they say "Docker now fully integrates with firewalld" while looking at this mess.
Enabling firewall logs shouldn't break a system.

Edit: br_netfilter is also loaded on the ubuntu server

@vidlb
Copy link

vidlb commented Jan 20, 2024

Here is one of the stack trace of the system failure that appears in the logs after the eth interface has been reset.
janv. 19 21:48:06 kernel: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
janv. 19 21:48:06 kernel: ixgbe 0000:01:00.0 enp1s0f0: initiating reset due to tx timeout
janv. 19 21:48:06 kernel: rcu:         0-....: (0 ticks this GP) idle=11bc/1/0x4000000000000000 softirq=626145/626145 fqs=2033
janv. 19 21:48:06 kernel:         (detected by 1, t=5256 jiffies, g=2012617, q=166405 ncpus=12)
janv. 19 21:48:06 kernel: ixgbe 0000:01:00.0 enp1s0f0: Reset adapter
janv. 19 21:48:06 kernel: CPU: 1 PID: 0 Comm: swapper/1 Tainted: P        W  OE      6.1.0-17-amd64 #1  Debian 6.1.69-1
janv. 19 21:48:06 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X470D4U2-2T, BIOS L4.03J 03/24/2023
janv. 19 21:48:06 kernel: RIP: 0010:native_safe_halt+0xb/0x10
janv. 19 21:48:06 kernel: Code: 80 48 02 20 48 8b 00 a8 08 75 c0 e9 7c ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc eb 07 0f 00 2d b9 59 5d 00 fb f4 <e9> e0 bf 3c 00 eb 07 0f 00 2d a9 59 5d 00 f4 e9 d1 bf 3c 00 cc 0f
janv. 19 21:48:06 kernel: RSP: 0018:ffffc0964015fe68 EFLAGS: 00000246
janv. 19 21:48:06 kernel: RAX: 0000000000004000 RBX: 0000000000000001 RCX: 0000000000000000
janv. 19 21:48:06 kernel: RDX: ffff9f72fea40000 RSI: ffff9f640167f400 RDI: ffff9f640167f464
janv. 19 21:48:16 dockerd[501680]: time="2024-01-19T21:48:16.512509876+01:00" level=warning msg="Health check for container 36fa4ed593d09015837bb7447d05b5ac18d200bc5bc52903c6d0be7340133a74 error: timed out starting health check for container 36fa4ed593d09015837>
janv. 19 21:48:16 dockerd[501680]: time="2024-01-19T21:48:16.512718417+01:00" level=error msg="stream copy error: reading from a closed fifo"
janv. 19 21:48:25 kernel: ixgbe 0000:01:00.0 enp1s0f0: NIC Link is Up 10 Gbps, Flow Control: None
janv. 19 21:48:25 kernel: RBP: ffff9f640167f464 R08: ffffffffbb9a9a20 R09: 00000000238e3a2f
janv. 19 21:48:25 kernel: R10: 0000000000000008 R11: 0000000000000f9c R12: ffffffffbb9a9a20
janv. 19 21:48:25 kernel: R13: ffffffffbb9a9aa0 R14: 0000000000000001 R15: 0000000000000000
janv. 19 21:48:25 kernel: NMI watchdog: Watchdog detected hard LOCKUP on cpu 11
janv. 19 21:48:25 kernel: Modules linked in: xt_tcpudp xt_conntrack xt_addrtype xt_MASQUERADE nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_log_syslog nft_log nf_conntrack_netlink bluetooth jitterentr>
janv. 19 21:48:25 kernel:  drm_vram_helper zcommon(POE) drm_ttm_helper crypto_simd snd_timer acpi_ipmi ttm cryptd znvpair(POE) evdev ipmi_si snd sp5100_tco drm_kms_helper ipmi_devintf wmi_bmof rapl soundcore ccp i2c_algo_bit k10temp watchdog spl(OE) ipmi_msghan>
janv. 19 21:48:25 kernel: CPU: 11 PID: 16 Comm: rcu_preempt Tainted: P        W  OE      6.1.0-17-amd64 #1  Debian 6.1.69-1
janv. 19 21:48:25 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X470D4U2-2T, BIOS L4.03J 03/24/2023
janv. 19 21:48:25 kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x69/0x2a0
janv. 19 21:48:25 kernel: Code: 00 77 75 f0 0f ba 2b 08 0f 92 c2 8b 03 0f b6 d2 c1 e2 08 30 e4 09 d0 3d ff 00 00 00 77 51 85 c0 74 0e 8b 03 84 c0 74 08 f3 90 <8b> 03 84 c0 75 f8 b8 01 00 00 00 66 89 03 5b 5d 41 5c 41 5d e9 5e
janv. 19 21:48:25 kernel: RSP: 0018:ffffc0964014fe08 EFLAGS: 00000002
janv. 19 21:48:25 kernel: RAX: 00000000002c0101 RBX: ffffffffbb8d6b00 RCX: 00000000000a1dd1
janv. 19 21:48:25 kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffffbb8d6b00
janv. 19 21:48:25 kernel: RBP: ffff9f64003e6600 R08: 0000000000000000 R09: 0000000000000000
janv. 19 21:48:25 kernel: R10: 000000000000000b R11: 0000000000000001 R12: ffffffffbb8d6b00
janv. 19 21:48:25 kernel: R13: ffffffffb9f2f390 R14: 0000000000031c00 R15: ffffffffb9f30cf0
janv. 19 21:48:25 kernel: FS:  0000000000000000(0000) GS:ffff9f72fecc0000(0000) knlGS:0000000000000000
janv. 19 21:48:25 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
janv. 19 21:48:25 kernel: CR2: 00007f0da07509f8 CR3: 0000000191302000 CR4: 0000000000350ee0
janv. 19 21:48:25 kernel: Call Trace:
janv. 19 21:48:25 kernel:  <NMI>
janv. 19 21:48:25 kernel:  ? watchdog_overflow_callback.cold+0x1e/0x70
janv. 19 21:48:25 kernel:  ? __perf_event_overflow+0xe5/0x2a0
janv. 19 21:48:25 kernel:  ? x86_pmu_handle_irq+0x100/0x170
janv. 19 21:48:25 kernel:  ? amd_pmu_handle_irq+0x2f/0x90
janv. 19 21:48:25 kernel:  ? perf_event_nmi_handler+0x26/0x50
janv. 19 21:48:25 kernel:  ? nmi_handle+0x5d/0x120
janv. 19 21:48:25 kernel:  ? default_do_nmi+0x40/0x130
janv. 19 21:48:25 kernel:  ? exc_nmi+0x132/0x170
janv. 19 21:48:25 kernel:  ? end_repeat_nmi+0x16/0x67
janv. 19 21:48:25 kernel:  ? rcu_gp_cleanup+0x460/0x460
janv. 19 21:48:25 kernel:  ? rcu_check_boost_fail+0x170/0x170
janv. 19 21:48:25 kernel:  ? native_queued_spin_lock_slowpath+0x69/0x2a0
janv. 19 21:48:25 kernel:  ? native_queued_spin_lock_slowpath+0x69/0x2a0
janv. 19 21:48:25 kernel:  ? native_queued_spin_lock_slowpath+0x69/0x2a0
janv. 19 21:48:25 kernel:  </NMI>
janv. 19 21:48:25 kernel:  <TASK>
janv. 19 21:48:25 kernel:  ? rcu_check_boost_fail+0x170/0x170
janv. 19 21:48:25 kernel:  _raw_spin_lock_irqsave+0x39/0x50
janv. 19 21:48:25 kernel:  force_qs_rnp+0xf6/0x230
janv. 19 21:48:25 kernel:  ? prepare_to_swait_event+0x6f/0x120
janv. 19 21:48:25 kernel:  ? rcu_gp_cleanup+0x460/0x460
janv. 19 21:48:25 kernel:  rcu_gp_fqs_loop+0x3b5/0x550
janv. 19 21:48:25 kernel:  rcu_gp_kthread+0xd0/0x190
janv. 19 21:48:25 kernel:  kthread+0xda/0x100
janv. 19 21:48:25 kernel:  ? kthread_complete_and_exit+0x20/0x20
janv. 19 21:48:25 kernel:  ret_from_fork+0x22/0x30
janv. 19 21:48:25 kernel:  </TASK>
janv. 19 21:48:25 kernel: FS:  0000000000000000(0000) GS:ffff9f72fea40000(0000) knlGS:0000000000000000
janv. 19 21:48:25 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
janv. 19 21:48:25 kernel: CR2: 00007f6ab500a480 CR3: 00000003cd466000 CR4: 0000000000350ee0
janv. 19 21:48:25 kernel: Call Trace:
janv. 19 21:48:25 kernel:  <IRQ>
janv. 19 21:48:25 kernel:  ? rcu_dump_cpu_stacks+0xa4/0xe0
janv. 19 21:48:25 kernel:  ? rcu_sched_clock_irq.cold+0x4c/0x459
janv. 19 21:48:25 kernel:  ? recalibrate_cpu_khz+0x10/0x10
janv. 19 21:48:25 kernel:  ? update_process_times+0x70/0xb0
janv. 19 21:48:25 kernel:  ? perf_trace_run_bpf_submit+0x52/0xc0
janv. 19 21:48:25 kernel:  ? tick_sched_handle+0x22/0x60
janv. 19 21:48:25 kernel:  ? tick_sched_timer+0x63/0x80
janv. 19 21:48:25 kernel:  ? tick_sched_do_timer+0xa0/0xa0
janv. 19 21:48:25 kernel:  ? __hrtimer_run_queues+0x112/0x2b0
janv. 19 21:48:25 kernel:  ? hrtimer_interrupt+0xf4/0x210
janv. 19 21:48:25 kernel:  ? __sysvec_apic_timer_interrupt+0x5d/0x110
janv. 19 21:48:25 kernel:  ? sysvec_apic_timer_interrupt+0x69/0x90
janv. 19 21:48:25 kernel:  </IRQ>
janv. 19 21:48:25 kernel:  <TASK>
janv. 19 21:48:25 kernel:  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
janv. 19 21:48:25 kernel:  ? native_safe_halt+0xb/0x10
janv. 19 21:48:25 kernel:  acpi_idle_do_entry+0x4e/0x60
janv. 19 21:48:25 kernel:  acpi_idle_enter+0x80/0xd0
janv. 19 21:48:25 kernel:  cpuidle_enter_state+0x8c/0x420
janv. 19 21:48:25 kernel:  cpuidle_enter+0x29/0x40
janv. 19 21:48:25 kernel:  do_idle+0x202/0x2a0
janv. 19 21:48:25 kernel:  cpu_startup_entry+0x26/0x30
janv. 19 21:48:25 kernel:  start_secondary+0x12a/0x150
janv. 19 21:48:25 kernel:  secondary_startup_64_no_verify+0xe5/0xeb
janv. 19 21:48:25 kernel:  </TASK>

@GigabyteProductions
Copy link
Author

This line stands out to me:

ixgbe 0000:01:00.0 enp1s0f0: initiating reset due to tx timeout

Flooding the kernel log buffer is a great way to prevent the kernel from doing other things, and you're doing exactly that with the LogDenied=all setting (See #439), which is likely causing your enp1s0f0 timeout in the ixgbe driver code, along with the inability to ssh to that machine.

That being said, I didn't look into your "Route cache is full" error until now because I assumed that was unrelated to hanging IPv6 connections. It looks like you should only have IPv6 route cache entries due to PMTU exceptions [1]. This issue is unrelated to the IPv6_rpfilter=yes issue since it DROP's packets rather than sending ICMPv6 "Packet Too Big" responses. You're probably getting that error, specifically, from a combination of having a smaller MTU on the host interface than in the Docker network, and one of your Docker containers reaching out to a lot of IPv6 hosts.

[1]: See: https://linux.debian.kernel.narkive.com/5zMeCHhc/bug-861115-please-consider-increasing-net-ipv6-route-max-size-default-value#post3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
can't fix Can't fix. Likely due to technical reasons.
Projects
None yet
Development

No branches or pull requests

4 participants