network between kubernetes PODs is down after one flanneld is stopped and datastore can't be reached #636

Dieken · 2017-03-12T16:17:23Z

I'm experimenting kinds of failure in kubernetes cluster, I found a strange problem.

My steps:

start 4 nodes kubernetes cluster with kubeadm and vagrant, use flannel vxlan for network among PODs.
stop all etcd containers so that kubernetes apiserver stops working -- expected.
suppose node01 (192.168.200.201) with POD cidr 172.16.0.0/24, and node04(192.168.200.204) with POD cidr 172.16.3.0/24. After etcd and apiserver stops working, flanneld on node01 and node04 still run, this is excellent, I can ping from node01 to a container(172.16.3.10) on node04, also from a container on node01, both work, very good!
Now I stop the flanneld on node04, and not stop flanneld on node01. Ping from node01 to the container(172.16.3.10) on node04 still work, but ping from the container on node01 to 172.16.3.10 on node04 doesn't work any more, why?

Ping from node01 to containers on node02 and node03 still work.

I suppose flanneld is for network control plane, its exit shouldn't interrupt the data plane, because the flannel.1 vxlan interface and cni0 bridge on node04 still exist after flanneld on node04 exit.

I tried to set rp_filter to 0 on all interfaces of node01/node04, didn't help.

The host OS is latest Ubuntu 16.04, the ubuntu/xenial box shipped by Vagrant.
Kubernetes v1.5.4 and Flannel v0.7.0.

Dieken · 2017-03-12T17:36:49Z

bad network from container on node01 to container on node04, iptables trace in /var/log/syslog on node04:

Mar 12 17:21:14 node04 kernel: [ 3452.909716] TRACE: raw:PREROUTING:policy:2 IN=flannel.1 OUT= MAC=fe:bf:08:13:d7:4a:ea:f6:36:35:fd:f9:08:00 SRC=172.16.0.7 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=56818 DF PROTO=ICMP TYPE=8 CODE=0 ID=492 SEQ=1
Mar 12 17:21:14 node04 kernel: [ 3452.909724] TRACE: nat:PREROUTING:rule:1 IN=flannel.1 OUT= MAC=fe:bf:08:13:d7:4a:ea:f6:36:35:fd:f9:08:00 SRC=172.16.0.7 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=56818 DF PROTO=ICMP TYPE=8 CODE=0 ID=492 SEQ=1
Mar 12 17:21:14 node04 kernel: [ 3452.909732] TRACE: nat:KUBE-SERVICES:return:6 IN=flannel.1 OUT= MAC=fe:bf:08:13:d7:4a:ea:f6:36:35:fd:f9:08:00 SRC=172.16.0.7 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=56818 DF PROTO=ICMP TYPE=8 CODE=0 ID=492 SEQ=1
Mar 12 17:21:14 node04 kernel: [ 3452.909736] TRACE: nat:PREROUTING:policy:3 IN=flannel.1 OUT= MAC=fe:bf:08:13:d7:4a:ea:f6:36:35:fd:f9:08:00 SRC=172.16.0.7 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=56818 DF PROTO=ICMP TYPE=8 CODE=0 ID=492 SEQ=1
Mar 12 17:21:14 node04 kernel: [ 3452.909742] TRACE: filter:FORWARD:rule:1 IN=flannel.1 OUT=cni0 MAC=fe:bf:08:13:d7:4a:ea:f6:36:35:fd:f9:08:00 SRC=172.16.0.7 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=62 ID=56818 DF PROTO=ICMP TYPE=8 CODE=0 ID=492 SEQ=1
Mar 12 17:21:14 node04 kernel: [ 3452.909746] TRACE: filter:DOCKER-ISOLATION:return:1 IN=flannel.1 OUT=cni0 MAC=fe:bf:08:13:d7:4a:ea:f6:36:35:fd:f9:08:00 SRC=172.16.0.7 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=62 ID=56818 DF PROTO=ICMP TYPE=8 CODE=0 ID=492 SEQ=1
Mar 12 17:21:14 node04 kernel: [ 3452.909750] TRACE: filter:FORWARD:policy:6 IN=flannel.1 OUT=cni0 MAC=fe:bf:08:13:d7:4a:ea:f6:36:35:fd:f9:08:00 SRC=172.16.0.7 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=62 ID=56818 DF PROTO=ICMP TYPE=8 CODE=0 ID=492 SEQ=1
Mar 12 17:21:14 node04 kernel: [ 3452.909753] TRACE: nat:POSTROUTING:rule:1 IN= OUT=cni0 SRC=172.16.0.7 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=62 ID=56818 DF PROTO=ICMP TYPE=8 CODE=0 ID=492 SEQ=1
Mar 12 17:21:14 node04 kernel: [ 3452.909756] TRACE: nat:KUBE-POSTROUTING:return:2 IN= OUT=cni0 SRC=172.16.0.7 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=62 ID=56818 DF PROTO=ICMP TYPE=8 CODE=0 ID=492 SEQ=1
Mar 12 17:21:14 node04 kernel: [ 3452.909759] TRACE: nat:POSTROUTING:rule:3 IN= OUT=cni0 SRC=172.16.0.7 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=62 ID=56818 DF PROTO=ICMP TYPE=8 CODE=0 ID=492 SEQ=1
Mar 12 17:21:14 node04 kernel: [ 3452.909761] TRACE: nat:POSTROUTING:policy:6 IN= OUT=cni0 SRC=172.16.0.7 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=62 ID=56818 DF PROTO=ICMP TYPE=8 CODE=0 ID=492 SEQ=1
Mar 12 17:21:14 node04 kernel: [ 3452.909799] TRACE: raw:PREROUTING:policy:2 IN=cni0 OUT= PHYSIN=veth63e31496 MAC=0a:58:ac:10:03:01:0a:58:ac:10:03:0c:08:00 SRC=172.16.3.12 DST=172.16.0.7 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=27538 PROTO=ICMP TYPE=0 CODE=0 ID=492 SEQ=1
Mar 12 17:21:14 node04 kernel: [ 3452.909808] TRACE: filter:FORWARD:rule:1 IN=cni0 OUT=flannel.1 PHYSIN=veth63e31496 MAC=0a:58:ac:10:03:01:0a:58:ac:10:03:0c:08:00 SRC=172.16.3.12 DST=172.16.0.7 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=27538 PROTO=ICMP TYPE=0 CODE=0 ID=492 SEQ=1
Mar 12 17:21:14 node04 kernel: [ 3452.909812] TRACE: filter:DOCKER-ISOLATION:return:1 IN=cni0 OUT=flannel.1 PHYSIN=veth63e31496 MAC=0a:58:ac:10:03:01:0a:58:ac:10:03:0c:08:00 SRC=172.16.3.12 DST=172.16.0.7 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=27538 PROTO=ICMP TYPE=0 CODE=0 ID=492 SEQ=1
Mar 12 17:21:14 node04 kernel: [ 3452.909816] TRACE: filter:FORWARD:policy:6 IN=cni0 OUT=flannel.1 PHYSIN=veth63e31496 MAC=0a:58:ac:10:03:01:0a:58:ac:10:03:0c:08:00 SRC=172.16.3.12 DST=172.16.0.7 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=27538 PROTO=ICMP TYPE=0 CODE=0 ID=492 SEQ=1

Mar 12 17:21:20 node04 kernel: [ 3458.914902] TRACE: raw:OUTPUT:policy:2 IN= OUT=cni0 SRC=172.16.3.1 DST=172.16.3.12 LEN=112 TOS=0x00 PREC=0xC0 TTL=64 ID=11995 PROTO=ICMP TYPE=3 CODE=1 [SRC=172.16.3.12 DST=172.16.0.7 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=27538 PROTO=ICMP TYPE=0 CODE=0 ID=492 SEQ=1 ]
Mar 12 17:21:20 node04 kernel: [ 3458.914925] TRACE: filter:OUTPUT:rule:1 IN= OUT=cni0 SRC=172.16.3.1 DST=172.16.3.12 LEN=112 TOS=0x00 PREC=0xC0 TTL=64 ID=11995 PROTO=ICMP TYPE=3 CODE=1 [SRC=172.16.3.12 DST=172.16.0.7 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=27538 PROTO=ICMP TYPE=0 CODE=0 ID=492 SEQ=1 ]
Mar 12 17:21:20 node04 kernel: [ 3458.914934] TRACE: filter:KUBE-SERVICES:return:1 IN= OUT=cni0 SRC=172.16.3.1 DST=172.16.3.12 LEN=112 TOS=0x00 PREC=0xC0 TTL=64 ID=11995 PROTO=ICMP TYPE=3 CODE=1 [SRC=172.16.3.12 DST=172.16.0.7 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=27538 PROTO=ICMP TYPE=0 CODE=0 ID=492 SEQ=1 ]
Mar 12 17:21:20 node04 kernel: [ 3458.914940] TRACE: filter:OUTPUT:rule:2 IN= OUT=cni0 SRC=172.16.3.1 DST=172.16.3.12 LEN=112 TOS=0x00 PREC=0xC0 TTL=64 ID=11995 PROTO=ICMP TYPE=3 CODE=1 [SRC=172.16.3.12 DST=172.16.0.7 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=27538 PROTO=ICMP TYPE=0 CODE=0 ID=492 SEQ=1 ]
Mar 12 17:21:20 node04 kernel: [ 3458.914947] TRACE: filter:KUBE-FIREWALL:return:2 IN= OUT=cni0 SRC=172.16.3.1 DST=172.16.3.12 LEN=112 TOS=0x00 PREC=0xC0 TTL=64 ID=11995 PROTO=ICMP TYPE=3 CODE=1 [SRC=172.16.3.12 DST=172.16.0.7 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=27538 PROTO=ICMP TYPE=0 CODE=0 ID=492 SEQ=1 ]
Mar 12 17:21:20 node04 kernel: [ 3458.914954] TRACE: filter:OUTPUT:policy:3 IN= OUT=cni0 SRC=172.16.3.1 DST=172.16.3.12 LEN=112 TOS=0x00 PREC=0xC0 TTL=64 ID=11995 PROTO=ICMP TYPE=3 CODE=1 [SRC=172.16.3.12 DST=172.16.0.7 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=27538 PROTO=ICMP TYPE=0 CODE=0 ID=492 SEQ=1 ]

trace on node01, no ICMP reply:

Mar 12 17:21:14 node01 kernel: [ 3453.096969] TRACE: raw:PREROUTING:policy:2 IN=cni0 OUT= PHYSIN=vethf912d941 MAC=0a:58:ac:10:00:01:0a:58:ac:10:00:07:08:00 SRC=172.16.0.7 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=56818 DF PROTO=ICMP TYPE=8 CODE=0 ID=492 SEQ=1
Mar 12 17:21:14 node01 kernel: [ 3453.096976] TRACE: nat:PREROUTING:rule:1 IN=cni0 OUT= PHYSIN=vethf912d941 MAC=0a:58:ac:10:00:01:0a:58:ac:10:00:07:08:00 SRC=172.16.0.7 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=56818 DF PROTO=ICMP TYPE=8 CODE=0 ID=492 SEQ=1
Mar 12 17:21:14 node01 kernel: [ 3453.096985] TRACE: nat:KUBE-SERVICES:return:6 IN=cni0 OUT= PHYSIN=vethf912d941 MAC=0a:58:ac:10:00:01:0a:58:ac:10:00:07:08:00 SRC=172.16.0.7 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=56818 DF PROTO=ICMP TYPE=8 CODE=0 ID=492 SEQ=1
Mar 12 17:21:14 node01 kernel: [ 3453.096989] TRACE: nat:PREROUTING:policy:3 IN=cni0 OUT= PHYSIN=vethf912d941 MAC=0a:58:ac:10:00:01:0a:58:ac:10:00:07:08:00 SRC=172.16.0.7 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=56818 DF PROTO=ICMP TYPE=8 CODE=0 ID=492 SEQ=1
Mar 12 17:21:14 node01 kernel: [ 3453.097006] TRACE: filter:FORWARD:rule:1 IN=cni0 OUT=flannel.1 PHYSIN=vethf912d941 MAC=0a:58:ac:10:00:01:0a:58:ac:10:00:07:08:00 SRC=172.16.0.7 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=56818 DF PROTO=ICMP TYPE=8 CODE=0 ID=492 SEQ=1
Mar 12 17:21:14 node01 kernel: [ 3453.097011] TRACE: filter:DOCKER-ISOLATION:return:1 IN=cni0 OUT=flannel.1 PHYSIN=vethf912d941 MAC=0a:58:ac:10:00:01:0a:58:ac:10:00:07:08:00 SRC=172.16.0.7 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=56818 DF PROTO=ICMP TYPE=8 CODE=0 ID=492 SEQ=1
Mar 12 17:21:14 node01 kernel: [ 3453.097014] TRACE: filter:FORWARD:policy:6 IN=cni0 OUT=flannel.1 PHYSIN=vethf912d941 MAC=0a:58:ac:10:00:01:0a:58:ac:10:00:07:08:00 SRC=172.16.0.7 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=56818 DF PROTO=ICMP TYPE=8 CODE=0 ID=492 SEQ=1
Mar 12 17:21:14 node01 kernel: [ 3453.097017] TRACE: nat:POSTROUTING:rule:2 IN= OUT=flannel.1 PHYSIN=vethf912d941 SRC=172.16.0.7 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=56818 DF PROTO=ICMP TYPE=8 CODE=0 ID=492 SEQ=1
Mar 12 17:21:14 node01 kernel: [ 3453.097021] TRACE: nat:KUBE-POSTROUTING:return:2 IN= OUT=flannel.1 PHYSIN=vethf912d941 SRC=172.16.0.7 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=56818 DF PROTO=ICMP TYPE=8 CODE=0 ID=492 SEQ=1
Mar 12 17:21:14 node01 kernel: [ 3453.097023] TRACE: nat:POSTROUTING:rule:3 IN= OUT=flannel.1 PHYSIN=vethf912d941 SRC=172.16.0.7 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=56818 DF PROTO=ICMP TYPE=8 CODE=0 ID=492 SEQ=1
Mar 12 17:21:14 node01 kernel: [ 3453.097026] TRACE: nat:POSTROUTING:policy:6 IN= OUT=flannel.1 PHYSIN=vethf912d941 SRC=172.16.0.7 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=56818 DF PROTO=ICMP TYPE=8 CODE=0 ID=492 SEQ=1

good network from node01 host to container on node04, iptables trace in /var/log/system on node04:

Mar 12 17:26:36 node04 kernel: [ 3774.709092] TRACE: raw:PREROUTING:policy:2 IN=flannel.1 OUT= MAC=fe:bf:08:13:d7:4a:ea:f6:36:35:fd:f9:08:00 SRC=172.16.0.0 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=40486 DF PROTO=ICMP TYPE=8 CODE=0 ID=27983 SEQ=1
Mar 12 17:26:36 node04 kernel: [ 3774.709099] TRACE: nat:PREROUTING:rule:1 IN=flannel.1 OUT= MAC=fe:bf:08:13:d7:4a:ea:f6:36:35:fd:f9:08:00 SRC=172.16.0.0 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=40486 DF PROTO=ICMP TYPE=8 CODE=0 ID=27983 SEQ=1
Mar 12 17:26:36 node04 kernel: [ 3774.709107] TRACE: nat:KUBE-SERVICES:return:6 IN=flannel.1 OUT= MAC=fe:bf:08:13:d7:4a:ea:f6:36:35:fd:f9:08:00 SRC=172.16.0.0 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=40486 DF PROTO=ICMP TYPE=8 CODE=0 ID=27983 SEQ=1
Mar 12 17:26:36 node04 kernel: [ 3774.709154] TRACE: nat:PREROUTING:policy:3 IN=flannel.1 OUT= MAC=fe:bf:08:13:d7:4a:ea:f6:36:35:fd:f9:08:00 SRC=172.16.0.0 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=40486 DF PROTO=ICMP TYPE=8 CODE=0 ID=27983 SEQ=1
Mar 12 17:26:36 node04 kernel: [ 3774.709162] TRACE: filter:FORWARD:rule:1 IN=flannel.1 OUT=cni0 MAC=fe:bf:08:13:d7:4a:ea:f6:36:35:fd:f9:08:00 SRC=172.16.0.0 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=40486 DF PROTO=ICMP TYPE=8 CODE=0 ID=27983 SEQ=1
Mar 12 17:26:36 node04 kernel: [ 3774.709166] TRACE: filter:DOCKER-ISOLATION:return:1 IN=flannel.1 OUT=cni0 MAC=fe:bf:08:13:d7:4a:ea:f6:36:35:fd:f9:08:00 SRC=172.16.0.0 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=40486 DF PROTO=ICMP TYPE=8 CODE=0 ID=27983 SEQ=1
Mar 12 17:26:36 node04 kernel: [ 3774.709169] TRACE: filter:FORWARD:policy:6 IN=flannel.1 OUT=cni0 MAC=fe:bf:08:13:d7:4a:ea:f6:36:35:fd:f9:08:00 SRC=172.16.0.0 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=40486 DF PROTO=ICMP TYPE=8 CODE=0 ID=27983 SEQ=1
Mar 12 17:26:36 node04 kernel: [ 3774.709172] TRACE: nat:POSTROUTING:rule:1 IN= OUT=cni0 SRC=172.16.0.0 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=40486 DF PROTO=ICMP TYPE=8 CODE=0 ID=27983 SEQ=1
Mar 12 17:26:36 node04 kernel: [ 3774.709176] TRACE: nat:KUBE-POSTROUTING:return:2 IN= OUT=cni0 SRC=172.16.0.0 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=40486 DF PROTO=ICMP TYPE=8 CODE=0 ID=27983 SEQ=1
Mar 12 17:26:36 node04 kernel: [ 3774.709178] TRACE: nat:POSTROUTING:rule:3 IN= OUT=cni0 SRC=172.16.0.0 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=40486 DF PROTO=ICMP TYPE=8 CODE=0 ID=27983 SEQ=1
Mar 12 17:26:36 node04 kernel: [ 3774.709181] TRACE: nat:POSTROUTING:policy:6 IN= OUT=cni0 SRC=172.16.0.0 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=40486 DF PROTO=ICMP TYPE=8 CODE=0 ID=27983 SEQ=1
Mar 12 17:26:36 node04 kernel: [ 3774.709218] TRACE: raw:PREROUTING:policy:2 IN=cni0 OUT= PHYSIN=veth63e31496 MAC=0a:58:ac:10:03:01:0a:58:ac:10:03:0c:08:00 SRC=172.16.3.12 DST=172.16.0.0 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=31680 PROTO=ICMP TYPE=0 CODE=0 ID=27983 SEQ=1
Mar 12 17:26:36 node04 kernel: [ 3774.709226] TRACE: filter:FORWARD:rule:1 IN=cni0 OUT=flannel.1 PHYSIN=veth63e31496 MAC=0a:58:ac:10:03:01:0a:58:ac:10:03:0c:08:00 SRC=172.16.3.12 DST=172.16.0.0 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=31680 PROTO=ICMP TYPE=0 CODE=0 ID=27983 SEQ=1
Mar 12 17:26:36 node04 kernel: [ 3774.709230] TRACE: filter:DOCKER-ISOLATION:return:1 IN=cni0 OUT=flannel.1 PHYSIN=veth63e31496 MAC=0a:58:ac:10:03:01:0a:58:ac:10:03:0c:08:00 SRC=172.16.3.12 DST=172.16.0.0 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=31680 PROTO=ICMP TYPE=0 CODE=0 ID=27983 SEQ=1
Mar 12 17:26:36 node04 kernel: [ 3774.709234] TRACE: filter:FORWARD:policy:6 IN=cni0 OUT=flannel.1 PHYSIN=veth63e31496 MAC=0a:58:ac:10:03:01:0a:58:ac:10:03:0c:08:00 SRC=172.16.3.12 DST=172.16.0.0 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=31680 PROTO=ICMP TYPE=0 CODE=0 ID=27983 SEQ=1

trace on node01, got ICMP reply:

Mar 12 17:26:36 node01 kernel: [ 3774.885906] TRACE: raw:OUTPUT:policy:2 IN= OUT=flannel.1 SRC=172.16.0.0 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=40486 DF PROTO=ICMP TYPE=8 CODE=0 ID=27983 SEQ=1 UID=0 GID=0
Mar 12 17:26:36 node01 kernel: [ 3774.885914] TRACE: nat:OUTPUT:rule:1 IN= OUT=flannel.1 SRC=172.16.0.0 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=40486 DF PROTO=ICMP TYPE=8 CODE=0 ID=27983 SEQ=1 UID=0 GID=0
Mar 12 17:26:36 node01 kernel: [ 3774.885919] TRACE: nat:KUBE-SERVICES:return:6 IN= OUT=flannel.1 SRC=172.16.0.0 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=40486 DF PROTO=ICMP TYPE=8 CODE=0 ID=27983 SEQ=1 UID=0 GID=0
Mar 12 17:26:36 node01 kernel: [ 3774.885922] TRACE: nat:OUTPUT:policy:3 IN= OUT=flannel.1 SRC=172.16.0.0 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=40486 DF PROTO=ICMP TYPE=8 CODE=0 ID=27983 SEQ=1 UID=0 GID=0
Mar 12 17:26:36 node01 kernel: [ 3774.885925] TRACE: filter:OUTPUT:rule:1 IN= OUT=flannel.1 SRC=172.16.0.0 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=40486 DF PROTO=ICMP TYPE=8 CODE=0 ID=27983 SEQ=1 UID=0 GID=0
Mar 12 17:26:36 node01 kernel: [ 3774.885928] TRACE: filter:KUBE-SERVICES:return:1 IN= OUT=flannel.1 SRC=172.16.0.0 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=40486 DF PROTO=ICMP TYPE=8 CODE=0 ID=27983 SEQ=1 UID=0 GID=0
Mar 12 17:26:36 node01 kernel: [ 3774.885931] TRACE: filter:OUTPUT:rule:2 IN= OUT=flannel.1 SRC=172.16.0.0 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=40486 DF PROTO=ICMP TYPE=8 CODE=0 ID=27983 SEQ=1 UID=0 GID=0
Mar 12 17:26:36 node01 kernel: [ 3774.885934] TRACE: filter:KUBE-FIREWALL:return:2 IN= OUT=flannel.1 SRC=172.16.0.0 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=40486 DF PROTO=ICMP TYPE=8 CODE=0 ID=27983 SEQ=1 UID=0 GID=0
Mar 12 17:26:36 node01 kernel: [ 3774.885936] TRACE: filter:OUTPUT:policy:3 IN= OUT=flannel.1 SRC=172.16.0.0 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=40486 DF PROTO=ICMP TYPE=8 CODE=0 ID=27983 SEQ=1 UID=0 GID=0
Mar 12 17:26:36 node01 kernel: [ 3774.885939] TRACE: nat:POSTROUTING:rule:2 IN= OUT=flannel.1 SRC=172.16.0.0 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=40486 DF PROTO=ICMP TYPE=8 CODE=0 ID=27983 SEQ=1 UID=0 GID=0
Mar 12 17:26:36 node01 kernel: [ 3774.885942] TRACE: nat:KUBE-POSTROUTING:return:2 IN= OUT=flannel.1 SRC=172.16.0.0 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=40486 DF PROTO=ICMP TYPE=8 CODE=0 ID=27983 SEQ=1 UID=0 GID=0
Mar 12 17:26:36 node01 kernel: [ 3774.885945] TRACE: nat:POSTROUTING:rule:3 IN= OUT=flannel.1 SRC=172.16.0.0 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=40486 DF PROTO=ICMP TYPE=8 CODE=0 ID=27983 SEQ=1 UID=0 GID=0
Mar 12 17:26:36 node01 kernel: [ 3774.885948] TRACE: nat:POSTROUTING:policy:6 IN= OUT=flannel.1 SRC=172.16.0.0 DST=172.16.3.12 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=40486 DF PROTO=ICMP TYPE=8 CODE=0 ID=27983 SEQ=1 UID=0 GID=0
Mar 12 17:26:36 node01 kernel: [ 3774.897055] TRACE: raw:PREROUTING:policy:2 IN=flannel.1 OUT= MAC=ea:f6:36:35:fd:f9:fe:bf:08:13:d7:4a:08:00 SRC=172.16.3.12 DST=172.16.0.0 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=31680 PROTO=ICMP TYPE=0 CODE=0 ID=27983 SEQ=1
Mar 12 17:26:36 node01 kernel: [ 3774.897063] TRACE: filter:INPUT:rule:1 IN=flannel.1 OUT= MAC=ea:f6:36:35:fd:f9:fe:bf:08:13:d7:4a:08:00 SRC=172.16.3.12 DST=172.16.0.0 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=31680 PROTO=ICMP TYPE=0 CODE=0 ID=27983 SEQ=1
Mar 12 17:26:36 node01 kernel: [ 3774.897067] TRACE: filter:KUBE-FIREWALL:return:2 IN=flannel.1 OUT= MAC=ea:f6:36:35:fd:f9:fe:bf:08:13:d7:4a:08:00 SRC=172.16.3.12 DST=172.16.0.0 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=31680 PROTO=ICMP TYPE=0 CODE=0 ID=27983 SEQ=1
Mar 12 17:26:36 node01 kernel: [ 3774.897070] TRACE: filter:INPUT:policy:2 IN=flannel.1 OUT= MAC=ea:f6:36:35:fd:f9:fe:bf:08:13:d7:4a:08:00 SRC=172.16.3.12 DST=172.16.0.0 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=31680 PROTO=ICMP TYPE=0 CODE=0 ID=27983 SEQ=1

Dieken · 2017-03-13T02:29:16Z

The network topology, probably not very accurate, especially the relationship between veth and cni0, but it should be enough to understand the network data flow.

Dieken · 2017-03-13T13:31:52Z

I got it. When flanneld on node04 stopped and couldn't start because kubernetes apiserver couldn't work without Etcd up, there was nobody(it was flanneld on node04) to automatically inject ARP table of flannel vxlan interface on node04 with node01's POD IPs to node01's flannel vxlan interface's MAC. So all PODs on nodes except node04 couldn't be reached from node04 due to ARP miss. This can be confirmed by this command on node04:

sudo arp -i flannel.1 -s 172.16.0.3  MAC-of-flannel.1-on-node01

Then ping from 172.16.0.3 to 172.16.3.10 works.

Dieken · 2017-03-13T14:42:00Z

I feel it's better flanneld checks bridge fdb and subnet lease before it exits due to broken k8s apiserver. If the fdb and subnet lease are valid, flanneld can do its best to keep injecting ARP table.

tomdee · 2017-11-03T23:16:24Z

The vxlan code was significantly changed in the last couple of releases so I don't think this is till a problem.

Dieken · 2017-11-05T09:19:04Z

@tomdee

Thank you very much!!! That's so awesome!!! I just verified, flanneld now injects permanet ARP table entries for each pod subnets of other nodes, so exit of flanneld won't affect the communication among pods any more.

I have 8 nodes, the picture was captured from a node with pod subnet 172.29.2.0/24.

root@k8s-dev-a04:~# uname -a
Linux k8s-dev-a04 4.4.0-98-generic #121-Ubuntu SMP Tue Oct 10 14:24:03 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
root@k8s-dev-a04:~# lsb_release -a
LSB Version:	core-9.20160110ubuntu0.2-amd64:core-9.20160110ubuntu0.2-noarch:security-9.20160110ubuntu0.2-amd64:security-9.20160110ubuntu0.2-noarch
Distributor ID:	Ubuntu
Description:	Ubuntu 16.04.3 LTS
Release:	16.04
Codename:	xenial

Dieken changed the title ~~network between kubernetes PODs is down after one flanned is stopped~~ network between kubernetes PODs is down after one flanneld is stopped Mar 12, 2017

tomdee changed the title ~~network between kubernetes PODs is down after one flanneld is stopped~~ network between kubernetes PODs is down after one flanneld is stopped and datastore can't be reached Mar 22, 2017

tomdee added the components/backend/vxlan label Mar 22, 2017

tomdee closed this as completed Nov 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

network between kubernetes PODs is down after one flanneld is stopped and datastore can't be reached #636

network between kubernetes PODs is down after one flanneld is stopped and datastore can't be reached #636

Dieken commented Mar 12, 2017 •

edited

Dieken commented Mar 12, 2017 •

edited

Dieken commented Mar 13, 2017 •

edited

Dieken commented Mar 13, 2017

Dieken commented Mar 13, 2017

tomdee commented Nov 3, 2017

Dieken commented Nov 5, 2017 •

edited

network between kubernetes PODs is down after one flanneld is stopped and datastore can't be reached #636

network between kubernetes PODs is down after one flanneld is stopped and datastore can't be reached #636

Comments

Dieken commented Mar 12, 2017 • edited

Dieken commented Mar 12, 2017 • edited

Dieken commented Mar 13, 2017 • edited

Dieken commented Mar 13, 2017

Dieken commented Mar 13, 2017

tomdee commented Nov 3, 2017

Dieken commented Nov 5, 2017 • edited

Dieken commented Mar 12, 2017 •

edited

Dieken commented Mar 12, 2017 •

edited

Dieken commented Mar 13, 2017 •

edited

Dieken commented Nov 5, 2017 •

edited