v1.11 backports 2023-06-08 (ipsec fixes) #26021

joamaki · 2023-06-08T09:55:15Z

ipsec: Fix XfrmInNoStates drops on ESK & AKS upgrades #25724 -- ipsec: Fix XfrmInNoStates drops on ESK & AKS upgrades (@pchaigno)
- Minor conflicts in node.go due to Change cilium_host IPv6 address #24208 not being in v1.11.
- Trivial conflicts in config.go and daemon_main.go due to viper->vp change.
ipsec: Reinitialize IPsec for new devices in ENI mode #25744 -- ipsec: Reinitialize IPsec for new devices in ENI mode (@joamaki)
- Trivial conflicts in includes and due to minor code movements in loader.go.
ipsec: Fix leak of XFRM OUT policies #25784 -- ipsec: Fix leak of XFRM OUT policies (@pchaigno)
- Trivial conflict in node_ids.go due to a new function added in main.
ipsec: Fix XfrmOutPolBlock drops on upgrades #25735 -- ipsec: Fix XfrmOutPolBlock drops on upgrades (@pchaigno)
- Minor conflict in includes of state.go.
ipsec: Fix IPsec reinitialization #25936 (@joamaki)
ipsec: Change XFRM FWD policy to simplest wildcard #25953 (@pchaigno)
ipsec: Fix cleanup of XFRM states and policies #26072 (@pchaigno)
ipsec: Don't attempt per-node route deletion when unexistant #26093 (@pchaigno)

Once this PR is merged, you can update the PR labels via:

$ for pr in 25724 25744 25784 25735 25936 25953 26072 26093; do contrib/backporting/set-labels.py $pr done 1.11; done

joamaki · 2023-06-08T10:06:51Z

This PR replaced #25895. Only change here is the addition of the last two commits.

joamaki · 2023-06-08T14:33:50Z

Waiting on #25993 to be merged to fix test runs.

pchaigno · 2023-06-09T11:07:16Z

I think we should merge this without waiting for #25993.

gandro · 2023-06-12T13:19:52Z

I have pushed two more PRs to this branch, since those PRs also need to be backported as part of the IPSec fixes to v1.11 and depend on other commits. I will update the PR description accordingly.

qmonnet · 2023-06-13T09:20:33Z

Yet another one to the list:

ipsec: Don't attempt per-node route deletion when unexistant #26093

[ upstream commit 600c7d4 ] The XFRM IN policies and states didn't change so we should never need to remove any stale XFRM IN configs. Let's thus simplify the logic to find stale policies and states accordingly. I would expect this incorrect removal to cause a few drops on agent restart, but after multiple attempts to reproduce on small (3 nodes) and larger (20) clusters (EKS & GKE) with a drop-sensitive application (migrate-svc), I'm not able to see such drops. I'm guessing this is because we reinstall the XFRM IN configs right after we removed them so there isn't really much time for a packet to be received and dropped. Fixes: 688dc9a ("ipsec: Remove stale XFRM states and policies") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

[ upstream commit bbc50a3 ] This small bit of refactoring will make later changes a bit easier. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

[ upstream commit 2ada282 ] No functional changes. Best viewed with git show -b or the equivalent on GitHub to not show space-only changes. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

[ upstream commit 64f4c23 ] No functional changes. Best viewed with git show -b or the equivalent on GitHub to not show space-only changes. Same as the previous commit but on a different set of conditions. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

[ upstream commit 66c45ac ] This is a partial revert of commit 3e59b68 ("ipsec: Per-node XFRM states & policies for EKS & AKS"). One change that commit 3e59b68 ("ipsec: Per-node XFRM states & policies for EKS & AKS") make on EKS and AKS was to switch from using NodeInternalIPs to using CiliumInternalIPs for outer IPsec (ESN) IP addresses. That made the logic more consistent with the logic we use for other IPAM schemes (e.g., GKE). It however causes serious connectivity issues on upgrades and downgrades. This is mostly because typically not all nodes are updated to the new Cilium version at the same time. If we consider two pods on nodes A and B trying to communicate, then node A may be using the old NodeInternalIPs while node B is already on the new CiliumInternalIPs. When node B sends traffic to node A, node A doesn't have the XFRM state IN necessary to decrypt it. The same happens in the other direction. This commit reintroduces the NodeInternalIPs for EKS and AKS. Subsequent commits will introduce additional changes to enable a proper migration path from NodeInternalIPs to CiliumInternalIPs. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

[ upstream commit 6b3b50d ] On EKS and AKS, we currently use NodeInternalIPs for the IPsec tunnels. A subsequent commit will allow us to change that to switch to using CiliumInternalIPs (as done on GKE). For that to be possible without breaking inter-node connectivity for the whole duration of the switch, we need an intermediate mode where both CiliumInternalIPs and NodeInternalIPs are accepted on ingress. The idea is that we will then have a two-steps migration from NodeInternalIP to CiliumInternalIP: 1. All nodes are using NodeInternalIP. 2. Upgrade to the version of Cilium that supports both NodeInternalIP and CiliumInternalIP and encapsulates IPsec traffic with NodeInternalIP. 3. Via an agent flag, tell Cilium to switch to encapsulating IPsec traffic with CiliumInternalIP. 4. All nodes are using CiliumInternalIP. This commit implements the logic for step 2 above. To that end, we will duplicate the XFRM IN states such that we have both: src 0.0.0.0 dst [NodeInternalIP] # existing src 0.0.0.0 dst [CiliumInternalIP] # new thus matching and being able to receive IPsec packets with an outer destination IP of either NodeInternalIP or CiliumInternalIP. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

[ upstream commit 963e45b ] On EKS and AKS, IPsec used NodeInternalIPs for the encapsulation. This commit introduces a new flag to allow switching from NodeInternalIPs to CiliumInternalIPs; it defaults to the former. This new flag allows for step 3 of the migration plan defined in the previous commit. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

[ upstream commit e880002 ] reinitializeIPSec only runs the interface detection if EncryptInterface is empty. Since it sets it after detecting interfaces, it will only run the detection once. Let's change that to run the detection even if the EncryptInterface list isn't empty. That will allow us to rerun the detection when new ENI devices are added on EKS. One consequence of this change is that we will now attach to all interfaces even if the user configured --encrypt-interface. That is fine because --encrypt-interface shouldn't actually be used in ENI mode. In ENI mode, we want to attach to all interfaces as we don't have a guarantee on which interface the IPsec traffic will come in. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

[ upstream commit bf0940b ] If IPsec is enabled along with the ENI IPAM mode we need to load the bpf_network program onto new ENI devices when they're added at runtime. To fix this, we subscribe to netlink link updates to detect when new (non-veth) devices are added and reinitialize IPsec to load the BPF program onto the devices. The compilation of the bpf_netowrk program has been moved to Reinitialize() to avoid spurious recompilation on reinitialize. Signed-off-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

[ upstream commit 3201a5e ] This commit simply refactors some existing code into a new getNodeIDForNode function. This function will be called from elsewhere in a subsequent commit. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

[ upstream commit 9cc8a89 ] Our logic to clean up old XFRM configs on node deletion currently relies on the destination IP to identify the configs to remove. That doesn't work with ENI and Azure IPAMs, but until recently, it didn't need to. On ENI and Azure IPAMs we didn't have per-node XFRM configs. That changed in commit 3e59b68 ("ipsec: Per-node XFRM states & policies for EKS & AKS"). We now need to clean up per-node XFRM configs for ENI and Azure IPAM modes as well, and we can't rely on the destination IP for that because the XFRM policies don't match on that destination IP. Instead, since commit 73c36d4 ("ipsec: Match OUT XFRM states & policies using node IDs"), we match the per-node XFRM configs using node IDs encoded in the packet mark. The good news is that this is true for all IPAM modes (whether Azure, ENI, cluster-pool, or something else). So our cleanup logic can now rely on the node ID of the deleted node to clean up its XFRM states and policies. This commit implements that. Fixes: 3e59b68 ("ipsec: Per-node XFRM states & policies for EKS & AKS") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

[ upstream commit 3e898f2 ] This commit lowers the priority of the catch-all default-drop XFRM OUT policies, from 1 to 100. For context, 0 is the highest possible priority. This change will allow us to introduce several levels of priorities for XFRM OUT policies in subsequent commits. Diff before/after this patch: src 0.0.0.0/0 dst 0.0.0.0/0 - dir out action block priority 1 + dir out action block priority 100 mark 0xe00/0xf00 Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

[ upstream commit a11d088 ] This is a revert, or rather a reimplementation, of commit 688dc9a ("ipsec: Remove stale XFRM states and policies"). In that commit, we would remove the old XFRM OUT policies and states because they conflict with the new ones and prevent the installation to proceed. This removal however causes a short window of packet drops on upgrade, between the time the old XFRM configs are removed and the new ones are added. These drops would show up as XfrmOutPolBlock because packets then match the catch-all default-drop XFRM policy. Instead of removing the old XFRM configs, a better, less-disruptive approach is to deprioritize them and add the new ones in front. To that end, we "lower" the priority of the old XFRM OUT policy from 0 to 50 (0 is the highest-possible priority). By doing this the XFRM OUT state is also indirectly deprioritized because it is selected by the XFRM OUT policy. As with the code from commit 688dc9a ("ipsec: Remove stale XFRM states and policies"), this whole logic can be removed in v1.15, once we are sure that nobody is upgrading with the old XFRM configs in place. At that point, we will be able to completely clean up those old XFRM configs. The priority of 50 was chosen arbitrarily, to be between the priority of new XFRM OUT configs (0) and the priority of the catch-all default-drop policy (100), while leaving space if we need to add additional rules of different priorities. Diff before/after upgrade (v1.13.0 -> this patch, GKE): src 10.24.1.0/24 dst 10.24.2.0/24 - dir out priority 0 + dir out priority 50 mark 0x3e00/0xff00 tmpl src 10.24.1.77 dst 10.24.2.207 proto esp spi 0x00000003 reqid 1 mode tunnel Fixes: 688dc9a ("ipsec: Remove stale XFRM states and policies") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

[ upstream commit 0af2303 ] This commit reverts the bpf_host changes of commit 4c7cce1 ("bpf: Remove IP_POOLS IPsec code"). The IP_POOLS IPsec code was a hack to avoid having one XFRM OUT policy and state per remote node. Instead, we had a single XFRM OUT policy and state that would encrypt traffic as usual, but encapsulate it with placeholder IP addresses, such as 0.0.0.0 -> 192.168.0.0. Those outer IP addresses would then be rewritten to the proper IPs in bpf_host. To that end, bpf_lxc would pass the destination IP address, the tunnel endpoint, to bpf_host via a skb->cb slot. The source IP address was hardcoded in the object file. Commit 4c7cce1 ("bpf: Remove IP_POOLS IPsec code") thus got rid of that hack to instead have per-node XFRM OUT policies and states. The kernel therefore directly writes the proper outer IP addresses. Unfortunately, the transition from one implementation to the other isn't so simple. If we simply remove the old IP_POOLS code as done in commit 4c7cce1, then we will have drops on upgrade. We have two cases, depending on which of bpf_lxc or bpf_host is reloaded first: 1. If bpf_host is reloaded before the new bpf_lxc is loaded, then it won't rewrite the outer IP addresses anymore. In that case, we end up with packets of the form 0.0.0.0 -> 192.168.0.0 leaving on the wire. Obviously, they don't go far and end up dropped. 2. If bpf_lxc is reloaded before the new bpf_host, then it will reuse skb->cb for something else and the XFRM layer will handle the outer IP addresses. But because bpf_host is still on the old implementation, it will try to use skb->cb to rewrite the outer IP addresses. We thus end up with gibberish outer destination IP addresses. One way to fix this is to have bpf_host support both implementations. This is what this commit does. The logic to rewrite the outer IP addresses is reintroduced in bpf_host, but it is only executed if the outer source IP address is 0.0.0.0. That way, we will only rewrite the outer IP addresses if bpf_lxc is on the old implementation and the XFRM layer didn't write the proper outer IPs. Fixes: 4c7cce1 ("bpf: Remove IP_POOLS IPsec code") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

[ upstream commit ca9c056 ] As explained in the previous commit, we need to switch our IPsec logic from one implementation to another. This implementation requires some synchronized work between bpf_lxc and bpf_host. To enable this switch without causing drops, the previous commit made bpf_host support both implementations. This is quite enough though. For this to work, we need to ensure that bpf_host is always reloaded before any bpf_lxc is loaded. That is, we need to load the bpf_host program that supports both implementations before we actually start the switch from one implementation to the second. This commit makes that change in the order of BPF program reloads. Instead of regenerating the bpf_host program (i.e., the host endpoint's datapath) in a goroutine like other BPF programs, we will regenerate it first, as a blocking operation. Regenerating the host endpoint's datapath separately like this will delay the agent startup. This regeneration was measured to take around 1 second on an EKS cluster (though it can probably grow to a few seconds depending on the node type and current load). That should stay fairly small compared to the overall duration of the agent startup (around 30 seconds). Nevertheless, this separate regeneration is only performed when we actually need: for IPsec with EKS or AKS IPAM mode. Fixes: 4c7cce1 ("bpf: Remove IP_POOLS IPsec code") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

[ upstream commit c0d9b8c ] Commit 73c36d4 ("ipsec: Match OUT XFRM states & policies using node IDs") changed our XFRM states to match on packet marks of the form 0xXXXXYe00/0xffffff00 where XXXX is the node ID and Y is the SPI. The previous format for the packet mark in XFRM states was 0xYe00/0xff00. According to the Linux kernel these two states conflict (because 0xXXXXYe00/0xffffff00 ∈ 0xYe00/0xff00). That means we can't add the new state while the old one is around. Thus, in commit ddd491b ("ipsec: Custom check for XFRM state existence"), we removed any old conflicting XFRM state before adding the new ones. That however causes packet drops on upgrades because we may remove the old XFRM state before bpf_lxc has been updated to use the new 0xXXXXYe00/0xffffff00 mark. Instead, we would need both XFRM state formats to coexist for the duration of the upgrade. Impossible, you say! Don't despair. Things are actually a bit more complicated (it's IPsec and Linux after all). While Linux doesn't allow us to add 0xXXXXYe00/0xffffff00 when 0xYe00/0xff00 exists, it does allow adding in the reverse order. That seems to be because 0xXXXXYe00/0xffffff00 ∈ 0xYe00/0xff00 but 0xYe00/0xff00 ∉ 0xXXXXYe00/0xffffff00 [1]. Therefore, to have both XFRM states coexist, we can remove the old state, add the new one, then re-add the old state. That is allowed because we never try to add the new state when the old is present. During the short period of time when we have removed the old XFRM state, we can have a packet drops due to the missing state. These drops should be limited to the specific node pair this XFRM state is handling. This will also only happen on upgrades. Finally, this shouldn't happen with ENI and Azure IPAM modes because they don't have such old conflicting states. I tested this upgrade path on a 20-nodes GKE cluster running our drop-sensitive application, migrate-svc, scaled up to 50 clients and 30 backends. I didn't get a single packet drop despite the application consistently sending packets back and forth between nodes. Thus, I think the window for drops to happen is really small. Diff before/after the upgrade (v1.13.0 -> thi patch, GKE): src 10.24.1.77 dst 10.24.2.207 proto esp spi 0x00000003 reqid 1 mode tunnel replay-window 0 mark 0x3e00/0xff00 output-mark 0xe00/0xf00 aead rfc4106(gcm(aes)) 0xfc2d0c4e646b87ff2d0801b57997e3598eab0d6b 128 - anti-replay context: seq 0x0, oseq 0x2c, bitmap 0x00000000 + anti-replay context: seq 0x0, oseq 0x16, bitmap 0x00000000 sel src 0.0.0.0/0 dst 0.0.0.0/0 + src 10.24.1.77 dst 10.24.2.207 + proto esp spi 0x00000003 reqid 1 mode tunnel + replay-window 0 + mark 0x713f3e00/0xffffff00 output-mark 0xe00/0xf00 + aead rfc4106(gcm(aes)) 0xfc2d0c4e646b87ff2d0801b57997e3598eab0d6b 128 + anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000 + sel src 0.0.0.0/0 dst 0.0.0.0/0 We can notice that the counters for the existing XFRM state also changed (decreased). That's expected since the state got recreated. 1 - I think this is because XFRM states don't have priorities. So when two XFRM states would match a given packet (in our case a packet with mark XXXXYe00), the oldest XFRM state is taken. Thus, by not allowing to add a more specific match after a more generic one, the kernel ensures that the more specific match is always taken when both match a packet. That likely corresponds to user expectations. That is, if both 0xXXXXYe00/0xffffff00 and 0xYe00/0xff00 match a packet, we would probably expect 0xXXXXYe00/0xffffff00 to be used. Fixes: ddd491b ("ipsec: Custom check for XFRM state existence") Fixes: 73c36d4 ("ipsec: Match OUT XFRM states & policies using node IDs") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

[ upstream commit 4704655 ] Now that the code is reloading the bpf_network program at runtime we should not fatal if we fail to reload the program since this may be caused by ongoing interface changes (e.g. interface was being removed). Change the log.Fatal into log.Error and keep loading to other interfaces. Fixes: bf0940b ("loader: Reinitialize IPsec on device changes on ENI") Signed-off-by: Jussi Maki <jussi@isovalent.com>

[ upstream commit 592777d ] reloadIPSecOnLinkChanges() did not ignore veth device updates causing reload to be triggered when new endpoints were created. Ignore any updates with "veth" as device type. The draining of updates during settle wait was broken due to unintentional breaking out of the loop. Removed the break. Fixes: bf0940b ("loader: Reinitialize IPsec on device changes on ENI") Signed-off-by: Jussi Maki <jussi@isovalent.com>

[ upstream commit ac54f29 ] We recently changed our XFRM configuration to have one XFRM OUT policy per remote node, regardless of the IPAM mode being used. In doing so, we also moved the XFRM FWD policy to be installed once per remote node. With ENI and Azure IPAM modes, this wouldn't cause any issue because the XFRM FWD policy is the same regardless of the remote node. On other IPAM modes, however, the XFRM FWD policy is for some reason different depending on the remote node that triggered the installation. As a result, for those IPAM modes, one FWD policy is installed per remote node. And the deletion logic triggered on node deletions wasn't updated to take that into account. We thus have a leak of XFRM FWD policies. In the end, our FWD policy just needs to allow everything through without encrypting it. It doesn't need to be specific to any remote node. We can simply completely wildcard the match, to look like: src 0.0.0.0/0 dst 0.0.0.0/0 dir fwd priority 2975 ptype main tmpl src 0.0.0.0 dst 192.168.134.181 proto esp reqid 1 mode tunnel level use So we match all packets regardless of source and destination IPs. We don't match on the packet mark. There's a small implementation hurdle here. Because we used to install FWD policies of the form "src 0.0.0.0/0 dst 10.0.1.0/24", the kernel was able to deduce which IP family we are matching against and would adapt the 0.0.0.0/0 source CIDR to ::/0 as needed. Now that we are matching on 0/0 for both CIDRs, it cannot deduce this anymore. So instead, we must detect the IP family ourself and use the proper CIDRs. In addition to changing the XFRM FWD policy to the above, we can also stop installing it once per remote node. It's enough to install it when we receive the event for the local node, once. Fixes: 3e59b68 ("ipsec: Per-node XFRM states & policies for EKS & AKS") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

[ upstream commit d0ab559 ] We use this wildcard IPv6 CIDR in the catch-all default-drop OUT policy as well as in the FWD policy. It was incorrectly set to ::/128 instead of ::/0 and would therefore not match anything instead of matching everything. This commit fixes it. Fixes: e802c29 ("ipsec: Refactor wildcard IP variables") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

[ upstream commit 25064d1 ] After applying a backport of 9cc8a89 ("ipsec: Fix leak of XFRM policies with ENI and Azure IPAMs") to 1.11.16, I noticed that we were getting occasional spikes of "no inbound state" xfrm errors (XfrmInNoStates). These lead to packet loss and brief outages for applications sending traffic to the node on which the spikes occur. I noticed that the "No node ID found for node." logline would appear at the time of these spikes and from the code this is logged when the node ID cannot be resolved. Looking a bit further the call to `DeleteIPsecEndpoint` will end up deleting the xfrm state for any state that matches the node id as derived from the mark in the state. The problem seems to be that the inbound state for 0.0.0.0/0 -> node IP has a mark of `0xd00` which when shifted >> 16 in `getNodeIDFromXfrmMark` matches nodeID 0 and so the inbound state gets deleted and the kernel drops all the inbound traffic as it no longer matches a state. This commit updates that logic to skip the XFRM state and policy deletion when the node ID is zero. Fixes: 9cc8a89 ("ipsec: Fix leak of XFRM policies with ENI and Azure IPAMs") Signed-off-by: Steven Johnson <sjdot@protonmail.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

[ upstream commit 57eac9d ] With commit 9cc8a89 ("ipsec: Fix leak of XFRM policies with ENI and Azure IPAMs") we rely on the node ID to find XFRM states and policies that belong to remote nodes, to clean them up when remote nodes are deleted. This commit makes sure that we only do this for XFRM states and policies that actually match on these node IDs. That should only be the same if the mark mask matches on node ID bits. Thus if should look like 0xffffff00 (matches on node ID, SPI, and encryption bit) or 0xffff0f00 (matches on node and encryption bit). Fixes: 9cc8a89 ("ipsec: Fix leak of XFRM policies with ENI and Azure IPAMs") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

[ upstream commit 1e1e2f7 ] Commit 3e59b68 ("ipsec: Per-node XFRM states & policies for EKS & AKS") changed the XFRM config to have one state and policy per remote node in IPAM modes ENI and Azure. The IPsec cleanup logic was therefore also updated to call deleteIPsec() whenever a remote node is deleted. However, we missed that the cleanup logic also tries to remove the per-node IP route. In case of IPAM modes ENI and Azure, the IP route however stays as before: we have a single route for all remote nodes. We therefore don't have anything to cleanup. Because of this unnecessary IP route cleanup attempt, an error message was printed for every remote node deletion: Unable to delete the IPsec route OUT from the host routing table This commit fixes it to avoid attempting this unnecessary cleanup. Fixes: 3e59b68 ("ipsec: Per-node XFRM states & policies for EKS & AKS") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>

On kernels before 4.19, the XFRM output mark is not fully supported. Thus, when comparing XFRM states, if we compare the output marks, the existing states will never match the new state. The new state will have an output mark, but the states installed in the kernel don't have it (because the kernel ignored it). As a result, our current logic will assume that the state we want to install doesn't already exist, it will try to install it, fail because it already exist, assume there's a conflicting state, throw an error, remove the conflicting state, and install the new (but identical) one. The end result is therefore the same: the new state is in place in the kernel. But on the way to installing it, we will emit an unnecessary error and temporarily remove the state (potentially causing packet drops). Instead, we can safely ignore the output-marks when comparing states. We don't expect any states with same IPs, SPI, and marks, but different output-marks anyway. The only way this could happen is if someone manually added such a state. Even if they did, the only impact would be that we wouldn't overwrite the manually-added state with the different output-mark. This patch is only necessary on v1.12 and earlier versions of Cilium because v1.13 dropped support for Linux <4.19. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

pchaigno · 2023-06-13T12:49:01Z

/test-backport-1.11

Job 'Cilium-PR-K8s-1.22-kernel-4.19' failed:

Click to show.

Test Name

K8sDatapathConfig Host firewall With VXLAN and endpoint routes

Failure Output

FAIL: Failed to reach 192.168.56.11:80 from testserver-host-2nx98

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.22-kernel-4.19/65/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.22-kernel-4.19 so I can create one.

Then please upload the Jenkins artifacts to that issue.

Job 'Cilium-PR-K8s-1.18-kernel-4.9' failed:

Click to show.

Test Name

K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master

Failure Output

FAIL: Unable to download helm chart v1.10 from GitHub

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.18-kernel-4.9/61/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.18-kernel-4.9 so I can create one.

Then please upload the Jenkins artifacts to that issue.

gandro · 2023-06-13T17:04:53Z

test-1.18-4.9 failed extracting the Helm chart FAIL: Unable to download helm chart v1.10 from GitHub which is an unrelated failure (see above for link), restarting
test-1.22-4.19 hit CI: Host firewall: unable to upgrade connection #15455 which has been observed on a previous v1.11 backport PR too
ConformanceKind1.19 hit a issue with 1.1.1.1 which is happening on other PRs too and unrelated.

Restarting two Jenkins pipelines.

gandro · 2023-06-13T17:05:06Z

/test-1.18-4.9

gandro · 2023-06-13T17:05:20Z

/test-1.22-4.19

pchaigno · 2023-06-13T17:45:54Z

I reviewed the last three backported PRs from me and also ran manual tests to cover the last version of this backport PR.

gandro · 2023-06-13T19:21:52Z

All CI except ConformanceKind1.19 has passed. The failures in ConformanceKind1.19 are due to 1.1.1.1 rejecting our traffic and unrelated to this PR (as it's happening on other PRs and branches too). All other CI and manual testing give a green signal, so let's merge.

This was referenced Jun 8, 2023

v1.11 backports 2023-06-05 #25895

Closed

v1.11 Backports 2023-06-08 #26007

Merged

This comment was marked as outdated.

Sign in to view

joamaki requested a review from pchaigno June 8, 2023 10:06

joamaki force-pushed the pr/v1.11-backport-2023-06-08-ipsec branch from a0a33d1 to a5ab01c Compare June 8, 2023 14:50

michi-covalent mentioned this pull request Jun 8, 2023

v1.11.18 release cilium/release#124

Closed

44 tasks

pchaigno approved these changes Jun 9, 2023

View reviewed changes

joamaki marked this pull request as ready for review June 9, 2023 11:26

joamaki requested a review from a team as a code owner June 9, 2023 11:26

michi-covalent approved these changes Jun 9, 2023

View reviewed changes

gandro requested a review from pchaigno June 12, 2023 13:20

gandro mentioned this pull request Jun 12, 2023

v1.11 Backports 2023-06-12 #26119

Closed

1 task

This comment was marked as outdated.

Sign in to view

pchaigno and others added 7 commits June 13, 2023 14:14

ipsec: Split enableIPsec into IPv4 and IPv6

bbed539

[ upstream commit bbc50a3 ] This small bit of refactoring will make later changes a bit easier. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

ipsec: Inverse condition to reduce indentations

b1892e0

[ upstream commit 2ada282 ] No functional changes. Best viewed with git show -b or the equivalent on GitHub to not show space-only changes. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

pchaigno and others added 16 commits June 13, 2023 14:14

node_ids: New helper function getNodeIDForNode

6dfe76f

[ upstream commit 3201a5e ] This commit simply refactors some existing code into a new getNodeIDForNode function. This function will be called from elsewhere in a subsequent commit. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

pchaigno force-pushed the pr/v1.11-backport-2023-06-08-ipsec branch from d52c308 to a331e32 Compare June 13, 2023 12:16

gandro removed the request for review from pchaigno June 13, 2023 18:34

gandro added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 13, 2023

gandro merged commit c3ffd99 into cilium:v1.11 Jun 13, 2023
52 of 53 checks passed

michi-covalent mentioned this pull request Jun 14, 2023

Prepare for release v1.11.18 #26204

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.11 backports 2023-06-08 (ipsec fixes) #26021

v1.11 backports 2023-06-08 (ipsec fixes) #26021

joamaki commented Jun 8, 2023 •

edited by pchaigno

This comment was marked as outdated.

joamaki commented Jun 8, 2023

joamaki commented Jun 8, 2023

pchaigno commented Jun 9, 2023

gandro commented Jun 12, 2023

This comment was marked as outdated.

qmonnet commented Jun 13, 2023

This comment was marked as outdated.

pchaigno commented Jun 13, 2023 •

edited by maintainer-s-little-helper bot

Test Name

Failure Output

Test Name

Failure Output

gandro commented Jun 13, 2023

gandro commented Jun 13, 2023

gandro commented Jun 13, 2023

pchaigno commented Jun 13, 2023

gandro commented Jun 13, 2023

v1.11 backports 2023-06-08 (ipsec fixes) #26021

v1.11 backports 2023-06-08 (ipsec fixes) #26021

Conversation

joamaki commented Jun 8, 2023 • edited by pchaigno

This comment was marked as outdated.

joamaki commented Jun 8, 2023

joamaki commented Jun 8, 2023

pchaigno commented Jun 9, 2023

gandro commented Jun 12, 2023

This comment was marked as outdated.

qmonnet commented Jun 13, 2023

This comment was marked as outdated.

pchaigno commented Jun 13, 2023 • edited by maintainer-s-little-helper bot

Test Name

Failure Output

Test Name

Failure Output

gandro commented Jun 13, 2023

gandro commented Jun 13, 2023

gandro commented Jun 13, 2023

pchaigno commented Jun 13, 2023

gandro commented Jun 13, 2023

joamaki commented Jun 8, 2023 •

edited by pchaigno

pchaigno commented Jun 13, 2023 •

edited by maintainer-s-little-helper bot