Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpf: fix same insn cannot be used with different pointers in proxy.h #32169

Closed

Conversation

mhofstetter
Copy link
Member

@mhofstetter mhofstetter commented Apr 24, 2024

With the following helm config enabled, Cilium Agent fails to load BPF programs on main.

The error only occurs if verbose datapath logging is enabled.

Tested on a local kind cluster

make kind
make kind-debug

helm values (in addition to default ones)

debug:
  enabled: true
  verbose: "datapath"
bpf:
  tproxy: true

error:

Verifier error: program cil_to_host: load program: invalid argument: same insn cannot be used with different pointers (356 line(s) omitted)
Verifier log: load program: invalid argument:
...
; sk = sk_lookup_udp(ctx, tuple, len, BPF_F_CURRENT_NETNS, 0);
829: (bf) r1 = r6
830: (b4) w3 = 36
831: (b7) r4 = -1
832: (b7) r5 = 0
833: (85) call bpf_sk_lookup_udp#85
last_idx 833 first_idx 827
regs=8 stack=0 before 832: (b7) r5 = 0
regs=8 stack=0 before 831: (b7) r4 = -1
regs=8 stack=0 before 830: (b4) w3 = 36
834: (b4) w7 = -178
; if (!sk)
835: (15) if r0 == 0x0 goto pc+38
 R0_w=sock(id=0,ref_obj_id=29,off=0,imm=0) R6=ctx(id=0,off=0,imm=0) R7_w=inv4294967118 R8=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R9=inv(id=22,umax_value=65535,var_off=(0x0; 0xffff)) R10=fp0 fp-16=???mmmmm fp-24=mmmmmmmm fp-32=mmmmmmmm fp-40=mmmmmmmm fp-48=mmmmmmmm fp-72=????mmmm fp-80=mmmmmmmm fp-88=mmmmmmmm refs=29
;
836: (61) r1 = *(u32 *)(r6 +16)
837: (63) *(u32 *)(r10 -104) = r1
838: (61) r7 = *(u32 *)(r0 +4)
same insn cannot be used with different pointers
processed 1261 insns (limit 1000000) max_states_per_insn 4 total_states 68 peak_states 67 mark_read 19

This breaks datapath connectivitiy in the cluster (e.g. coredns crash-loops and is unable to connect to k8s api server)

This commit fixes this issue by extracting a variable family.

Related to: #31418 (Bisected it down to this PR (e4316b9d044ec2fe437ffc1b1094a4cb4c5a20d6 doesn't work - af9bfba4e8fa4c80bd75e4d46f667ad757e6b073 works)

Note: I'm not sure whether this is THE right and only fix or whether the related PR introduced some other issues that might arise for other configs.

@mhofstetter mhofstetter added sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. area/proxy Impacts proxy components, including DNS, Kafka, Envoy and/or XDS servers. release-note/misc This PR makes changes that have no direct user impact. labels Apr 24, 2024
@mhofstetter mhofstetter force-pushed the pr/mhofstetter/tproxy-debug-insn branch from ece9e4d to cec8aec Compare April 24, 2024 16:14
Copy link
Contributor

@gentoo-root gentoo-root left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I'm not sure whether this is THE right and only fix or whether the related PR introduced some other issues that might arise for other configs.

If it works, it works :). We have tests for various configs. Apparently yours was not covered. Could you add the needed coverage in your PR? If we have all known configs covered, we don't need to worry about one fix breaking it elsewhere.

bpf/lib/proxy.h Outdated Show resolved Hide resolved
With the following helm config enabled, Cilium Agent fails to load BPF programs.

The error only occurs if verbose datapath logging is enabled.

```
debug:
  enabled: true
  verbose: "datapath"
bpf:
  tproxy: true
```

error:

```
; sk = sk_lookup_udp(ctx, tuple, len, BPF_F_CURRENT_NETNS, 0);
829: (bf) r1 = r6
830: (b4) w3 = 36
831: (b7) r4 = -1
832: (b7) r5 = 0
833: (85) call bpf_sk_lookup_udp#85
last_idx 833 first_idx 827
regs=8 stack=0 before 832: (b7) r5 = 0
regs=8 stack=0 before 831: (b7) r4 = -1
regs=8 stack=0 before 830: (b4) w3 = 36
834: (b4) w7 = -178
; if (!sk)
835: (15) if r0 == 0x0 goto pc+38
 R0_w=sock(id=0,ref_obj_id=29,off=0,imm=0) R6=ctx(id=0,off=0,imm=0) R7_w=inv4294967118 R8=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R9=inv(id=22,umax_value=65535,var_off=(0x0; 0xffff)) R10=fp0 fp-16=???mmmmm fp-24=mmmmmmmm fp-32=mmmmmmmm fp-40=mmmmmmmm fp-48=mmmmmmmm fp-72=????mmmm fp-80=mmmmmmmm fp-88=mmmmmmmm refs=29
;
836: (61) r1 = *(u32 *)(r6 +16)
837: (63) *(u32 *)(r10 -104) = r1
838: (61) r7 = *(u32 *)(r0 +4)
same insn cannot be used with different pointers
processed 1261 insns (limit 1000000) max_states_per_insn 4 total_states 68 peak_states 67 mark_read 19
```

This commit fixes this issue by extracting a variable `family`.

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
@gentoo-root
Copy link
Contributor

/test

gentoo-root added a commit that referenced this pull request Apr 24, 2024
Run the verifier tests on the debug configurations, because it may
expose verifier errors.

Ref: #32169
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
gentoo-root added a commit that referenced this pull request Apr 25, 2024
Run the verifier tests on the debug configurations, because it may
expose verifier errors.

Ref: #32169
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
gentoo-root added a commit that referenced this pull request May 3, 2024
When DEBUG is enabled, Clang 17 optimizes the final part of
assign_socket_tcp and assign_socket_udp by keeping only one instance of
the code and jumping to it from two places. It means that the same
struct bpf_sock *sk pointer can point to the return value of either
skc_lookup_tcp or sk_lookup_udp. Unfortunately, for the verifier these
pointer types are different (PTR_TO_SOCK_COMMON and PTR_TO_SOCKET), and
it doesn't allow such pattern with the following error:

; sk = sk_lookup_udp(ctx, tuple, len, BPF_F_CURRENT_NETNS, 0);
829: (bf) r1 = r6
830: (b4) w3 = 36
831: (b7) r4 = -1
832: (b7) r5 = 0
833: (85) call bpf_sk_lookup_udp#85
last_idx 833 first_idx 827
regs=8 stack=0 before 832: (b7) r5 = 0
regs=8 stack=0 before 831: (b7) r4 = -1
regs=8 stack=0 before 830: (b4) w3 = 36
834: (b4) w7 = -178
; if (!sk)
835: (15) if r0 == 0x0 goto pc+38
 R0_w=sock(id=0,ref_obj_id=29,off=0,imm=0) R6=ctx(id=0,off=0,imm=0) R7_w=inv4294967118 R8=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R9=inv(id=22,umax_value=65535,var_off=(0x0; 0xffff)) R10=fp0 fp-16=???mmmmm fp-24=mmmmmmmm fp-32=mmmmmmmm fp-40=mmmmmmmm fp-48=mmmmmmmm fp-72=????mmmm fp-80=mmmmmmmm fp-88=mmmmmmmm refs=29
;
836: (61) r1 = *(u32 *)(r6 +16)
837: (63) *(u32 *)(r10 -104) = r1
838: (61) r7 = *(u32 *)(r0 +4)
same insn cannot be used with different pointers
processed 1261 insns (limit 1000000) max_states_per_insn 4 total_states 68 peak_states 67 mark_read 19

Work around it by adding READ_ONCE to the problematic dereference.
Although it doesn't guarantee that the same problem won't happen, it
helps in this case.

Ref: #32169
Fixes: 673d481 ("images: Update LLVM to 17.0.6")
Reported-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Co-developed-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
gentoo-root added a commit that referenced this pull request May 3, 2024
When DEBUG is enabled, Clang 17 optimizes the final part of
assign_socket_tcp and assign_socket_udp by keeping only one instance of
the code and jumping to it from two places. It means that the same
struct bpf_sock *sk pointer can point to the return value of either
skc_lookup_tcp or sk_lookup_udp. Unfortunately, for the verifier these
pointer types are different (PTR_TO_SOCK_COMMON and PTR_TO_SOCKET), and
it doesn't allow such pattern with the following error:

; sk = sk_lookup_udp(ctx, tuple, len, BPF_F_CURRENT_NETNS, 0);
829: (bf) r1 = r6
830: (b4) w3 = 36
831: (b7) r4 = -1
832: (b7) r5 = 0
833: (85) call bpf_sk_lookup_udp#85
last_idx 833 first_idx 827
regs=8 stack=0 before 832: (b7) r5 = 0
regs=8 stack=0 before 831: (b7) r4 = -1
regs=8 stack=0 before 830: (b4) w3 = 36
834: (b4) w7 = -178
; if (!sk)
835: (15) if r0 == 0x0 goto pc+38
 R0_w=sock(id=0,ref_obj_id=29,off=0,imm=0) R6=ctx(id=0,off=0,imm=0) R7_w=inv4294967118 R8=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R9=inv(id=22,umax_value=65535,var_off=(0x0; 0xffff)) R10=fp0 fp-16=???mmmmm fp-24=mmmmmmmm fp-32=mmmmmmmm fp-40=mmmmmmmm fp-48=mmmmmmmm fp-72=????mmmm fp-80=mmmmmmmm fp-88=mmmmmmmm refs=29
;
836: (61) r1 = *(u32 *)(r6 +16)
837: (63) *(u32 *)(r10 -104) = r1
838: (61) r7 = *(u32 *)(r0 +4)
same insn cannot be used with different pointers
processed 1261 insns (limit 1000000) max_states_per_insn 4 total_states 68 peak_states 67 mark_read 19

Work around it by adding READ_ONCE to the problematic dereference.
Although it doesn't guarantee that the same problem won't happen, it
helps in this case.

Ref: #32169
Fixes: 673d481 ("images: Update LLVM to 17.0.6")
Reported-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Co-developed-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
gentoo-root added a commit that referenced this pull request May 3, 2024
Run the verifier tests on the debug configurations, because it may
expose verifier errors.

Ref: #32169
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
gentoo-root added a commit that referenced this pull request May 3, 2024
When DEBUG is enabled, Clang 17 optimizes the final part of
assign_socket_tcp and assign_socket_udp by keeping only one instance of
the code and jumping to it from two places. It means that the same
struct bpf_sock *sk pointer can point to the return value of either
skc_lookup_tcp or sk_lookup_udp. Unfortunately, for the verifier these
pointer types are different (PTR_TO_SOCK_COMMON and PTR_TO_SOCKET), and
it doesn't allow such pattern with the following error:

; sk = sk_lookup_udp(ctx, tuple, len, BPF_F_CURRENT_NETNS, 0);
829: (bf) r1 = r6
830: (b4) w3 = 36
831: (b7) r4 = -1
832: (b7) r5 = 0
833: (85) call bpf_sk_lookup_udp#85
last_idx 833 first_idx 827
regs=8 stack=0 before 832: (b7) r5 = 0
regs=8 stack=0 before 831: (b7) r4 = -1
regs=8 stack=0 before 830: (b4) w3 = 36
834: (b4) w7 = -178
; if (!sk)
835: (15) if r0 == 0x0 goto pc+38
 R0_w=sock(id=0,ref_obj_id=29,off=0,imm=0) R6=ctx(id=0,off=0,imm=0) R7_w=inv4294967118 R8=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R9=inv(id=22,umax_value=65535,var_off=(0x0; 0xffff)) R10=fp0 fp-16=???mmmmm fp-24=mmmmmmmm fp-32=mmmmmmmm fp-40=mmmmmmmm fp-48=mmmmmmmm fp-72=????mmmm fp-80=mmmmmmmm fp-88=mmmmmmmm refs=29
;
836: (61) r1 = *(u32 *)(r6 +16)
837: (63) *(u32 *)(r10 -104) = r1
838: (61) r7 = *(u32 *)(r0 +4)
same insn cannot be used with different pointers
processed 1261 insns (limit 1000000) max_states_per_insn 4 total_states 68 peak_states 67 mark_read 19

Work around it by adding READ_ONCE to the problematic dereference.
Although it doesn't guarantee that the same problem won't happen, it
helps in this case.

Ref: #32169
Fixes: 673d481 ("images: Update LLVM to 17.0.6")
Reported-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Co-developed-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
gentoo-root added a commit that referenced this pull request May 3, 2024
Run the verifier tests on the debug configurations, because it may
expose verifier errors.

Ref: #32169
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
@mhofstetter
Copy link
Member Author

Closing in favor of #32170

@mhofstetter mhofstetter closed this May 6, 2024
@mhofstetter mhofstetter deleted the pr/mhofstetter/tproxy-debug-insn branch May 6, 2024 08:32
github-merge-queue bot pushed a commit that referenced this pull request May 7, 2024
When DEBUG is enabled, Clang 17 optimizes the final part of
assign_socket_tcp and assign_socket_udp by keeping only one instance of
the code and jumping to it from two places. It means that the same
struct bpf_sock *sk pointer can point to the return value of either
skc_lookup_tcp or sk_lookup_udp. Unfortunately, for the verifier these
pointer types are different (PTR_TO_SOCK_COMMON and PTR_TO_SOCKET), and
it doesn't allow such pattern with the following error:

; sk = sk_lookup_udp(ctx, tuple, len, BPF_F_CURRENT_NETNS, 0);
829: (bf) r1 = r6
830: (b4) w3 = 36
831: (b7) r4 = -1
832: (b7) r5 = 0
833: (85) call bpf_sk_lookup_udp#85
last_idx 833 first_idx 827
regs=8 stack=0 before 832: (b7) r5 = 0
regs=8 stack=0 before 831: (b7) r4 = -1
regs=8 stack=0 before 830: (b4) w3 = 36
834: (b4) w7 = -178
; if (!sk)
835: (15) if r0 == 0x0 goto pc+38
 R0_w=sock(id=0,ref_obj_id=29,off=0,imm=0) R6=ctx(id=0,off=0,imm=0) R7_w=inv4294967118 R8=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R9=inv(id=22,umax_value=65535,var_off=(0x0; 0xffff)) R10=fp0 fp-16=???mmmmm fp-24=mmmmmmmm fp-32=mmmmmmmm fp-40=mmmmmmmm fp-48=mmmmmmmm fp-72=????mmmm fp-80=mmmmmmmm fp-88=mmmmmmmm refs=29
;
836: (61) r1 = *(u32 *)(r6 +16)
837: (63) *(u32 *)(r10 -104) = r1
838: (61) r7 = *(u32 *)(r0 +4)
same insn cannot be used with different pointers
processed 1261 insns (limit 1000000) max_states_per_insn 4 total_states 68 peak_states 67 mark_read 19

Work around it by adding READ_ONCE to the problematic dereference.
Although it doesn't guarantee that the same problem won't happen, it
helps in this case.

Ref: #32169
Fixes: 673d481 ("images: Update LLVM to 17.0.6")
Reported-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Co-developed-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
github-merge-queue bot pushed a commit that referenced this pull request May 7, 2024
Run the verifier tests on the debug configurations, because it may
expose verifier errors.

Ref: #32169
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/proxy Impacts proxy components, including DNS, Kafka, Envoy and/or XDS servers. release-note/misc This PR makes changes that have no direct user impact. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants