aya: fix tc name limit #728

pooladkhay · 2023-08-08T22:47:47Z

The buffer for attributes should be sized to hold at least 256 bytes, based on CLS_BPF_NAME_LEN = 256 from the kernel:
https://github.com/torvalds/linux/blob/02aee814/net/sched/cls_bpf.c#L28

This pull request also includes integration tests with an eBPF program of type classifier with a 256-byte-long name.

Fixes: #610

netlify · 2023-08-08T22:47:52Z

✅ Deploy Preview for aya-rs-docs ready!

Built without sensitive environment variables

Name	Link
🔨 Latest commit	`9606690`
🔍 Latest deploy log	https://app.netlify.com/sites/aya-rs-docs/deploys/656a23481a434f00080c2ef5
😎 Deploy Preview	https://deploy-preview-728--aya-rs-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

tamird

Please squash these commits.

tamird · 2023-08-08T22:55:04Z

test/integration-ebpf/src/tc_name_limit.rs

+use aya_bpf::{macros::classifier, programs::TcContext};
+
+// This macro generates a function with arbitrary name
+macro_rules! generate_ebpf_function {


why is this macro needed? is this something forward-looking?

Since the name of that function is the name of the ebpf program and we are trying to use a 256-byte-long name, the macro is just a way to avoid writing this:

#[classifier] pub fn a..[256 times]..a(_ctx: TcContext) -> i32 { 0 }

I wonder if there is a way to dynamically generate that name inside the macro in this #![no_std] #![no_main] environment, like what we would do in userspace side: "a".repeat(256)

But you still write it =/

I just found it cleaner to pass 256 'a's to a macro than writing a function with that name directly. If that's an issue, I can remove it.

@tamird Macro was replaced by just a function.

tamird · 2023-08-08T22:56:13Z

aya/src/sys/netlink.rs

+    // The buffer for attributes should be sized to hold at least 256 bytes,
+    // based on `CLS_BPF_NAME_LEN = 256` from the kernel:
+    // https://github.com/torvalds/linux/blob/02aee814/net/sched/cls_bpf.c#L28
+    attrs: [u8; 512],


at least 256, but then why 512?

Because we currently use around ~30 bytes of attributes in addition to name.
Rather than picking a "right sized buffer" for the payload (which is of varying length anyway) we use the next largest power of 2.
I've got netlink changes planned soon, so will look at making the buffer size variable then.

Can the comment say that please?

Sure, will update the comment.

@tamird Updated the comment.

tamird · 2023-08-08T22:56:39Z

test/integration-test/src/tests/load.rs

@@ -13,6 +15,27 @@ use aya::{
 const MAX_RETRIES: usize = 100;
 const RETRY_DURATION: time::Duration = time::Duration::from_millis(10);

+#[test]
+fn tc_name_limit() {
+    let _ = qdisc_add_clsact("lo");


why is it ok to drop this value?

We need the clsact qdisc attached to the interface before we can proceed, Actually a better approach would be a function that checks that so we can call qdisc_add_clsact("lo") only when that interface doesn't have that qdisc.
I can add that and update the code...

Is there a different type of probe we can use that doesn't have this requirement?

@tamird I have added a function to check if clsact qdisc exists before trying to actually add it.
Now the pipeline is failing due to the change to the aya's public api.
I'd really appreciate it if you review the code and help me on that.

Is there a different type of probe we can use that doesn't have this requirement?

By requirement, Do you mean the need to call qdisc_add_clsact("lo")?
If yes, I don't think so.
As far as I know we can either use ingress or clsact if attaching to ingress path and to attach an ebpf program to egress path, the only option is to use clsact:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1f211a1b929c804100e138c5d3d656992cfd5622

tamird · 2023-08-08T22:57:33Z

test/integration-ebpf/src/tc_name_limit.rs

+256 is the maximum length allowed by the kernel:
+https://github.com/torvalds/linux/blob/02aee814/net/sched/cls_bpf.c#L28
+*/
+generate_ebpf_function!(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa);


what happens if the program name is 257 bytes long?

I don't remember the exact error right now but the kernel rejects that ebpf program and returns an error.
I can send the exact error in a few hours.

@tamird Sorry for the delay
This is the error you get when the name is more that 256 bytes:

Error: netlink error while attaching ebpf program to tc Caused by: Invalid argument (os error 22)

should we test that as well?

That's a good idea, Macro is probably more useful now.

@tamird An integration test was added to make sure programs with a name longer than 256 bytes will fail to attach.

dave-tucker · 2023-08-09T09:50:01Z

I'm happy with the change to up the buffer to the next nearest power of 2.
Tamir's comments need addressing before merge.

tamird · 2023-08-13T13:01:02Z

test/integration-test/src/tests/load.rs

@@ -13,6 +15,27 @@ use aya::{
 const MAX_RETRIES: usize = 100;
 const RETRY_DURATION: time::Duration = time::Duration::from_millis(10);

+#[test]
+fn tc_name_limit() {
+    let _ = qdisc_add_clsact("lo");


Is there a different type of probe we can use that doesn't have this requirement?

tamird · 2023-08-13T13:01:05Z

test/integration-ebpf/src/tc_name_limit.rs

+256 is the maximum length allowed by the kernel:
+https://github.com/torvalds/linux/blob/02aee814/net/sched/cls_bpf.c#L28
+*/
+generate_ebpf_function!(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa);


should we test that as well?

tamird · 2023-08-23T16:43:44Z

aya/src/sys/netlink.rs

+    collections::HashMap,
+    ffi::CStr,
+    io,
+    mem::{self},


{self} is redundant

tamird · 2023-08-23T16:44:08Z

aya/src/sys/netlink.rs

+    // The buffer for attributes should be sized to hold at least 256 bytes,
+    // based on `CLS_BPF_NAME_LEN = 256` from the kernel:
+    // https://github.com/torvalds/linux/blob/02aee814/net/sched/cls_bpf.c#L28
+    attrs: [u8; 512],


Can the comment say that please?

tamird · 2023-08-23T16:44:31Z

test/integration-ebpf/src/tc_name_limit.rs

+use aya_bpf::{macros::classifier, programs::TcContext};
+
+// This macro generates a function with arbitrary name
+macro_rules! generate_ebpf_function {


But you still write it =/

tamird · 2023-08-23T16:45:23Z

test/integration-test/src/tests/load.rs

+
+    let mut bpf = Bpf::load(crate::TC_NAME_LIMIT_TEST).unwrap();
+
+    // This magic number (256) is derived from the fact that the kernel


this comment is somewhat misplaced; this 256 is just defined by the fact that this is the name used in the probe code.

The idea is to ensure that Aya does not impose a limit and allows names up to the size allowed by the kernel.
I'll add this sentence as well to be more specific about the goal.

mergify · 2023-09-14T23:53:02Z

@pooladkhay, this pull request is now in conflict and requires a rebase.

alessandrod · 2023-11-19T03:45:50Z

@pooladkhay ping?

pooladkhay · 2023-11-19T21:54:15Z

@pooladkhay ping?

@alessandrod pong! 😃

Life hasn't been easy lately, I should have notified earlier.
I'll get back to this asap, please let it be open...

alessandrod · 2023-11-23T20:20:46Z

@pooladkhay ping?

@alessandrod pong! 😃

Life hasn't been easy lately, I should have notified earlier. I'll get back to this asap, please let it be open...

hugs ❤️ and sorry if I made you feel rushed it wasn't my intention

pooladkhay · 2023-11-24T23:40:26Z

@pooladkhay ping?

@alessandrod pong! 😃
Life hasn't been easy lately, I should have notified earlier. I'll get back to this asap, please let it be open...

hugs ❤️ and sorry if I made you feel rushed it wasn't my intention

No worries at all 🙋‍♂️

mergify · 2023-11-24T23:47:00Z

Hey @alessandrod, this pull request changes the Aya Public API and requires your review.

tamird

Reviewed 1 of 5 files at r3, 1 of 2 files at r4, 6 of 6 files at r5, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @alessandrod, @dave-tucker, and @pooladkhay)

-- commits line 2 at r5:
can you please squash these commits? I don't understand why 12 commits are needed.

aya/src/sys/netlink.rs line 83 at r5 (raw file):

}

pub(crate) unsafe fn netlink_clsact_qdisc_exists(if_index: i32) -> Result<bool, io::Error> {

i'm not familiar enough with netlink, @dave-tucker should review this.

test/integration-ebpf/src/tc_name_limit.rs line 6 at r5 (raw file):

use aya_bpf::{macros::classifier, programs::TcContext};

/*

I don't think we use this comment style in this project. Can you please use // .... style instead?

test/integration-ebpf/src/tc_name_limit_exceeded.rs line 6 at r5 (raw file):

use aya_bpf::{macros::classifier, programs::TcContext};

/*

ditto

test/integration-test/src/tests/load.rs line 27 at r2 (raw file):

Previously, pooladkhay (Mohammad Javad Pooladkhay) wrote…

The idea is to ensure that Aya does not impose a limit and allows names up to the size allowed by the kernel.
I'll add this sentence as well to be more specific about the goal.

I understand. I'm just saying that this repeats the comment that is present in the probe code. Here we just need to match the name of the probe. I think this comment can just be removed?

test/integration-test/src/tests/load.rs line 65 at r5 (raw file):

    let mut bpf = Bpf::load(crate::TC_NAME_LIMIT_EXCEEDED_TEST).unwrap();

    /*

ditto about this comment

test/integration-test/src/tests/load.rs line 87 at r5 (raw file):

    // as a result, all arms except the desired Err() should be unreachable!().
    match program.attach("lo", TcAttachType::Ingress) {
        Ok(_) => unreachable!(),

unreachable? this is just a test failure, isn't it? can you instead write:

assert_matches!(
  program.attach("lo", TcAttachType::Ingress),
  Err(ProgramError::TcError(TcError::NetlinkError { io_error })) => {
    assert_eq!(io_error.raw_or_error(), Some(22))
  }
);

tamird

Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @alessandrod, @dave-tucker, and @pooladkhay)

test/integration-test/src/tests/load.rs line 19 at r2 (raw file):

#[test]
fn tc_name_limit() {

Looks like these tests are failing:

failures:

---- tests::load::tc_name_limit stdout ----
thread 'tests::load::tc_name_limit' panicked at test/integration-test/src/tests/load.rs:31:32:
called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- tests::load::tc_name_limit_exceeded stdout ----
thread 'tests::load::tc_name_limit_exceeded' panicked at test/integration-test/src/tests/load.rs:60:32:
called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }

It's the qdisc_add_clsact("lo").unwrap(); calls that seem to be failing.

The buffer for attributes should be sized to hold at least 256 bytes, based on `CLS_BPF_NAME_LEN = 256` from the kernel: https://github.com/torvalds/linux/blob/02aee814/net/sched/cls_bpf.c#L28 Fixes: aya-rs#610 aya: add netlink_clsact_qdisc_exists function public-api: add clsact_qdisc_exists integration-test: add tc_name_limit integration-test: add tc_name_limit_exceeded Signed-off-by: Mohammad Javad Pooladkhay <m.pooladkhay@gmail.com>

mergify · 2023-12-01T18:18:15Z

Hey @alessandrod, this pull request changes the Aya Public API and requires your review.

pooladkhay · 2023-12-01T18:26:56Z

Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @alessandrod, @dave-tucker, and @pooladkhay)

test/integration-test/src/tests/load.rs line 19 at r2 (raw file):
#[test]
fn tc_name_limit() {
Looks like these tests are failing:
failures:

---- tests::load::tc_name_limit stdout ----
thread 'tests::load::tc_name_limit' panicked at test/integration-test/src/tests/load.rs:31:32:
called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- tests::load::tc_name_limit_exceeded stdout ----
thread 'tests::load::tc_name_limit_exceeded' panicked at test/integration-test/src/tests/load.rs:60:32:
called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }
It's the qdisc_add_clsact("lo").unwrap(); calls that seem to be failing.

That's true,
Two commands are being used to run the tests:
cargo xtask integration-test local which succeeds, and cargo xtask integration-test vm <KERNEL_IMAGE> which is the faulty one. Running it on my local machine fails too.

Is it possible that the loop back interface which I'm trying to attach an eBPF program to, is not available on that image? (hence the Os { code: 2, kind: NotFound, message: "No such file or directory" } error)

tamird

Reviewed 7 of 7 files at r6, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @alessandrod, @dave-tucker, and @pooladkhay)

test/integration-test/src/tests/load.rs line 19 at r2 (raw file):

Previously, pooladkhay (Mohammad Javad Pooladkhay) wrote…

That's true,
Two commands are being used to run the tests:
cargo xtask integration-test local which succeeds, and cargo xtask integration-test vm <KERNEL_IMAGE> which is the faulty one. Running it on my local machine fails too.

Is it possible that the loop back interface which I'm trying to attach an eBPF program to, is not available on that image? (hence the Os { code: 2, kind: NotFound, message: "No such file or directory" } error)

Yep, seems likely that it is not available. Can we create a network namespace so that we aren't reliant on ambient resources?

pooladkhay · 2023-12-02T11:36:00Z

Reviewed 7 of 7 files at r6, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @alessandrod, @dave-tucker, and @pooladkhay)

test/integration-test/src/tests/load.rs line 19 at r2 (raw file):
Previously, pooladkhay (Mohammad Javad Pooladkhay) wrote…

Yep, seems likely that it is not available. Can we create a network namespace so that we aren't reliant on ambient resources?

Could you please give me more hints, and maybe an example?

tamird

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @alessandrod, @dave-tucker, and @pooladkhay)

test/integration-test/src/tests/load.rs line 19 at r2 (raw file):

Previously, pooladkhay (Mohammad Javad Pooladkhay) wrote…

Could you please give me more hints, and maybe an example?

See

aya/test/integration-test/src/tests/xdp.rs

Line 137 in b176967

let _netns = NetNsGuard::new();

pooladkhay · 2023-12-10T17:50:37Z

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @alessandrod, @dave-tucker, and @pooladkhay)

test/integration-test/src/tests/load.rs line 19 at r2 (raw file):
Previously, pooladkhay (Mohammad Javad Pooladkhay) wrote…

Could you please give me more hints, and maybe an example?

See

aya/test/integration-test/src/tests/xdp.rs

Line 137 in b176967

let _netns = NetNsGuard::new();

I tried that. Namespace is created but the test still fails with the same error:

test tests::load::tc_name_limit ... Entered network namespace aya-test-168-0
thread 'tests::load::tc_name_limit' panicked at test/integration-test/src/tests/load.rs:623:32:
called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Exited network namespace aya-test-168-0
FAILED

I wonder what is the purpose of this step in testing?
What is the main difference between cargo xtask integration-test vm and cargo xtask integration-test local if the tests are running on a linux machine?
Is this related to kernel versions? maybe being able to test on a specific version?

I took a look and the CI file and realised runs-on: ubuntu-22.04 is set at the top but there are also checks like this one: if: runner.os == 'macOS'.
Considering those checks checking runner.os, maybe cargo xtask integration-test vm is used in a scenario where the runner is not a linux image?
If that's the case, I'm unable to make sense of it because of runs-on: ubuntu-22.04.

Or maybe I'm missing something here...

tamird

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @alessandrod, @dave-tucker, and @pooladkhay)

test/integration-test/src/tests/load.rs line 19 at r2 (raw file):

Previously, pooladkhay (Mohammad Javad Pooladkhay) wrote…

I tried that. Namespace is created but the test still fails with the same error:
test tests::load::tc_name_limit ... Entered network namespace aya-test-168-0
thread 'tests::load::tc_name_limit' panicked at test/integration-test/src/tests/load.rs:623:32:
called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Exited network namespace aya-test-168-0
FAILED
I wonder what is the purpose of this step in testing?
What is the main difference between cargo xtask integration-test vm and cargo xtask integration-test local if the tests are running on a linux machine?
Is this related to kernel versions? maybe being able to test on a specific version?

I took a look and the CI file and realised runs-on: ubuntu-22.04 is set at the top but there are also checks like this one: if: runner.os == 'macOS'.
Considering those checks checking runner.os, maybe cargo xtask integration-test vm is used in a scenario where the runner is not a linux image?
If that's the case, I'm unable to make sense of it because of runs-on: ubuntu-22.04.

Or maybe I'm missing something here...

Yes, this is both for testing on non-Linux and for future testing on different kernel versions.

As for the error you're seeing, I don't know how to help you. It seems we explicitly set up a loopback device in the network namespace:

aya/test/integration-test/src/utils.rs

Lines 41 to 53 in 41351ec

    
           let lo = CString::new("lo").unwrap(); 
        
           unsafe { 
        
               let idx = if_nametoindex(lo.as_ptr()); 
        
               if idx == 0 { 
        
                   panic!( 
        
                       "Interface `lo` not found in netns {}: {}", 
        
                       netns.name, 
        
                       io::Error::last_os_error() 
        
                   ); 
        
               } 
        
               netlink_set_link_up(idx as i32) 
        
                   .unwrap_or_else(|e| panic!("Failed to set `lo` up in netns {}: {e}", netns.name)); 
        
           }

mergify · 2024-02-06T12:37:37Z

Hey @alessandrod, this pull request changes the Aya Public API and requires your review.

mergify bot added aya This is about aya (userspace) test A PR that improves test cases or CI labels Aug 8, 2023

tamird requested a review from dave-tucker August 8, 2023 22:57

tamird requested changes Aug 8, 2023

View reviewed changes

pooladkhay force-pushed the main branch from 1fe6a17 to 4859705 Compare August 12, 2023 23:16

pooladkhay requested a review from tamird August 12, 2023 23:24

tamird reviewed Aug 23, 2023

View reviewed changes

mergify bot added the needs-rebase label Sep 14, 2023

mergify bot removed the needs-rebase label Nov 19, 2023

mergify bot requested a review from alessandrod November 24, 2023 23:47

mergify bot added the api/needs-review Makes an API change that needs review label Nov 24, 2023

tamird requested changes Nov 25, 2023

View reviewed changes

pooladkhay closed this Nov 30, 2023

pooladkhay force-pushed the main branch from df59ff3 to b176967 Compare November 30, 2023 20:28

pooladkhay reopened this Dec 1, 2023

pooladkhay requested a review from tamird December 1, 2023 18:29

tamird requested changes Dec 1, 2023

View reviewed changes

tamird requested changes Dec 6, 2023

View reviewed changes

tamird requested changes Dec 11, 2023

View reviewed changes


		let mut bpf = Bpf::load(crate::TC_NAME_LIMIT_TEST).unwrap();

		// This magic number (256) is derived from the fact that the kernel

	let lo = CString::new("lo").unwrap();
	unsafe {
	let idx = if_nametoindex(lo.as_ptr());
	if idx == 0 {
	panic!(
	"Interface `lo` not found in netns {}: {}",
	netns.name,
	io::Error::last_os_error()
	);
	}
	netlink_set_link_up(idx as i32)
	.unwrap_or_else(\|e\| panic!("Failed to set `lo` up in netns {}: {e}", netns.name));
	}

aya: fix tc name limit #728

Are you sure you want to change the base?

aya: fix tc name limit #728

Conversation

pooladkhay commented Aug 8, 2023

netlify bot commented Aug 8, 2023 • edited

✅ Deploy Preview for aya-rs-docs ready!

tamird left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pooladkhay Nov 25, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pooladkhay Aug 23, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dave-tucker commented Aug 9, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mergify bot commented Sep 14, 2023

alessandrod commented Nov 19, 2023

pooladkhay commented Nov 19, 2023

alessandrod commented Nov 23, 2023

pooladkhay commented Nov 24, 2023

mergify bot commented Nov 24, 2023

tamird left a comment

Choose a reason for hiding this comment

tamird left a comment

Choose a reason for hiding this comment

mergify bot commented Dec 1, 2023

pooladkhay commented Dec 1, 2023 • edited

tamird left a comment

Choose a reason for hiding this comment

pooladkhay commented Dec 2, 2023

tamird left a comment

Choose a reason for hiding this comment

pooladkhay commented Dec 10, 2023

tamird left a comment

Choose a reason for hiding this comment

mergify bot commented Feb 6, 2024

netlify bot commented Aug 8, 2023 •

edited

pooladkhay Nov 25, 2023 •

edited

pooladkhay Aug 23, 2023 •

edited

pooladkhay commented Dec 1, 2023 •

edited