Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aya: fix tc name limit #728

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

aya: fix tc name limit #728

wants to merge 1 commit into from

Conversation

pooladkhay
Copy link

The buffer for attributes should be sized to hold at least 256 bytes, based on CLS_BPF_NAME_LEN = 256 from the kernel:
https://github.com/torvalds/linux/blob/02aee814/net/sched/cls_bpf.c#L28

This pull request also includes integration tests with an eBPF program of type classifier with a 256-byte-long name.

Fixes: #610

@netlify
Copy link

netlify bot commented Aug 8, 2023

Deploy Preview for aya-rs-docs ready!

Built without sensitive environment variables

Name Link
🔨 Latest commit 9606690
🔍 Latest deploy log https://app.netlify.com/sites/aya-rs-docs/deploys/656a23481a434f00080c2ef5
😎 Deploy Preview https://deploy-preview-728--aya-rs-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@mergify mergify bot added aya This is about aya (userspace) test A PR that improves test cases or CI labels Aug 8, 2023
@tamird tamird requested a review from dave-tucker August 8, 2023 22:57
Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash these commits.

use aya_bpf::{macros::classifier, programs::TcContext};

// This macro generates a function with arbitrary name
macro_rules! generate_ebpf_function {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this macro needed? is this something forward-looking?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the name of that function is the name of the ebpf program and we are trying to use a 256-byte-long name, the macro is just a way to avoid writing this:

#[classifier]
pub fn a..[256 times]..a(_ctx: TcContext) -> i32 {
    0
}

I wonder if there is a way to dynamically generate that name inside the macro in this #![no_std] #![no_main] environment, like what we would do in userspace side: "a".repeat(256)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you still write it =/

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just found it cleaner to pass 256 'a's to a macro than writing a function with that name directly. If that's an issue, I can remove it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tamird Macro was replaced by just a function.

// The buffer for attributes should be sized to hold at least 256 bytes,
// based on `CLS_BPF_NAME_LEN = 256` from the kernel:
// https://github.com/torvalds/linux/blob/02aee814/net/sched/cls_bpf.c#L28
attrs: [u8; 512],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least 256, but then why 512?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we currently use around ~30 bytes of attributes in addition to name.
Rather than picking a "right sized buffer" for the payload (which is of varying length anyway) we use the next largest power of 2.
I've got netlink changes planned soon, so will look at making the buffer size variable then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the comment say that please?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will update the comment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tamird Updated the comment.

@@ -13,6 +15,27 @@ use aya::{
const MAX_RETRIES: usize = 100;
const RETRY_DURATION: time::Duration = time::Duration::from_millis(10);

#[test]
fn tc_name_limit() {
let _ = qdisc_add_clsact("lo");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it ok to drop this value?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the clsact qdisc attached to the interface before we can proceed, Actually a better approach would be a function that checks that so we can call qdisc_add_clsact("lo") only when that interface doesn't have that qdisc.
I can add that and update the code...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a different type of probe we can use that doesn't have this requirement?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tamird I have added a function to check if clsact qdisc exists before trying to actually add it.
Now the pipeline is failing due to the change to the aya's public api.
I'd really appreciate it if you review the code and help me on that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a different type of probe we can use that doesn't have this requirement?

By requirement, Do you mean the need to call qdisc_add_clsact("lo")?
If yes, I don't think so.
As far as I know we can either use ingress or clsact if attaching to ingress path and to attach an ebpf program to egress path, the only option is to use clsact:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1f211a1b929c804100e138c5d3d656992cfd5622

256 is the maximum length allowed by the kernel:
https://github.com/torvalds/linux/blob/02aee814/net/sched/cls_bpf.c#L28
*/
generate_ebpf_function!(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if the program name is 257 bytes long?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember the exact error right now but the kernel rejects that ebpf program and returns an error.
I can send the exact error in a few hours.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tamird Sorry for the delay
This is the error you get when the name is more that 256 bytes:

Error: netlink error while attaching ebpf program to tc

Caused by:
    Invalid argument (os error 22)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we test that as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea, Macro is probably more useful now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tamird An integration test was added to make sure programs with a name longer than 256 bytes will fail to attach.

@dave-tucker
Copy link
Member

I'm happy with the change to up the buffer to the next nearest power of 2.
Tamir's comments need addressing before merge.

@@ -13,6 +15,27 @@ use aya::{
const MAX_RETRIES: usize = 100;
const RETRY_DURATION: time::Duration = time::Duration::from_millis(10);

#[test]
fn tc_name_limit() {
let _ = qdisc_add_clsact("lo");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a different type of probe we can use that doesn't have this requirement?

256 is the maximum length allowed by the kernel:
https://github.com/torvalds/linux/blob/02aee814/net/sched/cls_bpf.c#L28
*/
generate_ebpf_function!(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we test that as well?

collections::HashMap,
ffi::CStr,
io,
mem::{self},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{self} is redundant

// The buffer for attributes should be sized to hold at least 256 bytes,
// based on `CLS_BPF_NAME_LEN = 256` from the kernel:
// https://github.com/torvalds/linux/blob/02aee814/net/sched/cls_bpf.c#L28
attrs: [u8; 512],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the comment say that please?

use aya_bpf::{macros::classifier, programs::TcContext};

// This macro generates a function with arbitrary name
macro_rules! generate_ebpf_function {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you still write it =/


let mut bpf = Bpf::load(crate::TC_NAME_LIMIT_TEST).unwrap();

// This magic number (256) is derived from the fact that the kernel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is somewhat misplaced; this 256 is just defined by the fact that this is the name used in the probe code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to ensure that Aya does not impose a limit and allows names up to the size allowed by the kernel.
I'll add this sentence as well to be more specific about the goal.

@mergify
Copy link

mergify bot commented Sep 14, 2023

@pooladkhay, this pull request is now in conflict and requires a rebase.

@mergify mergify bot added the needs-rebase label Sep 14, 2023
@alessandrod
Copy link
Collaborator

@pooladkhay ping?

@pooladkhay
Copy link
Author

@pooladkhay ping?

@alessandrod pong! 😃

Life hasn't been easy lately, I should have notified earlier.
I'll get back to this asap, please let it be open...

@mergify mergify bot removed the needs-rebase label Nov 19, 2023
@alessandrod
Copy link
Collaborator

@pooladkhay ping?

@alessandrod pong! 😃

Life hasn't been easy lately, I should have notified earlier. I'll get back to this asap, please let it be open...

hugs ❤️ and sorry if I made you feel rushed it wasn't my intention

@pooladkhay
Copy link
Author

@pooladkhay ping?

@alessandrod pong! 😃
Life hasn't been easy lately, I should have notified earlier. I'll get back to this asap, please let it be open...

hugs ❤️ and sorry if I made you feel rushed it wasn't my intention

No worries at all 🙋‍♂️

Copy link

mergify bot commented Nov 24, 2023

Hey @alessandrod, this pull request changes the Aya Public API and requires your review.

@mergify mergify bot requested a review from alessandrod November 24, 2023 23:47
@mergify mergify bot added the api/needs-review Makes an API change that needs review label Nov 24, 2023
Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 5 files at r3, 1 of 2 files at r4, 6 of 6 files at r5, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @alessandrod, @dave-tucker, and @pooladkhay)


-- commits line 2 at r5:
can you please squash these commits? I don't understand why 12 commits are needed.


aya/src/sys/netlink.rs line 83 at r5 (raw file):

}

pub(crate) unsafe fn netlink_clsact_qdisc_exists(if_index: i32) -> Result<bool, io::Error> {

i'm not familiar enough with netlink, @dave-tucker should review this.


test/integration-ebpf/src/tc_name_limit.rs line 6 at r5 (raw file):

use aya_bpf::{macros::classifier, programs::TcContext};

/*

I don't think we use this comment style in this project. Can you please use // .... style instead?


test/integration-ebpf/src/tc_name_limit_exceeded.rs line 6 at r5 (raw file):

use aya_bpf::{macros::classifier, programs::TcContext};

/*

ditto


test/integration-test/src/tests/load.rs line 27 at r2 (raw file):

Previously, pooladkhay (Mohammad Javad Pooladkhay) wrote…

The idea is to ensure that Aya does not impose a limit and allows names up to the size allowed by the kernel.
I'll add this sentence as well to be more specific about the goal.

I understand. I'm just saying that this repeats the comment that is present in the probe code. Here we just need to match the name of the probe. I think this comment can just be removed?


test/integration-test/src/tests/load.rs line 65 at r5 (raw file):

    let mut bpf = Bpf::load(crate::TC_NAME_LIMIT_EXCEEDED_TEST).unwrap();

    /*

ditto about this comment


test/integration-test/src/tests/load.rs line 87 at r5 (raw file):

    // as a result, all arms except the desired Err() should be unreachable!().
    match program.attach("lo", TcAttachType::Ingress) {
        Ok(_) => unreachable!(),

unreachable? this is just a test failure, isn't it? can you instead write:

assert_matches!(
  program.attach("lo", TcAttachType::Ingress),
  Err(ProgramError::TcError(TcError::NetlinkError { io_error })) => {
    assert_eq!(io_error.raw_or_error(), Some(22))
  }
);

Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @alessandrod, @dave-tucker, and @pooladkhay)


test/integration-test/src/tests/load.rs line 19 at r2 (raw file):

#[test]
fn tc_name_limit() {

Looks like these tests are failing:

failures:

---- tests::load::tc_name_limit stdout ----
thread 'tests::load::tc_name_limit' panicked at test/integration-test/src/tests/load.rs:31:32:
called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- tests::load::tc_name_limit_exceeded stdout ----
thread 'tests::load::tc_name_limit_exceeded' panicked at test/integration-test/src/tests/load.rs:60:32:
called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }

It's the qdisc_add_clsact("lo").unwrap(); calls that seem to be failing.

The buffer for attributes should be sized to hold at least 256 bytes, based on `CLS_BPF_NAME_LEN = 256` from the kernel:
https://github.com/torvalds/linux/blob/02aee814/net/sched/cls_bpf.c#L28

Fixes: aya-rs#610

aya: add netlink_clsact_qdisc_exists function

public-api: add clsact_qdisc_exists

integration-test: add tc_name_limit

integration-test: add tc_name_limit_exceeded

Signed-off-by: Mohammad Javad Pooladkhay <m.pooladkhay@gmail.com>
@pooladkhay pooladkhay reopened this Dec 1, 2023
Copy link

mergify bot commented Dec 1, 2023

Hey @alessandrod, this pull request changes the Aya Public API and requires your review.

@pooladkhay
Copy link
Author

pooladkhay commented Dec 1, 2023

Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @alessandrod, @dave-tucker, and @pooladkhay)

test/integration-test/src/tests/load.rs line 19 at r2 (raw file):

#[test]
fn tc_name_limit() {

Looks like these tests are failing:

failures:

---- tests::load::tc_name_limit stdout ----
thread 'tests::load::tc_name_limit' panicked at test/integration-test/src/tests/load.rs:31:32:
called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- tests::load::tc_name_limit_exceeded stdout ----
thread 'tests::load::tc_name_limit_exceeded' panicked at test/integration-test/src/tests/load.rs:60:32:
called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }

It's the qdisc_add_clsact("lo").unwrap(); calls that seem to be failing.

That's true,
Two commands are being used to run the tests:
cargo xtask integration-test local which succeeds, and cargo xtask integration-test vm <KERNEL_IMAGE> which is the faulty one. Running it on my local machine fails too.

Is it possible that the loop back interface which I'm trying to attach an eBPF program to, is not available on that image? (hence the Os { code: 2, kind: NotFound, message: "No such file or directory" } error)

Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 7 files at r6, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @alessandrod, @dave-tucker, and @pooladkhay)


test/integration-test/src/tests/load.rs line 19 at r2 (raw file):

Previously, pooladkhay (Mohammad Javad Pooladkhay) wrote…

That's true,
Two commands are being used to run the tests:
cargo xtask integration-test local which succeeds, and cargo xtask integration-test vm <KERNEL_IMAGE> which is the faulty one. Running it on my local machine fails too.

Is it possible that the loop back interface which I'm trying to attach an eBPF program to, is not available on that image? (hence the Os { code: 2, kind: NotFound, message: "No such file or directory" } error)

Yep, seems likely that it is not available. Can we create a network namespace so that we aren't reliant on ambient resources?

@pooladkhay
Copy link
Author

Reviewed 7 of 7 files at r6, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @alessandrod, @dave-tucker, and @pooladkhay)

test/integration-test/src/tests/load.rs line 19 at r2 (raw file):
Previously, pooladkhay (Mohammad Javad Pooladkhay) wrote…

Yep, seems likely that it is not available. Can we create a network namespace so that we aren't reliant on ambient resources?

Could you please give me more hints, and maybe an example?

Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @alessandrod, @dave-tucker, and @pooladkhay)


test/integration-test/src/tests/load.rs line 19 at r2 (raw file):

Previously, pooladkhay (Mohammad Javad Pooladkhay) wrote…

Could you please give me more hints, and maybe an example?

See

let _netns = NetNsGuard::new();

@pooladkhay
Copy link
Author

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @alessandrod, @dave-tucker, and @pooladkhay)

test/integration-test/src/tests/load.rs line 19 at r2 (raw file):
Previously, pooladkhay (Mohammad Javad Pooladkhay) wrote…

Could you please give me more hints, and maybe an example?

See

let _netns = NetNsGuard::new();

I tried that. Namespace is created but the test still fails with the same error:

test tests::load::tc_name_limit ... Entered network namespace aya-test-168-0
thread 'tests::load::tc_name_limit' panicked at test/integration-test/src/tests/load.rs:623:32:
called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Exited network namespace aya-test-168-0
FAILED

I wonder what is the purpose of this step in testing?
What is the main difference between cargo xtask integration-test vm and cargo xtask integration-test local if the tests are running on a linux machine?
Is this related to kernel versions? maybe being able to test on a specific version?

I took a look and the CI file and realised runs-on: ubuntu-22.04 is set at the top but there are also checks like this one: if: runner.os == 'macOS'.
Considering those checks checking runner.os, maybe cargo xtask integration-test vm is used in a scenario where the runner is not a linux image?
If that's the case, I'm unable to make sense of it because of runs-on: ubuntu-22.04.

Or maybe I'm missing something here...

Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @alessandrod, @dave-tucker, and @pooladkhay)


test/integration-test/src/tests/load.rs line 19 at r2 (raw file):

Previously, pooladkhay (Mohammad Javad Pooladkhay) wrote…

I tried that. Namespace is created but the test still fails with the same error:

test tests::load::tc_name_limit ... Entered network namespace aya-test-168-0
thread 'tests::load::tc_name_limit' panicked at test/integration-test/src/tests/load.rs:623:32:
called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Exited network namespace aya-test-168-0
FAILED

I wonder what is the purpose of this step in testing?
What is the main difference between cargo xtask integration-test vm and cargo xtask integration-test local if the tests are running on a linux machine?
Is this related to kernel versions? maybe being able to test on a specific version?

I took a look and the CI file and realised runs-on: ubuntu-22.04 is set at the top but there are also checks like this one: if: runner.os == 'macOS'.
Considering those checks checking runner.os, maybe cargo xtask integration-test vm is used in a scenario where the runner is not a linux image?
If that's the case, I'm unable to make sense of it because of runs-on: ubuntu-22.04.

Or maybe I'm missing something here...

Yes, this is both for testing on non-Linux and for future testing on different kernel versions.

As for the error you're seeing, I don't know how to help you. It seems we explicitly set up a loopback device in the network namespace:

let lo = CString::new("lo").unwrap();
unsafe {
let idx = if_nametoindex(lo.as_ptr());
if idx == 0 {
panic!(
"Interface `lo` not found in netns {}: {}",
netns.name,
io::Error::last_os_error()
);
}
netlink_set_link_up(idx as i32)
.unwrap_or_else(|e| panic!("Failed to set `lo` up in netns {}: {e}", netns.name));
}

Copy link

mergify bot commented Feb 6, 2024

Hey @alessandrod, this pull request changes the Aya Public API and requires your review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api/needs-review Makes an API change that needs review aya This is about aya (userspace) test A PR that improves test cases or CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

long tc program names are not truncated, causing a netlink error
4 participants