Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aya+ebpf: Implement read+write methods for PerfEventArray #649

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

TheElectronWill
Copy link

@TheElectronWill TheElectronWill commented Jul 12, 2023

This allows to read and write a PerfEventArray, from userspace and kernel.
Thanks to these modifications, one can read perf events from ebpf code!

Usage

  1. open the perf event from userspace
    • Note: this could be done in aya if perf_event_open was public, for now I'm using perf_event_open_sys
  2. put the event's file descriptor into a map PerfEventArray<i32>
  3. load the ebpf program
  4. from ebpf, read the file descriptor to get the perf event's value, and put it in another map PerfEventArray<MyValue> (you can create a struct to carry the info you need, for instance)
  5. from userspace, read the value from the PerfEventArray using a PerfEventArrayBuffer

See the integration test for more details.

What is it for?

This can be useful to read performance counters from the kernel space, a method which I have evaluated in my research work.


This change is Reviewable

@netlify
Copy link

netlify bot commented Jul 12, 2023

Deploy Preview for aya-rs-docs failed.

Built without sensitive environment variables

Name Link
🔨 Latest commit 0aeb379
🔍 Latest deploy log https://app.netlify.com/sites/aya-rs-docs/deploys/659976617952250008625870

@TheElectronWill TheElectronWill changed the title Implement read+write methods for PerfEventArray aya+ebpf: Implement read+write methods for PerfEventArray Jul 12, 2023
Comment on lines 96 to 103
// According to the Linux manual, `bpf_perf_event_read_value` is preferred over `bpf_perf_event_read`.
let ret = bpf_perf_event_read_value(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bpf_perf_event_read_value is available starting with kernel 4.15, for version prior you need to use bpf_perf_event_read.

Could you add a kernel version check (cached of course) to handle those ancient versions?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! good idea

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it doesn't seem that easy, because the return type is different between bpf_perf_event_read and bpf_perf_event_read_value, and because the aya's functions to compare kernel versions are not available in bpf 🙁 . I think that I will rather state the minimum required version in the doc, and create a "legacy" version of the PerfEventArray if support for old kernels is required.

bpf/aya-bpf/src/maps/perf/perf_event_array.rs Show resolved Hide resolved
if ret == 0 {
Ok(buf.assume_init())
} else {
Err(ret)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is wrong and this should grab the errno.

Suggested change
Err(ret)
Err(Error::last_os_error())

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manual reads

Return 0 on success, or a negative error in case of failure.

And the patch (I had it in a bookmark, I should find the link to the current kernel source code) is quite clear: IIUC errno isn't involved here, it does things like

if (unlikely(size != sizeof(struct bpf_perf_event_value)))
		return -EINVAL;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's annoying, well I suppose you can use from_raw_os_error with the value directly then...

Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you write an integration test?

@TheElectronWill
Copy link
Author

Could you write an integration test?

Yes, but how do I open the perf event?
Do I make the corresponding functions of aya public, or do I add a dependency?

@tamird
Copy link
Member

tamird commented Jul 26, 2023

Sorry, a dependency on what? I think you should make whatever you need public so that your test is representative of real usage.

@TheElectronWill
Copy link
Author

Sorry, a dependency on what? I think you should make whatever you need public so that your test is representative of real usage.

I've used the crate perf_event_open_sys for step 1 (see the PR's description) to test what I've implemented.

I'll make what I need public :)

@mergify mergify bot added aya This is about aya (userspace) aya-bpf This is about aya-bpf (kernel) labels Sep 14, 2023
@mergify
Copy link

mergify bot commented Sep 14, 2023

@TheElectronWill, this pull request is now in conflict and requires a rebase.

@mergify mergify bot added the needs-rebase label Sep 14, 2023
@mergify mergify bot added test A PR that improves test cases or CI and removed needs-rebase labels Oct 30, 2023
@TheElectronWill
Copy link
Author

TheElectronWill commented Oct 30, 2023

Okay, I've added integration tests!
I've made a user-friendly version of perf_event_open public, for the purpose of testing and to make the feature available to the users without an additional dependency.

I've failed to run the tests on my machine, with a rather weird error

``` warning: error: could not compile `integration-ebpf` (bin "bpf_probe_read") due to 2 previous errors

error: failed to run custom build command for integration-test v0.1.0 (/home/guillaume/Documents/Contrib/aya/test/integration-test)
process didn't exit successfully: /home/guillaume/Documents/Contrib/aya/target/debug/build/integration-test-fd7f5096efa9800e/build-script-build (exit status: 101)
--- stderr
thread 'main' panicked at test/integration-test/build.rs:258:9:
assertion left == right failed: cd "[...]/aya/test/integration-ebpf" && CARGO_CFG_BPF_TARGET_ARCH="x86_64" "cargo" "build" "-Z" "build-std=core" "--bins" "--message-format=json" "--release" "--target" "bpfel-unknown-none" "--target-dir" "[...]/aya/target/debug/build/integration-test-8b78b2389a639211/out/integration-ebpf" failed: ExitStatus(unix_wait_status(25856))
left: Some(101)
right: Some(0)
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
Error: AYA_BUILD_INTEGRATION_BPF="true" "cargo" "build" "--message-format=json" "--package" "integration-test" "--tests" "--profile" "dev" failed: ExitStatus(unix_wait_status(25856))

Running mentioned `cd "..."` command manually reveals the underlying error:

= note: 16:13:28 [ERROR] fatal error: "Inline asm not supported by this streamer because we don't have an asm parser for this target\n"
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Running pass 'Function Pass Manager' on module 'bpf_probe_read-8e261d0427f4b668'.
1. Running pass 'BPF Assembly Printer' on function '@test_bpf_probe_read_user_str_bytes'

but I'm stuck here :S Any idea?
</details>

@tamird
Copy link
Member

tamird commented Oct 31, 2023

do the integration tests run for you on main? how are you running them? what versions of rustc and bpf-linker are you using?

@TheElectronWill
Copy link
Author

TheElectronWill commented Oct 31, 2023

Solution to my aforementioned problem: update bpf-linker by running cargo install bpf-linker.
I've updated the PR to make the new test pass :)

Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1, 9 of 9 files at r2, all commit messages.
Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @marysaka and @TheElectronWill)


aya/src/maps/perf/perf_event_array.rs line 209 at r1 (raw file):

    /// Returns [`MapError::OutOfBounds`] if `index` is out of bounds, [`MapError::SyscallError`]
    /// if `bpf_map_update_elem` fails.
    pub fn set(&mut self, index: u32, value: i32) -> Result<(), MapError> {

is value a file descriptor? can we use AsFd instead of a raw i32?


aya/src/programs/perf_event.rs line 55 at r2 (raw file):

}

/// Fields included in the event samples

period


aya/src/programs/perf_event.rs line 59 at r2 (raw file):

pub struct SampleType(u64);

/// "Wake up" overflow notification policy.

did you mean to put all these changes in the second commit? I think you should just squash the two, they don't seem to be cleanly separated.


aya/src/programs/perf_event.rs line 63 at r2 (raw file):

#[derive(Debug, Clone)]
pub enum WakeUpPolicy {
    /// Wake up after n events

missing period here and below.


aya/src/programs/perf_event.rs line 64 at r2 (raw file):

pub enum WakeUpPolicy {
    /// Wake up after n events
    WakeupEvents(u32),

these names stutter (WakeUpPolicy::WakeupFoo). can we do better?


bpf/aya-bpf/src/maps/perf/perf_event_array.rs line 22 at r1 (raw file):

/// The minimum kernel version required to read perf_event values using [PerfEventArray] is 4.15.
/// This concerns the functions [`read_current_cpu()`], [`read_at_index()`] and [`read()`].
///  

why this blank line?


bpf/aya-bpf/src/maps/perf/perf_event_array.rs line 76 at r1 (raw file):

    fn output<C: BpfContext>(&self, ctx: &C, data: &T, flags: u64) -> Result<(), i64> {
        unsafe {
            let ret = bpf_perf_event_output(

please reduce the scope of unsafe. let ret = unsafe { ...}; if ret == 0 { Ok(()) } else { ... }


bpf/aya-bpf/src/maps/perf/perf_event_array.rs line 101 at r1 (raw file):

    fn read(&self, flags: u64) -> Result<bpf_perf_event_value, i64> {
        let mut buf = MaybeUninit::<bpf_perf_event_value>::uninit();
        unsafe {

same here


bpf/aya-bpf/src/maps/perf/perf_event_array.rs line 102 at r1 (raw file):

        let mut buf = MaybeUninit::<bpf_perf_event_value>::uninit();
        unsafe {
            // According to the Linux manual, `bpf_perf_event_read_value` is preferred over `bpf_perf_event_read`.

citation?


test/integration-ebpf/src/perf_events.rs line 29 at r2 (raw file):

#[perf_event]
pub fn on_perf_event(ctx: PerfEventContext) -> i64 {
    match read_event(&ctx).map(|res| write_output(&ctx, res)) {

s/map/and_then/ - currently this throws away the inner error.


test/integration-ebpf/src/perf_events.rs line 30 at r2 (raw file):

pub fn on_perf_event(ctx: PerfEventContext) -> i64 {
    match read_event(&ctx).map(|res| write_output(&ctx, res)) {
        Ok(_) => 0,

s/_/()/ to avoid the appearance of throwing away something meaningful


test/integration-test/src/tests/perf_events.rs line 69 at r2 (raw file):

    // sleep a little bit, then poll the values from the buffer
    std::thread::sleep(Duration::from_secs(2));

this is more than a little bit. why do we need to sleep so long? can we poll the fd instead?


test/integration-test/src/tests/perf_events.rs line 76 at r2 (raw file):

    // read the events and check that the returned data is correct
    let mut events_data: [BytesMut; BUF_PAGE_COUNT] = std::array::from_fn(|_| BytesMut::new());

why does this need more than one of these? i would imagine a single buffer that is reused.


test/integration-test/src/tests/perf_events.rs line 80 at r2 (raw file):

    for data_buf in events_data.iter_mut().take(event_stats.read) {
        // You must ensure that the definition of the struct (here `EventData`) is the same

these comments are rather strange. who is "you"? consider anchoring such a comment on the relevant types and making them symmetrical (i.e. the comment on the bpf should reference the userspace definition and vice versa)

@TheElectronWill
Copy link
Author

/// Returns [`MapError::OutOfBounds`] if `index` is out of bounds, [`MapError::SyscallError`]
/// if `bpf_map_update_elem` fails.
pub fn set(&mut self, index: u32, value: i32) -> Result<(), MapError> {

is value a file descriptor? can we use AsFd instead of a raw i32?

It could be any i32, but that would make no sense for a PerfEventArray because only an event file descriptor would work with read. I've used AsFd.


        let mut buf = MaybeUninit::<bpf_perf_event_value>::uninit();
        unsafe {
            // According to the Linux manual, `bpf_perf_event_read_value` is preferred over `bpf_perf_event_read`.

citation?

man bpf-helpers reads:

u64 bpf_perf_event_read(...)
Also, be aware that the newer helper bpf_perf_event_read_value() is recommended over bpf_perf_event_read() in general. The latter has some ABI quirks where error and counter value are used as a return code (which is wrong to do since ranges may overlap). This issue is fixed with bpf_perf_event_read_value(), which at the same time provides more features over the bpf_perf_event_read() interface.

long bpf_perf_event_read_value(...)
In general, bpf_perf_event_read_value() is recommended over
bpf_perf_event_read(), which has some ABI issues and provides fewer functionalities.


    // sleep a little bit, then poll the values from the buffer
    std::thread::sleep(Duration::from_secs(2));

this is more than a little bit. why do we need to sleep so long? can we poll the fd instead?

I've changed the test to wait for buf.readable() to become true. I've also added an async version.


    // read the events and check that the returned data is correct
    let mut events_data: [BytesMut; BUF_PAGE_COUNT] = std::array::from_fn(|_| BytesMut::new());

why does this need more than one of these? i would imagine a single buffer that is reused.

We could use only one, here I wanted to test reading potentially multiple events at once.

Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, please reply in reviewable if it isn't too inconvenient. The review is stateful, and so it's difficult for me to continue it when your replies are not anchored to the discussions to which they belong.

Reviewed 6 of 6 files at r3, all commit messages.
Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @marysaka and @TheElectronWill)


test/integration-ebpf/src/perf_events.rs line 12 at r3 (raw file):

};

/// Data sent by the bpf program to userspace.

nit: this doc comment on a non-pub item will never be seen in rustdoc.


test/integration-test/src/tests/perf_events.rs line 69 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

this is more than a little bit. why do we need to sleep so long? can we poll the fd instead?

if you search around, you'll see we have another test that uses raw epoll, do you think that would be preferable to this polling loop? that might still be async, but it doesn't use the async perf event array.


test/integration-test/src/tests/perf_events.rs line 12 at r3 (raw file):

use test_log::test;

/// Data sent by the bpf program to userspace.

ditto


test/integration-test/src/tests/perf_events.rs line 27 at r3 (raw file):

const WAIT_TIMEOUT: Duration = Duration::from_secs(1);

/// Opens an hardware perf_event for testing.

ditto


test/integration-test/src/tests/perf_events.rs line 28 at r3 (raw file):

/// Opens an hardware perf_event for testing.
// Beware: this returns an `OwnedFd`, which means that the file descriptor is closed on drop.

I think this is an odd comment that describes the semantics of OwnedFd, which has its own docs.


test/integration-test/src/tests/perf_events.rs line 96 at r3 (raw file):

    // read the events and check that the returned data is correct
    let mut events_data: [BytesMut; BUF_PAGE_COUNT] = std::array::from_fn(|_| BytesMut::new());
    let events_stats = buf

would you mind destructuring event_stats? we should probably assert on the number of dropped events, but it's easy to miss the existence of that field because of the point access.

same in the async version please


test/integration-test/src/tests/perf_events.rs line 102 at r3 (raw file):

    for data_buf in events_data.iter_mut().take(events_stats.read) {
        let ptr = data_buf.as_ptr() as *const EventData;
        let data: EventData = unsafe { ptr.read_unaligned() };

let EventData { cpu_id, tag } = unsafe { ptr.read_unaligned };?

same in the async version please

Copy link
Author

@TheElectronWill TheElectronWill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @marysaka and @tamird)


aya/src/programs/perf_event.rs line 64 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

these names stutter (WakeUpPolicy::WakeupFoo). can we do better?

Done.


test/integration-ebpf/src/perf_events.rs line 12 at r3 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

nit: this doc comment on a non-pub item will never be seen in rustdoc.

I agree, however it shows up in IDE popups, which is quite handy when working on the project, isn't it?


test/integration-test/src/tests/perf_events.rs line 69 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

if you search around, you'll see we have another test that uses raw epoll, do you think that would be preferable to this polling loop? that might still be async, but it doesn't use the async perf event array.

I think that it would be better for a real app. For this test... I don't know, because readable() has some internal logic and doesn't simply check the file descriptor. It may be good to test this logic, since the file descriptor readiness is tested in the async version.


test/integration-test/src/tests/perf_events.rs line 28 at r3 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

I think this is an odd comment that describes the semantics of OwnedFd, which has its own docs.

removed.


test/integration-test/src/tests/perf_events.rs line 96 at r3 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

would you mind destructuring event_stats? we should probably assert on the number of dropped events, but it's easy to miss the existence of that field because of the point access.

same in the async version please

Good idea! I do that.


test/integration-test/src/tests/perf_events.rs line 102 at r3 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

let EventData { cpu_id, tag } = unsafe { ptr.read_unaligned };?

same in the async version please

Done.

Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @marysaka and @TheElectronWill)


aya/src/programs/perf_event.rs line 64 at r2 (raw file):

Previously, TheElectronWill (Guillaume Raffin) wrote…

Done.

They still stutter. What am I missing?


test/integration-ebpf/src/perf_events.rs line 12 at r3 (raw file):

Previously, TheElectronWill (Guillaume Raffin) wrote…

I agree, however it shows up in IDE popups, which is quite handy when working on the project, isn't it?

Is that not the case with a non-doc comment? If so TIL and this is fine.


test/integration-test/src/tests/perf_events.rs line 69 at r2 (raw file):

Previously, TheElectronWill (Guillaume Raffin) wrote…

I think that it would be better for a real app. For this test... I don't know, because readable() has some internal logic and doesn't simply check the file descriptor. It may be good to test this logic, since the file descriptor readiness is tested in the async version.

Okay. I think 200ms is rather long, do you know how long this takes in practice? I would expect an interval of 10ms or so.

Copy link
Author

@TheElectronWill TheElectronWill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @marysaka and @tamird)


aya/src/programs/perf_event.rs line 64 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

They still stutter. What am I missing?

in the latest revision it's WakeupPolicy


test/integration-ebpf/src/perf_events.rs line 12 at r3 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

Is that not the case with a non-doc comment? If so TIL and this is fine.

a normal comment does not show up on hover


test/integration-test/src/tests/perf_events.rs line 69 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

Okay. I think 200ms is rather long, do you know how long this takes in practice? I would expect an interval of 10ms or so.

The program is attached to the software clock, I believe that the events won't arrive before the first tick of the clock. I can increase the frequency, though :)

Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r5, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @marysaka and @TheElectronWill)


aya/src/programs/perf_event.rs line 64 at r2 (raw file):

Previously, TheElectronWill (Guillaume Raffin) wrote…

in the latest revision it's WakeupPolicy

The word Wakeup is in both the enum name and both its variants, which produces a stutter.


test/integration-test/src/tests/perf_events.rs line 69 at r2 (raw file):

Previously, TheElectronWill (Guillaume Raffin) wrote…

The program is attached to the software clock, I believe that the events won't arrive before the first tick of the clock. I can increase the frequency, though :)

Ah, so this is computed from the frequency. Can we do the math here instead of hard coding?

Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @marysaka from 3 discussions.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @marysaka and @TheElectronWill)

@TheElectronWill TheElectronWill force-pushed the bpf-perf-event-read branch 2 times, most recently from 3e7734a to 2dc52fb Compare November 2, 2023 16:04
Copy link
Author

@TheElectronWill TheElectronWill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 9 of 11 files reviewed, 2 unresolved discussions (waiting on @marysaka and @tamird)


aya/src/programs/perf_event.rs line 64 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

The word Wakeup is in both the enum name and both its variants, which produces a stutter.

Oh, right! Sorry, I don't know why but I was focused on the case (WakeUp vs Wakeup)... Fixed (for real this time).


test/integration-test/src/tests/perf_events.rs line 69 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

Ah, so this is computed from the frequency. Can we do the math here instead of hard coding?

Yes, that's much better indeed.

Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r6, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @marysaka and @TheElectronWill)


test/integration-test/src/tests/perf_events.rs line 69 at r2 (raw file):

Previously, TheElectronWill (Guillaume Raffin) wrote…

Yes, that's much better indeed.

I think I'm confused now, the sampling is now a period rather than a frequency?

Copy link
Author

@TheElectronWill TheElectronWill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 10 of 11 files reviewed, 1 unresolved discussion (waiting on @marysaka and @tamird)


test/integration-test/src/tests/perf_events.rs line 69 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

I think I'm confused now, the sampling is now a period rather than a frequency?

Uh oh, I've tried both ways and forgot to revert one change... Good catch.

Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

I think you'll need to run cargo xtask api --bless

Reviewed 1 of 1 files at r7, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @marysaka)

Copy link

mergify bot commented Nov 2, 2023

Hey @alessandrod, this pull request changes the Aya Public API and requires your review.

@mergify mergify bot requested a review from alessandrod November 2, 2023 18:30
@mergify mergify bot added the api/needs-review Makes an API change that needs review label Nov 2, 2023
Copy link
Author

@TheElectronWill TheElectronWill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cargo xtask public-api --bless completed :check:

Reviewable status: 11 of 13 files reviewed, all discussions resolved (waiting on @marysaka)

Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alessandrod

Reviewed 2 of 2 files at r8, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @alessandrod and @marysaka)

Copy link

mergify bot commented Nov 28, 2023

@TheElectronWill, this pull request is now in conflict and requires a rebase.

This allow to read _and_ write a PerfEventArray, from userspace _and_ kernel.
Copy link
Collaborator

@alessandrod alessandrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My hunch is that we should have two separate types - I don't think it's ever desiderable to call open() and set() on the same map, and having both in the same type seems confusing. I'm not sure what the best names would be, maybe PerfEventByteArray and PerfEventArray? Not sure

pub fn output_at_index<C: BpfContext>(&self, ctx: &C, index: u32, data: &T, flags: u32) {
let flags = u64::from(flags) << 32 | u64::from(index);
unsafe {
fn output<C: BpfContext>(&self, ctx: &C, data: &T, flags: u64) -> Result<(), i64> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why this change? I find it more confusing than the old API.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having both index and flags looked subtly wrong, because the flags parameter of the bpf function is actually used either as an index, or as a flag, never both at the same time (and not with a 32-bits shift). If I'm not mistaken, either:

  • you want to write at a specific index: u32 and you set flags = (index as u64) & BPF_F_INDEX_MASK
  • or you want to write using a specific flag: u64 (usually a constant defined in the bpf library, like BPF_F_CURRENT_CPU) and you set flags = flag

See man bpf_helpers:

The flags are used to indicate the index in map for
which the value must be put, masked with
BPF_F_INDEX_MASK. Alternatively, flags can be set
to BPF_F_CURRENT_CPU to indicate that the index of
the current CPU core should be used.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having both index and flags looked subtly wrong, because the flags parameter of the bpf function is actually used either as an index, or as a flag

I don't think so? The flags argument can be used to pass both an index and flags. The index just happens to be the low 32 bits. IOW, I might want to output to an index AND pass some flags

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there are other flags than BPF_F_CURRENT_CPU for this call?
If I do flags = u64::from(BPF_F_CURRENT_CPU) << 32 | u64::from(index) the flags looks weird to me: which element is going to be written? In any case, BPF_F_CURRENT_CPU is not shifted in the bcc examples.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I do flags = u64::from(BPF_F_CURRENT_CPU) << 32 | u64::from(index)

You get EINVAL, because you've set some invalid bits in the upper bits.

In any case, BPF_F_CURRENT_CPU is not shifted in the bcc examples

Neither do we

I wonder if there are other flags than BPF_F_CURRENT_CPU for this call?

It's a generic flags argument. More flags could be added in the future, and if that were to happen it shouldn't break our APIs

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither do we

Didn't self.output_at_index(ctx, BPF_F_CURRENT_CPU as u32, data, flags) discard the higher 32-bits of BPF_F_CURRENT_CPU and then apply the shift?

I understand the need for a generic future-proof flag, I will add it back.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither do we

Didn't self.output_at_index(ctx, BPF_F_CURRENT_CPU as u32, data, flags) discard the higher 32-bits of BPF_F_CURRENT_CPU and then apply the shift?

        let flags = u64::from(flags) << 32 | u64::from(index);

Yes but the shift is applied to flags, not index (BPF_F_CURRENT_CPU).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's right. Yet, u64::from(BPF_F_CURRENT_CPU as u32) is not BPF_F_CURRENT_CPU

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is if you look at _MASK and consider that kernel APIs are stable so the mask can't change

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*revelation moment* Okay, I understand now, sorry for the confusion. The API confused me a little bit.
May I offer to add the index+flags back, but with something like (maybe without the INDEX_MASK):

let flags_for_bpf = (u64::from(index) & BPF_F_INDEX_MASK) | flags;

so that it's more obvious to the reader and will continue to work even if new flags are added?

///
/// Returns [`MapError::OutOfBounds`] if `index` is out of bounds, [`MapError::SyscallError`]
/// if `bpf_map_update_elem` fails.
pub fn set<FD: AsFd>(&mut self, index: u32, value: &FD) -> Result<(), MapError> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if I set an index that has already been open()-ed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I have not tested this case. If the fd has already been opened, it could stay open? The doc does not give any clear hint about this case.

pub struct SampleType(u64);

/// "Wake up" overflow notification policy.
/// Overflows are generated only by sampling events.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing newline between the two lines. The format is:

[summary]
[newline]
[paragraphs separated by newlines]

Copy link

mergify bot commented Jan 29, 2024

@TheElectronWill, this pull request is now in conflict and requires a rebase.

@mergify mergify bot added the needs-rebase label Jan 29, 2024
@TheElectronWill
Copy link
Author

My hunch is that we should have two separate types - I don't think it's ever desiderable to call open() and set() on the same map, and having both in the same type seems confusing. I'm not sure what the best names would be, maybe PerfEventByteArray and PerfEventArray? Not sure

I agree with the idea. There's already a PerfEventByteArray at bpf/aya-bpf/src/maps/perf/perf_event_byte_array.rs, but I'm not sure what is its intended usage.

Copy link

mergify bot commented Feb 6, 2024

Hey @alessandrod, this pull request changes the Aya Public API and requires your review.

Copy link

mergify bot commented Feb 6, 2024

@TheElectronWill, this pull request is now in conflict and requires a rebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api/needs-review Makes an API change that needs review aya This is about aya (userspace) aya-bpf This is about aya-bpf (kernel) needs-rebase test A PR that improves test cases or CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants