Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(bpfassets): Refactor to reduce the API exposed by bpfassets #1413

Merged
merged 3 commits into from May 15, 2024

Conversation

dave-tucker
Copy link
Contributor

@dave-tucker dave-tucker commented May 8, 2024

Currently pkg/bpfassets contains many global variables that are accessed from many other packages.
This makes it hard for both the bpfAssets package and other packages that depend on it to be effectively unit tested.
To solve this, I've shrunk the API of be based on:

type Attacher interface {
	HardwareCountersEnabled() bool
	Detach()
	CollectProcesses() ([]ProcessBPFMetrics, error)
	CollectCPUFreq() (map[int32]uint64, error)
	GetEnabledBPFHWCounters() []string
	GetEnabledBPFSWCounters() []string
}

Where NewAttacher() will provide a new eBPF attacher and NewMockAttacher() is used for unit testing (dependency injection).

In un-picking access to globals it transpired that a HwCountersEnabled() was being accessed from many places in the codebase. It defaults to true. Once the eBPF probes have been loaded we mark it to false if we're unable to open perf events.

In order to keep this as consistent as possible, I've changed the program start ordering to load the probes first so the HwCountersEnabled bool can be propagated where it needs to go - and we can be sure it won't change after the fact.

Similarly, GetEnabledBPFHWCounters() and GetEnabledBPFSWCounter() were also changing after the probes were loaded, which is now also solved by loading eBPF earlier in the program startup.

The older BCC test has now been fixed to work with libbpf - and will hopefully run in CI.

The last commit in this series fixes test run control to allow you to:
make VERBOSE=1 test to run unit tests, bpf test + benchmarks, or to selectively run one of the 3 component parts.
e2e tests are excluded from that make target given they will fail without and e2e environment set up.

Requires: #1390
Updates: #1412

@dave-tucker dave-tucker marked this pull request as draft May 8, 2024 14:59
@dave-tucker dave-tucker force-pushed the self-contained-bpf branch 4 times, most recently from a877376 to a42a370 Compare May 9, 2024 09:04
@dave-tucker
Copy link
Contributor Author

dave-tucker commented May 9, 2024

The test failure is due to the code that looks up the libbpf artifact path. This has been an issue in my dev environment since day 1 😱
I'll open a separate PR to fix that.

@rootfs
Copy link
Contributor

rootfs commented May 13, 2024

@dave-tucker this is great, can we have this soon so we can include it in the next release on Thu? Thanks

These constants are available in the config package already and do not
need to be re-exported under a different name.

Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
@dave-tucker dave-tucker force-pushed the self-contained-bpf branch 4 times, most recently from a77b24f to d364a51 Compare May 14, 2024 13:07
@dave-tucker dave-tucker marked this pull request as ready for review May 14, 2024 13:10
@dave-tucker dave-tucker force-pushed the self-contained-bpf branch 5 times, most recently from 0bf2971 to c2e7d88 Compare May 14, 2024 17:26
Makefile Show resolved Hide resolved
pkg/bpfassets/attacher/attacher.go Outdated Show resolved Hide resolved
pkg/bpfassets/attacher/attacher.go Outdated Show resolved Hide resolved
pkg/bpfassets/attacher/attacher.go Outdated Show resolved Hide resolved
pkg/bpfassets/attacher/attacher.go Outdated Show resolved Hide resolved
pkg/bpfassets/attacher/attacher.go Show resolved Hide resolved
pkg/bpfassets/attacher/attacher.go Show resolved Hide resolved
pkg/bpfassets/attacher/attacher.go Outdated Show resolved Hide resolved
@@ -20,6 +20,17 @@ import (
"github.com/sustainable-computing-io/kepler/pkg/config"
)

var SoftIRQEvents = []string{config.IRQNetTXLabel, config.IRQNetRXLabel, config.IRQBlockLabel}

type Attacher interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'attacher' as a name was more useful when we were having both bcc and libbpf .
suggestion: change name to simply KeplerBPF since it is an interface to kepler.bpf.o , even package can be moved from pkg/bpfassets/attacher to pkg/bpf or something better. unless we want to have a choice of libbpf and cillium.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I'll do this in a follow up as the diff is already pretty big.

pkg/collector/metric_collector.go Outdated Show resolved Hide resolved
This reduces the API exposed by the pkg/bpfassets and
also makes it easier to test.

Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
Remove the verbose test targets. You can enable this
with make VERBOSE=1 test.

Ensure e2e tests aren't run in the normal test target
as they will fail.

Skip bpfassets tests as they need sudo.

Compile bpfassets tests to a tmpdir and then run them.

Fix assertions in the bpfassets test to skip cpuFreq
given that this isn't populated, and to add more
assertions in the data that comes from `processes`.

Split test run into unit-test, bpf-test and bench
targets.

Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
@dave-tucker
Copy link
Contributor Author

Thanks for the review @vimalk78. I've addressed your comments. The only outstanding thing left to clarify is what the "correct" behaviour should be in the case that either of the probes loaded for tracking PageCacheHit cannot be loaded.

@vimalk78
Copy link
Contributor

in the case that either of the probes loaded for tracking PageCacheHit cannot be loaded.

since we return error in case kepler_sched_switch_trace fails to load, how about returning error in case kepler_write_page_trace or kepler_read_page_trace ?

@sthaha
Copy link
Collaborator

sthaha commented May 15, 2024

@dave-tucker Thanks for the PR. It looks good to me. I ran the version against the upstream/main using the dev docker-compose today and got "interesting" results. The variance between 2 versions of kepler running (kepler:latest and this PR) is quite a bit.

We may want to investigate further why this is the case without blocking this PR. It may very well be pointing to some other underlying issue in kepler / bpf or even the (docker compose) setup.

image image

I have created a PR for the "enhanced" dashboard - #1429

@rootfs rootfs merged commit e0ec4fd into sustainable-computing-io:main May 15, 2024
20 checks passed
@dave-tucker dave-tucker deleted the self-contained-bpf branch May 15, 2024 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants