Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DISCUSSION] [PLUGIN API] Notes from the Rust world #1672

Open
gnosek opened this issue Feb 4, 2024 · 6 comments
Open

[DISCUSSION] [PLUGIN API] Notes from the Rust world #1672

gnosek opened this issue Feb 4, 2024 · 6 comments
Labels
kind/feature New feature or request

Comments

@gnosek
Copy link
Contributor

gnosek commented Feb 4, 2024

Here are some notes I made while working on the Falco Plugin Rust SDK. Please note that they're not blockers for the Rust SDK itself (everything here has a workaround, better or worse). I hope that they can drive future API evolution and make it safer and easier to use. Please consider all suggestions with an implied "if it doesn't make life for other SDKs or the plugin framework too hard".

Strings without an explicit owner

There are a few strings that I found that do not have a specific struct as their owner, preventing usage of instance fields for the underlying storage. Basically, every API method that returns a pointer but does not take a pointer as an argument is potentially problematic.

The following lists the uglier cases I've found so far.

plugin_get_fields and plugin_get_init_schema

The issue here is that it's tempting to generate the resulting string automatically (writing JSON by hand is not exactly my definition of fun) and we do need to store the result somewhere.

It may be possible to generate these at compile time but it would require procedural macros which are fairly magical and opaque. My goal is to avoid proc macros so that the whole API can be discovered and autocompleted by the IDE and the compiler.

Proposed solution

Split allocation and initialization of the plugin into two phases and introduce a ss_plugin_t* parameter to all methods that do not take one now (it's less problematic for things like plugin_get_name since it's easier to hardcode their value at compile time, but we should do it for consistency anyway). So instead of:

  • plugin_get_name
  • plugin_get_foo
  • etc.
  • if anything is wrong, just return
  • plugin_init
  • use the plugin
  • plugin_destroy

the workflow would be:

  • plugin_alloc
  • plugin_get_foo
  • if anything is wrong, plugin_destroy before returning
  • plugin_init
  • use the plugin
  • plugin_destroy

plugin_get_last_error

The situation is somewhat different here as we do have an instance pointer, but it can be potentially accessed from multiple threads (given there are API methods that both take a ss_plugin_t* and allow multithreaded access).

Assuming that plugin_get_last_error is used for all error reporting, this implies that multiple threads can share the same ss_plugin_t* and its error message buffer, which requires a Mutex to prevent multiple threads from writing to the buffer concurrently.

More importantly, since we cannot return a reference with a lifetime attached, we cannot unlock the mutex whenever the caller is done with the error string. This means that the returned pointer must remain valid even after the mutex is unlocked. As a consequence, we may never reallocate the buffer, so we're forced to use a fixed-length array.

Now, to make sure a catastrophe won't happen with multiple threads reading and writing the error buffer, we need to zero-fill it before every write, so that it's always NUL-terminated, even while we're writing to it (note that non-Rust code in a different thread can easily read the error buffer via a pointer obtained whenever, ignoring the mutex completely).

Proposed solution

I feel it would be best to drop the whole concept of a plugin instance as a separate entity. If you want multiple plugin instances, just allocate multiple ss_plugin_t* instances. Then it's clear that every thread has its own error message buffer and can use it freely without locking.

ss_plugin_t* not safely accessible from ss_instance_t* methods

This again ties into thread safety and the Rust model of memory safety. A non-const pointer naturally maps to a &mut reference, but that is instant UB in multi-threaded Rust: multiple mutable references cannot even exist (unused) without causing UB and actually accessing an object through multiple mutable references from different threads is a recipe
for disaster (including the worst case where everything works fine--for now).

Passing shared references to the plugin to instance methods doesn't help all that much: all other methods would need to take shared references as well, with all the overhead that it brings (every mutable field hidden behind a mutex, resulting in abysmal developer experience and performance penalty). So in the Rust SDK, I chose not to expose the ss_plugin_t* parameter to ss_instance_t* methods--the instance can pull in all the data in wants from the main plugin e.g. by Arc<T>
(for read-only data) or Arc<Mutex<T>> (for mutable data).

Proposed solution

Again, kill the ss_instance_t* concept and just use multiple ss_plugin_t* instances.

Unclear thread-related lifetime rules

Is there a guarantee that whichever thread allocates a plugin or instance is responsible for freeing it later, or can allocations
and frees happen in different threads (for the same object)?

Can a particular object be accessed from multiple threads? Sequentially? Concurrently?

This has consequences for the trait bounds for the plugin implementations that would e.g. disallow thread-local storage or other non-thread-safe constructs.

Proposed solution

Clarify the rules. It feels okay to guarantee that a particular object will only ever be accessed by the thread that created it,
except for the unfortunate "instance accessing plugin" case (which the Rust SDK disallows).

If we dropped ss_instance_t* and accepted this rule, thread safety would become much simpler IMHO in the API: it would become entirely single-threaded for any particular object and any threading issues become internal to the plugin itself.

Unclear lifetime rules for strings (pointers) passed to plugins

For any pointer passed from the plugin API to the plugin, it would be nice to know how long it remains valid; basically: can I store the pointer or do I need to copy the pointee. The two obvious answers (different ones for different pointers) would be:

  • it's pointing to static storage and will remain valid for the lifetime of the plugin
  • it's pointing to an ephemeral variable (maybe stack-allocated) and will only remain valid during the call

Proposed solution

Document the lifetime rules for all pointers.

Inconsistent support for data types

There are a few different uses for the various PT_*-described types:

  • event fields (the original use case)
  • table keys
  • table values
  • values for extracted fields

The documentation suggests that table keys and values support different sets of types (involving the ss_plugin_field_t enum), but the method signatures themselves say that keys and values can use the same data types (basically: whatever is convertible to ss_plugin_state_data).

Values for extracted fields are set in ss_plugin_extract_field.res, which is a union nearly identical to ss_plugin_state_data*.

I realized that to support a data type we need two sets of conversion functions:

  • to/from a sequence of bytes (in events)
  • to/from a C data type (in tables and field extraction)

but the second conversion is ever so slightly different between table access and field extraction.

Proposed solution

Make ss_plugin_extract_field.res a pointer to ss_plugin_state_data.

Maybe we also want to make ss_plugin_state_data a tagged union (i.e. a struct that contains an enum describing which union variant is live and the union itself).

Note that we need something tricky for the v4/v6 combined fields as currently there's no way to extract a list of addresses due to their variable length. We can give up and use a pointer to a separately allocated struct but that introduces unnecessary allocations. Alternatively, we can use a fixed-size struct that would fit both address types, like:

struct v4_or_v6_addr
{
    uint32_t family; // AF_INET or AF_INET6
    union
    {
        uint32_t ipv4;
        uint8_t ipv6[16];
    };
}

Unclear endianness rules

I realize this is not an issue with the plugin API per se, but the endianness of values is not specified anywhere. x86-64 and aarch64 are both little-endian AFAIK but not all architectures are.

This creates its own version of hell when trying to deal with cross-endianness captures, so we might at least try to avoid issues
with plugins and specify e.g. that all numeric fields in events are little-endian.

Note: I honestly have no idea how IP addresses are stored in events: native byte order? network byte order?

Proposed solution

Sit down and think very hard about endianness across the whole stack.

Alternatives

Leave the API as is.

@gnosek gnosek added the kind/feature New feature or request label Feb 4, 2024
@gnosek
Copy link
Contributor Author

gnosek commented Feb 4, 2024

cc @jasondellaluce @Andreagit97

@poiana
Copy link
Contributor

poiana commented May 4, 2024

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@Andreagit97
Copy link
Member

uhh, I missed that sorry!
These are all valid points, i faced some of them while trying to implement a simple Rust SDK, not sure when/how we want to tackle them, but for sure this is something we need to keep in consideration in future plugin API reworks

@Andreagit97
Copy link
Member

/remove-lifecycle stale

@Andreagit97
Copy link
Member

I would also reconsider the multithread approach built by default inside the plugin API considering pros and cons:

  • code complexity
  • performance
  • real-world usages

@gnosek
Copy link
Contributor Author

gnosek commented May 6, 2024

@Andreagit97 I was confused about the source plugin <-> instance relationship (it's always 1:1 it seems), so at a glance it feels like a threading rule being tl;dr "every object is always accessed from the thread that created it" would pretty much describe the current state.

Vaguely related, we could use some clarification around tables:

  • can we even think of accessing tables from async plugins (or, in general, other threads)? it would be massively useful if so but it's a whole new can of worms
  • we probably need a rule saying ss_table_entry pointers can never be stored across an API function call (they're effectively iterators that get invalidated whenever you look at the underlying table hard enough)

I have Ideas ™️ about extending the table support with a few methods that would let us factor just about anything inside sinsp into plugins but that's way out of scope here ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants