Switch Allocator to act more like an owned arena #467

Joeoc2001 · 2023-12-18T11:23:42Z

Currently, the Allocator of a message::Builder functions as an API for some kind of Malloc/Free. This is fine when the message to be encoded resides in host memory that we have full control over, but for some use cases this API is overly restrictive.

Specifically for my use case, Capn' Proto should be perfect for sharing data with a resident WebAssembly instance, as the data does not need to be copied and can simply be encoded directly into the WebAssembly instance's memory. However doing so requires the blocks of memory in the instance to be allocated by an exported function by the module, which may be fallible, and which requires a mutable borrow over the host memory to be invoked, invalidating Rust's aliasing rules on the mutable pointer returned by the allocator.

An alternative, proposed partially by #78, would be to have the allocator and relevant sub-builders deal with slices instead of raw pointers. Further to this discussion, I would propose that each builder keep integer offsets into the segments in the builder, and have them borrow the segments only for the period in which they are writing the data. This would allow the allocation and deallocation methods to have mutable access to all of the segments, without invalidating any existing pointers if moving the underlying buffers (as may be the case if a WebAssembly instance opted to grow its memory on alloc invocation).

Performance

Based on zero testing, I would guess that the borrowing from the owned allocator would be mostly inlined away, aside from the one extra level of indirection added by the offset+slice combo. I'd expect most CPUs to be able to deal with this fairly smoothly since the pre-fetcher should notice that the underlying segment is cohesive, but benchmarks would be required.

Backwards compatibility

Since the Allocator already acts like it owns the data allocated, a trait covering temporary borrows of slices only when written (as described above) would be a more general case. Therefore a new trait could be introduced, and an auto-implementation from Allocators to the new trait could be provided, meaning no existing code would break.

Are there any thoughts on this? For now, I intend to encode in a host buffer and then copy the segments into the instance memory, but this obviously isn't ideal when the underlying protocol is designed to be 0-copy.

The text was updated successfully, but these errors were encountered:

dwrensha · 2023-12-22T00:13:23Z

I would propose that each builder keep integer offsets into the segments in the builder, and have them borrow the segments only for the period in which they are writing the data.

This would eliminate the main source of unsafe in capnproto-rust, so it does sound appealing. I expect that the performance cost would be significant, but there's no way for sure to know without actually doing some benchmarking. Note that we currently do dynamic dispatch on the Allocator trait -- I think that might mean that a lot of stuff won't get inlined.

A safety vs performance analysis here with benchmark numbers would probably be interesting to much of the broader Rust community.

dwrensha · 2024-01-12T14:45:04Z

It would be relatively straightforward to change the interface so that capnp::message::Allocator::allocate_segment returns an error rather than panicking on out-of-memory. (Note that downstream code would then need to insert ? in a bunch of places, so making the change comes with a cost.)

@Joeoc2001 would such a change on its own help in your use cases at all?

Joeoc2001 · 2024-01-12T17:25:33Z

I'm not sure that'd solve my use case specifically, unfortunately. The issue with WebAssembly allocators with the current API design of this crate isn't just that the provided module allocators can fail to allocate, but that ultimately the allocator is just arbitrarily instructions provided by the loaded module meaning the module (and host) may do anything, including moving or remapping the module memory in host memory, making the current pointers implementation UB for a malicious, buggy or unexpected allocator implementation.

ObsidianMinor · 2024-02-15T19:27:23Z

ultimately the allocator is just arbitrarily instructions provided by the loaded module meaning the module (and host) may do anything, including moving or remapping the module memory in host memory, making the current pointers implementation UB for a malicious, buggy or unexpected allocator implementation

If the host can just drop the memory out from under your feet (even if you're holding a mutable reference to it) that doesn't sound like a stable place to perform complex write operations anyway. It might seriously be faster to copy segments directly into the wasm memory instead of having to do at least 3 redirects every time you want to write something somewhere.

In fact, how would this work given that the arena stores pointers to the start of each segment? Are we implying the arena doesn't actually store segments anymore? So now it's dynamic, which makes it actually 4 redirects.

You'd have to ask the capnp allocator where the segment currently is.
The capnp allocator offsets into the module memory to get the start of the segment memory.
Then you'd offset into the segment to reach the start of the builder.
Then you'd offset into the data you're building to write the value.

Now you could coalesce a lot of this I suppose and redefine the allocator trait to look something like

pub trait Allocator {
    fn alloc(&self, amount: u32) -> (SegmentId, u32);
    fn write_bit(&self, segment: SegmentId, slot: u64, value: bool);
    fn write_u8(&self, segment: SegmentId, slot: u64, value: u8);
    fn write_16(&self, segment: SegmentId, slot: u64, value: u16);
    fn write_u32(&self, segment: SegmentId, slot: u64, value: u32);
    fn write_u64(&self, segment: SegmentId, slot: u64, value: u64);
}

But once again, every single time you write anywhere you'll hit a redirect. The compiler most likely won't be able to optimize writes in any way.

Joeoc2001 · 2024-02-18T20:53:30Z

that doesn't sound like a stable place to perform complex write operations anyway

I think there's an important distinction between 'can' and 'will' here - the module alloc method isn't known at compile time by the Rust compiler, since we load the WebAssembly module at run time, so any implementation of a 0-copy encoder into a module's memory using the current Capnp API will be unsound since a malicious module could do very weird things every time a new segment is allocated. However any sane and reasonable module won't do any of this, so considering the situation unstable is more a comment on the source of the modules to be run, and not on the use case as a whole. The common case should be amenable to optimisation.

So now it's dynamic, which makes it actually 4 redirects.

Potentially, but probably not for most WebAssembly host implementations if they can guarantee that the instance memory is contiguous in host memory. Then it'd just be a double offset, where the Arena stores offsets into the instance memory giving the starts of segments instead of the raw pointers. Also, the segment locations could only change at the point of calling alloc, so all writes in a single segment before it fills will use the same static offset to map from instance to host memory.

My ideal trait would look something like this:

pub trait Allocator {
    type Segment;
    fn alloc(&mut self, amount: u32) -> (Self::SegmentId, u32);
    fn write_bit(&mut self, segment: Self::Segment, slot: u64, value: bool);
    fn write_u8(&mut self, segment: Self::Segment, slot: u64, value: u8);
    fn write_16(&mut self, segment: Self::Segment, slot: u64, value: u16);
    fn write_u32(&mut self, segment: Self::Segment, slot: u64, value: u32);
    fn write_u64(&mut self, segment: Self::Segment, slot: u64, value: u64);
}

Then the current Allocator implementations could implement Segment to be *const u8, adding no overhead to existing implementations. Or they could implement it to be usize, and have the Allocator have a Vec<Box<[u8]>> inside, allowing all unsafe code to be removed potentially at the cost of an extra offset calculation on each write (depending on what Rustc can optimise it down to).

ObsidianMinor · 2024-02-18T22:03:42Z

My ideal trait would look something like this:

pub trait Allocator {
    type Segment;
    fn alloc(&mut self, amount: u32) -> (Self::SegmentId, u32);
    fn write_bit(&mut self, segment: Self::Segment, slot: u64, value: bool);
    fn write_u8(&mut self, segment: Self::Segment, slot: u64, value: u8);
    fn write_16(&mut self, segment: Self::Segment, slot: u64, value: u16);
    fn write_u32(&mut self, segment: Self::Segment, slot: u64, value: u32);
    fn write_u64(&mut self, segment: Self::Segment, slot: u64, value: u64);
}

This would require every capnp builder type to be generic over the Allocator itself, which would inherently cause every downstream user type to be generic over the Allocator as well. It doesn't really lend itself to ease of use.

dwrensha · 2024-02-18T22:26:16Z

This would require every capnp builder type to be generic over the Allocator itself,

That is something I've considered before, and similarly on the read side having every Reader type be generic over ReaderSegments. I expect that this would improve performance by eliminating a bunch of dynamic dispatch. It would also let us remove the unaligned and sync_reader feature flags:

capnproto-rust/capnp/Cargo.toml

Line 33 in 3180a44

unaligned = []

capnproto-rust/capnp/Cargo.toml

Line 42 in 3180a44

sync_reader = []

because that information could live in the traits instead.

It doesn't really lend itself to ease of use.

Yep, that's that cost. My guess is that the cost outweighs the benefits. But still-- it could be interesting to investigate more deeply to get a better sense of just what it would look like.

dwrensha · 2024-02-19T20:46:49Z

The bug described in #484 is maybe tangentially related to the above discussion. The bug would have been avoided if Allocator had a get_segment(u32) method. (Currently Arena assumes that segment memory does not move.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch Allocator to act more like an owned arena #467

Switch Allocator to act more like an owned arena #467

Joeoc2001 commented Dec 18, 2023

dwrensha commented Dec 22, 2023

dwrensha commented Jan 12, 2024

Joeoc2001 commented Jan 12, 2024

ObsidianMinor commented Feb 15, 2024

Joeoc2001 commented Feb 18, 2024 •

edited

ObsidianMinor commented Feb 18, 2024 •

edited

dwrensha commented Feb 18, 2024

dwrensha commented Feb 19, 2024 •

edited

Switch Allocator to act more like an owned arena #467

Switch Allocator to act more like an owned arena #467

Comments

Joeoc2001 commented Dec 18, 2023

Performance

Backwards compatibility

dwrensha commented Dec 22, 2023

dwrensha commented Jan 12, 2024

Joeoc2001 commented Jan 12, 2024

ObsidianMinor commented Feb 15, 2024

Joeoc2001 commented Feb 18, 2024 • edited

ObsidianMinor commented Feb 18, 2024 • edited

dwrensha commented Feb 18, 2024

dwrensha commented Feb 19, 2024 • edited

Joeoc2001 commented Feb 18, 2024 •

edited

ObsidianMinor commented Feb 18, 2024 •

edited

dwrensha commented Feb 19, 2024 •

edited