Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: introduce kj::CopyOnWrite<T>, kj::AtomicCopyOnWrite<T> #1934

Open
jasnell opened this issue Feb 3, 2024 · 8 comments
Open

Proposal: introduce kj::CopyOnWrite<T>, kj::AtomicCopyOnWrite<T> #1934

jasnell opened this issue Feb 3, 2024 · 8 comments

Comments

@jasnell
Copy link
Collaborator

jasnell commented Feb 3, 2024

There are a range of cases where Copy-on-write semantics would be helpful and could help performance (e.g. avoiding copies on iterators, for instance, when iterating over a set of headers .. see api/http.c++ in the workerd codebase). It would be useful for kj to have a simple mechanism built-in.

// T is expected to be Refcounted, or will be wrapped a Refcounted
template <typename T, typename StaticCopier = decltype(nullptr)>
class CopyOnWrite {
public:
  CopyOnWrite(CopyOnWrite&& other);
  CopyOnWrite& operator=(CopyOnWrite&& other);
  KJ_DISALLOW_COPY(CopyOnWrite);

  CopyOnWrite addRef();

  // Returns true if there are multiple refs to the underlying inner. When true, calling
  // write() will end up copying the underlying inner.
  bool isShared() const;

  // Get a mutable reference to the underlying value, copying it first if the reference
  // count on inner is > 1. Uses should never hold the mutable T& reference directly.
  // By default, copy uses the copy assignment defined by T. If a StaticCopier is specified
  // in the template, that will be used instead. It must accept a single `const Own<T>&` as the
  // argument and return a new `Own<T>`. The returned kj::Own should be refcounted. If
  // it is not, it will be wrapped in a Refcounted wrapper.
  //
  // The copy is performed in-place, replacing the current inner with the new inner.
  // e.g. inner = kj::refcounted<T>(*inner) ...
  T& write();

  // Like write except an exception will be thrown if there are multiple references to inner
  // rather than copying it.
  T& writeWithoutCopy();

  // This class *intentionally* does not expose the raw pointer to T on inner.

  // Get a const reference to the underlying value.
  const T& read() const;
  operator const T&() const;

  // Release the underlying value, making the CopyOnWrite instance empty.
  kj::Own<T> release();

  // True only this and the other CopyOnWrite instance share the same underlying inner.
  bool operator==(const CopyOnWrite& other) const;
  bool operator!=(const CopyOnWrite& other) const;
  bool operator==(decltype(nullptr));

private:
  // Refcounted kj::Own
  kj::Own<T> inner;
};

// T is expected to be AtomicRefcounted. Otherwise the semantics are the same as
// CopyOnWrite
template <typename T, typename StaticCopier = decltype(nullptr)>
class AtomicCopyOnWrite {
public:
  AtomicCopyOnWrite(AtomicCopyOnWrite&& other);
  AtomicCopyOnWrite& operator=(AtomicCopyOnWrite&& other);
  KJ_DISALLOW_COPY(CopyOnWrite);

  AtomicCopyOnWrite addRef();

  T& write();
  T& writeWithoutCopy();

  // This class *intentionally* does not expose the raw pointer held on inner.

  const T& read() const;
  operator const T&() const;

  kj::Own<T> release();

  bool operator==(const AtomicCopyOnWrite& other) const;
  bool operator!=(const AtomicCopyOnWrite& other) const;
  bool operator==(decltype(nullptr));

private:
  // AtomicRefcounted kj::Own
  kj::Own<T> inner;
};

// Equivalent to kj::refcounted<...>, but returning a CopyOnWrite<T>
template <typename T, typename ... Params>
static CopyOnWrite<T> cow(Params... params);

// Equivalent to kj::atomicRefcounted<...>, but returning an AtomicCopyOnWrite<T>
template <typename T, typename ... Params>
static AtomicCopyOnWrite<T> atomicCow(Params...params);

Example use:

// If foo wasn't refcounted, then it would be wrapped in a refcounted wrapper.
struct Foo : public kj::Refcounted {
  kj::String bar;
  Foo(kj::String bar) : bar(kj::mv(bar)) {}
  Foo(const Foo& other) : bar(kj::str(other.bar)) {}
}

kj::CopyOnWrite<Foo> foo = kj::cow<Foo>(kj::str("hello"));
kj::CopyOnWrite<Foo> ref = foo.addRef();

foo.write().bar = kj::str("there");

void sayHello(const Foo& a, const Foo& b) {
  KJ_DBG(a.bar, b.bar);
}

sayHello(ref, foo);  // hello, there
@jasnell
Copy link
Collaborator Author

jasnell commented Feb 3, 2024

/cc @mikea ... this may have some interesting intersection with some of your other in-progress smart ptr proposals

@kentonv
Copy link
Member

kentonv commented Feb 5, 2024

I wonder if this could just be an inherent feature of Rc / Arc: just a method like getUniqueCopy() which first checks if the refcount is greater than 1, clones the object if so, and then returns a reference to the result. (In the case of Arc, this gets you a non-const reference, whereas normally dereferencing Arc always gives you const.)

In any case I have nothing against the idea in theory, but to merge this I'd want there to be at least one real-world use queued up immediately where the utility clearly makes the code better. It sounds like you've identified some already.

@jasnell
Copy link
Collaborator Author

jasnell commented Feb 5, 2024

Yep, currently in the workerd codebase we end up copying Headers for every iterator just in case they are modified. There's even a TODO comment in there indicating that a copy-on-write strategy would be much more efficient. I'm sure there are likely other cases but that's the immediate "real world use" queued up.

@jasnell
Copy link
Collaborator Author

jasnell commented Feb 5, 2024

Briefly discussed with @mikea ... I definitely think this can be worked into the kj::Rc and kj::Arc proposals ... I do think we need to consider whether the write modifies in place or creates a new ref... that is, one of the following two choices:

// Modify the Rc/Arc in place:

struct Foo { int x = 0 };

kj::Rc<Foo> foo = kj::refcounted<Foo>();
kj::Rc<Foo> ref = foo.addRef(); // ref gets sent off somewhere else

foo.write()->x = 1;  // copies only if refcount > 1 and replaces the inner Foo in-place, otherwise modify in place

// As opposed to...

kj::Rc<Foo> foo = kj::refcounted<Foo>();
kj::Rc<Foo> ref = foo.addRef();  // ref gets sent off somewhere else

kj::Rc<Foo> foo2 = foo.clone();  // aways copies, even if refcount is 1
foo2->x = 1;

The second option feels more aligned with kj-style but I tend to prefer the first option style-wise.

The one significant drawback to this approach, however, is that the copy-on-write semantics could be easily missed, which could introduce some subtle bugs. For instance, if someone forgot to use this additional API when given a kj::Rc<T> or kj::Arc<T>. Using a separate type makes it much more explicit.

// I want copy-on-write semantics for edits to Foo
kj::Rc<Foo> foo = ...
foo->x = 1; // oops should have called clone() or write() ...

// vs... 

kj::CopyOnWrite<Foo> foo = ...
foo.write()->x = 1;  // no choice but to call write ... no way to accidentally oops

@kentonv
Copy link
Member

kentonv commented Feb 5, 2024

Maybe Rc<const T> represents a copy-on-write T. You call .makeWritable() on it, which consumes the original reference and gives you an Rc<T>. This makes a copy if the refcount was more than 1, or just const-casts if it was 1.

@jasnell
Copy link
Collaborator Author

jasnell commented Feb 5, 2024

kj::Rc<const Foo> foo = kj::refcounted<const Foo>();
kj::Rc<const Foo> ref = foo.addRef(); // Send ref off to something else...

kj::Rc<Foo> mutFoo = foo.makeWritable();

// Tho it does feel a bit weird for kj::Rc<T> to also have this "makeWritable" 
// maybe instead....?

kj::Rc<Foo> mutFoo = kj::copyOnWrite(kj::mv(foo));

// Then to get another copy-on-write Rc... 

kj::Rc<const Foo> constFoo = mutFoo.makeConst();  // consumes mutFoo

@kentonv
Copy link
Member

kentonv commented Feb 6, 2024

it does feel a bit weird for kj::Rc to also have this "makeWritable"

You can use SFINAE to make sure it's only there if T is const.

Maybe copyIfShared is a better name than makeWritable or copyOnWrite?

@jasnell
Copy link
Collaborator Author

jasnell commented Feb 6, 2024

ok, I think this approach works. I'll work up a PR based on the kj::Rc/kj::Arc stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants