Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Derive macro for From<T> for Ipld and TryFrom<Ipld> for T #508

Open
QuinnWilton opened this issue Jan 18, 2024 · 2 comments
Open

Derive macro for From<T> for Ipld and TryFrom<Ipld> for T #508

QuinnWilton opened this issue Jan 18, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@QuinnWilton
Copy link
Contributor

QuinnWilton commented Jan 18, 2024

To support encoding and decoding our types using dag-json and dag-cbor, as well as for internal, general Ipld use, we manually implement conversions for our struct types to/from Ipld. This is something we should be able to clean up considerably with some derive macros.

The current approach

Right now, wiring up a type to be encoded to and from dag-json and dag-cbor involves three steps:

  1. Implementing From<T> for Ipld
  2. Implementing TryFrom<Ipld> for T
  3. Implementing DagJson for T
  4. Implementing DagCbor for T

For example, here's what those first two implementations look like for the Receipt<Ipld> type:

impl From<Receipt<Ipld>> for Ipld {
    fn from(receipt: Receipt<Ipld>) -> Self {
        From::from(&receipt)
    }
}

impl TryFrom<Ipld> for Receipt<Ipld> {
    type Error = WorkflowError<Unit>;

    fn try_from(ipld: Ipld) -> Result<Self, Self::Error> {
        let map = from_ipld::<BTreeMap<String, Ipld>>(ipld)?;

        let ran = map
            .get(RAN_KEY)
            .ok_or_else(|| WorkflowError::<Unit>::MissingField(RAN_KEY.to_string()))?
            .try_into()?;

        let out = map
            .get(OUT_KEY)
            .ok_or_else(|| WorkflowError::<Unit>::MissingField(OUT_KEY.to_string()))?;

        let meta = map
            .get(METADATA_KEY)
            .ok_or_else(|| WorkflowError::<Unit>::MissingField(METADATA_KEY.to_string()))?;

        let issuer = map
            .get(ISSUER_KEY)
            .and_then(|ipld| match ipld {
                Ipld::Null => None,
                ipld => Some(ipld),
            })
            .and_then(|ipld| from_ipld(ipld.to_owned()).ok())
            .map(Issuer::new);

        let prf = map
            .get(PROOF_KEY)
            .ok_or_else(|| WorkflowError::<Unit>::MissingField(PROOF_KEY.to_string()))?;

        Ok(Receipt {
            ran,
            out: InstructionResult::try_from(out)?,
            meta: meta.to_owned(),
            issuer,
            prf: UcanPrf::try_from(prf)?,
        })
    }
}

These implementations tend to be fairly rote and repetitive and primarily involve either constructing an Ipld::Map of the fields to be encoded, or it involves checking an Ipld::Map for all of the required keys, and then constructing the corresponding struct or enum.

Given these two definitions, we're able to implement our DagJson trait for a type, which will provide default implementations of methods for encoding the type to dag-json, or for decoding dag-json into our type:

pub trait DagJson
where
    Self: TryFrom<Ipld> + Clone,
    Ipld: From<Self>,
{
    ...
}

impl DagJson for Receipt<Ipld> {}

This works similarly for DagCbor as well.

Proposed Approach

We think there's a relatively straightforward approach to deriving most of this code using a few macros, and a slight shift in how we handle encoding/decoding. Roughly:

  1. Write a derive proc-macro, Ipld, for deriving the required From and TryFrom implementations.
  2. Remove the DagJson trait, and instead expose generic functions over any type that implements libipld-core::codec::Encode<DagJsonCodec> or libipldcore::codec::Decode<DagJsonCodec>.
  3. Use the existing DagCbor derive and expose a generic to_cid function over any type that implements libipld-core::codec::Encode<DagCborCodec> or libipldcore::codec::Decode<DagCborCodec>, replacing our internal type. This should be possible (e.g. https://github.com/Actyx/banyan/blob/master/banyan/src/index.rs#L477).
  4. Write a proc-macro, DagJson, for deriving the above Encode<DagJsonCodec> and Decode<DagJsonCodec> implementations.
  5. For 2 and 3, implement ref versions as well.

Ipld Derive Macro

This will be a fairly involved macro, but there is some prior art that we can crib a ton of code from.

In particular, libipld already provides a derive macro, DagCbor, for generating encode and decode implementations for a type under the dag-cbor encoding. It takes a similar approach to what we'll need to do, however, you'll note that it's directly handling the translation to and from dag-cbor, rather than going through Ipld first. This is unfortunately not something that we'll be able to easily do, as DagJsonCodec is implemented using Serde, for the Ipld type specifically. Due to this, I think we'll just want to walk through the enum variants or struct fields in much the same way, and instead deriving TryFrom<Ipld> and From<T> for Ipld, rather than targeting DagJson directly. These implementations can look essentially the same as what we've been writing manually.

The other main complication here is that we'll want to support honoring the common serde attributes, like serde(skip) or serde(rename = "fooBar"), during this translation, so that we're able to easily support common transformations, like encoding fields as camelCase.

Existing DagJson Trait

We're overconstraining implementers of our existing DagJson type, by requiring T: TryFrom<Ipld> and Ipld: From<T> on our encode + decode methods.

Instead, we can delete this trait entirely, and expose helpers that directly invoke Encode<DagJsonCodec>::encode and Decode<DagJsonCodec>::decode. Since these traits are already implemented for Ipld, we'll later be able to leverage this fact to support encoding and decoding types that can be converted to Ipld.

This keeps things a little more ergonomic by directly tapping into the existing libipld traits, and thus automatically supporting anything that implements them, rather than needing to manually enumerate the encodable and decodable types in our codebase.

DagJson Derive Macro

This would be a macro that generates Encode<DagJsonCodec> and Decode<DagJsonCodec> implementations for a type, such that T: TryFrom<Ipld> and Ipld: From<T>.

In the Encode case, it converts the value from T to Ipld, and then dispatches to the Encode implementation for Ipld.

In the Decode case, it decodes to Ipld, and then converts the resulting value to one of type T.

This macro essentially fulfills the same role as the old DagJson trait, which primarily existed to mark types as being encodable or decodable, but while doing so in a way that offers better interop with libipld codebase.

Testing Notes

Since the Ipld macro should generate nearly identical code to what we'd manually write today, all of our existing implementations should be lifted into conformance tests for the macro to provide a relatively risk-free migration path.

@QuinnWilton QuinnWilton added the enhancement New feature or request label Jan 18, 2024
@matheus23
Copy link

matheus23 commented Jan 19, 2024

I think it should be possible to use #[derive(serde::Serialize, serde::Deserialize)] on the type and go from there. You shouldn't need to write your own derive macro or trait implementations.

The serde Ipld stuff has some limitations, but we should try going that route as long as we don't hit them.
(Limitations include not being able to use type-based enum disambiguation.)

@zeeshanlakhani
Copy link
Contributor

zeeshanlakhani commented Jan 19, 2024

I think it should be possible to use #[derive(serde::Serialize, serde::Deserialize)] on the type and go from there. You shouldn't need to write your own derive macro or trait implementations.

The serde Ipld stuff has some limitations, but we should try going that route as long as we don't hit them. (Limitations include not being able to use type-based enum disambiguation.)

There's some reason why we don't want just Serialize/Deserialize. Just so you know, we already derive those. We use Ipld as the intermediate, to not have the Rust struct 1:1 with what's serialized/deserialized. Certain Ipld conversions remove expanded code in the struct on serialization and return that structure on deserialization for internal use. We can secondary structures that do this, but the thought was why not use Ipld directly? Otherwise, you get the serde internal format and wire format situation, which I've been down as well haha. Tradeoffs for sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants