Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

webauthn-rs credential serialisation migration #368

Open
micolous opened this issue Oct 11, 2023 · 10 comments
Open

webauthn-rs credential serialisation migration #368

micolous opened this issue Oct 11, 2023 · 10 comments
Labels
enhancement New feature or request

Comments

@micolous
Copy link
Collaborator

I've been mulling this over and discussing with @Firstyear about what our credential serialisation story looks like, and how we could improve it.

This write-up is a draft, feedback is welcome. I intend to update this post as things go.

Note: in this write-up, "Credential" refers to both webauthn_rs_core::interface::Credential and webauthn_rs_core::interface::CredentialV3. "CredentialV4" will refer to the current webauthn_rs_core::interface::Credential type alone.

Present state

Credential types derive serde's Serialize and Deserialize traits with their default implementation. A library user can then use a serde serialisation framework (eg: serde_json, serde_cbor) to store a Credential.

CredentialV3 has a one-way migration to CredentialV4, and all APIs work can only work with CredentialV4.

Considerations and ideas

Storage efficiency

CredentialV4 migrated from storing identifiers as Vec<u8> to Base64UrlSafeData, which reduced the serialised size of credentials in JSON (to 37% of original), but increased it for other binary formats to about 135% of previous.

While the overhead is around 50 bytes per record, this works out to about 1 GiB for every 21.4M Credentials, which is more meaningful for larger identity providers, particularly after accounting for replication.

Non-self-describing formats

We currently require use of self-describing serialisation formats (#352).

We could implement Serialize and Deserialize explicitly to emit an opaque blob of bytes, that way it doesn't matter anymore – they just need to be able to handle a Vec<u8>. Something using JSON could wrap that in Base64, but that's at a layer we don't really care too much about.

This would change the "inner format" used for serialisation, but these shouldn't be an API contract.

Version control

We've migrated storage formats once before, going from V3 to V4. However, as CredentialV4 is closely coupled with other APIs, this is a one-way migration, and all credentials are immediately rewritten as CredentialV4.

This is a problem for a distributed system, where schema skew needs to be managed: all readers need to be able to read a new format before anything can write in the new format.

It should be possible to migrate in a more controlled manner, so when there's a CredentialV5, updated readers will can continue to write back in CredentialV4 format until everything can read CredentialV5. CredentialV5 credentials would be written as CredentialV5, and not rolled back to CredentialV4.

This version management could be largely done within the deserialisation layer, though it may affect the available functionality. Serialisation and deserialisation would also need to retain any unknown fields.

We'd also need to consider how far back in schema versions we want to support.

Ideas

I propose to make that inner format CBOR – mainly because we already need to ship a CBOR implementation for other parts of CTAP. We can use integer field IDs (rather than strings) with native binary types for bytes-like values to improve storage efficiency.

Version handling is that CredentialV5 can mostly continue to evolve, but the serialisation layer would need to know about what version of the structure is allowed.

It will need some sensible defaults wrapped around it.

@micolous micolous added the enhancement New feature or request label Oct 11, 2023
@smessmer
Copy link
Contributor

Why CBOR and not bincode for the inner format? Bincode is a bit more compact. What we're doing currently in our manual serialization logic is:

Serialize the whole credential into bincode, store the Vec<u8> into a struct that has a format version number, then serialize that struct with CBOR. It's a bit more complicated but it gives us a better compression ratio than using CBOR everywhere, and the outer CBOR allows us to use a stable format to read the version number from in case bincode changes how it works.

Btw, while credential is the most important data structure to think about, we also care about storage of PasskeyRegistration and PasskeyAuthentication.

@Firstyear
Copy link
Member

Why CBOR and not bincode for the inner format? Bincode is a bit more compact. What we're doing currently in our manual serialization logic is:

We don't want to add an extra dependency / moving part. We already have to support cbor, so we can use it. There are lots of things we can do because of this to make it quite small.

Serialize the whole credential into bincode, store the Vec<u8> into a struct that has a format version number, then serialize that struct with CBOR. It's a bit more complicated but it gives us a better compression ratio than using CBOR everywhere, and the outer CBOR allows us to use a stable format to read the version number from in case bincode changes how it works.

I understand that you want your problem solved in a certain way, but we have to consider our developer time also. Complex makes our lives harder as the maintainers of this library.

We're already putting up a lot of time to resolve the issues you are raising, and we have had lots of discussions about how to handle them to help you. But please also consider that we are trying to think about this for other library users, their experiences, and our own time to maintain and develop this.

Btw, while credential is the most important data structure to think about, we also care about storage of PasskeyRegistration and PasskeyAuthentication.

Yep, that's something to consider.

@smessmer
Copy link
Contributor

smessmer commented Oct 11, 2023

Why CBOR and not bincode for the inner format? Bincode is a bit more compact. What we're doing currently in our manual serialization logic is:

We don't want to add an extra dependency / moving part. We already have to support cbor, so we can use it. There are lots of things we can do because of this to make it quite small.

Makes sense. It's a bit unfortunate that this takes away the choice of serialization format from the developer so developers can't use bincode even if they are ok with adding the extra dependency. But I understand the reasoning. There might not be a way to leave that choice to developers while still handling versioning inside of the library.

I'll take a look at how large the serialization blobs are after the change and if they're too large, we can just keep doing what we're doing today, replicating the Credential struct and using our own serialization format. It's not great but it works. At least for Credentials. For PasskeyRegistration and PasskeyAuthentication, that currently doesn't work because we can't access the internals.

We're already putting up a lot of time to resolve the issues you are raising, and we have had lots of discussions about how to handle them to help you. But please also consider that we are trying to think about this for other library users, their experiences, and our own time to maintain and develop this.

I appreciate the time you're spending on helping me. My suggestions are just that, suggestions and ideas. It's up to you to decide what makes sense for the library. WhatsApp will likely be one of the projects with the largest number of users running through code from this library, but you want the library to be adopted and successful, and for that you have to focus on what most developers need. In terms of that, we're only a small number of developers and have requirements different from most. We're willing do use workarounds as long as the library leaves us a way to do that.

@Firstyear
Copy link
Member

Makes sense. It's a bit unfortunate that this takes away the choice of serialization format from the developer so developers can't use bincode even if they are ok with adding the extra dependency. But I understand the reasoning. There might not be a way to leave that choice to developers while still handling versioning inside of the library.

Yesterday @micolous and I met in person and spoke about it and since then I had some more thoughts on the topic, so I want to talk to them some more because there may be a way to do this without taking away the choice of serialisation from users. So give me a bit to talk them :)

I'll take a look at how large the serialization blobs are after the change and if they're too large, we can just keep doing what we're doing today, replicating the Credential struct and using our own serialization format. It's not great but it works. At least for Credentials. For PasskeyRegistration and PasskeyAuthentication, that currently doesn't work because we can't access the internals.

Yep. We do need to consider how to handle passkeyreg and auth to help here too.

I appreciate the time you're spending on helping me. My suggestions are just that, suggestions and ideas. It's up to you to decide what makes sense for the library. WhatsApp will likely be one of the projects with the largest number of users running through code from this library, but you want the library to be adopted and successful, and for that you have to focus on what most developers need. In terms of that, we're only a small number of developers and have requirements different from most. We're willing do use workarounds as long as the library leaves us a way to do that.

I expect that's true - Discord already uses this library, but I expect Whatsapp would dwarf that. And of course, we're immensely proud and humbled you would trust our work with your users security. We want to do the best for those people, so we would really like you to use this library!

And as you say, we have to consider many developers, not just your requirements. And so we have to really consider a few different things because especially at the scale you're working at, there are problems you will face that no one else will. Of the hundreds of thousands of downloads of this library, I suspect that only 3 projects actually will face the issues you have, and one of them is something I work on so it needs to be considered too.

We are also only a small number of developers too. It seems to be a theme in IT doesn't it :)

Anyway, after some more thought and reflection, I realised the issue is primarily about version migration between structs, not serialisation, so I think there is actually a way to do migrations without us having to take away the serialisation interfaces. I've sent it to @micolous in another medium to discuss a bit more, but I think it's workable.

@Firstyear
Copy link
Member

So update regarding the approach here.

Stepping back I realised the issue here wasn't base64urlsafe, serialisation formats, or anything else. The core of the problem is "safe migrations" between credential versions. I think we focused too much on the serialisation challenges and got lost in that, when there was a simpler option.

  • We improve versioning of the credentials allowing migrations, but without forcing "upgrade the world". This is important for distributed systems, and will affect Whatsapp, Discord and Kanidm. Likely we suspect this means we'll have CredentialV4, CredentialV5 etc. and they impl a common trait that the library requires to process these in safe default manners (especially re-field addition).
  • We will need to version PasskeyAuth/Reg as well I expect.
  • Set an API version at library startup that defines the min version of credentials the library accepts. This defines credential features and what version we emit.
  • We move to humanBinaryData in a new credential version that respects the "human readable" serde flag, and we remove base64urlsafe.
  • We leave serialisation exposed so that users can choose whatever format they wish.

We will say @smessmer that reading about bitcode, it's format isn't "stable" per the readme on https://crates.io/crates/bincode so you need to be careful here and do your own versioning on the outside of this.

@micolous Does this gel will what we just spoke about?

@yaleman
Copy link
Member

yaleman commented Oct 12, 2023

We leave serialisation exposed so that users can choose whatever format they wish.

That could even be feature-gated if we wanted to make it explicit that you're doing a weird thing? serde is a pretty common flag on libs.

@smessmer
Copy link
Contributor

That approach with versioning within the library while allowing developers to choose the serialization format sounds great. Thanks.

We will say @smessmer that reading about bitcode, it's format isn't "stable" per the readme on https://crates.io/crates/bincode so you need to be careful here and do your own versioning on the outside of this.

Yes, we have our own versioning and will keep doing that. Thanks for the callout.

@Firstyear
Copy link
Member

That approach with versioning within the library while allowing developers to choose the serialization format sounds great. Thanks.

We have been (slowly) progressing with this, our attentions were elsewhere for a small while though.

@jgiacomoni
Copy link

+1 for this discussion and being able to serialize to a binary format for storage and performance (Link)

For what it is worth, I began looking at bincode but will likely use postcard as it has a stable wire format. right now I'm using serde_cbor.

Thanks!

@Firstyear
Copy link
Member

With the merged changes to our binary code using the HumanBinary format now, this should be working for most persons now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants