Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add schemars to ZeroVec #4792

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

ashu26jha
Copy link
Contributor

@ashu26jha ashu26jha commented Apr 10, 2024

This PR intends to add schemars support to ZeroVec. Implemented JsonSchema for following:

  1. ZeroVec
  2. VarZeroVec
  3. ZeroSlice
  4. ZeroVecError

Added a basic test, just to check the generation. The JSON generated from the test:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "DataStruct",
  "type": "object",
  "required": [
    "chars",
    "nested_numbers",
    "nums",
    "strs"
  ],
  "properties": {
    "chars": {
      "$ref": "#/definitions/ZeroVec<Character>"
    },
    "nested_numbers": {
      "$ref": "#/definitions/VarZeroVec<ZeroSlice<uint32>>"
    },
    "nums": {
      "$ref": "#/definitions/ZeroVec<uint32>"
    },
    "strs": {
      "$ref": "#/definitions/VarZeroVec<String>"
    }
  },
  "definitions": {
    "VarZeroVec<String>": {
      "type": "array",
      "items": {
        "type": "string"
      }
    },
    "VarZeroVec<ZeroSlice<uint32>>": {
      "type": "array",
      "items": {
        "$ref": "#/definitions/ZeroSlice<uint32>"
      }
    },
    "ZeroSlice<uint32>": {
      "type": "object",
      "required": [
        "data"
      ],
      "properties": {
        "data": {
          "type": "array",
          "items": {
            "type": "integer",
            "format": "uint8",
            "minimum": 0.0
          }
        },
        "error": {
          "$ref": "#/definitions/ZeroVecError"
        }
      }
    },
    "ZeroVec<Character>": {
      "type": "array",
      "items": {
        "type": "string",
        "maxLength": 1,
        "minLength": 1
      }
    },
    "ZeroVec<uint32>": {
      "type": "array",
      "items": {
        "type": "integer",
        "format": "uint32",
        "minimum": 0.0
      }
    },
    "ZeroVecError": {
      "description": "ZeroVecError is an enum representing errors that can occur during the decoding of slices of ULE",
      "type": "string",
      "enum": [
        "InvalidLength",
        "ParseError",
        "VarZeroVecFormatError"
      ]
    }
  }
}

Marking this part of #2118, as it may require more work.

Implement JsonSchema:

@ashu26jha ashu26jha requested review from Manishearth, sffc and a team as code owners April 10, 2024 16:03
Comment on lines +560 to +581
fn json_schema(gen: &mut SchemaGenerator) -> Schema {
// Instead of generating a subschema for T, we generate a schema representing the byte array
let byte_schema = gen.subschema_for::<Vec<u8>>();

SchemaObject {
instance_type: Some(InstanceType::Object.into()),
object: Some(Box::new({
let mut object_validation = schemars::schema::ObjectValidation::default();
object_validation
.properties
.insert("data".to_string(), byte_schema);
object_validation
.properties
.insert("error".to_string(), gen.subschema_for::<ZeroVecError>());
object_validation.required.insert("data".to_string());
object_validation
})),
..Default::default()
}
.into()
}
}
Copy link
Contributor Author

@ashu26jha ashu26jha Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ maintainers I am not certain if this is the right way to move forward

@@ -37,6 +38,7 @@ serde = { version = "1.0", default-features = false, features = ["alloc"], optio
# and all 0.7 versions, but not further.
yoke = { version = ">=0.6.0, <0.8.0", path = "../yoke", optional = true }
twox-hash = { version = "1.6", default-features = false, optional = true }
schemars = "0.8.16"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: this must be an optional dependency

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and please add it to the top-level Cargo.toml (ideally wait until #4844 is merged)

@@ -5,6 +5,19 @@
use core::any;
use core::fmt;

use crate::__zerovec_internal_reexport::boxed::Box;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: just alloc::boxed::Box


fn json_schema(gen: &mut SchemaGenerator) -> Schema {
// Instead of generating a subschema for T, we generate a schema representing the byte array
let byte_schema = gen.subschema_for::<Vec<u8>>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: why? ZeroSlice is logically an array of items

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, ZeroVec and ZeroSlice should both become arrays of T.

Maybe they can resolve to the same JsonSchema impl.

Copy link
Contributor Author

@ashu26jha ashu26jha Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, expected interaction should be as array not bytes. Lets fallback to implementing as arrays of T.

Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this!

nested_numbers: VarZeroVec<'data, ZeroSlice<u32>>,
}

#[test]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: use insta to help generate test cases and check-in the JSON Schema output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will look into it 👍


fn json_schema(gen: &mut SchemaGenerator) -> Schema {
// Instead of generating a subschema for T, we generate a schema representing the byte array
let byte_schema = gen.subschema_for::<Vec<u8>>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, ZeroVec and ZeroSlice should both become arrays of T.

Maybe they can resolve to the same JsonSchema impl.

let schema = gen.into_root_schema_for::<DataStruct>();
let schema_json =
serde_json::to_string_pretty(&schema).expect("Failed to serialize schema");
println!("{}", schema_json);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Please include the generated JSON.


#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[derive(JsonSchema)]
pub struct DataStruct<'data> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: for in-file unit tests, make one test for each of these. For example,

#[test]
fn test_schema_zerovec_u32() {
    let gen = SchemaGenerator::default();
    let schema = gen.into_root_schema_for::<ZeroVec<u32>>();
    // ...
}

As suggested below, use insta for the more complex tests.

@sffc
Copy link
Member

sffc commented Apr 12, 2024

  "properties": {
    "chars": {
      "$ref": "#/definitions/ZeroVec<Character>"
    },
    "nested_numbers": {
      "$ref": "#/definitions/VarZeroVec<ZeroSlice<uint32>>"
    },
    "nums": {
      "$ref": "#/definitions/ZeroVec<uint32>"
    },
    "strs": {
      "$ref": "#/definitions/VarZeroVec<String>"
    }
  },

While technically correct, I think I would expect these to be inlined at the call site rather than being stuffed into the definitions section. I imagine the definitions section being useful when you have deeply nested structs or a struct that gets used multiple times, but these vecs should just be vecs.

Is this configurable in the JsonSchema generator, or is it a result of the JsonSchema trait impl?

@sffc
Copy link
Member

sffc commented Apr 12, 2024

In other words, I think my expectation for the JSON Schema of your sample struct would be

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "DataStruct",
  "type": "object",
  "required": [
    "chars",
    "nested_numbers",
    "nums",
    "strs"
  ],
  "properties": {
    "chars": {
      "type": "array",
      "items": {
        "type": "string",
        "maxLength": 1,
        "minLength": 1
      }
    },
    "nested_numbers": {
      "type": "array",
      "items": {
        "type": "array",
        "items": {
          "type": "integer",
          "format": "uint32",
          "minimum": 0.0
        }
      }
    },
    "nums": {
      "type": "array",
      "items": {
        "type": "integer",
        "format": "uint32",
        "minimum": 0.0
      }
    },
    "strs": {
      "type": "array",
      "items": {
        "type": "string"
      }
    }
  }
}

@ashu26jha
Copy link
Contributor Author

ashu26jha commented Apr 12, 2024

While technically correct, I think I would expect these to be inlined at the call site rather than being stuffed into the definitions section. I imagine the definitions section being useful when you have deeply nested structs or a struct that gets used multiple times, but these vecs should just be vecs.
Is this configurable in the JsonSchema generator, or is it a result of the JsonSchema trait impl?

@sffc Found something in the docs and in thier codebase:

What does docs mentions: is_referenceable() returns false for simpler schemas, but for complex/recursive ones it returns true. But the schema I tested is probably the simpliest of our use cases, not sure why we are getting references. is_referenceable(), $ref

What does code mentions: Under SchemaSettings we do indeed have inline_subschemas by default this is false, we can set this to true. Few references still be generated for recursive type (We do want to keep this).

I could give setting inline_subschemas to true a shot and see how it pans out.

In other words, I think my expectation for the JSON Schema of your sample struct would be

Yes indeed, myself not a fan of very big schemas. Having schemas too big basically defies the purpose.

@ashu26jha
Copy link
Contributor Author

ashu26jha commented Apr 13, 2024

@sffc @Manishearth

New JSON, I was able to generate with setting inline_subschemas to true :

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "DataStruct",
  "type": "object",
  "required": [
    "chars",
    "nested_numbers",
    "nums",
    "strs"
  ],
  "properties": {
    "chars": {
      "type": "array",
      "items": {
        "type": "string",
        "maxLength": 1,
        "minLength": 1
      }
    },
    "nested_numbers": {
      "type": "array",
      "items": {
        "type": "array",
        "items": {
          "type": "integer",
          "format": "uint32",
          "minimum": 0.0
        }
      }
    },
    "nums": {
      "$ref": "#/definitions/ZeroVec<uint32>"
    },
    "strs": {
      "type": "array",
      "items": {
        "type": "string"
      }
    }
  }
}

Which one should we go for?

@ashu26jha
Copy link
Contributor Author

Update the JSON, if all looks good, I think we are in position to write for ZeroMap & ZeroMap2d as well

@sffc sffc self-requested a review May 2, 2024 17:15
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the open feedbacks.

@@ -24,6 +24,7 @@ all-features = true

[dependencies]
zerofrom = { workspace = true }
serde_json = "1.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Use the { workspace = true } version of the dependency.

@@ -37,6 +38,7 @@ serde = { version = "1.0", default-features = false, features = ["alloc"], optio
# and all 0.7 versions, but not further.
yoke = { version = ">=0.6.0, <0.8.0", path = "../yoke", optional = true }
twox-hash = { version = "1.6", default-features = false, optional = true }
schemars = "0.8.16"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and please add it to the top-level Cargo.toml (ideally wait until #4844 is merged)

@@ -33,6 +46,27 @@ impl fmt::Display for ZeroVecError {
}
}
}
impl JsonSchema for ZeroVecError {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: I don't think you should need this impl. It should go away when you fix the ZeroSlice impl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants