Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Add propertyDiscriminator keyword (aka discriminator) #1413

Open
xiaoxiangmoe opened this issue Jun 8, 2023 · 10 comments
Open

Proposal: Add propertyDiscriminator keyword (aka discriminator) #1413

xiaoxiangmoe opened this issue Jun 8, 2023 · 10 comments

Comments

@xiaoxiangmoe
Copy link

The pattern of using oneOf to describe a choice between two types has become ubiquitous.

{
  "oneOf": [
    { "$ref": "#/$defs/aaa" },
    { "$ref": "#/$defs/bbb" }
  ]
}

However, this pattern is also notorious for its many shortcomings. There is a better way to describe this kind of constraint that doesn't have all the problems of the oneOf pattern, but it's verbose, error prone, and not particularly intuitive.

{
  "allOf": [
    {
      "if": {
        "properties": {
          "foo": { "const": "aaa" }
        },
        "required": ["foo"]
      },
      "then": { "$ref": "#/$defs/foo-aaa" }
    },
    {
      "if": {
        "properties": {
          "foo": { "const": "bbb" }
        },
        "required": ["foo"]
      },
      "then": { "$ref": "#/$defs/foo-bbb" }
    }
  ]
}

But this is too complicated. We may simplify this.

{
  "propertyDiscriminator": {
    "foo": [
      [
        "aaa",
        {
          "$ref": "#/$defs/foo-aaa"
        }
      ],
      [
        "bbb",
        {
          "$ref": "#/$defs/foo-bbb"
        }
      ]
    ]
  }
}

releated issue : #1082 (comment)

The difference between #1082 and this proposal is —— that issue only support string as discriminator.
This proposal is a little more complicated. But it would accept any json value here.

For example:

{
  "propertyDiscriminator": {
    "foo": [
      [
        2,
        {
          "$ref": "#/$defs/foo-aaa"
        }
      ],
      [
        3,
        {
          "$ref": "#/$defs/foo-bbb"
        }
      ]
    ]
  }
}

cc @jdesrosiers @gregsdennis

@gregsdennis
Copy link
Member

gregsdennis commented Jun 8, 2023

The only hesitation I have for this is the use of a tuple construct as part of the definition. In order to understand the keyword, you have to "just know" that the first item in the tuple is a key to match on and the second is the schema (rather than the components having "names" that indicate what they are).

Secondly what are the performance implications if someone used a (very) complex object instead of a primitive for the key?

{
  "propertyDiscriminator": {
    "foo": [
      [
        { "prop1": "value", "prop2": 42, "array": [ 1, 2, "string" ] },
        {
          "$ref": "#/$defs/foo-aaa"
        }
      ],
      [
        3,
        {
          "$ref": "#/$defs/foo-bbb"
        }
      ]
    ]
  }
}

@jdesrosiers
Copy link
Member

jdesrosiers commented Jun 8, 2023

I don't know if the name change was intentional, but this should be considered an alternative to propertyDependencies. We would choose one or the other, definitely not both, so it doesn't need a unique name. That said, I'm open to considering a different name.

Here's my relevant comments about this from the other issue where it was originally proposed.

One of my main design goals for this keyword has always been to keep it as simple as possible or people won't use it. This is a little more complex, but I don't think it's too much.

However, this isn't actually an O(1) process. The lookup from the Map is O(1), but the cost of building the Map is O(n). Implementations that have a compile step could produce the Map in the compile step and then evaluate instances using that Map in O(1), but most implementations don't have a compile step.

Overall, I think you've presented a reasonable alternative for this keyword. It makes it more versatile without adding too much complexity. It's technically a performance regression, but it can be optimized to O(1) in a lot of cases. In any case, we're talking about very small values for "n", so O(n) isn't much worse that O(1).

The biggest question to me is whether the added complexity is worth the extra capability. I believe that if it gets too verbose and complex, people won't use it. I just don't know where the line is. Maybe it will be fine and maybe not. I think it will be important to get a lot of opinions on this.

@gregsdennis
Copy link
Member

gregsdennis commented Jun 8, 2023

In any case, we're talking about very small values for "n", so O(n) isn't much worse that O(1).

I recently had a user of my schema generation lib ask me to support the { allOf: [ { if/then }... ] } pattern based on an enum property in their C# code. The enum they were using must have had ~50 different values, which is why they wanted generator support.

I think you're probably right for many (most?) cases, but cases like this still need to be considered.

@jdesrosiers
Copy link
Member

Yes, there will always be outliers. In #1410 someone was talking about having over 100 conditions they want to select on. I'm not worried about the performance difference mostly because it's possible to optimize to O(1). It will be necessary to build to lookup, but you only have to do that once and can validate as many instances as you like after than at O(1). Also, these comparisons should be fairly fast, so I don't expect that even 100 comparisons will present a significant performance concern.

@xiaoxiangmoe
Copy link
Author

I don't know if the name change was intentional

Both propertyDependencies and propertyDiscriminator is okay for me. I see PR #1143 is merged, so I choose a different name.

@xiaoxiangmoe
Copy link
Author

xiaoxiangmoe commented Jun 9, 2023

@gregsdennis

In order to understand the keyword, you have to "just know" that the first item in the tuple is a key to match on and the second is the schema (rather than the components having "names" that indicate what they are).

I think the pattern below is also okay for me.

{
  "propertyDiscriminator": {
    "foo": [
      {
        "propertyValue": 114,
        "subSchema": {
          "$ref": "#/$defs/foo-aaa"
        }
      },
      {
        "propertyValue": 514,
        "subSchema": {
          "$ref": "#/$defs/foo-bbb"
        }
      }
    ]
  }
}

But this will using a little more words than using 2-tuple [propertyValue,subSchema].

Secondly what are the performance implications if someone used a (very) complex object instead of a primitive for the key?

Maybe we should only support int, string, null and boolean in first version of this standard. And waiting for more use case feedback later.


@jdesrosiers
My real concern is that our new proposal needs to support both string discriminator and int discriminator.

There are a lot of practices on the Internet that use int enum as a discriminator. Especially in some systems written in C language, enum is an integer, and what is passed after serialization is also an integer.

If int enum discriminator cannot be supported, then the scope of application of propertyDependencies may be narrowed, causing everyone to fall back to using the more complicated if then else.

  • The coding style guide of some teams may educate everyone that string enum property can use propertyDependencies, and int enum property cannot use propertyDependencies.
  • The coding style guide of some teams may prohibit propertyDependencies for uniformity.

So I think the added complexity is worth for wider applicability.

@jdesrosiers
Copy link
Member

But this will using a little more words than using 2-tuple [propertyValue,subSchema].

That's my concern. IMO using an object rather than an array crosses the line of being too verbose.

Maybe we should only support int, string, null and boolean in first version of this standard.

You forgot "number", but otherwise I agree that limiting to scalar values would be reasonable. However, I'm not sure it's necessary. I don't see any performance implications to allowing it other than it's more expensive to compare structured values than it is compare scalar ones, but that's inherent in the choice to compare on structured values. I don't see it as a concern for this keyword specifically.

@ErosZy
Copy link

ErosZy commented Jun 13, 2023

has there been any significant progress? I really need this, if-then-else is so bad. @xiaoxiangmoe @jdesrosiers

@gregsdennis
Copy link
Member

@ErosZy this is a specification change proposal. Spec work takes time. There are a lot of other spec lifecycle management things that we (the JSON Schema team) need to work out before we can get back to making these kinds of changes.

My suggestion is to find out if the implementation you're using supports custom keywords, either via vocabularies or some other method. If so, then I suggest writing the functionality you need in order to meet your immediate needs. (We also welcome your feedback on anything that you come up with.)

As far as updating the spec goes, please have patience. We are considering this for the next version. Either this or propertyDependencies or possibly even something else (maybe what you come up with) is expected to be included.

@gregsdennis
Copy link
Member

Note: propertyDependencies is being extracted to a separate proposal document. Once that goes through, PRs against the keyword should be made against that document, not Core.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@jdesrosiers @gregsdennis @ErosZy @xiaoxiangmoe and others