Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed Addition to JSON Schema: "Else-If" #1410

Open
chapmanjw opened this issue Jun 2, 2023 · 21 comments
Open

Proposed Addition to JSON Schema: "Else-If" #1410

chapmanjw opened this issue Jun 2, 2023 · 21 comments

Comments

@chapmanjw
Copy link

chapmanjw commented Jun 2, 2023

When defining mutually exclusive if-statements (if, else-if, else-if, etc.), JSON Schema currently requires nesting with the else statements. For shallow cases, this works fairly well. For example:

{
  "if": { ... },
  "then": { ... },
  "else": {
    "if": { ... },
    "then": { ... },
    "else": {
      "if": { ... },
      "then": { ... },
      "else": { ... }
    }
  }
}

However, for highly complex use-case where there are hundreds of mutually exclusive conditions, this results in nested if-then-else statements more than 100 deep. This complicates use-cases where clients are using JSON Schema for non-validation use-cases (e.g. mapping rules to their own data models from our JSON Schemas) as well as causes issues with commonly uses open-source validator implementations (~100 is a commonly used "safety" limit in some open source libraries to prevent infinite recursion).

Since conditional logic (if-then-else) is defined by standard JSON Schema vocabularies, it would be ideal if we could come up with a way to flatten else-if behaviors in the standard JSON Schema vocabularies (rather than using custom vocabularies that most open-source implementations would not be able to handle by default). Something like this:

{
  "if": { ... },
  "then": { ... },
  "elseIf": [
    {
      "if": { ... },
      "then": { ... }
    }, 
    {
      "if": { ... },
      "then": { ... }
    }
  ],
  "else": { ... }
}

Thanks for your consideration.

@jdesrosiers
Copy link
Member

Thanks for the suggestion. There's definitely a gap here, but I think we'd probably need a different solution than the one suggested.

In case it isn't known, the workaround you can use today to flatten your conditionals is to put them in an allOf.

{
  "allOf": [
    {
      "if": { ... },
      "then": { ... }
    },
    {
      "if": { ... },
      "then": { ... }
    },
    ...
  ]
}

Each schema will pass if the if fails or the if passes and the then passes, so you can easily add as many conditionals as you want without nesting. Of course this isn't exactly the same as nesting with else. Every if will run regardless of what happens with other ifs. This can sometimes mean more complex ifs and it definitely means the evaluation can't short circuit when the first match is found.

So, while this workaround allows you to express what you need to express without excessive nesting, it would be nice to have a more ideal solution that doesn't require nesting.

Back to the suggested elseIf keyword...

The if/then/else keywords are just keywords that are part of a generic schema. The elseIf keyword seems to require and expect the if and then to be present and wouldn't make sense if there were other keywords in those schemas. So, items in the elseIf would have to not be schemas, but rather a special construct specific to the elseIf keyword. This would be awkward because there is nothing like that in JSON Schema and because it looks like a schema, but isn't.

I'm not really sure how to reconcile these problems. I thought about it for a bit and here's the best I could come up with so far. I'll call this keyword "conditional". It's an array whose items are schemas. The elements are logically pairs of schemas. The first pair represents if and then. Any following pairs represent elseIf and then. If there's an odd number of schemas, the remaining schema represents else.

{
  "conditional": [
    { ... }, // if
    { ... }, // then
    { ... }, // elsif
    { ... }, // then
    { ... }, // elsif
    { ... }, // then
    { ... } // else
  ]
}

The biggest problem with this solution is that it's not very readable/maintainable. Two or three elements is fine, but once you start getting into the elseIfs, it can get hard to keep track. $comments can help with that, but I'd be reluctant to introduce a feature that's confusing enough that it practically requires the use of $comments to be maintainable.

@Exekyel
Copy link

Exekyel commented Jun 5, 2023

Does anyOf do short-circuiting? If not, could we propose a firstOf that evaluates the conditional expressions in order until one evaluates true?

@gregsdennis
Copy link
Member

gregsdennis commented Jun 5, 2023

anyOf can short-circuit if annotations aren't being collected, however there's a problem with using it: because none of the subschemas have an else, they'll all pass if the if doesn't match.

{
  "anyOf": [
    { "if": false, "then": { "type": "object" } },
    { "if": false, "then": { "type": "string" } }
  ]
}

The instance 42 passes this schema because none of the ifs pass, so none of the thens are invoked.

If you put an else: false on all of them, then I imagine you could probably do an anyOf, but that seems tedious.

{
  "anyOf": [
    { "if": false, "then": { "type": "object" }, "else": false },
    { "if": false, "then": { "type": "string" }, "else": false }
  ]
}

Here, only objects and strings are allowed.

@jdesrosiers
Copy link
Member

If you put an else: false on all of them, then I imagine you could probably do an anyOf, but that seems tedious.

Unfortunately, that doesn't work. The anyOf schemas would fall through if the if fails (which is what we want), but it would also fall through if the if passes and the then fails (which is not what we want). We want it to only try the next schema if the if fails. If a then fails in any of the schemas, we need evaluation to stop and report failure, not try the next one. anyOf can't do that. Not even one that can guarantee short circuiting.

@Exekyel
Copy link

Exekyel commented Jun 6, 2023

I am trying to think if there is some combination of nested anyOf, allOf, and literal booleans that

  1. Evaluates one condition per clause
  2. Does not continue to evaluate once a condition is true (e.g. value of clause is not used for fallthrough)

It might not end up extremely readable, but if it's already possible then maybe we can put some syntax sugar on top

Unfortunately I'll have to come back to this tomorrow

@chapmanjw
Copy link
Author

The goal here is to have either 0 or 1 (first applicable) condition in a list of conditions apply, making each condition mutually exclusive, but not requiring any of them. Flattening to an anyOf (with else-false) does simplify them, but does not make them mutually exclusive. So far, the only means I have found to make a list of conditions mutually exclusive is to nest each next condition in the else statement of the previous one. :-(

@signontwodotoh
Copy link

Seconding the usefulness of this feature. I agree that a shortcircuited version of allOf (which would mostly be used for conditionals or code generators) would be the most idiomatic representation of this.

@Exekyel
Copy link

Exekyel commented Jun 6, 2023

After more thought, the only way I can imagine to make anyOf work is to repeat the previous conditions in later conditionals, so I'm not considering that anymore.

My proposal would be something like this

{
  "firstOf": [
    {"if": "PREDICATE_1", "then": "EXCLUSIVE_RESULT_1"},
    {"if": "PREDICATE_2", "then": "EXCLUSIVE_RESULT_2"}
  ]
}

In this example, if PREDICATE_1 is true, then the result of firstOf will be EXCLUSIVE_RESULT_1. If PREDICATE_1 is false and PREDICATE_2 is true, then the result of firstOf will be EXCLUSIVE_RESULT_2. If both PREDICATE_1 and PREDICATE_2 are false, then the result of firstOf will be false. In the firstOf grammar, none of the nested conditionals may contain an else because that is implicit (not sure if this is possible)

We should also define the base case:

{
  "firstOf": []
}

Although I personally have no preference if this returns false, true, or doesn't parse.

@chapmanjw
Copy link
Author

The firstOf idea looks promising syntactically. How would it behave though? With oneOf, for example, one entire nested schema must be valid. If that worked the same way here, both the if and then statements of the first schema element would need to evaluate true in order to prevent moving to the second item in the list.

{
  "firstOf": [
    {"if": "PREDICATE_1", "then": "EXCLUSIVE_RESULT_1"},
    {"if": "PREDICATE_2", "then": "EXCLUSIVE_RESULT_2"}
  ]
}

With this example, if PREDICATE_1 evaluated true but EXCLUSIVE_RESULT_1 evaluated false, validators would move on to the second element in the list (PREDICATE_2) rather than producing a validation message that EXCLUSIVE_RESULT_1 did not evaluate true. In order for this to work, firstOf would need to have more limited vocabulary (meaning it could only contain if/then/else keywords on the first-level elements it contains) and would need different evaluation rules defined for validators. The downside there being that it behaves materially differently from anyOf, allOf, and oneOf in how the nested schemas are evaluated.

@gregsdennis
Copy link
Member

I think I would suggest switch or select since those are the common words used in programming languages for what we're considering.

That said, this is a very niche application of an anyOf-like thing, and I'd prefer to have something more generic.

  • What would happen if someone used a firstOf with subschemas that aren't if/then constructs?
  • How does this keyword affect annotation collection?
  • [Probably more questions]

@jdesrosiers
Copy link
Member

jdesrosiers commented Jun 6, 2023

the only way I can imagine to make anyOf work is to repeat the previous conditions in later conditionals

Agreed.

I agree that a shortcircuited version of allOf [...] would be the most idiomatic representation of this.

Short circuiting alone (whether anyOf or allOf) is not a solution to this problem. See, #1410 (comment)

My proposal would be something like this

This appears to be the same as the original proposal for elseIf except elseIf is now spelled firstOf. The problems with that proposal don't seem to be solved here.

@Exekyel
Copy link

Exekyel commented Jun 6, 2023

@chapmanjw

The downside there being that it behaves materially differently from anyOf, allOf, and oneOf in how the nested schemas are evaluated.

Now I'm worried that I misunderstood the feature request. Isn't this statement also true about else-if? If the issue is naming, I think I prefer condition switch or select versus firstOf after seeing feedback

@gregsdennis

I would suggest switch or select

Agreed, I chose firstOf because I thought it was similar to anyOf, but it's not so these are good suggestions

What would happen if someone used a firstOf with subschemas that aren't if/then constructs?

Maybe it would evaluate them in order until one was true and return it directly, or maybe it's not a valid schema? I'm not sure

How does this keyword affect annotation collection?

Sorry, I don't know and can't answer this one

@jdesrosiers

This appears to be the same as the original proposal for elseIf except elseIf is now spelled firstOf. The problems with that proposal don't seem to be solved here.

One substantial difference is that it does not require if at the same level. Looking at your comment, I just see one core problem (apologies if I combined them erroneously):

This would be awkward because there is nothing like that in JSON Schema and because it looks like a schema, but isn't.

I think you really highlighted the core of the problem. We're trying to fit an imperative tool into a functional box. Lisp has a cond which is very similar to your proposed conditional, except it does support 2-tuples of if-then constructs:

(cond (test-expression1 then-expression1)
      (test-expression2 then-expression2)
      (t else-expression2))

(source)

On that note, would using an array of arrays of schemas be any better?

{
  "conditional": [
    [{ ... }, { ... }], // if, then
    [{ ... }, { ... }], // elsif, then
    [{ ... }, { ... }], // elsif then
    [{ ... }] // else
  ]
}

If the firstOf idea dies brutally, but it still gets us closer to a solution that's fine with me. Thank you for entertaining the thought and poking holes in it

@gregsdennis
Copy link
Member

Maybe it would evaluate them in order until one was true and return it directly...

We're trying to fit an imperative tool into a functional box.

This entire issue/discussion is one of the hesitations I had about introducing if/then/else in the first place. JSON Schema is not a programming language; it doesn't have a "program flow." At its core, JSON Schema is nothing more than a collection of constraints. However, these keywords tend to make it feel like it has flow, and I think that's what lead to this proposal.

  • It's perfectly valid to have any of these keywords present on their own.
  • if doesn't actually do anything (besides any annotation behavior).
  • then is evaluated only if if is present and validates successfully (valid == true).
  • else is evaluated only if if is present and validates unsuccessfully (valid == false).

Yes, an implementation has to process if to know which of then or else to process, but it's not done like you'd typically think of your code running an if-statement. if is considered more of a dependency of the then and else constraints rather than the three being a single logical statement (as they are in programming languages).

else-if and the other suggestions here necessarily imply sequential processing, that "single logical statement," and I think that's what doesn't fit with the rest of JSON Schema.

@gregsdennis
Copy link
Member

gregsdennis commented Jun 6, 2023

For allOf, anyOf, and all of the other multiple-schema applicator keywords, their children can be evaluated in any order, even in parallel.

We have no basis for a keyword that applies subschemas sequentially, especially one that bails out part-way through.

You may ask, "What about prefixItems? Doesn't that apply the subschemas in order?" Not necessarily. It applies subschemas to the same index in the instance, but they don't need to be evaluated in that order. As with the others, you can evaluate them reversed, at random, or in parallel, and you'll still get the same evaluation result.

@chapmanjw
Copy link
Author

Looking around to see what others have done in other schema formats. Here is an example of XSD-based modeling to express these concepts:

Following a similar pattern could bring us back to the example in the first post:

{
  "if": { ... },
  "then": { ... },
  "elseIf": [
    {
      "if": { ... },
      "then": { ... }
    }, 
    {
      "if": { ... },
      "then": { ... }
    }
  ],
  "else": { ... }
}

The schema for elseIf would need to explicitly be an array of objects containing only if and then properties. The applicator meta-schema could look like this:

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$id": "https://json-schema.org/draft/2020-12/meta/applicator",
    "$vocabulary": {
        "https://json-schema.org/draft/2020-12/vocab/applicator": true
    },
    "$dynamicAnchor": "meta",
    "title": "Applicator vocabulary meta-schema",
    "type": ["object", "boolean"],
    "properties": {
        "prefixItems": { "$ref": "#/$defs/schemaArray" },
        "items": { "$dynamicRef": "#meta" },
        "contains": { "$dynamicRef": "#meta" },
        "additionalProperties": { "$dynamicRef": "#meta" },
        "properties": {
            "type": "object",
            "additionalProperties": { "$dynamicRef": "#meta" },
            "default": {}
        },
        "patternProperties": {
            "type": "object",
            "additionalProperties": { "$dynamicRef": "#meta" },
            "propertyNames": { "format": "regex" },
            "default": {}
        },
        "dependentSchemas": {
            "type": "object",
            "additionalProperties": { "$dynamicRef": "#meta" },
            "default": {}
        },
        "propertyNames": { "$dynamicRef": "#meta" },
        "if": { "$dynamicRef": "#meta" },
        "then": { "$dynamicRef": "#meta" },
        "elseIf": { "$ref": "#/$defs/elseIfArray" },
        "else": { "$dynamicRef": "#meta" },
        "allOf": { "$ref": "#/$defs/schemaArray" },
        "anyOf": { "$ref": "#/$defs/schemaArray" },
        "oneOf": { "$ref": "#/$defs/schemaArray" },
        "not": { "$dynamicRef": "#meta" }
    },
    "$defs": {
        "schemaArray": {
            "type": "array",
            "minItems": 1,
            "items": { "$dynamicRef": "#meta" }
        },
        "elseIfArray": {
            "type": "array",
            "minItems": 1,
            "items": { "$ref": "#/$defs/elseIfCondition" }
        },
        "elseIfCondition": {
            "type": "object",
            "properties": {
                "if": { "$dynamicRef": "#meta" },
                "then": { "$dynamicRef": "#meta" }
            },
            "required": [ "if", "then" ]
        }
    }
}

@Exekyel
Copy link

Exekyel commented Jun 7, 2023

This is still subverting some core assumptions about JSON schema:

  1. The construct elseIfCondition looks like a schema but it isn't (there isn't anything else in JSON schema like it)
  2. The if and then construct properties that we are asking for are functionally very different from the existing if and then keywords
  3. The elseIfArray needs to be evaluated in order (which isn't done anywhere else in JSON schema)

Even conditional fixes 1 and 2 but not 3

@jdesrosiers
Copy link
Member

We're trying to fit an imperative tool into a functional box.

if/then/else is a bit awkward in JSON Schema, but its design is based on the same boolean logic underpinnings that JSON Schema is based on. if/then is equivalent to boolean implication (A -> B), which is equivalent to !A || B, which can be expressed in JSON Schema as anyOf: [{ not: A }, B]. So, if/then in JSON Schema isn't an imperative tool, but in conversations like this we often conflate it with the imperative tool. That's why suggestions like adding a elseIf seem straightforward, but actually aren't.

Lisp has a cond which is very similar to your proposed conditional, except it does support 2-tuples of if-then constructs

I thought about pairing them as tuples as well, but decided against it in the moment. I think there are times when it's best expressed as tuples and times when it's not. In a simple if/then(/else), the pairing is unnecessary and the extra syntax would be annoying, but when you start getting into the elseIfs, it can help with readability to have them grouped. Perhaps it would be best to allow both forms?

We have no basis for a keyword that applies subschemas sequentially, especially one that bails out part-way through.

While it's true that there aren't currently any keywords that work that way, I don't see any reason why a keyword like that would be a problem. It doesn't introduce the need for any new capabilities to the JSON Schema architecture.

@jdesrosiers
Copy link
Member

It might be relevant to point out the propertyDependencies keyword that's expected to be included in a future release of JSON Schema.

{
  "propertyDependencies": {
    "foo": {
      "a":  { ... },
      "b":  { ... },
      "c":  { ... },
      ...
    }
  }
}

This keyword defines a schema to apply if a property has a given value. In this case, if the value of property "foo" is "a" it applies one schema, if it's "b" it applies another schema, etc. This would allow O(1) selection of the mutually exclusive option, which is even better than O(n) you would have with an else-if chain. Although efficient and concise, this solution is limited to being able to match on constant string values of a single property. If you need something more expressive than that, this wouldn't solve your problem and you'd be back to nested if/then/else.

@Exekyel
Copy link

Exekyel commented Jun 7, 2023

This

if A then W
else if B then X
else if C then Y
else Z

is equivalent to this

if A then W
if !A and B then X
if !A and !B and C then Y
if !A and !B and !C then Z

which is pure boolean logic and fully compatible with anyOf. We just don't want to write the second block by hand (it's strictly worse than just using nesting). To achieve it still requires either:

  • a sequential preprocessing step which appends the complement of any earlier predicates, or
  • the ability for then to "look back" at earlier predicates and determine their complement

p.s. if it simplifies implementation, this is also equivalent:

if !(false) and A then W
if !(false or A) and B then X
if !(false or A or B) and C then Y
if !(false or A or B or C) then Z

this way, some "union" exists and can be dropped into !() to make the complement


propertyDependencies seems to neatly solve the switch version already, since C implies !A and !B. Maybe that's good enough already for a lot of cases

@jdesrosiers
Copy link
Member

The schema for elseIf would need to explicitly be an array of objects containing only if and then properties.

We've avoided structured keywords like this in the past. That's the main reason if/then/else are three separate keywords rather than one keyword with three components named if, then, and else. Personally, I don't think that avoidance is necessary and I don't have a problem with it, but that has historically been a minority opinion so I don't see it getting broad support.

Although a structured keyword would be a unique addition to JSON Schema, that's not my concern with this proposal. My concern is that because if/then keywords already exist as schema keywords with slightly different behavior, using the same names within a structured keyword would be too confusing (it looks like a schema, but it's not). If different (but still good) names could be found, I'd feel better about the proposal.

Interestingly, since names are the problem, if we convert the structured if/then into a tuple (no names needed), we pretty much end up with @Exekyel's variation of the conditional keyword I suggested 😄.

@gregsdennis
Copy link
Member

gregsdennis commented Jun 7, 2023

if A then W
if !A and B then X
if !A and !B and C then Y
if !A and !B and !C then Z

I think using references might help this a little so that you're not copying constraints everywhere.

{
  "$defs": {
    "A": { ... },
    "B": { ... },
    "C": { ... },
  },
  "allOf": [
    {
      "if": { "$ref": "#/$defs/A" },
      "then": { ... } // W
    },
    {
      "if": {
        "allOf": [
          { "not": { "$ref": "#/$defs/A" } },
          { "$ref": "#/$defs/B" }
        ]
      },
      "then": { ... } // X
    },
    {
      "if": {
        "allOf": [
          { "not": { "$ref": "#/$defs/A" } },
          { "not": { "$ref": "#/$defs/B" } },
          { "$ref": "#/$defs/C" }
        ]
      },
      "then": { ... } // Y
    },
    {
      "if": {
        "allOf": [
          { "not": { "$ref": "#/$defs/A" } },
          { "not": { "$ref": "#/$defs/B" } },
          { "not": { "$ref": "#/$defs/C" } }
        ]
      },
      "then": { ... } // Z
    },
  ]
}

While this makes it a bit easier to read, unless your implementation is doing a really good job of caching results, you still have multiple evaluations for each definition as it still has to evaluate all of the options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants
@jdesrosiers @chapmanjw @gregsdennis @Exekyel @signontwodotoh and others