Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent validation error since 1.0.82 #824

Closed
eggilbert opened this issue Jun 13, 2023 · 2 comments
Closed

Inconsistent validation error since 1.0.82 #824

eggilbert opened this issue Jun 13, 2023 · 2 comments
Assignees

Comments

@eggilbert
Copy link

eggilbert commented Jun 13, 2023

Greetings -
Similar to the use case described in #285, we're looking to validate a schema against the specification meta schema to ensure it is syntactically valid. And then later on, different JSON payloads are validated against the first schema to ensure they conform.

Based on the comments of #313, I see there may be some issues validating with the 2019-09 spec that we're using. That said, we've noticed some strange behavior since version 1.0.82 that seems like a different problem.

I've made an example test case here with the results from different versions of the library. I think due to changes in how the anyOf keyword is handled, different error messages are returned across the library versions. But in 1.0.82 and later, we're getting an inconsistent error message back. The first time a schema is validated returns one error, and then if the same schema is validated again the error response is different which is not what was expected or happening previously.

The first call returns $.type: string found, array expected and subsequent calls return $.type: should be valid to any of the schemas array instead.

@Test
void validate() throws JsonProcessingException {
    final var v201909SpecSchema = JsonSchemaFactory.getInstance(SpecVersion.VersionFlag.V201909)
        .getSchema(URI.create(JsonMetaSchema.getV201909().getUri()), new SchemaValidatorsConfig());

    final var invalidSchema = new ObjectMapper().readTree("""
        {
            "$schema": "https://json-schema.org/draft/2019-09/schema",
            "type": "cat"
        }
        """);

    // Validate same JSON schema against v2019-09 spec schema twice
    final var validationErrors1 = v201909SpecSchema.validate(invalidSchema);
    final var validationErrors2 = v201909SpecSchema.validate(invalidSchema);

    System.out.println(validationErrors1);
    System.out.println(validationErrors2);

    // Validation errors should be the same
    assertEquals(validationErrors1, validationErrors2);

    // Results
    //
    // 1.0.73
    // [$.type: does not have a value in the enumeration [array, boolean, integer, null, number, object, string], $.type: should be valid to any of the schemas array]
    // [$.type: does not have a value in the enumeration [array, boolean, integer, null, number, object, string], $.type: should be valid to any of the schemas array]
    //
    // 1.0.74
    // [$.type: does not have a value in the enumeration [array, boolean, integer, null, number, object, string], $.type: string found, array expected]
    // [$.type: does not have a value in the enumeration [array, boolean, integer, null, number, object, string], $.type: string found, array expected]
    //
    // 1.0.78
    // [$.type: does not have a value in the enumeration [array, boolean, integer, null, number, object, string], $.type: should be valid to any of the schemas array]
    // [$.type: does not have a value in the enumeration [array, boolean, integer, null, number, object, string], $.type: should be valid to any of the schemas array]
    //
    // >= 1.0.82
    // [$.type: does not have a value in the enumeration [array, boolean, integer, null, number, object, string], $.type: string found, array expected]
    // [$.type: does not have a value in the enumeration [array, boolean, integer, null, number, object, string], $.type: should be valid to any of the schemas array]
    //
    // ?????
}
@fdutton
Copy link
Contributor

fdutton commented Jun 14, 2023

The validators get loaded into a TreeMap and sorted based on the type of validation performed. This is done because the standard requires that some validations be performed before others. For example, properties must be validated before unevaluatedProperties can be checked. I suspect this is the source of the non-determinism. I even have a comment in the code to investigate this.

TODO: This smells. We are performing a lexicographical ordering of paths of unknown depth.

@fdutton fdutton self-assigned this Jun 14, 2023
@justin-tay
Copy link
Contributor

I have a pull request that fixes this.

The issue in this particular case isn't the order of the validators, because even after ensuring a deterministic order, this still happens.

The primary cause is these lines

if (schema.hasTypeValidator()) {
TypeValidator typeValidator = schema.getTypeValidator();
//If schema has type validator and node type doesn't match with schemaType then ignore it
//For union type, it is a must to call TypeValidator
if (typeValidator.getSchemaType() != JsonType.UNION && !typeValidator.equalsToSchemaType(node)) {
allErrors.add(buildValidationMessage(at, typeValidator.getSchemaType().toString()));
continue;
}
}

The main reason is that schema.hasTypeValidator() and schema.getTypeValidator() depends on the typeValidator field that is only lazily set once getTypeValidator() is first called. To make it consistent, the schema.getTypeValidator() call should ensure that the validators are already loaded.

This means that on the first run, hasTypeValidator() returns false. The schema validation happens, and the type validator in the schema runs and generates the type validation error. As getValidators() is called, the typeValidator field is now set.

In the second run, hasTypeValidator() returns true and the anyOf validator generates the should be valid to any of the schemas array message instead.

Another issue is that actually the error message built should be that of the type validator, ie. the type validator should be called with validate to generate the actual message instead of using the message of the anyOf validator as the should be valid to any of the schemas array message doesn't actually make sense.

There are actually other issues for instance

Set<ValidationMessage> childNotRequiredErrors = allErrors.stream().filter(error -> !ValidatorTypeCode.REQUIRED.getValue().equals(error.getType())).collect(Collectors.toSet());

Where collect(Collectors.toSet()) is used which will collect to a HashSet where are LinkedHashSet should be used to enforce ordering.

@stevehu stevehu closed this as completed May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants