New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tests from JSONTestSuite #140
Conversation
Note These results are affected by shared workloads on GitHub runners. Use the results only to detect possible regressions, but always rerun on more stable machine before making any conclusions! Benchmark results (pull-request, 02e873c)
Benchmark results (master, 7bc09ac)
|
c4a582d
to
db8feaf
Compare
db8feaf
to
02e873c
Compare
Quality Gate passedIssues Measures |
@Breus looks like that badge is going to get a little bit more impressive π |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main problem is what to do with n_* test cases?
I'd suggest to clearly claim in the docs that the masker is not a RFC 8259 compliant JSON parser, so the output for non-valid JSONs may be unpredictable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Let's address the invalid input handling (? vs throwing exception) in a different PR. Not too big of a deal
This PR adds all tests from
JSONTestSuite
. That suite has 3 types of tests:y_*
are valid JSONsi_*
are invalid JSONs that are allowed to be parsed, usually some large numbers and/or invalid UTF-8 sequencesn_*
are invalid JSONs, for this the normal parser should fail, however we can process all of them except 2 that have 100k nesting level (P.S. Don't try opening them in IDEA π)I've copied all of them to our repository and established following test cases
mustPassSuiteWithNoopMaskerShouldBeEquivalentToJackson
y_*
tests must pass. I'm usingValueMaskers.withRawValueFunction(value -> value)
as a more complicated noop masker to make sure that we always correctly track values, transform them to strings and back to bytes. The results of this are equivalent when parsed by JacksonmayPassSuiteWithNoopMaskerShouldNotFail
i_*
tests must not fail with and ideally should be equivalent to how Jackson parses it (if it can). There are 4 exceptions whenjava.lang.String
treats invalid UTF-8 characters differently than Jackson doesmustFailSuiteWithNoopMaskerShouldOnlyFailWithInvalidJsonException
n_*
tests must fail for a normal JSON parser, however we allow it not to fail, but if it does fail, then it must be withInvalidJsonException
. WithValueMaskers.withRawValueFunction
(or default masking options) there are only 2 test cases that are failing due to internalStackOverFlowError
.shouldMaskAllTestCasesPredictably
For all test files (including
n_*
) I've simply asserted the masker json we currently produce.mustPassSuiteWithNoopTextFunction
Verifies that all test files that contain correct UTF-8 sequences we properly decode it in
withTextValueFunction
. There are 26i_*
test files where we fail on converting invalid sequences.For the
withTextValueFunction
function the question is whether we should continue to fail on those 26 test cases or take the behavior ofjava.lang.String
of converting most invalid sequences intoοΏ½
and passing that to the user provided function?Fixes
This PR has couple of fixes for cases when we get out of bounds or throw internal exception instead of
InvalidJsonException
. All of the fixes only apply ton_*
andi_*
test cases, nothing critical was found during the testing πHard decisions time!
The main problem is what to do with
n_*
test cases?The main problem is that we're not actually parsing the values so we do not reject things like invalid numbers or invalid escapes (except for
withTextValueFunction
. We also do not fail on invalid json structures - for example in arrays we don't strictly need to see a comma and ignore anything unknown and mask values if we detect them. With that for most of the test cases we actually mask them in more or less predictable ways, for example[""
->["***"
(n_array_unclosed.json
)[ , ""]
->[ , "***"]
(n_array_missing_value.json
)However some test cases are leaking information or giving weird results
abc
->abc
(n_string_single_string_no_double_quotes.json
) when string doesn't have quotes then we simply leave it as is['single quote']
->['single quo"&&&"
(n_string_single_quote.json
) we ignore the value started with a single quote and accidentally think that we've found a boolean to maskIn
shouldMaskAllTestCasesPredictably
I've asserted the current status, so it would be useful to go over them and decide whether we should do anything special about each one of those.Closes #121