Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify how Schema Objects require full-document parsing (3.1.1) #3758

Merged
merged 3 commits into from May 14, 2024

Conversation

handrews
Copy link
Contributor

@handrews handrews commented Apr 26, 2024

Please review even though this is closed! It should have been a draft PR and was merged prematurely.

This is the next step in un-blocking OASComply. Unlike the previous OASComply-related PR, this PR does not apply to v3.0.4. What, if any, wording about document parsing needs to go in 3.0.4 is something we can sort out later.

The implied requirement might be surprising to some, but is an unambiguous result of normatively citing JSON Schema 2020-12. Resolution of references to $id values that do not match the document's location (e.g. are resolved as URIs rather than URLs, including by noticing them in already-parsed content) was affirmed multiple times by TSC decisions during the work on OAS 3.1:

Unfortunately, it seems like we never did get around to adding explicit wording highlighting this. Which does not change the fact that a.) it was decided and re-affirmed, and b.) normatively citing JSON Schema 2020-12 mandates this behavior as the only possible way to satisfy its requirements.

There might be further changes related to this in the "Resolving Relative References in URIs", but I'll deal with that when I fix #3545, which covers both that section and "Resolving Relative References in URLs".

I also expect we'll want to add further guidance for both implementors and description authors on the Learn site. I tried to keep the text here as minimal as possible while heading off the misunderstandings we've seen as folks implement 3.1.


Commit message:

JSON Schema draft 2020-12 includes numerous keywords that require parsing the entire document prior to deeming a reference unresolvable. This makes that more clear and outlines several approaches.

The practice of embedding OpenAPI fragments in other formats is deemed to have implementation-defined (non-interoperable) behavior, as the potential complications that might arise are not predictable.

@handrews handrews added clarification requests to clarify, but not change, part of the spec re-use: ref/id resolution how $ref, operationId, or anything else is resolved labels Apr 26, 2024
@handrews handrews added this to the v3.1.1 milestone Apr 26, 2024
@handrews handrews requested a review from a team April 26, 2024 17:55
@handrews handrews changed the title Clarify how Schema Objs require full-doc parsing (3.1.1) Clarify how Schema Objects require full-doc parsing (3.1.1) Apr 26, 2024
@handrews handrews changed the title Clarify how Schema Objects require full-doc parsing (3.1.1) Clarify how Schema Objects require full-document parsing (3.1.1) Apr 26, 2024
JSON Schema draft 2020-12 includes numerous keywords that require
parsing the entire document prior to deeming a reference unresolvable.
This makes that more clear and outlines several approaches.

The practice of embedding OpenAPI fragments in other formats is
deemed to have implementation-defined (non-interoperable) behavior,
as the potential complications that might arise are not predictable.
miqui
miqui previously approved these changes Apr 28, 2024
Copy link
Contributor

@miqui miqui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job ! @handrews

@miqui
Copy link
Contributor

miqui commented Apr 28, 2024

This PR IMHO is very clear. I am very much tempted to merge this before the TDC call. @lornajane , @ralfhandl thoughts?

@handrews
Copy link
Contributor Author

@miqui this one has some pretty deep impacts so I'd rather leave it open for longer.

ralfhandl
ralfhandl previously approved these changes Apr 29, 2024
Copy link
Contributor

@mikekistler mikekistler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! 👍

@miqui
Copy link
Contributor

miqui commented May 2, 2024

Discussed during TDC 5/2/2024 - It was decided we need 2 more TSC approvals.

versions/3.1.1.md Outdated Show resolved Hide resolved
This goes into more detail and uses "undefined" instead of
"implementation-defined" as the behavior is likely to be
incorrect (rather than just a different interpretationof an
ambiguous requirement), and may result in security concerns
as well.
@handrews
Copy link
Contributor Author

handrews commented May 3, 2024

@ralfhandl I've pushed a new commit with the following commit message:


Rework the guidance around fragmentary parsing.

This goes into more detail and uses "undefined" instead of
"implementation-defined" as the behavior is likely to be
incorrect (rather than just a different interpretationof an
ambiguous requirement), and may result in security concerns
as well.


I don't know that it's a simplification, but I think it's more explicit and should be easier to follow. Please do continue to suggest simplified wording - it's something I struggle with sometimes.

Copy link
Contributor

@ralfhandl ralfhandl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good explanation!


Implementations that parse referenced fragments of OpenAPI content without regard for the content of the rest of the containing document will miss keywords that change the meaning and behavior of the reference target.
In particular, failing to take into account keywords that change the base URI introduces security risks by causing references to resolve to unintended URIs, with unpredictable results.
While some implementations support this sort of parsing due to the requirements of past versions of this specification, in version 3.1, the result of parsing fragments in isolation is _undefined_ and likely to contradict the requirements of this specification.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing some distinction, but this seems like the same behavior that's described as "implementation defined" in https://github.com/OAI/OpenAPI-Specification/pull/3732/files#diff-b92507e7acda65ae00a02236c555cefc68b6fca4661077b84c2fb9ab150e5e17R151

This section particularly addresses schema objects but the other PR seems to encompass schema objects too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@notEthan eh... (wobbles hand back and forth in a wishy-washy way). There's an overlap, but it's not quite the same. This is where we get into the "it's too hard to even explain" that I mentioned in the undefined/implementation-defined PR, but this ones not one of the worst so here's an explanation:

If you have a document that consists of an empty JSON object, {}, that document is syntactically valid as several different Objects. Parsing that document from different reference contexts is still full-document parsing, so it does not fall into this undefined behavior. But it is parsing using multiple conflicting contexts, so it falls into the implementation-defined behavior of the other PR.

It's a pretty fine hair to split, but it is deterministic. The whole-document case is implementation-defined because at least one implementor I've spoken with did not think it required a fix in the spec. I'm trying to invalidate as few existing implementations as possible, which is tricky to balance with the scenarios that result in outright wrong behavior.

Copy link
Contributor Author

@handrews handrews May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither of these changes get into the deeper question of whether documents are parsed as JSON/YAML once or each time, or whether, once parsed as JSON/YAML, the resulting structures are further parsed as OAS Objects once or each time. That would start getting into whether various parsing steps are cached, and what is done when a cached Object doesn't agree with a new parsing context. There are many ways this might or might not be detected, and many strategies one could pick to handle it - new each time, first wins, last wins, any conflict is an error, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an overlap, but it's not quite the same.

I'm happy to accept that, and appreciate the explanation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured it might help folks to see what "complicated to explain" looks like :-)

@ralfhandl
Copy link
Contributor

All reviewers seem to be happy with the latest state of the PR

@ralfhandl ralfhandl merged commit 05f49ef into OAI:v3.1.1-dev May 14, 2024
1 check passed
@RomanHotsiy
Copy link
Contributor

Looks good 👎

@handrews handrews deleted the docparse branch May 19, 2024 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clarification requests to clarify, but not change, part of the spec re-use: ref/id resolution how $ref, operationId, or anything else is resolved
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants