Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mediatype: schema+yaml #18

Open
ioggstream opened this issue Feb 7, 2022 · 8 comments
Open

mediatype: schema+yaml #18

ioggstream opened this issue Feb 7, 2022 · 8 comments

Comments

@ioggstream
Copy link
Collaborator

ioggstream commented Feb 7, 2022

I expect

To register the application/schema+yaml media type to support interoperability.

Notes

YAML is widely used in the API community to write specification documents like OAS3+ files.
Since OAS3.0 builds on json-schema draft 4 and OAS3.1 on json-schema 2020, it is useful to
standardize that:

  • it is possible to serialize json-schema using subset of YAML that can be represented in JSON;
  • support tools for people that currently use YAML syntax (eg. in schema and API catalogues, in developer platforms, ..)

Benefits:

  • since YAML is already used for developing schemas, providing a media type with related interoperability and security considerations supports the regulation of already established practices;
  • API developers using YAML serialization usually need to convert specifications from/to JSON to use json-schema tools: we expect that registering this media type will improve yaml support in json-schema tools and avoid manual work.
This was referenced Feb 7, 2022
@jdesrosiers
Copy link
Contributor

I think it's useful to dig down a little on how a JSON Schema media type for YAML might be beneficial for developers. In most cases it makes no difference what you use to develop your schemas because the tooling works against the parsed document anyway. It doesn't know or care what format it was in before it was parsed.

The only time media types are a concern is when the tooling fetches external schemas from the web or the file system. In this case, the tooling behaves like a kind of web client for JSON Schema. So, the first question is, is it useful to serve JSON Schema as YAML over HTTP? I would say no. It takes almost no effort to convert the YAML to develop in to JSON when you serve it. The JSON is going to be way more efficient and there's no need for the human readability of YAML at this point. There's no good reason to chose YAML as the transport format.

The next question is, is it useful to serve YAML from the file system? This is where it becomes pretty inconvenient for developers who want to develop schemas in YAML. If the JSON Schema client doesn't understand YAML, then the developer needs to transform the schemas into JSON on the file system and modify all the references because things like $ref: 'address.schema.yaml' would have to become "$ref": "address.schema.json" because the extension determines the media type on the file system.

However, instead of fetching from the file system, you can load all of your schemas into the JSON Schema client manually. This can relatively easily be done by walking a directory and loading all the schemas in that directory. This is what most tooling expect you to do anyway. So, although working with YAML served from the file system is a huge pain, there is an alternative approach that makes working with YAML, or any other format relatively painless.

So, in the end I don't see that the lack of a JSON Schema media type for YAML would limit developers in any way. I don't think there's a problem that needs solving here. I think YAML developers are sufficiently supported as it is. Let me know if I'm missing anything.

@ioggstream
Copy link
Collaborator Author

ioggstream commented Feb 11, 2022

is it useful to serve JSON Schema as YAML over HTTP?

My experience is that API development on code hosting platforms, including reviews, relies on yaml.
AFAIR, I have never designed or reviewed an API or schema specification serialized in JSON.
The vast majority of $refs I have see until now references yaml files.

Since OAS3.1 is not widely deployed, people out there are still writing their schema files in yaml
using draft-4... but when OAS3.1 will be mainstream, we'll eventually get to the point where
OAS3.1 documents will be using yaml, while schemas are constrained to json - theoretically, because in practice I think that implementers will actually support yaml - just made a couple of tests on eg. stoplight studio.

you can load all of your schemas into the JSON Schema client manually

it's that manually word that I don't like :)

@stuartherbert
Copy link

(... bringing over the discussion from issue 14 ...)

The registration guidelines in RFC 6838 lays out two tests / criteria:

  1. "Media types that make use of a named structured syntax SHOULD use the appropriate registered "+suffix" for that structured syntax when they are registered."
  2. "media types MUST NO be given names incorporating suffixes for structured syntaxes they do not actually employ"

JSON Schema is written as YAML, when it is embedded in OpenAPI Spec v3.1 files (and in AsyncAPI spec files too). (Oh, and K8S is now using JSON Schema to describe manifest payloads as part of CRDs. They're also written in YAML.) On the face of it, that seems to be sufficient to be accepted for registration?

I personally don't know anyone extracting that schema and publishing it as a standalone resource in YAML format today, mostly because OpenAPI Spec v3.1 adoption is still at an early phase (in the communities where I work).

But it's only a matter of time.

I have done work for clients who required schema definitions to be published as standalone, addressable resources. Once folks are used to writing schemas in YAML (as part of OpenAPI Spec v3.1), I do expect them to ask for any extracted schema to also be published in YAML for ease of comparison / consistency.

I can't see anything in the RFC that makes tooling a factor in considering registration, one way or the other. Section 4.2.8 only has those two tests for whether or not a media type can include a particular structured syntax name suffix. I'm going to respectfully argue that it's out of scope to consider how schema resources in YAML are used.

If they exist (they do), and they're used (that requires evidence), then they "SHOULD" be registered.

HTH.

@jdesrosiers
Copy link
Contributor

@ioggstream I'm not arguing that people don't use or want to use YAML to write JSON Schemas. It's a very popular way to write schemas. As you say, someone who's exposure to JSON Schema is limited to OpenAPI may not have ever seen a JSON Schema in JSON. All that is far from the point. The JSON in JSON Schema doesn't mean it's written in JSON. I means it validates data that is compatible with JSON. Neither the schema nor the instance is required to have been serialized as JSON at some point. You can develop the schema however you like. It can be JSON, YAML, TOML, or even a data structure in your programming language of choice. My point is that you don't need a media type to develop a JSON Schema. You only need a media type to transport a JSON Schema. So, this not true ...

but when OAS3.1 will be mainstream, we'll eventually get to the point where OAS3.1 documents will be using yaml, while schemas are constrained to json

OpenAPI's adoption of 2020-12 JSON Schema does not in any way restrict OpenAPI 3.1 developers to only using JSON. JSON Schema doesn't care how you develop your schemas. You are more than welcome to use YAML or anything else. Implementations work on the data structure that is parsed from the code. Once the text is parsed, it's all the same.

@stuartherbert

I'm going to respectfully argue that it's out of scope to consider how schema resources in YAML are used. If they exist (they do), and they're used (that requires evidence), then they "SHOULD" be registered.

I think the important question is, are there implementations out there that make use of this media type. I know that application/schema+json and application/yaml are widely used despite not being officially registered, but I've never heard of any implementation that makes use of `application/schema+yaml. As you say, if they exist, we should register the media type. If not, maybe it's not time yet.

@ioggstream
Copy link
Collaborator Author

The JSON in JSON Schema doesn't mean it's written in JSON. I means it validates data that is compatible with JSON.
You can develop the schema however you like. It can be JSON, YAML, TOML...

This is an interesting perspective, but I think that it is not clear from the spec which is based on media types.

Abstract:

JSON Schema defines the media type "application/schema+json", a JSON-based format for describing the structure of JSON data

Introduction:

This document proposes a new media type "application/schema+json" to identify a JSON Schema for describing JSON data.

This means that, by specification, json schema is linked to its serialization, and writing it in YAML, TOML, ... is non standard.
It can't thus be content-negotiated nor published in standard way.

are there implementations out there that make use of this media type

This spec is going to register application/yaml and +yaml too since there's a mess around yaml too (e.g. text/yaml, application/yaml, application/x-yaml, application/vnd.yaml, ...), so the goal is to provide a standard way to do it.

I've never heard of any implementation that makes use of `application/schema+yaml.
Both openapi and schema are currently published with the yaml media type in API context, but this means that the only way to do it in a somewhat standard way is:

  • limit to json-schema draft-4
  • embed json-schema in OAS3 files
    which is suboptimal.

@jdesrosiers
Copy link
Contributor

I think you're reading too much into what those statements in the spec mean, but I can agree that it needs to be more clear. The media type is just a means to address schemas and serialize them for transport.

JSON Schema is programming language agnostic, and supports the full range of values described in the data model. Be aware, however, that some languages and JSON parsers may not be able to represent in memory the full range of values describable by JSON.

This is (awkwardly) saying that JSON Schema works with anything that is compatible with JSON. In fact it even makes allowances for not-quite-JSON-compatible,

It is possible to use JSON Schema with a superset of the JSON Schema data model, where an instance may be outside any of the six JSON data types.

As an expert in JSON Schema, I can assure you that the intention of the authors is to be very flexible in what form a schema takes. All that matters is that it's compatible with JSON, and even that is flexible. We define a media type for transporting a schema in JSON (application/schema+json), but you're not intentionally limited to only that media type. As long as the JSON Schema client knows how to parse it, it's fair game. There's absolutely no problem with using a YAML encoded JSON Schema in a YAML document such as OpenAPI. I'm going to stop there because I don't think there's any benefit of each of us continuing to repeat our arguments until we can agree on this point.

@ioggstream
Copy link
Collaborator Author

JSON Schema is programming language agnostic, and supports the full range of values described in the data model

Maybe it just need an editorial review to align the abstract/introduction with the rest of the text.

I'm going to stop there...

For now, let's focus on merging #19, but I think that this discussion, however it goes, just benefits JSON schema.
Before tackling this PR we need to address #15 and #16.

I hope to discuss these issues at IETF113, but there's not a date yet.

@cjaccino
Copy link

I believe having an application/schema+yaml or an application/json-schema+yaml is a benefit to the community. I don't care which, but I lean toward the second because schema+yaml unnecessarily presumes there will never be a native yaml schema format. application/schema+yaml is a bit of a semantic land grab.

An argument against having a schema to transport JSON Schema as YAML is an argument against a tools ability to describe the payload. Not having the ability to convey this meaning does harm.

The discussion of schemas as files seems to miss the fact that they can be named anything. The fact that we may use .json and .yaml on the ends of our files does note necessarily influence the URI.

Given URI: /schemas/Person a client should be able to request a schema in JSON, YAML, or other common format, which we know people would like to be able to do. The burden should be on the server to provide the best response it can, which may mean adapting to the client. Avoiding registration of the media type merely prevents us from being able to have the conversation.

Avoiding registration of the YAML media type for JSON Schema unnecessarily restricts our language. It leaves a semantic gap that we have correctly identified.

We should have a media type for YAML-borne JSON Schemas. At the same time, we can all acknowledge that JSON is the native format for JSON Schema.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Discussion
Development

No branches or pull requests

4 participants