Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add integer to the list of types supported by the schema object. #87

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

auspicacious
Copy link

@auspicacious auspicacious commented Feb 28, 2024

While reading, I noticed that integer was not included in this list of possible types. Given that integer is used extensively in this chapter, I assume this was a simple oversight.

As an aside, though, I understand that the JSON Schema core specification draft draws a semantic difference between integer and the other types and that the OpenAPI specification explicitly adds integer as a type with a slightly different definition than the JSON Schema validation draft ("a JSON number without a fraction or exponent part" vs. "any number with a zero fractional part"). This is messy, but it doesn't seem worthwhile digging into these differences in this explainer.

Maybe the OAS should be more explicit about this definitional difference, though?

@handrews
Copy link
Contributor

@auspicacious thanks for this!

Maybe the OAS should be more explicit about this definitional difference, though?

As one of the main people involved in re-aligning OAS and JSON Schema, I'm confident that any difference between OAS 3.1 and JSON Schema draft 2020-12 is unintentional and perhaps an error. If you'd like to file an issue in the OAI/OpenAPI-Specification repo we can maybe clean up that language in 3.1.1 and 3.2.0.

As for your fix here, I think you are correct that "integer" should be included because it is talking about the type keyword (which allows "integer") and not the JSON data model types (which do not make that distinction). But perhaps we could note that distinction here? I'm not sure what wording would be best, but anything that indicates that JSON does not distinguish between integers and other kinds of numbers, but JSON Schema's type field does ought to do it.

@auspicacious
Copy link
Author

@handrews I let the scope creep up a bit, but I feel like this way of writing it is probably clearer.

I have another question, which you can push over to an issue I'll try to create in the specification repo when I have a little more time, but is relevant here as well.

For context, I was the author of this issue regarding integer formats eight years ago. I'm doing a little brushing up on the possibility I might be getting back into this space and I'm looking at the format registry you mentioned when closing that issue in January.

The format registry seems to be defining two new types: one called stringnumber and one called numberstring. I'm missing any potential context on planned revisions to the standard, and I can't find any references to the registry or these types in the current standard, but is this intentional? The Schema Validation draft does not allow the creation of additional types, and it seems like doing so would cause unnecessary challenges for validation and code generation systems.

Copy link
Contributor

@handrews handrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with doing some extra cleanup here, but it's very easy to step into murky territory so I'd try to be minimal about the changes.

## The Schema Object
## The Schema Field
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should stay "Schema Object" because while I think all fields that directly take a Schema Object are called schema, I'm not 100% sure, and there are other fields that involve references to schema objects (mapping in the Discriminator Object) as well. It's also not true in general that fields with the same name in different Objects have the same value, so it's best to talk about the Schema Object rather than the field.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so there's a few different concepts being conflated here, I think.

As I understand it, this document is titled "Content of Message Bodies" and is providing an overview of how you can use OpenAPI to describe the content of message bodies. If someone is reading these documents in sequence, I think that it is also the first place that they will encounter the use of schema to describe data.

So there are two separate concepts that need to be conveyed: the names of the fields that, together, make up the definition of the content of a message, and, also, the concept of a schema object, which can be used in many places.

I wrote this deliberately to separate the field called schema used at this particular location in a content definition, and the idea of a schema object, which can be used in various different places. In other words, I wrote it this way for the same reason you are asking me not to write it this way, and I'm not clear on your mental model here.

This might be clearer if the "Media Type Object" section above were renamed to "Media Type Field" to help distinguish this more clearly and be more consistent with the "content field" section at the top.

If this is not an appropriate mental model, I think that some more work needs to be done to describe the vocabulary and mental model that is appropriate.


The [Schema Object](https://spec.openapis.org/oas/v3.1.0#schema-object) defines a data type which can be a primitive (integer, string, ...), an array or an object depending on its `type` field.
The schema field holds a [Schema Object](https://spec.openapis.org/oas/v3.1.0#schema-object).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just be dropped if we keep "Schema Object" as the section header.


`type` is a string and its possible values are: `number`, `string`, `boolean`, `array` and `object`. Depending on the selected type a number of other fields are available to further specify the data format.
Schema objects describe the structure of data, and may be nested to describe complex arrays and objects. They are most often used to describe JSON data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to remove the "Depending on the selected type..." language as that's not really how JSON Schema works (you can use all fields at all times, it's just that they don't all apply to every data type, and some combinations are nonsensical- but they are still technically valid).

I'd just leave it as "Schema objects describe the structure of data." The exact relationship between JSON Schema and non-JSON data varies a bit among JSON Schema and OpenAPI versions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so delete the second sentence, but why not explain that schema objects are nested? That's how they're usually used. I know there's a few fields that only apply to the root, but for this non-normative introduction I think that might be a bit much to start with.


For example, for `string` types the length of the string can be limited with `minLength` and `maxLength`. Similarly, `integer` types, accept `minimum` and `maximum` values. No matter the type, if the amount of options for the data is limited to a certain set, it can be specified with the `enum` array. All these properties are listed in the [Schema Object](https://spec.openapis.org/oas/v3.1.0#schema-object) specification.
Each schema object has a field called `type`, which defines the type of data expected. Six of the seven possible values for `type` correspond directly to JSON's types as defined in [RFC 8259](https://datatracker.ietf.org/doc/html/rfc8259): `string`, `number`, `boolean`, `null`, `object`, and `array`. However, OpenAPI defines an additional type called `integer`, which indicates a JSON number without a fraction or exponent part. In other words, if your schema object is of type `integer`, it tells your users to expect a JSON number that looks like `123`, `-123`, or `0`, but not `1.0` or `1.0e-2`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

integer is defined by JSON Schema, not OpenAPI (we should ignore any inadvertent discrepancies between the two). Even in OAS 3.0, integer as a type comes from JSON Schema even if the wording suggests otherwise.

I'm also hesitant to get into the exact definition of integer as it changed in subtle ways between certain JSON Schema drafts. It's better to just let folks look at JSON Schema-related docs for that. For example, 1.0 is an integer in the JSON Schema drafts used for 3.1 and (Im 99% sure) 3.0. But (I think?) not in 2.0. I'd just note that JSON Schema adds an integer type to its type keyword for convenience and leave it at that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to save this for an issue, but, although I don't know the context that decision was made in, 1.0 really, really, really should not be considered an integer.

It will cause immediate problems for simple clients in, e.g., Python, if that is allowed:

>>> isinstance(json.loads('1'), int)
True
>>> isinstance(json.loads('1'), float)
False
>>> isinstance(json.loads('1.0'), int)
False
>>> isinstance(json.loads('1.0'), float)
True

and the OpenAPI definition is very clear that 1.0 is not an integer in the OpenAPI schema dialect, which is why I added the definition above.

For example, for `string` types the length of the string can be limited with `minLength` and `maxLength`. Similarly, `integer` types, accept `minimum` and `maximum` values. No matter the type, if the amount of options for the data is limited to a certain set, it can be specified with the `enum` array. All these properties are listed in the [Schema Object](https://spec.openapis.org/oas/v3.1.0#schema-object) specification.
Each schema object has a field called `type`, which defines the type of data expected. Six of the seven possible values for `type` correspond directly to JSON's types as defined in [RFC 8259](https://datatracker.ietf.org/doc/html/rfc8259): `string`, `number`, `boolean`, `null`, `object`, and `array`. However, OpenAPI defines an additional type called `integer`, which indicates a JSON number without a fraction or exponent part. In other words, if your schema object is of type `integer`, it tells your users to expect a JSON number that looks like `123`, `-123`, or `0`, but not `1.0` or `1.0e-2`.

Depending on the selected type, a number of other fields are available to further specify the data format. For example, for `string` types, the length of the string can be limited with `minLength` and `maxLength`. Similarly, `integer` types accept `minimum` and `maximum` values. No matter the type, if the amount of options for the data is limited to a certain set, it can be specified with the `enum` array. All these properties are listed in the [Schema Object](https://spec.openapis.org/oas/v3.1.0#schema-object) specification.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this isn't phrasing you chose, but all fields are always available. Certain fields apply only to certain types. Putting a maxLength field on an integer is a no-op, but it's not wrong. Also, minimum and maximum work with any number, not just integers. And be careful about type and enum - they always both apply, and it's easy to accidentally create an impossible-to-satisfy schema if you combine them carelessly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure that a reasonable validator/linter of JSON Schema documents, if one exists, would catch those issues, and my understanding is that this is a non-normative document intended to show how the tools should be used.

Rewriting the first sentence in this way is probably ambiguous enough for this situation:

In addition to type, several more fields can be used to further constrain your data.

@handrews
Copy link
Contributor

@auspicacious

The format registry seems to be defining two new types: one called stringnumber and one called numberstring. I'm missing any potential context on planned revisions to the standard, and I can't find any references to the registry or these types in the current standard, but is this intentional?

🤦 Nope. Those should be string, number (the order is irrelevant) meaning that the format applies to both numbers and strings, but something is apparently formatted wrong for how that gets rendered. The registry is on the gh-pages branch in the OAI/OpenAPI-Specification repo. Something will need to get fixed there.

Copy link
Author

@auspicacious auspicacious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking it would be best to go back to the original PR that just adds a single word to the existing document, and maybe revisit the rest of it at a later point after the underlying standards have reached a consensus. Ideally this overview document should be about one good way of doing things, not many almost good ways.

## The Schema Object
## The Schema Field
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so there's a few different concepts being conflated here, I think.

As I understand it, this document is titled "Content of Message Bodies" and is providing an overview of how you can use OpenAPI to describe the content of message bodies. If someone is reading these documents in sequence, I think that it is also the first place that they will encounter the use of schema to describe data.

So there are two separate concepts that need to be conveyed: the names of the fields that, together, make up the definition of the content of a message, and, also, the concept of a schema object, which can be used in many places.

I wrote this deliberately to separate the field called schema used at this particular location in a content definition, and the idea of a schema object, which can be used in various different places. In other words, I wrote it this way for the same reason you are asking me not to write it this way, and I'm not clear on your mental model here.

This might be clearer if the "Media Type Object" section above were renamed to "Media Type Field" to help distinguish this more clearly and be more consistent with the "content field" section at the top.

If this is not an appropriate mental model, I think that some more work needs to be done to describe the vocabulary and mental model that is appropriate.


`type` is a string and its possible values are: `number`, `string`, `boolean`, `array` and `object`. Depending on the selected type a number of other fields are available to further specify the data format.
Schema objects describe the structure of data, and may be nested to describe complex arrays and objects. They are most often used to describe JSON data.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so delete the second sentence, but why not explain that schema objects are nested? That's how they're usually used. I know there's a few fields that only apply to the root, but for this non-normative introduction I think that might be a bit much to start with.


For example, for `string` types the length of the string can be limited with `minLength` and `maxLength`. Similarly, `integer` types, accept `minimum` and `maximum` values. No matter the type, if the amount of options for the data is limited to a certain set, it can be specified with the `enum` array. All these properties are listed in the [Schema Object](https://spec.openapis.org/oas/v3.1.0#schema-object) specification.
Each schema object has a field called `type`, which defines the type of data expected. Six of the seven possible values for `type` correspond directly to JSON's types as defined in [RFC 8259](https://datatracker.ietf.org/doc/html/rfc8259): `string`, `number`, `boolean`, `null`, `object`, and `array`. However, OpenAPI defines an additional type called `integer`, which indicates a JSON number without a fraction or exponent part. In other words, if your schema object is of type `integer`, it tells your users to expect a JSON number that looks like `123`, `-123`, or `0`, but not `1.0` or `1.0e-2`.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to save this for an issue, but, although I don't know the context that decision was made in, 1.0 really, really, really should not be considered an integer.

It will cause immediate problems for simple clients in, e.g., Python, if that is allowed:

>>> isinstance(json.loads('1'), int)
True
>>> isinstance(json.loads('1'), float)
False
>>> isinstance(json.loads('1.0'), int)
False
>>> isinstance(json.loads('1.0'), float)
True

and the OpenAPI definition is very clear that 1.0 is not an integer in the OpenAPI schema dialect, which is why I added the definition above.

For example, for `string` types the length of the string can be limited with `minLength` and `maxLength`. Similarly, `integer` types, accept `minimum` and `maximum` values. No matter the type, if the amount of options for the data is limited to a certain set, it can be specified with the `enum` array. All these properties are listed in the [Schema Object](https://spec.openapis.org/oas/v3.1.0#schema-object) specification.
Each schema object has a field called `type`, which defines the type of data expected. Six of the seven possible values for `type` correspond directly to JSON's types as defined in [RFC 8259](https://datatracker.ietf.org/doc/html/rfc8259): `string`, `number`, `boolean`, `null`, `object`, and `array`. However, OpenAPI defines an additional type called `integer`, which indicates a JSON number without a fraction or exponent part. In other words, if your schema object is of type `integer`, it tells your users to expect a JSON number that looks like `123`, `-123`, or `0`, but not `1.0` or `1.0e-2`.

Depending on the selected type, a number of other fields are available to further specify the data format. For example, for `string` types, the length of the string can be limited with `minLength` and `maxLength`. Similarly, `integer` types accept `minimum` and `maximum` values. No matter the type, if the amount of options for the data is limited to a certain set, it can be specified with the `enum` array. All these properties are listed in the [Schema Object](https://spec.openapis.org/oas/v3.1.0#schema-object) specification.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure that a reasonable validator/linter of JSON Schema documents, if one exists, would catch those issues, and my understanding is that this is a non-normative document intended to show how the tools should be used.

Rewriting the first sentence in this way is probably ambiguous enough for this situation:

In addition to type, several more fields can be used to further constrain your data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants