Split Overview into the two specific use cases #1370

awwright · 2022-12-17T01:30:30Z

This is an alternative to #1244, and like that issue, this one should be incorporated prior to #1365. This PR makes a clearer distinction between validation uses and annotation uses that 1365 will draw on (it relies on classifying which use each keyword is being used for, since validation use is different from annotation use).

This PR moves vocabulary-related discussion deeper into the specification, and replaces Overview with a specific description of what a JSON Schema can do, especially in terms of its output.

Once its capabilities are described, the rest of the specification can describe how you write a schema that can do these things.

This replaces the Overview section, which is a little bit redundant with the above Abstract, then Introduction, sections.

CC @handrews: Could you provide your take on if this touches Vocabularies too much?

handrews · 2023-01-20T20:19:29Z

@awwright I generally like the split. I'd overall be more comfortable working out high-level organization and terminology in a few issues. I find it hard to think about terminology changes in PRs like this because they have systemic implications. Manageable-sized PRs make it hard to think about the systemic impact, and PRs that change terminology everywhere to show consistency are too large to work with in general.

Some of this also comes from recent experiences of having wording I've written questioned- the questions were understandable, but we were only able to work out what the spec really meant because there was a solid record of discussion in issues that we could research in addition to the text.

I've also had too many experiences (both as a writer and reader) where people thought they agreed on terminology, but actually had different meanings in mind. Discussions in issues, where work can focus on the concepts and fit words to those concepts once they are thoroughly understood and agreed to, gives me much more confidence. It would also help us avoid some of the terminology inconsistencies that have accumulated because the terminology was developed piecemeal instead of worked out as a system.

There's a balance between working out conceptual pieces and working out the whole system that I'm not sure how to manage, but in both cases I find working from concrete wording in PRs difficult. It makes me nervous for reasons that have little to do with the PR and a lot to do with how things have ended up misunderstood in the past.

handrews · 2023-01-20T20:26:27Z

As far as vocabularies, I see the point of moving things down. I also feel like it's important to get the concept established early (although it doesn't need as much detail as is currently present in the overview). This is another thing where working at an outline level (or slightly more detailed) would help a lot more than looking at text changes and movements.

This better describes the specific uses that JSON Schema supports. The vocabulary-related prose is moved down into the relevant section.

awwright · 2023-03-28T01:47:59Z

If this looks reasonable, I'd like to move this through next.

Then I'd like to focus mostly on resolving outstanding PRs to accommodate a conversion to Markdown, but I'd also like to squeeze in #1390—this would help organize the spec to accommodate writing #1365/Discussion #329.

gregsdennis

I'm still on the fence about needing to split validation and annotation use cases like this, although this does give a deeper dive into each. I think these sections may need to be children of a "base use cases" section to ease the reader into it a bit more.

Extracting the vocab bit is fine with me.

gregsdennis · 2023-03-29T21:34:16Z

jsonschema-core.xml

-                JSON Schemas are themselves JSON documents.
-                This, and related specifications, define keywords allowing authors to describe JSON
-                data in several ways.
+                A JSON Schema document describes a validator (also known as a "recognizer" or "acceptor") which classifies a provided JSON document as "accepted" or "rejected."


The "accept"/"reject" terminology is new. I see you use it later in the PR as well, but it's not used throughout the document.

It's new to this spec, but it is used widely outside JSON Schema and may help new readers understand what is going on. I'm going to suggest we should use accept/reject more often (it greatly simplifies the phrasing of many sentences), but that'll be an issue for later.

Can you remove that language from this PR and open an issue for that change, please?

I'm not opposed to it, but I think vernacular should be an agreed-upon change, not something that's just snuck in.

Well my point is there's a certain segment who may see our language as new, and "accepts" is the existing term they're familiar with. I think we should use a variety of language to introduce and define the concepts, and then we can use our choice of term for the rest of the document. Is there a problem with this line of thinking?

I don't have a problem with introducing them, but this PR doesn't seem the place for it. I'd like to get the opinions of the other maintainers.

I know what a finite state machine is, I still don't find the references you're adding helpful

significantly fewer people have a real understanding of them or how a JSON Schema can be mapped into one

Ok, though my argument is that not every part of the intro has to be helpful to everyone; it has to be written so that the widest possible audience will understand what JSON Schema accomplishes for them.

The two biggest audiences, I think, will be application developers ("I want a DSL for checking JSON, instead of doing it in code") and formal grammars ("I know what ABNF and DTDs are, I want this for JSON").

I think you'll find that other similar technology uses technical terms much more heavily than I'm suggesting we do.

I looked at the introduction for ABNF, which I found far too technical for most people to understand. It talks in technical terms that it's a formal syntax, but doesn't really describe why you'd want to use it at all, or use it over other languages.

XML DTD also talks about formal grammars, validators, and uses the accepts/rejects terminology; but it too is somewhat technical and it's not immediately obvious to me who the target audience is.

So what I'm looking for is (1) should the formal grammar audience be accommodated in the introduction? (Since ABNF and DTDs both seem to be written exclusively for this audience, I would suggest this is important.)

And (2) if we should accommodate the formal grammar audience, is there a better way to write it so that it's more helpful for them, and less confusing to others?

Ok, though my argument is that not every part of the intro has to be helpful to everyone; it has to be written so that the widest possible audience will understand what JSON Schema accomplishes for them.

This is a code review, where saying "I don't find it helpful" is to say "I believe you should not add this, it isn't helpful to a wider audience", not simply offering my own anecdote about my personal reading.

XML DTD also talks about formal grammars, validators, and uses the accepts/rejects terminology;

Section 2.8 of a document is wildly different from being literally the first paragraph of the actual content of the document. I also don't see the "accepts/rejects" terminology in the section you linked. It uses "valid", as we already do.

So what I'm looking for is (1) should the formal grammar audience be accommodated in the introduction?

You already have my own opinion, now three times: no, we should not.

not simply offering my own anecdote about my personal reading

Ok, I ask because saying "I don't find it helpful" is suggestive of a personal opinion without projecting what others will think; saying "I don't believe this will be helpful" is a general observation of the sort I'm looking for.

I'm going to have to think about what else to say, if it's not immediately obvious that formal grammars are related here, as that's the formal study of what JSON Schema is fundamentally doing.

I also don't see the "accepts/rejects" terminology in the section you linked. It uses "valid", as we already do.

XML does not use the term "validates" (in the third person singular) to refer to an outcome (and actually it doesn't use it in that form at all). It uses "validate" to describe a process, "accept"/"matches", and "reject" to describe outcomes of that process, and "valid" to describe documents that have been accepted by the process, but nothing like "validates successfully" as we do.

Ok, I ask because saying "I don't find it helpful" is suggestive of a personal opinion without projecting what others will think; saying "I don't believe this will be helpful" is a general observation of the sort I'm looking for.

At the risk of quoting myself, the comment I left before that was quite clear on which I was intending, please don't ignore it:

All in all I find the first few paragraphs here to be a step back

I don't see them as adding understanding to someone reading the spec

what's here in this whole PR does too much

I'm going to bow out of this PR as well, as I've I believe communicated I'm -1 on the changes in their current form, and that there might be smaller changes that I'm more positive on but that they're sufficiently far away from this PR in its current state that it's not a matter of rewording a small bit here and there. It bears repeating I suppose that that's just my vote, and others may disagree of course, though obviously I've landed on this PR after Greg sounds like he was expressing similar doubts.

gregsdennis · 2023-03-29T21:35:03Z

jsonschema-core.xml

+                A condition for accepting a document is called an "assertion".
+                Assertions impose constraints that instances must conform to.
+                Given a schema and an instance, the schema "accepts" an input whenever all the assertions are met,
+                and the schema "rejects" when any of the assertions fail.


"rejects" needs an object, i.e. what is being rejected?

The input JSON document, as was mentioned in 'the schema "accepts" an input whenever...'

Yes, but grammatically, you need to repeat the object.

gregsdennis · 2023-03-29T21:36:40Z

jsonschema-core.xml

-                JSON Schemas are themselves JSON documents.
-                This, and related specifications, define keywords allowing authors to describe JSON
-                data in several ways.
+                A JSON Schema document describes a validator (also known as a "recognizer" or "acceptor") which classifies a provided JSON document as "accepted" or "rejected."


Does the schema describe a validator? I would expect people think of the "validator" as the implementation, not the document.

Yeah that makes sense... There's a sense in which these two uses are actually the same, the "validator implementation" is just a generic form of validator that is configurable. Like if I have a schema, then if the program is written or compiled to work only with that schema, or if it's generic and configured at runtime, makes no difference.

Is there a better name for "the program that tests an input against some specific schema"?

I don't think you understand my point. Colloquially, the "validator" is the implementation, not the schema. I think we need to stick with this.

Saying the schema itself is the validator will be confusing. A validator evaluates JSON against a schema. The schema is no more than configuration.

Colloquially, the "validator" is the implementation, not the schema.

I believe I see the point you're making, but I'm adding, this is similar to how we discuss compilers and interpreters. You're pointing out a definition of "validator" that functions like an interpreter: there's a library that reads the schema (the source code), then uses this interpretation to validate JSON.

But you can also compile source code to a program, and run the program directly. In this paradigm, there is no interpreter (what is usually called the validator), but the compiled program is still a "validator" (a thing that performs validation). It just has no concept of a schema (any more than a compiled C program can parse C).

So with JSON Schema, the schema is not the validator (as such), but I think you can say it describes a validator.

I see where you're coming from, but never in my experience with this project have we used "validator" that way. It has always been used to mean the implementation.

At best, this reads weird.

@Julian If I compile a schema or curry away the schema argument, leaving an executable that only reads an instance, what terminology should we use for the compiler, and the program/function it outputs?

I ask because in my opinion, I think the function that accepts the instance would be the "validator", not the compiler. And I argue this usage is entirely consistent with most "validator" libraries that are more like interpreters (they both parse the schema, and validate instances, in a single package).

I don't believe we need terminology for such a concept in the spec at all (and certainly not at this point in time). What we use today is fine, "implementation", which refers to the executable program capable of doing things with schemas.

the function that accepts the instance would be the "validator"

This I agree with, but it doesn't follow from this that the schema is a validator. The schema is still just "configuration" (if you want to call it that. It still goes through a library/application, and you get an output. It's just that your example also produces an intermediate output of an executable function that represents a specific schema. The system is inputting the JSON Schema (most likely as JSON or YAML text) and an instance and getting out whether the instance is valid according to that schema. That "compile" step is an intermediate implementation detail that doesn't need to be covered in the spec.

The spec needs to concern itself with one thing:

inputs: a schema and an instance

output: validation results and/or annotations

Anything an implementation does to get from input to output is necessarily beyond the scope of the spec.

but it doesn't follow from this that the schema is a validator

I see, this isn't what I intended to convey. By saying "the schema describes a validator" I think that would disconnect the schema (the description) from the validator (the actual process). Is a different word is in order here, or some additional explanation ("the schema describes the behavior of a validator")?

I don't think it's necessary to say that at all.

A schema describes a set of constraints and annotations that can be applied to an instance. That's it. There's no need to bring in implementations of any form.

gregsdennis · 2023-03-29T21:36:52Z

jsonschema-core.xml

-                with a "$" character to emphasize their required nature.  This vocabulary
-                is essential to the functioning of the "application/schema+json" media
-                type, and is used to bootstrap the loading of other vocabularies.
+                A schema may also describe an "annotator," a way to read an instance and output a set of "annotations."


Is the schema describing an annotator? (same as "validator" above)

Yeah, similar situation, I have a schema, and I want to use it to compile a program that takes a JSON input and returns an output format. It's not otherwise configurable, maybe this is an HTTP service. What do I call that program?

gregsdennis · 2023-03-29T21:38:57Z

jsonschema-core.xml

+                However, not all valid input is meaningful or true to a given application.
+                That is, if you process an arbitrary instance with nonsense data,
+                the resulting annotations may not necessarily be true, even though the input is valid.


The use of "true" here is odd. What does it mean for an input to be "true" to an application?

Yeah, I struggled a bit with how to phrase this. I'm trying to explain the phenomenon of "garbage in garbage out" and that the assertions don't have to be 100% completely defined.

I think dropping "true" and sticking with "meaningful" is the right way here.

awwright requested a review from handrews December 17, 2022 01:30

awwright force-pushed the two-broad-cases branch 2 times, most recently from 3c321dd to 8a76bea Compare December 17, 2022 07:11

awwright mentioned this pull request Dec 26, 2022

Upgrade to rfc2xml v3 #1372

Merged

awwright force-pushed the two-broad-cases branch from 8a76bea to 6496f47 Compare February 27, 2023 01:30

Split Overview into Validation, Annotation, and Vocabularies

d9369ab

This better describes the specific uses that JSON Schema supports. The vocabulary-related prose is moved down into the relevant section.

awwright force-pushed the two-broad-cases branch from 6496f47 to d9369ab Compare March 27, 2023 01:13

awwright requested review from gregsdennis and Relequestual and removed request for handrews March 28, 2023 00:29

awwright marked this pull request as ready for review March 28, 2023 01:49

gregsdennis requested changes Mar 29, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split Overview into the two specific use cases #1370

Split Overview into the two specific use cases #1370

awwright commented Dec 17, 2022

handrews commented Jan 20, 2023

handrews commented Jan 20, 2023

awwright commented Mar 28, 2023

gregsdennis left a comment

gregsdennis Mar 29, 2023

awwright Mar 30, 2023

gregsdennis Mar 30, 2023

awwright Mar 30, 2023

gregsdennis Mar 30, 2023

awwright Apr 3, 2023 •

edited

Julian Apr 3, 2023

awwright Apr 3, 2023

Julian Apr 3, 2023

Julian Apr 3, 2023

gregsdennis Mar 29, 2023

awwright Mar 30, 2023

gregsdennis Mar 30, 2023

gregsdennis Mar 29, 2023

awwright Mar 30, 2023

gregsdennis Mar 30, 2023 •

edited

awwright Mar 30, 2023

gregsdennis Mar 30, 2023 •

edited

awwright Apr 3, 2023

Julian Apr 3, 2023

gregsdennis Apr 3, 2023

awwright Apr 4, 2023

gregsdennis Apr 4, 2023

gregsdennis Mar 29, 2023

awwright Mar 30, 2023

gregsdennis Mar 29, 2023

awwright Mar 30, 2023

gregsdennis Mar 30, 2023

Split Overview into the two specific use cases #1370

Are you sure you want to change the base?

Split Overview into the two specific use cases #1370

Conversation

awwright commented Dec 17, 2022

handrews commented Jan 20, 2023

handrews commented Jan 20, 2023

awwright commented Mar 28, 2023

gregsdennis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

awwright Apr 3, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gregsdennis Mar 30, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gregsdennis Mar 30, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

awwright Apr 3, 2023 •

edited

gregsdennis Mar 30, 2023 •

edited

gregsdennis Mar 30, 2023 •

edited