Jsonnet based single file schemas

Summary

This RFC proposes a change to the way that we define govuk-content-schemas to utilise a single file per format approach using Jsonnet. This has been implemented in PR#634.

This proposal comprises of two relatively distinct concerns: the properties to define a schema and the usage of Jsonnet to author them.

For reference, an example of how we currently define a schema is: world_location/publisher/details.json, world_location/publisher/edition_links.json and world_location/publisher/links.json.

In the proposed format this would be authored as: world_location.jsonnet.

The properties to define a schema

Problem

Currently the majority of schemas are defined by creating multiple files: details.json, edition_links.json and/or links.json. These files are JSON schemas which are merged with default schemas.

This is fine for the majority of schemas, however when we have any unusual circumstances they need to generated by files that have significant repetition with the base files: 1, 2.

The consequence of this is that we have different ways to generate schemas which causes problems such as missing schemas and/or invalid ones.

Another problem is that it is very difficult to apply restrictions to any fields in the schema that aren’t details. For example most schemas allow all document_type values, even though they only need allow one or a few.

Approach

Because the data going into the Publishing API is consistent - restricted by Publishing API database schema - the ways a schema can variate from the default are quite predictable. They don’t actually need the ability to create a completely custom schema as is currently used.

It's also unnecessary to always be working in the context of a JSON schema as the different types of schema we generate can be subtly different. For example data going into Publishing API may have a forbidden title, this would always be a field in Content Store but just might have a value of null.

Fields identified

Defining the following fields can cover all the scenarios that currently exist with schemas, all of these fields are optional.

document_type

A string or array that defines which document_types can use a schema. Left null it will allow all schemas.

base_path

String of “optional”|”required”|”forbidden”. Default “required”.

routes

String of “optional”|”required”|”forbidden”. Default “required”.

redirects

String of “optional”|”required”|”forbidden”. Default “required”.

redirects

String of “optional”|”required”|”forbidden”. Default “required”.

title

String of “optional”|”required”|”forbidden”. Default “required”.

description

String of “optional”|”required”|”forbidden”. Default “required”.

rendering_app

String of “optional”|”required”|”forbidden”. Default “required”.

details

String of “optional”|”required”|”forbidden”. Default “required”.

definitions

An object that defines the JSON schema rules of the schema. This would mostly be used for defining details but could be used to refine other fields (eg a specific base_path format) and can be used of sub definitions in details. These are JSON schema objects.

edition_links and links

An object that defines the allowed links. We don’t need as much information as we currently have in edition_links.json as links can only be a type of “guid_list”.

The suggested format is to provide an object that has a key of the link type and a string value of a description. For defining required, maxItems or minItems this string value can be changed to an object:

links:
	example_1: “Description”
	example_2:
		description: “Another description”
		required: true

An example of this link definition in the final proposal is base_links.

Usage of Jsonnet in schemas

I anticipate the most eyebrow raising aspect of this proposal is the usage of Jsonnet to define the schema as it is not a particularly well known format, nor is it used in GOV.UK currently.

About Jsonnet

Jsonnet is a google project (created as a 20% project) which is a super-set of JSON, that allows the following additions to JSON that are interesting for schemas:

simpler, less restrictive syntax than JSON
ability to import files and merge contents
ability to add comments

It is notably used to work with Kubernetes in the form of Ksonnet.

It can be used in Ruby projects through the ruby-jsonnet gem.

Why Jsonnet

The most appealing feature of Jsonnet is the ability to import. This is particularly useful with schemas as there is frequently repetition. This is not a feature that is available (as far as I’m aware) in more common configuration languages. This presents to us a solution to the challenge we currently have where in most cases schemas share information, but a few instances are exceptions.

The other appealing aspect is that, as a super set of JSON, it is correct syntax to paste in examples of JSON schema and they will work, no need to convert them to a different syntax.

Example of import usage:

We can import default links into each schema that wants to use them, rather than needing to find complicated ways to rule defaults out: example
We can build up base schemas from common parts: example
We can show the initial file that a schema extends, so there is a clear place to look for the configuration defaults. Rather than finding them out from the code: example

Conclusion

In a number of areas some tough compromises have had to be made. The two main areas have been mixing JSON Schema into a non-JSON Schema definition file and the usage of Jsonnet where YAML/JSON may have been a less controversial.

Despite these two factors there are number of other things this proposal helps with:

Formats in a single file is easier to navigate and likely easier to understand
By not relying on the schema definition being a JSON Schema file we have the scope to add further non schema information (such as expanded links)
It'll be significantly easier to add any new schema types that might be needed (eg a going into content store one)
By building schemas based off a configuration rather than merging base ones we can create more accurate schemas with less unnecessary data (220,000 LOC removed)