This RFC proposes a change to the way that we define govuk-content-schemas to utilise a single file per format approach using Jsonnet. This has been implemented in PR#634.
This proposal comprises of two relatively distinct concerns: the properties to define a schema and the usage of Jsonnet to author them.
For reference, an example of how we currently define a schema is:
world_location/publisher/details.json
,
world_location/publisher/edition_links.json
and world_location/publisher/links.json
.
In the proposed format this would be authored as:
world_location.jsonnet
.
Currently the majority of schemas are defined by creating multiple files: details.json, edition_links.json and/or links.json. These files are JSON schemas which are merged with default schemas.
This is fine for the majority of schemas, however when we have any unusual circumstances they need to generated by files that have significant repetition with the base files: 1, 2.
The consequence of this is that we have different ways to generate schemas which causes problems such as missing schemas and/or invalid ones.
Another problem is that it is very difficult to apply restrictions to any fields
in the schema that aren’t details
. For example most schemas allow all
document_type values, even though they only need allow one or a few.
Because the data going into the Publishing API is consistent - restricted by Publishing API database schema - the ways a schema can variate from the default are quite predictable. They don’t actually need the ability to create a completely custom schema as is currently used.
It's also unnecessary to always be working in the context of a JSON schema as the different types of schema we generate can be subtly different. For example data going into Publishing API may have a forbidden title, this would always be a field in Content Store but just might have a value of null.
Defining the following fields can cover all the scenarios that currently exist with schemas, all of these fields are optional.
A string or array that defines which document_types can use a schema. Left
null
it will allow all schemas.
String of “optional”|”required”|”forbidden”. Default “required”.
String of “optional”|”required”|”forbidden”. Default “required”.
String of “optional”|”required”|”forbidden”. Default “required”.
String of “optional”|”required”|”forbidden”. Default “required”.
String of “optional”|”required”|”forbidden”. Default “required”.
String of “optional”|”required”|”forbidden”. Default “required”.
String of “optional”|”required”|”forbidden”. Default “required”.
String of “optional”|”required”|”forbidden”. Default “required”.
An object that defines the JSON schema rules of the schema. This would mostly
be used for defining details
but could be used to refine other fields
(eg a specific base_path format) and can be used of sub definitions in details.
These are JSON schema objects.
An object that defines the allowed links. We don’t need as much information as we currently have in edition_links.json as links can only be a type of “guid_list”.
The suggested format is to provide an object that has a key of the link type and a string value of a description. For defining required, maxItems or minItems this string value can be changed to an object:
links:
example_1: “Description”
example_2:
description: “Another description”
required: true
An example of this link definition in the final proposal is base_links.
I anticipate the most eyebrow raising aspect of this proposal is the usage of Jsonnet to define the schema as it is not a particularly well known format, nor is it used in GOV.UK currently.
Jsonnet is a google project (created as a 20% project) which is a super-set of JSON, that allows the following additions to JSON that are interesting for schemas:
- simpler, less restrictive syntax than JSON
- ability to import files and merge contents
- ability to add comments
It is notably used to work with Kubernetes in the form of Ksonnet.
It can be used in Ruby projects through the ruby-jsonnet gem.
The most appealing feature of Jsonnet is the ability to import. This is particularly useful with schemas as there is frequently repetition. This is not a feature that is available (as far as I’m aware) in more common configuration languages. This presents to us a solution to the challenge we currently have where in most cases schemas share information, but a few instances are exceptions.
The other appealing aspect is that, as a super set of JSON, it is correct syntax to paste in examples of JSON schema and they will work, no need to convert them to a different syntax.
- We can import default links into each schema that wants to use them, rather than needing to find complicated ways to rule defaults out: example
- We can build up base schemas from common parts: example
- We can show the initial file that a schema extends, so there is a clear place to look for the configuration defaults. Rather than finding them out from the code: example
In a number of areas some tough compromises have had to be made. The two main areas have been mixing JSON Schema into a non-JSON Schema definition file and the usage of Jsonnet where YAML/JSON may have been a less controversial.
Despite these two factors there are number of other things this proposal helps with:
- Formats in a single file is easier to navigate and likely easier to understand
- By not relying on the schema definition being a JSON Schema file we have the scope to add further non schema information (such as expanded links)
- It'll be significantly easier to add any new schema types that might be needed (eg a going into content store one)
- By building schemas based off a configuration rather than merging base ones we can create more accurate schemas with less unnecessary data (220,000 LOC removed)