Skip to content
This repository has been archived by the owner on Jan 25, 2023. It is now read-only.

MSCWG v2 plans

Alex Ball edited this page Mar 28, 2019 · 1 revision

Introduction

Here are some thoughts on how the workings of the Catalog might be made more elegant. This is subject to further refinement, and use of future tense should not be interpreted as a firm intention.

Data model

Great pains are currently taken to make the internal data structures as close as possible to the JSON API input and output. Perhaps the code could be simplified if the internal structures were handled differently.

Internally, the database will have tables for

  • Metadata scheme
  • Metadata scheme version
  • Tool
  • Tool version
  • Mapping
  • Mapping version
  • Organization
  • Endorsement

These will be addressed both publicly and internally by their internal identifiers, e.g. msc:m1. Note that the current method of addressing versions (e.g. msc:m1#v3.2) is semantically pleasing but unsafe if someone should update the version number; a more robust solution would be to use a more opaque serial number (e.g. msc:m1#s1 – s for sub-record).

The UNESCO keywords are a read-only vocabulary. These will be addressed publicly and internally by their URLs instead of their English strings, since several terms share the same English string but do not necessarily share a string in other languages.

The following will be internally maintained (editable) vocabularies, used to generate authority lists:

  • Data type (URI and label)
  • Programming languages
  • Computing platforms
  • Metadata scheme specification languages (e.g. DTD, RDF Schema)

It will help data quality if fields use a strict authority list, rather than the current process of free text entry. There will be a mechanism to add new terms to the list if required.

Data types are awkward in that URIs would be the best way to address them, but they are not always present. If internal URIs are generated for those without, what happens if someone want to add an official URI to a data type later? Perhaps we enforce that URIs are immutable and a new term would have to be created.

These relations are currently supported:

  • scheme – parent scheme – scheme
  • tool – supported scheme – scheme
  • mapping – input scheme – scheme
  • mapping – output scheme – scheme
  • endorsement – endorsed scheme – scheme
  • scheme, tool, mapping – maintainer – organization
  • scheme, tool, mapping – funder – organization
  • scheme – user – organization
  • endorsement – originator – organization

With the exception of endorsed scheme, these are currently one-way (stored at only one end). These will instead be moved out into their own table(s) internally. Publicly, they would be visible (and editable) at both ends of the relationship.

Clone this wiki locally