diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md index 0380e7512..28db7f41e 100644 --- a/.github/pull_request_template.md +++ b/.github/pull_request_template.md @@ -7,4 +7,4 @@ Requests: https://docs.dasch.swiss/developers/dsp/contribution/#pull-request-gui ===REMOVE=== -resolves DSP- +resolves DEV- diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index 2d09d7f91..0da712bbd 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -4,6 +4,8 @@ on: push: pull_request: types: [opened] + schedule: + - cron: '0 8 * * *' jobs: test-integration: diff --git a/docs/dsp-tools-create.md b/docs/dsp-tools-create.md index fc3a7a783..2731002b2 100644 --- a/docs/dsp-tools-create.md +++ b/docs/dsp-tools-create.md @@ -2,24 +2,16 @@ # JSON data model definition format -## Introduction - -This document contains all the information you need to create a data model that can be used by DSP. According to -Wikipedia, the [data model](https://en.wikipedia.org/wiki/Data_model) is "_an abstract model that organizes elements of -data and standardizes how they relate to one another and to the properties of real-world entities._" Further it -states: "_A data model explicitly determines the structure of data. Data models are typically specified by a data -specialist, data librarian, or a digital humanities scholar in a data modeling notation_". - -In this section, we will describe one of the notations that is used by dsp-tools to create a data model in the DSP -repository. The DSP repository is loosely based on [Linked Data](https://en.wikipedia.org/wiki/Linked_data) where also -the term _ontology_ is used. - -In the first section you find a rough overview of the data model definition, all the necessary components with a -definition and a short example of the definition. +This document describes the structure of a data model (ontology) used by DSP. According to Wikipedia, +the [data model](https://en.wikipedia.org/wiki/Data_model) is "an abstract model that organizes elements of data and +standardizes how they relate to one another and to the properties of real-world entities. [...] A data model explicitly +determines the structure of data. Data models are typically specified by a data specialist, data librarian, or a digital +humanities scholar in a data modeling notation". The following sections describe the notation for ontologies in the +context of DSP. ## A short overview -A complete data model definition looks like this: +A complete data model definition for DSP looks like this: ```json { @@ -82,7 +74,8 @@ The `$schema` object refers to the JSON schema for DSP data model definitions an `"project": {"key": "", ...}` -The `project` object contains all resources and properties of the ontology. It requires all the following data fields: +The `project` object contains all resources and properties of the ontology as well as some information about the +project. It requires all the following data fields: - shortcode - shortname @@ -97,7 +90,7 @@ The following fields are optional (if one or more of these fields are not used, - groups - users -A simple example definition of the "project" object looks like this: +A simple example definition of the `project` object looks like this: ```json { @@ -129,175 +122,69 @@ A simple example definition of the "project" object looks like this: } ``` -## Simple key/value pairs +## "project" object in detail -At that point we will go through all of this step by step and take a more in depth view on the individual fields of the -"project" object. The first four fields of the "project" object are "key"/"value" pairs. Therefore, they are quite -simple. +In the following section all fields of the `project` object are explained in detail. ### Shortcode `"shortcode": "<4-hex-characters>"` -It's a hexadecimal string in the range between "0000" and "FFFF" that's used to uniquely identify the project. The -shortcode has to be provided by the DaSCH. +The shortcode has to be unique and is represented by a 4 digit hexadecimal string. The shortcode has to be provided by the DaSCH. ### Shortname `"shortname": ""` -This is a short name (string) for the project. It's meant to be like a nickname. If the name of the project is e.g. -"Albus Percival Wulfric Dumbledore", then the shortname for it could be "Albi". It should be in the form of a -[xsd:NCNAME](https://www.w3.org/TR/xmlschema11-2/#NCName), that is a name without blanks and special characters like -`:`, `;`, `&`, `%` etc., but `-` and `_` are allowed. +The shortname has to be unique. It should be in the form of a [xsd:NCNAME](https://www.w3.org/TR/xmlschema11-2/#NCName). This means a +string without blanks or special characters but `-` and `_` are allowed (although not as first character). ### Longname `"longname": ""` -A longer string that provides the full name of the project. In our example, the longname would be "Albus Percival -Wulfric Dumbledore". +The longname is a string that provides the full name of the project. ### Descriptions `"descriptions": {"": "", ...}` -The descriptions specify the content of the project in *exactly* one or more strings. These descriptions can be supplied -in several languages (currently _"en"_, _"de"_, _"fr"_ and _"it"_ are supported). The descriptions have to be given as a -JSON object with the language as "key", and the description as "value". See the example above inside the curly brackets -after "descriptions"to see what that means. - -## Key/object pairs - -The following fields are **not** simple "key"/"value" pairs. They do have a key, the value however is another object and -therefore has an internal structure. Due to the increased complexity of these objects, they are looked at in more -detail. +The description is represented as a collection of strings with language tags (currently "en", "de", "fr" and "it" are +supported). It is the description of the project. ### Keywords `"keywords": ["", "", ...]` -An array of keywords is used to roughly describe the project in single words. A project that deals e.g. with old -monastery manuscripts could possess the keywords "monastery", "manuscripts", "medieval", (...). The array can be empty -as well e.i. " -keywords": []. +Keywords are represented as an array of strings and are used to describe and/or tag the project. ### Lists `"lists": [,,...]` -Often in order to characterize or classify a real world object, we use a sequential or hierarchical list of terms. For -example a classification of disciplines in the Humanities might look like follows: - -- Performing arts - - Music - - Chamber music - - Church music - - Conducting - - Choirs - - Orchestras - - Music history - - Music theory - - Musicology - - Jazz - - Pop/Rock - - Dance - - Choreography - - Theatre - - Acting - - Directing - - Playwriting - - Scenography - - Movies/Television - - Animation - - Live action -- Visual arts - - Fine arts - - Drawing - - Painting - - Photography - - Applied Arts - - Animation - - Architecture - - Decorative arts -- History - - Ancient history - - Modern history -- Languages and literature - - Linguistics - - Grammar - - Etymology - - Phonetics - - Semantics - - Literature - - Fiction - - Non-fiction - - Theory of literature -- Philosophy - - Aesthetics - - Applied philosophy - - Epistemology - - Justification - - Reasoning - - Metaphysics - - Determinism and free will - - Ontology - - Philosophy of mind - - Teleology - -DSP allows to define such controlled vocabularies or thesauri. They can be arranged "flat" or in "hierarchies" (as the -given example about the disciplines in Humanities is). The definition of these entities are called "lists" in the DSP. -Thus, the list object is used to give the resources of the ontology a taxonomic quality. A taxonomy makes it possible to -categorize a resource. The big advantage of a taxonomic structure as it is implemented by the DSP is that the user can -sub-categorize the objects. This allows the user to formulate his search requests more or less specifically as desired. -Thus, in the example above a search for " -Vocal music" would result in all works that are characterized by a sub-element of "Vocal music". However, a search for " -Masses" -would return only works that have been characterized as such. The number of hierarchy levels is not limited, but for -practical reasons it should not exceed 3-4 levels. - -Thus, a taxonomy is a hierarchical list of categories in a tree-like structure. The taxonomy must be complete. This -means that the entire set of resources must be mappable to the sub-categorization of the taxonomy. To come back to the -previous example: It must not occur that a musical work within our resource set cannot be mapped to a subcategory of our -taxonomy about classical music. The taxonomic-hierarchical structure is mapped using JSON. This is because JSON -inherently implements a tree structure as well. The root of the taxonomy tree is always the name of the taxonomy. The -root always stands alone at the top of the tree. It is followed by any number of levels, on which any number of -subcategories can be placed. - -Suppose you want to build a taxonomy of the classical musical genres as above. The root level would be the name of the -taxonomy e.g. "classicalmusicgenres". The next level on the hierarchy would be the basic genres, in our example -"Orchestral music", "Chamber music", "Solo instrumental", "Vocal Music" and "Opera". Each if these categories may have -subcategories. In our example "Opera" would have the subcategories "Comic opera", "Serious Opera", -"Opera Semiseria", "Opera Cornique", "Grand opera" and "Opera verismo". Each of these could again have subcategories, -and so forth. - -It is important to note that a flat taxonomy is also allowed. This means that a taxonomy from exactly two levels is -allowed. We have a root level, with the name of the taxonomy, followed by a single level. Within this second level, any -number of categories can coexist equally, but since they are on the same level, they are not hierarchically dependent on -each other. For example, you could define a taxonomy "soccer clubs", which have the categories "FCB", -"FCZ", (...) in the second level. FC Basel has no hierarchical connection to FC Zürich. Their taxonomic structure is -therefore flat. - -A resource can be assigned to a taxonomic node within its properties. So a resource of type "musical work" with the -title "La Traviata" would have the property/attribute "musical-genre" with the value "Grand opera". Within the DSP, each -property or attribute has an assigned cardinality. Sometimes, a taxonomy allows that an object may belong to different -categories at the same time (e.g. an image which depicts several categories at the same time). In these cases, a -cardinality greater than 1 allows adding multiple attributes of the same time. See further below the description of the -[cardinalities](#cardinalities). - -A node of the Taxonomy may have the following elements: - -- _name_: Name of the node. This should be unique within the given list. The name-element is optional but highly - recommended. -- _labels_: Language dependent labels in the form `{ "": "