diff --git a/MANIFEST.in b/MANIFEST.in index 84e773f0a..4088c3b25 100644 --- a/MANIFEST.in +++ b/MANIFEST.in @@ -1,4 +1,6 @@ include README.md include knora/dsplib/utils/knora-schema.json include knora/dsplib/utils/knora-schema-lists.json +include knora/dsplib/utils/knora-schema-lists-only.json include knora/dsplib/utils/knora-data-schema.xsd +include knora/dsplib/utils/language-codes-3b2_csv.csv diff --git a/Makefile b/Makefile index 12189a966..45d5b7489 100644 --- a/Makefile +++ b/Makefile @@ -73,7 +73,7 @@ help: ## this help @awk 'BEGIN {FS = ":.*?## "} /^[a-zA-Z_-]+:.*?## / {printf "\033[36m%-30s\033[0m %s\n", $$1, $$2}' $(MAKEFILE_LIST) | sort .PHONY: run -run: ## create dist, inatall and run +run: ## create dist, install and run $(MAKE) clean $(MAKE) dist $(MAKE) install diff --git a/docs/dsp-tools-create.md b/docs/dsp-tools-create.md index e754224cd..22e904525 100644 --- a/docs/dsp-tools-create.md +++ b/docs/dsp-tools-create.md @@ -3,23 +3,25 @@ # JSON data model definition format ## Introduction -This document contains all the information you need to create a data model that can be used by DSP. According to -Wikipedia, the [data model](https://en.wikipedia.org/wiki/Data_model) is "_an abstract model that organizes elements -of data and standardizes how they relate to one another and to the properties of real-world entities._" Further it -states: "_A data model explicitly determines the structure of data. Data models are typically specified by a data -specialist, data librarian, or a digital humanities scholar in a data modeling notation_". -In this section, we will describe one of the notations that is used by dsp-tools to create a data model in the DSP -repository. The DSP repository is loosely based on [Linked Data](https://en.wikipedia.org/wiki/Linked_data) where also -the term _ontology_ is used. +This document contains all the information you need to create a data model that can be used by DSP. According to Wikipedia, +the [data model](https://en.wikipedia.org/wiki/Data_model) is "_an abstract model that organizes elements of data and standardizes +how they relate to one another and to the properties of real-world entities._" Further it states: "_A data model explicitly +determines the structure of data. Data models are typically specified by a data specialist, data librarian, or a digital +humanities scholar in a data modeling notation_". -In the first section you find a rough overview of the data model definition, all the necessary components with a -definition and a short example of the definition. +In this section, we will describe one of the notations that is used by dsp-tools to create a data model in the DSP repository. The +DSP repository is loosely based on [Linked Data](https://en.wikipedia.org/wiki/Linked_data) where also the term _ontology_ is +used. + +In the first section you find a rough overview of the data model definition, all the necessary components with a definition and a +short example of the definition. ## A short overview -In the following section, you find all the mentioned parts with a detailed explanation. Right at the beginning we look -at the basic fields that belong to an ontology definition. This serves as an overview for you to which you can return -at any time while you read the description. + +In the following section, you find all the mentioned parts with a detailed explanation. Right at the beginning we look at the +basic fields that belong to an ontology definition. This serves as an overview for you to which you can return at any time while +you read the description. A complete data model definition looks like this: @@ -33,29 +35,43 @@ A complete data model definition looks like this: "shortcode": "0123", "shortname": "BiZ", "longname": "Bildung in Zahlen", - "descriptions": {...}, - "keywords": [...], - "lists": [...], - "groups": [...], - "users": [...], - "ontologies": [...] + "descriptions": { + ... + }, + "keywords": [ + ... + ], + "lists": [ + ... + ], + "groups": [ + ... + ], + "users": [ + ... + ], + "ontologies": [ + ... + ] } } ``` -As you can see, only two umbrella terms define our ontology: the "prefixes" object and the "project" object. In the -following we take a deeper look into both of them since, as you can see in the example above, both objects have further -fine-grained definition levels. + +As you can see, only two umbrella terms define our ontology: the "prefixes" object and the "project" object. In the following we +take a deeper look into both of them since, as you can see in the example above, both objects have further fine-grained definition +levels. ### "Prefixes" object -`"prefixes": { "prefix": "", ...}` -The "prefixes" object contains - as you may already have guessed by the name - the `prefixes` of *external* ontologies -that are also used in the current project. All prefixes are composed of a keyword, followed by its iri. This is used as -a shortcut for later so that you don't always have to specify the full qualified iri but can use the much shorter -keyword instead. That means that e.g. instead of addressing a property called "familyname" via -`http://xmlns.com/foaf/0.1/familyName` you can simply use foaf:familyName. +`"prefixes": { "prefix": "", ...}` -As you can see in the example below, you can have more than one prefix too. In the example we have "foaf" as well as +The "prefixes" object contains - as you may already have guessed by the name - the `prefixes` of *external* ontologies that are +also used in the current project. All prefixes are composed of a keyword, followed by its iri. This is used as a shortcut for +later so that you don't always have to specify the full qualified iri but can use the much shorter keyword instead. That means +that e.g. instead of addressing a property called "familyname" via +`http://xmlns.com/foaf/0.1/familyName` you can simply use foaf:familyName. + +As you can see in the example below, you can have more than one prefix too. In the example we have "foaf" as well as "dcterms" as our prefixes. ```json @@ -68,13 +84,13 @@ As you can see in the example below, you can have more than one prefix too. In t ``` ### "Project" object -`"project": {"key": "", ...}` -Right after the "prefix" object the "project" object has to follow, which contains all resources and properties of the -ontology. The "project" object is the bread and butter of the ontology. All its important properties are specified therein. +`"project": {"key": "", ...}` + +Right after the "prefix" object the "project" object has to follow, which contains all resources and properties of the ontology. +The "project" object is the bread and butter of the ontology. All its important properties are specified therein. -As you saw in the complete ontology definition in the beginning, the project definitions requires all the following -data fields: +As you saw in the complete ontology definition in the beginning, the project definitions requires all the following data fields: - shortcode - shortname @@ -90,182 +106,195 @@ Whereas the following fields are optional (if one or more of these fields are no - users So, a simple example definition of the "project" object could look like this: - + ```json { "project": { - "shortcode": "0809", - "shortname": "test" , - "longname": "Test Example", - "descriptions": { - "en": "This is a simple example project", - "de": "Dies ist ein einfaches Beispielprojekt" - }, - "keywords": ["example", "simple"], - "lists": [...], - "groups": [...], - "users": [...], - "ontology": [...] + "shortcode": "0809", + "shortname": "test", + "longname": "Test Example", + "descriptions": { + "en": "This is a simple example project", + "de": "Dies ist ein einfaches Beispielprojekt" + }, + "keywords": [ + "example", + "simple" + ], + "lists": [ + ... + ], + "groups": [ + ... + ], + "users": [ + ... + ], + "ontology": [ + ... + ] } } ``` ## Simple key/value pairs -At that point we will go through all of this step by step and take a more in depth view on the individual fields of the -"project" object. The first four fields of the "project" object are "key"/"value" pairs. Therefore, they are quite -simple. + +At that point we will go through all of this step by step and take a more in depth view on the individual fields of the +"project" object. The first four fields of the "project" object are "key"/"value" pairs. Therefore, they are quite simple. ### Shortcode + `"shortcode": "<4-hex-characters>"` -It's a hexadecimal string in the range between "0000" and "FFFF" that's used to uniquely identify the project. The -shortcode has to be provided by the DaSCH. +It's a hexadecimal string in the range between "0000" and "FFFF" that's used to uniquely identify the project. The shortcode has +to be provided by the DaSCH. ### Shortname + `"shortname": ""` -This is a short name (string) for the project. It's meant to be like a nickname. If the name of the project is e.g. +This is a short name (string) for the project. It's meant to be like a nickname. If the name of the project is e.g. "Albus Percival Wulfric Dumbledore", then the shortname for it could be "Albi". It should be in the form of a [xsd:NCNAME](https://www.w3.org/TR/xmlschema11-2/#NCName), that is a name without blanks and special characters like `:`, `;`, `&`, `%` etc., but `-` and `_` are allowed. ### Longname -`"longname": ""` -A longer string that provides the full name of the project. In our example, the longname would be "Albus Percival -Wulfric Dumbledore". +`"longname": ""` + +A longer string that provides the full name of the project. In our example, the longname would be "Albus Percival Wulfric +Dumbledore". ### Descriptions -`"descriptions": {"": "", ...}` -The descriptions specify the content of the project in *exactly* one or more strings. These descriptions can be -supplied in several languages (currently _"en"_, _"de"_, _"fr"_ and _"it"_ are supported). The descriptions have to be -given as a JSON object with the language as "key", and the description as "value". See the example above inside the -curly brackets after "descriptions" to see what that means. +`"descriptions": {"": "", ...}` + +The descriptions specify the content of the project in *exactly* one or more strings. These descriptions can be supplied in +several languages (currently _"en"_, _"de"_, _"fr"_ and _"it"_ are supported). The descriptions have to be given as a JSON object +with the language as "key", and the description as "value". See the example above inside the curly brackets after "descriptions"to +see what that means. ## Key/object pairs -The following fields are **not** simple "key"/"value" pairs. They do have a key, the value however is another object -and therefore has an internal structure. Due to the increased complexity of these objects, they are looked at in more detail. + +The following fields are **not** simple "key"/"value" pairs. They do have a key, the value however is another object and therefore +has an internal structure. Due to the increased complexity of these objects, they are looked at in more detail. ### Keywords -`"keywords": ["", "", ...]` -An array of keywords is used to roughly describe the project in single words. A project that deals e.g. with old -monastery manuscripts could possess the keywords "monastery", "manuscripts", "medieval", (...). The array can be empty -as well e.i. "keywords": []. +`"keywords": ["", "", ...]` -### Lists -`"lists": [,,...]` +An array of keywords is used to roughly describe the project in single words. A project that deals e.g. with old monastery +manuscripts could possess the keywords "monastery", "manuscripts", "medieval", (...). The array can be empty as well e.i. " +keywords": []. -Often in order to characterize or classify a real world object, we use a sequential or hierarchical list of terms. For -example a classification of disciplines in the Humanities might look like follows: +### Lists + +`"lists": [,,...]` + +Often in order to characterize or classify a real world object, we use a sequential or hierarchical list of terms. For example a +classification of disciplines in the Humanities might look like follows: - Performing arts - - Music - - Chamber music - - Church music - - Conducting - - Choirs - - Orchestras - - Music history - - Music theory - - Musicology - - Jazz - - Pop/Rock - - Dance - - Choreography - - Theatre - - Acting - - Directing - - Playwriting - - Scenography - - Movies/Television - - Animation - - Live action + - Music + - Chamber music + - Church music + - Conducting + - Choirs + - Orchestras + - Music history + - Music theory + - Musicology + - Jazz + - Pop/Rock + - Dance + - Choreography + - Theatre + - Acting + - Directing + - Playwriting + - Scenography + - Movies/Television + - Animation + - Live action - Visual arts - - Fine arts - - Drawing - - Painting - - Photography - - Applied Arts - - Animation - - Architecture - - Decorative arts + - Fine arts + - Drawing + - Painting + - Photography + - Applied Arts + - Animation + - Architecture + - Decorative arts - History - - Ancient history - - Modern history + - Ancient history + - Modern history - Languages and literature - - Linguistics - - Grammar - - Etymology - - Phonetics - - Semantics - - Literature - - Fiction - - Non-fiction - - Theory of literature + - Linguistics + - Grammar + - Etymology + - Phonetics + - Semantics + - Literature + - Fiction + - Non-fiction + - Theory of literature - Philosophy - - Aesthetics - - Applied philosophy - - Epistemology - - Justification - - Reasoning - - Metaphysics - - Determinism and free will - - Ontology - - Philosophy of mind - - Teleology - -DSP allows to define such controlled vocabularies or thesauri. They can be arranged "flat" or in "hierarchies" (as the -given example about the disciplines in Humanities is). The definition of these entities are called "lists" in the DSP. Thus, the -list object is used to give the resources of the ontology a taxonomic quality. A taxonomy makes it possible to -categorize a resource. The big advantage of a taxonomic structure as it is implemented by the DSP -is that the user can sub-categorize the objects. This allows the user to formulate his search requests more or less -specifically as desired. Thus, in the example above a search for "Vocal music" would result in all works that are -characterized by a sub-element of "Vocal music". However, a search for "Masses" would return only works that -have been characterized as such. The number of hierarchy levels is not limited, but for practical reasons -it should not exceed 3-4 levels. - -Thus, a taxonomy is a hierarchical list of categories in a tree-like structure. The taxonomy must be complete. This means -that the entire set of resources must be mappable to the sub-categorization of the taxonomy. To come back to the previous -example: It must not occur that a musical work within our resource set cannot be mapped to a subcategory of our -taxonomy about classical music. The taxonomic-hierarchical structure is mapped using JSON. This is because JSON -inherently implements a tree structure as well. The root of the taxonomy tree is always the name of the taxonomy. The -root always stands alone at the top of the tree. It is followed by any number of levels, on which any number of -subcategories can be placed. - -Suppose you want to build a taxonomy of the classical musical genres as above. The root level would be the name of the -taxonomy e.g. "classicalmusicgenres". The next level on the hierarchy would be the basic genres, in our example + - Aesthetics + - Applied philosophy + - Epistemology + - Justification + - Reasoning + - Metaphysics + - Determinism and free will + - Ontology + - Philosophy of mind + - Teleology + +DSP allows to define such controlled vocabularies or thesauri. They can be arranged "flat" or in "hierarchies" (as the given +example about the disciplines in Humanities is). The definition of these entities are called "lists" in the DSP. Thus, the list +object is used to give the resources of the ontology a taxonomic quality. A taxonomy makes it possible to categorize a resource. +The big advantage of a taxonomic structure as it is implemented by the DSP is that the user can sub-categorize the objects. This +allows the user to formulate his search requests more or less specifically as desired. Thus, in the example above a search for " +Vocal music" would result in all works that are characterized by a sub-element of "Vocal music". However, a search for "Masses" +would return only works that have been characterized as such. The number of hierarchy levels is not limited, but for practical +reasons it should not exceed 3-4 levels. + +Thus, a taxonomy is a hierarchical list of categories in a tree-like structure. The taxonomy must be complete. This means that the +entire set of resources must be mappable to the sub-categorization of the taxonomy. To come back to the previous example: It must +not occur that a musical work within our resource set cannot be mapped to a subcategory of our taxonomy about classical music. The +taxonomic-hierarchical structure is mapped using JSON. This is because JSON inherently implements a tree structure as well. The +root of the taxonomy tree is always the name of the taxonomy. The root always stands alone at the top of the tree. It is followed +by any number of levels, on which any number of subcategories can be placed. + +Suppose you want to build a taxonomy of the classical musical genres as above. The root level would be the name of the taxonomy +e.g. "classicalmusicgenres". The next level on the hierarchy would be the basic genres, in our example "Orchestral music", "Chamber music", "Solo instrumental", "Vocal Music" and "Opera". Each if these categories may have subcategories. In our example "Opera" would have the subcategories "Comic opera", "Serious Opera", -"Opera Semiseria", "Opera Cornique", "Grand opera" and "Opera verismo". Each of these could again have -subcategories, and so forth. - -It is important to note that a flat taxonomy is also allowed. This means that a taxonomy from exactly two levels is -allowed. We have a root level, with the name of the taxonomy, followed by a single level. Within this second level, -any number of categories can coexist equally, but since they are on the same level, they are not hierarchically -dependent on each other. For example, you could define a taxonomy "soccer clubs", which have the categories "FCB", -"FCZ", (...) in the second level. FC Basel has no hierarchical connection to FC Zürich. Their taxonomic structure is -therefore flat. - -A resource can be assigned to a taxonomic node within its properties. So a resource of type "musical work" with the -title "La Traviata" would have the property/attribute "musical-genre" with the value "Grand opera". Within the DSP, -each property or attribute has an assigned cardinality. Sometimes, a taxonomy allows that an object may belong to -different categories at the same time (e.g. an image which depicts several categories at the same time). In these cases, -a cardinality greater than 1 allows adding multiple attributes of the same time. See further below the description of the +"Opera Semiseria", "Opera Cornique", "Grand opera" and "Opera verismo". Each of these could again have subcategories, and so +forth. + +It is important to note that a flat taxonomy is also allowed. This means that a taxonomy from exactly two levels is allowed. We +have a root level, with the name of the taxonomy, followed by a single level. Within this second level, any number of categories +can coexist equally, but since they are on the same level, they are not hierarchically dependent on each other. For example, you +could define a taxonomy "soccer clubs", which have the categories "FCB", +"FCZ", (...) in the second level. FC Basel has no hierarchical connection to FC Zürich. Their taxonomic structure is therefore +flat. + +A resource can be assigned to a taxonomic node within its properties. So a resource of type "musical work" with the title "La +Traviata" would have the property/attribute "musical-genre" with the value "Grand opera". Within the DSP, each property or +attribute has an assigned cardinality. Sometimes, a taxonomy allows that an object may belong to different categories at the same +time (e.g. an image which depicts several categories at the same time). In these cases, a cardinality greater than 1 allows adding +multiple attributes of the same time. See further below the description of the [cardinalities](#cardinalities). A node of the Taxonomy may have the following elements: -- _name_: Name of the node. This should be unique within the given list. The name-element is optional but highly - recommended. -- _labels_: Language dependent labels in the form `{ "": "