diff --git a/README.md b/README.md index 78e5a05a4..c805dc805 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,13 @@ [![PyPI version](https://badge.fury.io/py/dsp-tools.svg)](https://badge.fury.io/py/dsp-tools) # DSP-TOOLS - DaSCH Service Platform Tools + dsp-tools is a command line tool that helps you interacting with the DaSCH Service Platform API. Go to [Full Documentation](https://docs.dasch.swiss/latest/DSP-TOOLS) + ## Information for developers + There is a `Makefile` for all the following tasks (and more). Type `make` to print the available targets. For a quick start, use: @@ -22,7 +25,9 @@ make install-requirements make install ``` + ## Pipenv + We use pipenv for our dependency management. There are two ways to get started: - `pipenv install --dev` installs all dependencies, while giving them the opportunity to update themselves - `pipenv install --ignore-pipfile` is used to get a deterministic build in production @@ -54,7 +59,9 @@ For security reasons, the maintainer regularly executes without pipenv, you can freeze your requirements with `pip3 freeze > requirements.txt` and update `setup.py` manually. + ### Pipenv setup in PyCharm + - Go to Add Interpreter > Pipenv Environment - Base Interpreter: PyCarm auto-detects one of your system-wide installed Pythons as base interpreter. - Pipenv executable: auto-detected @@ -63,7 +70,9 @@ manually. If you already initialized a pipenv-environment via command line, you can add its interpreter in PyCharm, but this will create the pipenv-environment again. + ## Testing + Please note that testing requires launching the complete DSP API stack which is based on docker images. Therefore, we recommend installing the [docker desktop client](https://www.docker.com/products). To run the complete test suite: @@ -71,7 +80,9 @@ To run the complete test suite: make test ``` + ## Code style + When contributing to the project please make sure you use the same code style rules as we do. We use [autopep8](https://pypi.org/project/autopep8/) and [mypy](https://pypi.org/project/mypy/). The configuration is defined in `pyproject.toml` in the root directory of the project. @@ -89,7 +100,9 @@ In VSCode, both mypy and autopep8 can be set up as default linter and formatter For formatting Markdown files (*.md) we use the default styling configuration provided by PyCharm. + ## Publishing + Publishing is automated with GitHub Actions and should _not_ be done manually. Please follow the [Pull Request Guidelines](https://docs.dasch.swiss/latest/developers/dsp/contribution/#pull-request-guidelines). If done correctly, when merging a pull request into `main`, the `release-please` action will create or update a pull request for @@ -99,7 +112,9 @@ create a release on GitHub, on PyPI and the docs. Please ensure you have only one pull request per feature. + ## Publishing manually + Publishing is automated with GitHub Actions and should _not_ be done manually. If you still need to do it, follow the steps below. @@ -129,7 +144,9 @@ For local development: python3 setup.py develop ``` + ## Contributing to the documentation + The documentation is a collection of [markdown](https://en.wikipedia.org/wiki/Markdown) files in the `docs` folder. After updates of the files, build and check the result with the following command: diff --git a/docs/dsp-tools-create-ontologies.md b/docs/dsp-tools-create-ontologies.md index ad3675db9..acca61cb9 100644 --- a/docs/dsp-tools-create-ontologies.md +++ b/docs/dsp-tools-create-ontologies.md @@ -24,7 +24,7 @@ resource or not. The cardinality definitions are explained [further below](#card Example of an `ontologies` object: -```json +``` { "ontologies": [ { @@ -653,6 +653,7 @@ Example: ### Link-properties + Link properties do not follow the pattern of the previous data types, because they do not connect to a final value but to an existing resource. Thus, the `object` denominates the resource class the link will point to. @@ -752,7 +753,7 @@ directly as cardinalities in a resource. The example belows shows both possibili Example: -```json +``` "properties": [ { "name": "partOfBook", @@ -845,7 +846,7 @@ they can be used directly as cardinalities in a resource. The example below show Example: -```json +``` "properties": [ { "name": "sequenceOfAudio", @@ -1096,6 +1097,7 @@ it is necessary to reference entities that are defined elsewhere. The following ## DSP base resources / base properties to be used directly in the XML file + There is a number of DSP base resources that must not be subclassed in a project ontology. They are directly available in the XML data file: diff --git a/docs/dsp-tools-create.md b/docs/dsp-tools-create.md index a82ebf8f3..656d102b9 100644 --- a/docs/dsp-tools-create.md +++ b/docs/dsp-tools-create.md @@ -20,7 +20,7 @@ This documentation is divided into two parts: A complete project definition looks like this: -```json +``` { "prefixes": { "foaf": "http://xmlns.com/foaf/0.1/", @@ -32,10 +32,12 @@ A complete project definition looks like this: "shortname": "BiZ", "longname": "Bildung in Zahlen", "descriptions": { - ... + "en": "This is a simple example project", + "de": "Dies ist ein einfaches Beispielprojekt" }, "keywords": [ - ... + "example", + "simple" ], "lists": [ ... @@ -113,38 +115,6 @@ The following fields are optional (if one or more of these fields are not used, - groups - users -A simple example definition of the `project` object looks like this: - -```json -{ - "project": { - "shortcode": "0809", - "shortname": "test", - "longname": "Test Example", - "descriptions": { - "en": "This is a simple example project", - "de": "Dies ist ein einfaches Beispielprojekt" - }, - "keywords": [ - "example", - "simple" - ], - "lists": [ - ... - ], - "groups": [ - ... - ], - "users": [ - ... - ], - "ontologies": [ - ... - ] - } -} -``` - ## "project" object in detail @@ -426,7 +396,7 @@ example, the list "colors" could be imported as follows: "en": "A list with categories" }, "nodes": [ - ... + "..." ] } ] diff --git a/docs/dsp-tools-excel2json.md b/docs/dsp-tools-excel2json.md index e019f77a7..3022429e8 100644 --- a/docs/dsp-tools-excel2json.md +++ b/docs/dsp-tools-excel2json.md @@ -41,8 +41,8 @@ dsp-tools excel2json data_model_files project.json This will create a file `project.json` with the lists, properties, and resources from the Excel files. -Please note that the "header" of the resulting JSON file is empty and thus invalid. It is necessary to add the project -shortcode, name, description, keywords, etc. by hand. +**Please note that the "header" of the resulting JSON file is empty and thus invalid. It is necessary to add the project +shortcode, name, description, keywords, etc. by hand.** Continue reading the following paragraphs to learn more about the expected structure of the Excel files. @@ -182,7 +182,7 @@ The output of the above command, with the template files, is: "en": "red" } }, - ... + "..." ] }, { @@ -203,7 +203,7 @@ The output of the above command, with the template files, is: "en": "artwork" } }, - ... + "..." ] }, { @@ -224,7 +224,7 @@ The output of the above command, with the template files, is: "en": "Faculty of Science" } }, - ... + "..." ] } ] diff --git a/docs/dsp-tools-excel2xml.md b/docs/dsp-tools-excel2xml.md index 9934b0948..3b779029d 100644 --- a/docs/dsp-tools-excel2xml.md +++ b/docs/dsp-tools-excel2xml.md @@ -50,15 +50,18 @@ These steps are now explained in-depth: ## 1. Read in your data source + In the first paragraph of the sample script, insert your ontology name, project shortcode, and the path to your data source. If necessary, activate one of the lines that are commented out. ## 2. Create root element `` + Then, the root element is created, which represents the `` tag of the XML document. ## 3. Append the permissions + As first children of ``, some standard permissions are added. At the end, please carefully check the permissions of the finished XML file to ensure that they meet your requirements, and adapt them if necessary. @@ -71,6 +74,7 @@ here](./dsp-tools-xmlupload.md#how-to-use-the-permissions-attribute-in-resources ## 4. Create list mappings + Let's assume that your data source has a column containing list values named after the "label" of the JSON project list, instead of the "name" which is needed for the `dsp-tools xmlupload`. You need a way to get the names from the labels. If your data source uses the labels correctly, this is an easy task: The method `create_json_list_mapping()` creates a @@ -139,10 +143,12 @@ used. ## 5. Iterate through the rows of your data source + With the help of Pandas, you can then iterate through the rows of your Excel/CSV, and create resources and properties. ### 6. Create the `` tag + There are four kind of resources that can be created: | super | tag | method | @@ -156,6 +162,7 @@ There are four kind of resources that can be created: here](./dsp-tools-xmlupload.md#dsp-base-resources--base-properties-to-be-used-directly-in-the-xml-file). #### Resource ID + Special care is needed when the ID of a resource is created. Every resource must have an ID that is unique in the file, and it must meet the constraints of xsd:ID. You can simply achieve this if you use the method `make_xsd_id_compatible()`. @@ -164,6 +171,7 @@ ID in a dict, so that you can retrieve it later. The example script contains an ### 7. Append the properties + For every property, there is a helper function that explains itself when you hover over it. So you don't need to worry any more how to construct a certain XML value for a certain property. @@ -182,6 +190,7 @@ Here's how the Docstrings assist you: #### Fine-tuning with `PropertyElement` + There are two possibilities how to create a property: The value can be passed as it is, or as `PropertyElement`. If it is passed as it is, the `permissions` are assumed to be `prop-default`, texts are assumed to be encoded as `utf8`, and the value won't have a comment: @@ -214,6 +223,7 @@ make_text_prop( #### Supported boolean formats + For `make_boolean_prop(cell)`, the following formats are supported: - true: True, "true", "True", "1", 1, "yes", "Yes" @@ -222,7 +232,7 @@ For `make_boolean_prop(cell)`, the following formats are supported: N/A-like values will raise an Error. So if your cell is empty, this method will not count it as false, but will raise an Error. If you want N/A-like values to be counted as false, you may use a construct like this: -```python +``` if excel2xml.check_notna(cell): # the cell contains usable content excel2xml.make_boolean_prop(":hasBoolean", cell) @@ -232,6 +242,7 @@ else: ``` #### Supported text values + DSP's only restriction on text-properties is that the string must be longer than 0. It is, for example, possible to upload the following property: ```xml @@ -243,22 +254,26 @@ upload the following property: `excel2xml` allows to create such a property, but text values that don't meet the requirements of [`excel2xml.check_notna()`](#check-if-a-cell-contains-a-usable-value) will trigger a warning, for example: -```python +``` excel2xml.make_text_prop(":hasText", " ") # OK, but triggers a warning excel2xml.make_text_prop(":hasText", "-") # OK, but triggers a warning ``` ### 8. Append the resource to root + At the end of the for-loop, it is important not to forget to append the finished resource to the root. ## 9. Save the file + At the very end, save the file under a name that you can choose yourself. ## Other helper methods + ### Check if a cell contains a usable value + The method `check_notna(cell)` checks a value if it is usable in the context of data archiving. A value is considered usable if it is @@ -308,6 +323,7 @@ In contrast, `check_notna(cell)` will return the expected value for all cases in ### Calendar date parsing + The method `find_date_in_string(string)` tries to find a calendar date in a string. If successful, it returns the DSP-formatted date string. diff --git a/docs/dsp-tools-usage.md b/docs/dsp-tools-usage.md index 277e4a9aa..a7fcdb4d8 100644 --- a/docs/dsp-tools-usage.md +++ b/docs/dsp-tools-usage.md @@ -223,6 +223,7 @@ described in the next paragraph. ## Use the module `excel2xml` to convert a data source to XML + dsp-tools assists you in converting a data source in CSV/XLS(X) format to an XML file. Unlike the other features of dsp-tools, this doesn't work via command line, but via helper methods that you can import into your own Python script. Because every data source is different, there is no single algorithm to convert them to a DSP conform XML. Every user diff --git a/docs/dsp-tools-xmlupload.md b/docs/dsp-tools-xmlupload.md index e4ce289f2..164754d7c 100644 --- a/docs/dsp-tools-xmlupload.md +++ b/docs/dsp-tools-xmlupload.md @@ -11,7 +11,7 @@ The command to import an XML file on a DSP server is described [here](./dsp-tool The import file must start with the standard XML header: -```xml +``` ``` @@ -170,7 +170,7 @@ and its properties. It is important to note that a resource doesn't inherit its property must have its own permissions. So, in the following example, the bitstreams don't inherit the permissions from their resource: -```xml +``` postcards/images/EURUS015a.jpg @@ -189,6 +189,7 @@ resource: ``` To take `KnownUser` as example: + - With `permissions="prop-default"`, a logged-in user who is not member of the project (`KnownUser`) has `V` rights on the image: Normal view. - With `permissions="prop-restricted"`, a logged-in user who is not member of the project (`KnownUser`) has `RV` @@ -261,10 +262,26 @@ The `` element is used for bitstream data. It contains the path to a ZIP container, an audio file etc. It must only be used if the resource is a `StillImageRepresentation`, an `AudioRepresentation`, a `DocumentRepresentation` etc. -Note: +Notes: + +- There is only _one_ `` element allowed per representation. +- The `` element must be the first element. +- The path is relative to the working directory where `dsp-tools xmlupload` is executed in. It is recommended to + choose the project folder as working directory, `my_project` in the example below: -- There is only _one_ `` element allowed per representation! -- The `` element must be the first element! +``` +my_project +├── files +│ ├── data_model.json +│ └── data_file.xml (images/dog.jpg) +└── images + ├── dog.jpg + └── cat.jpg +``` + +``` +my_project % dsp-tools xmlupload files/data_file.xml +``` Supported file extensions: @@ -310,7 +327,7 @@ Attributes: - `comment`: a comment for this specific value (optional) Example of a public and a hidden boolean property: -```xml +``` true @@ -427,15 +444,12 @@ Example of a property with a public and a hidden decimal value: ### <geometry-prop> The `` element is used for a geometric definition of a 2-D region (e.g. a region on an image). It must -contain at least one `` element. - -Note: - -- Usually these are not created by an import and should be used with caution! +contain at least one `` element. A `` can only be used inside a [`` tag](#region). Attributes: -- `name`: name of the property as defined in the ontology (required) +- `name`: the only allowed name is `hasGeometry`, because this property is a DSP base property that can only be used in + the [`` tag](#region). #### <geometry> @@ -443,60 +457,65 @@ Attributes: A geometry value is defined as a JSON object. It contains the following data: - `status`: "active" or "deleted" -- `type`: "circle", "rectangle" or "polygon" +- `type`: "circle", "rectangle" or "polygon" (only the rectangle can be displayed in DSP-APP. The others can be + looked at in another frontend, e.g. in TANGOH.) - `lineColor`: web-color - `lineWidth`: integer number (in pixels) - `points`: array of coordinate objects of the form `{"x": decimal, "y": decimal}` - `radius`: coordinate object of the form `{"x": decimal, "y": decimal}` +- In the SALSAH data, there is also a key named `original_index` in the JSON format of all three shapes, but it doesn't + seem to have an influence on the shapes that TANGOH displays, so it can be omitted. -Please note that all coordinates are normalized coordinates (relative to the image size) between 0.0 and 1.0! - -The following example defines a polygon: - -```json -{ - "status": "active", - "type": "polygon", - "lineColor": "#ff3333", - "lineWidth": 2, - "points": [{"x": 0.17252396166134185, "y": 0.1597222222222222}, - {"x": 0.8242811501597445, "y": 0.14583333333333334}, - {"x": 0.8242811501597445, "y": 0.8310185185185185}, - {"x": 0.1757188498402556, "y": 0.8240740740740741}, - {"x": 0.1757188498402556, "y": 0.1597222222222222}, - {"x": 0.16932907348242812, "y": 0.16435185185185186}], - "original_index": 0 -} -``` +Attributes: -Example of a property with a public polygon and a hidden rectangle: -```xml - +- `permissions`: Permission ID (optional, but if omitted, users who are lower than a `ProjectAdmin` have no permissions at all, not even view rights) +- `comment`: a comment for this specific value (optional) + +Example: +``` + - { - "status": "active", "type": "polygon", "lineColor": "#ff3333", "lineWidth": 2, "original_index": 0, - "points": [{"x": 0.1725239616613418, "y": 0.1597222222222222}, - {"x": 0.8242811501597445, "y": 0.1458333333333333}, - {"x": 0.8242811501597445, "y": 0.8310185185185185}, - {"x": 0.1757188498402556, "y": 0.8240740740740740}, - {"x": 0.1757188498402556, "y": 0.1597222222222222}, - {"x": 0.1693290734824281, "y": 0.1643518518518518}] + { + "status": "active", + "type": "rectangle", + "lineColor": "#ff1100", + "lineWidth": 5, + "points": [ + {"x":0.1,"y":0.7}, + {"x":0.3,"y":0.2} + ] } - + { - "status": "active", "type": "rectangle", "lineColor": "#ff3333", "lineWidth": 2, "original_index": 0, - "points": [{"x": 0.080985915492957750, "y": 0.16741071428571427}, - {"x": 0.739436619718309900, "y": 0.72991071428571430}] + "status": "active", + "type": "circle", + "lineColor": "#ff1100", + "lineWidth": 5, + "points": [{"x":0.5,"y":0.3}], + "radius": {"x":0.1,"y":0.1} // vector (0.1, 0.1) + } + + + { + "status": "active", + "type": "polygon", + "lineColor": "#ff1100", + "lineWidth": 5, + "points": [{"x": 0.4, "y": 0.6}, + {"x": 0.5, "y": 0.9}, + {"x": 0.8, "y": 0.9}, + {"x": 0.7, "y": 0.6}] } ``` -Attributes: +The underlying grid is a 0-1 normalized top left-anchored grid. The following coordinate system shows the three shapes +that were defined above: +![grid-for-geometry-prop](./assets/images/grid-for-geometry-prop.png) + -- `permissions`: Permission ID (optional, but if omitted, users who are lower than a `ProjectAdmin` have no permissions at all, not even view rights) -- `comment`: a comment for this specific value (optional) ### <geoname-prop> @@ -771,12 +790,15 @@ Example of a property with a public and a hidden URI: ## DSP base resources / base properties to be used directly in the XML file + There is a number of base resources and base properties that must not be subclassed in a project ontology. They are directly available in the XML data file. Please have in mind that built-in names of the knora-base ontology must be used without prepended colon. -See also [the related part of the ontology documentation](dsp-tools-create-ontologies.md#dsp-base-resources-base-properties-to-be-used-directly-in-the-xml-file). +See also [the related part of the ontology documentation](dsp-tools-create-ontologies.md#dsp-base-resources--base-properties-to-be-used-directly-in-the-xml-file) + ### `` + `` is an annotation to another resource of any class. It must have the following predefined properties: - `hasComment` (1-n) @@ -797,7 +819,9 @@ Example: Technical note: An `` is in fact a ``. But it is mandatory to use the shortcut, so that the XML file can be validated more precisely. + ### `` + A `` resource defines a region of interest (ROI) in an image. It must have the following predefined properties: - `hasColor` (1) @@ -805,10 +829,7 @@ A `` resource defines a region of interest (ROI) in an image. It must ha - `hasGeometry` (1) - `hasComment` (1-n) -There are three types of Geometry shapes (rectangle, circle, polygon), but only the rectangle can be displayed in -DSP-APP. The others can be used as well, but must be looked at in another fronted, e.g. in TANGOH. - -Example of a rectangle: +Example: ```xml @@ -837,42 +858,14 @@ Example of a rectangle: ``` +More details about the `` are documented [here](#geometry-prop). -The circle and polygon are created with the following syntax: -```json -{ - "status": "active", - "type": "circle", - "lineColor": "#ff1100", - "lineWidth": 5, - "points": [{"x":0.5,"y":0.3}], - "radius": {"x":0.1,"y":0.1} // vector (0.1, 0.1) -}, -{ - "status": "active", - "type": "polygon", - "lineColor": "#ff1100", - "lineWidth": 5, - "points": [{"x": 0.4, "y": 0.6}, - {"x": 0.5, "y": 0.9}, - {"x": 0.8, "y": 0.9}, - {"x": 0.7, "y": 0.6}] -} -``` - -The underlying grid is a 0-1 normalized top left-anchored grid. The following coordinate system shows the three shapes -that were defined above: -![grid-for-geometry-prop](./assets/images/grid-for-geometry-prop.png) - - -Technical notes: - - A `` is in fact a ``. But it is mandatory to use the +Technical note: A `` is in fact a ``. But it is mandatory to use the shortcut, so that the XML file can be validated more precisely. - - In the SALSAH data, there is also a key named `original_index` in the JSON format of all three shapes, but it doesn't - seem to have an influence on the shapes that TANGOH displays, so it can be omitted. ### `` + `` is a resource linking together several other resources of different classes. It must have the following predefined properties: @@ -999,20 +992,6 @@ To do an incremental XML upload, one of the following procedures is recommended. #00ff00 - - - { - "status":"active", - "lineColor":"#ff3333", - "lineWidth":2, - "points":[ - {"x":0.08098591549295775,"y":0.16741071428571427}, - {"x":0.7394366197183099,"y":0.7299107142857143}], - "type":"rectangle", - "original_index":0 - } - - 5416656 @@ -1058,20 +1037,6 @@ To do an incremental XML upload, one of the following procedures is recommended. #33ff77 - - - { - "status":"active", - "lineColor":"#ff3333", - "lineWidth":2, - "points":[ - {"x":0.08098591549295775,"y":0.16741071428571427}, - {"x":0.7394366197183099,"y":0.7299107142857143}], - "type":"rectangle", - "original_index":0 - } - - 5416656 @@ -1117,20 +1082,6 @@ To do an incremental XML upload, one of the following procedures is recommended. #33ff77 - - - { - "status":"active", - "lineColor":"#ff3333", - "lineWidth":2, - "points":[ - {"x":0.08098591549295775,"y":0.16741071428571427}, - {"x":0.7394366197183099,"y":0.7299107142857143}], - "type":"rectangle", - "original_index":0 - } - - 5416656