Skip to content

Commit

Permalink
docs(excel2json): use rosetta as example data (DEV-1478) (#254)
Browse files Browse the repository at this point in the history
  • Loading branch information
jnussbaum committed Nov 16, 2022
1 parent 8e09f0d commit af192cb
Show file tree
Hide file tree
Showing 15 changed files with 68 additions and 45 deletions.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added docs/assets/images/img-excel2xml-closeup.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/assets/images/img-properties-example.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/assets/images/img-resources-example-1.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/assets/images/img-resources-example-2.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 4 additions & 4 deletions docs/dsp-tools-create.md
Expand Up @@ -365,7 +365,7 @@ Example of a "lists" section:
#### Lists from Excel

Instead of being described in JSON, a list can be imported from one or several Excel files. In this case, the
`nodes` element of the root node consists of {"folder": "<path-to-folder-containing-the-excel-files>"}. In the above
`nodes` element of the root node consists of `{"folder": "<path-to-folder-containing-the-excel-files>"}`. In the above
example, the list "colors" could be imported as follows:

```json
Expand Down Expand Up @@ -403,9 +403,9 @@ example, the list "colors" could be imported as follows:
}
```

To do so, it would be necessary to place the following two files into the folder "path-to-folder":
![Colors_en](./assets/images/img-list-english-colors.png)
![Farben_de](./assets/images/img-list-german-colors.png)
To do so, it would be necessary to place the following two files into the folder "path-to-folder":
![Colors_en](./assets/images/img-list-english-colors.png){ width=50% }
![Farben_de](./assets/images/img-list-german-colors.png){ width=50% }

The expected format of the Excel files is documented [here](./dsp-tools-excel2json.md#lists-section). The only difference to
the explanations there is that column A of the Excel worksheet is not interpreted as list name (root node), but as
Expand Down
38 changes: 21 additions & 17 deletions docs/dsp-tools-excel2json.md
@@ -1,11 +1,11 @@
[![PyPI version](https://badge.fury.io/py/dsp-tools.svg)](https://badge.fury.io/py/dsp-tools)

# `excel2json`: Create a data model (JSON project file) from Excel
# excel2json

With dsp-tools, a JSON project file can be created from Excel files. The command for this is documented
[here](./dsp-tools-usage.md#create-a-json-project-file-from-excel-files).

A JSON project consists of
To put it simple, a JSON project consists of

- 0-1 "lists" sections
- 1-n ontologies, each containing
Expand Down Expand Up @@ -57,15 +57,16 @@ this is documented [here](./dsp-tools-usage.md#create-the-resources-section-of-a
Only `XLSX` files are allowed. The `resources` section can be inserted into the ontology file and then be uploaded onto
a DSP server.

**An Excel file template can be found [here](assets/data_model_templates/onto_name (onto_label)/resources.xlsx). It is recommended to work from
the template.**
**An Excel file template can be found [here](assets/data_model_templates/rosetta (rosetta)/resources.xlsx) or also in the
[`data_model_files` folder of `0123-import-scripts`](https://github.com/dasch-swiss/0123-import-scripts/tree/main/data_model_files).
It is recommended to work from the template.**

The expected worksheets of the Excel file are:

- `classes`: a table with all resource classes intended to be used in the resulting JSON
- `class1`, `class2`,...: a table for each resource class named after its name

The worksheet called `classes` must have the following structure:
The worksheet called `classes` must have the following structure:
![img-resources-example-1.png](assets/images/img-resources-example-1.png)

The expected columns are:
Expand All @@ -77,8 +78,8 @@ The expected columns are:

The optional columns may be omitted in the Excel.

All other worksheets, one for each resource class, have the following structure:
![img-resources-example-2.png](assets/images/img-resources-example-2.png){ width=50% }
All other worksheets, one for each resource class, have the following structure:
![img-resources-example-2.png](assets/images/img-resources-example-2.png){ width=30% }

The expected columns are:

Expand All @@ -99,10 +100,11 @@ this is documented [here](./dsp-tools-usage.md#create-the-properties-section-of-
Only the first worksheet of the Excel file is considered and only XLSX files are allowed. The `properties` section can
be inserted into the ontology file and then be uploaded onto a DSP server.

**An Excel file template can be found [here](assets/data_model_templates/onto_name (onto_label)/properties.xlsx). It is recommended to work
from the template.**
**An Excel file template can be found [here](assets/data_model_templates/rosetta (rosetta)/properties.xlsx) or also in the
[`data_model_files` folder of `0123-import-scripts`](https://github.com/dasch-swiss/0123-import-scripts/tree/main/data_model_files).
It is recommended to work from the template.**

The Excel sheet must have the following structure:
The Excel sheet must have the following structure:
![img-properties-example.png](assets/images/img-properties-example.png)

The expected columns are:
Expand All @@ -127,7 +129,7 @@ For further information about properties, see [here](./dsp-tools-create-ontologi

## "lists" section

With dsp-tools, the "lists" section of a JSON project file can be created from one or several Excel files. The lists can
With dsp-tools, the `lists` section of a JSON project file can be created from one or several Excel files. The lists can
then be inserted into a JSON project file and uploaded to a DSP server. The command for this is documented
[here](./dsp-tools-usage.md#create-the-lists-section-of-a-json-project-file-from-excel-files).

Expand All @@ -138,9 +140,9 @@ in a directory called `listfolder`:
dsp-tools excel2lists listfolder lists.json
```

The Excel sheets must have the following structure:
![img-list-english-example.png](assets/images/img-list-english-example.png)
![img-list-german-example.png](assets/images/img-list-german-example.png)
The Excel sheets must have the following structure:
![img-list-english-example.png](assets/images/img-list-english-example.png){ width=60% }
![img-list-german-example.png](assets/images/img-list-german-example.png){ width=60% }

Some notes:

Expand All @@ -156,9 +158,11 @@ Some notes:
- After the creation of the list, a validation against the JSON schema for lists is performed. An error message is
printed out if the list is not valid.

**It is recommended to work from the following templates:
[en.xlsx](assets/data_model_templates/lists/en.xlsx): File with the English labels
[de.xlsx](assets/data_model_templates/lists/de.xlsx): File with the German labels**
**It is recommended to work from the following templates:**

- [en.xlsx](assets/data_model_templates/lists/en.xlsx): File with the English labels
- [de.xlsx](assets/data_model_templates/lists/de.xlsx): File with the German labels
- or alternatively from the [`data_model_files` folder of `0123-import-scripts`](https://github.com/dasch-swiss/0123-import-scripts/tree/main/data_model_files)

The output of the above command, with the template files, is:

Expand Down
28 changes: 23 additions & 5 deletions docs/dsp-tools-excel2xml.md
@@ -1,14 +1,32 @@
[![PyPI version](https://badge.fury.io/py/dsp-tools.svg)](https://badge.fury.io/py/dsp-tools)

# Module `excel2xml`: Convert a data source to XML
# excel2xml

This page is about the module `excel2xml` that can be imported into a custom Python script that transforms any tabular
data into an XML.
## Two use cases - two approaches

There is also a CLI command `dsp-tools excel2xml` that creates an XML file from an Excel/CSV file which is already
structured according to the DSP specifications. The CLI command is documented
There are two kinds of Excel files that can be transformed into an XML file:

| structure | provenance | tool | example screenshot |
|------------------|-------------|--------------------------|----------------------------------------------------------|
| custom structure | customer | module `excel2xml` | ![](./assets/images/img-excel2xml-raw-data-category.png) |
| DSP structure | DSP server | CLI command `excel2xml` | ![](./assets/images/img-excel2xml-closeup.png) |

The first use case is the most frequent: The DaSCH receives a data export from a research project. Every project uses
different software, so every project will deliver their data in a different structure. The screenshot is just a
simplified example. For this use case, it is necessary to write a Python script that transforms the data from an
undefined state X into a DSP-conforming XML file that can be uploaded with `dsp-tools xmlupload`. For this, you need to
import the module `excel2xml` into your Python script.

The second use case is less frequent: We migrate data DaSCH-internally from one server to another. In this case, the
data already has the correct structure, and can automatically be transformed to XML. This can be done with the CLI
command `dsp-tools excel2xml` which is documented
[here](./dsp-tools-usage.md#use-the-module-excel2xml-to-convert-a-data-source-to-xml).

**This page deals only with the first use case, the module `excel2xml`** .


## Module `excel2xml`: Convert a data source to XML

To demonstrate the usage of the `excel2xml` module, there is a GitHub repository named `0123-import-scripts`. It
contains:

Expand Down
2 changes: 1 addition & 1 deletion docs/dsp-tools-usage.md
Expand Up @@ -204,7 +204,7 @@ Arguments:
- project_shortcode (mandatory): The four-digit hexadecimal shortcode of the project
- ontology_name (mandatory): the name of the ontology that the data belongs to

The Excel file must be structured as in this image:
The Excel file must be structured as in this image:
![img-excel2xml.png](assets/images/img-excel2xml.png)

Some notes:
Expand Down
7 changes: 4 additions & 3 deletions docs/dsp-tools-xmlupload.md
Expand Up @@ -266,8 +266,9 @@ Notes:

- There is only _one_ `<bitstream>` element allowed per representation.
- The `<bitstream>` element must be the first element.
- The path is relative to the working directory where `dsp-tools xmlupload` is executed in. It is recommended to
choose the project folder as working directory, `my_project` in the example below:
- By default, the path is relative to the working directory where `dsp-tools xmlupload` is executed in. This behaviour
can be modified with the flag [`--imgdir`](./dsp-tools-usage.md#upload-data-to-a-dsp-server). If you keep the default,
it is recommended to choose the project folder as working directory, `my_project` in the example below:

```
my_project
Expand Down Expand Up @@ -512,7 +513,7 @@ Example:
```

The underlying grid is a 0-1 normalized top left-anchored grid. The following coordinate system shows the three shapes
that were defined above:
that were defined above:
![grid-for-geometry-prop](./assets/images/grid-for-geometry-prop.png)


Expand Down
28 changes: 14 additions & 14 deletions docs/index.md
Expand Up @@ -2,7 +2,7 @@

# DSP-TOOLS documentation

dsp-tools is a command line tool that helps you to interact with the DaSCH Service Platform server (DSP server).
dsp-tools is a command line tool that helps you to interact with a DaSCH Service Platform (DSP) server.

In order to archive your data on the DaSCH Service Platform, you need a data model (ontology) that describes your data.
The data model is defined in a JSON project definition file which has to be transmitted to the DSP server. If the DSP
Expand All @@ -22,21 +22,21 @@ dsp-tools helps you with the following tasks:
data import) and writes the mapping from internal IDs to IRIs into a local file.
- [`dsp-tools excel2json`](./dsp-tools-usage.md#create-a-json-project-file-from-excel-files) creates an entire JSON
project file from a folder with Excel files in it.
- [`dsp-tools excel2lists`](./dsp-tools-usage.md#create-the-lists-section-of-a-json-project-file-from-excel-files)
creates the "lists" section of a JSON project file from one or several Excel files. The resulting section can be
integrated into a JSON project file and then be uploaded to a DSP server with `dsp-tools create`.
- [`dsp-tools excel2resources`](./dsp-tools-usage.md#create-the-resources-section-of-a-json-project-file-from-an-excel-file)
creates the "resources" section of a JSON project file from an Excel file. The resulting section can be integrated
into a JSON project file and then be uploaded to a DSP server with `dsp-tools create`.
- [`dsp-tools excel2properties`](./dsp-tools-usage.md#create-the-properties-section-of-a-json-project-file-from-an-excel-file)
creates the "properties" section of a JSON project file from an Excel file. The resulting section can be integrated
into a JSON project file and then be uploaded to a DSP server with `dsp-tools create`.
- [`dsp-tools id2iri`](./dsp-tools-usage.md#replace-internal-ids-with-iris-in-xml-file)
takes an XML file for bulk data import and replaces referenced internal IDs with IRIs. The mapping has to be provided
with a JSON file.
- [`dsp-tools excel2lists`](./dsp-tools-usage.md#create-the-lists-section-of-a-json-project-file-from-excel-files)
creates the "lists" section of a JSON project file from one or several Excel files. The resulting section can be
integrated into a JSON project file and then be uploaded to a DSP server with `dsp-tools create`.
- [`dsp-tools excel2resources`](./dsp-tools-usage.md#create-the-resources-section-of-a-json-project-file-from-an-excel-file)
creates the "resources" section of a JSON project file from an Excel file. The resulting section can be integrated
into a JSON project file and then be uploaded to a DSP server with `dsp-tools create`.
- [`dsp-tools excel2properties`](./dsp-tools-usage.md#create-the-properties-section-of-a-json-project-file-from-an-excel-file)
creates the "properties" section of a JSON project file from an Excel file. The resulting section can be integrated
into a JSON project file and then be uploaded to a DSP server with `dsp-tools create`.
- [`dsp-tools excel2xml`](./dsp-tools-usage.md#create-an-xml-file-from-excelcsv) transforms a data source to XML if it
is already structured according to the DSP specifications.
- [The module excel2xml](./dsp-tools-usage.md#use-the-module-excel2xml-to-convert-a-data-source-to-xml) provides helper
- [The module `excel2xml`](./dsp-tools-usage.md#use-the-module-excel2xml-to-convert-a-data-source-to-xml) provides helper
methods that can be used in a Python script to convert data from a tabular format into XML.
- [`dsp-tools id2iri`](./dsp-tools-usage.md#replace-internal-ids-with-iris-in-xml-file)
takes an XML file for bulk data import and replaces referenced internal IDs with IRIs. The mapping has to be provided
with a JSON file.
- [`dsp-tools start-api / stop-api / start-app`](./dsp-tools-usage.md#start-a-dsp-stack-on-your-local-machine-for-dasch-internal-use-only)
assist you in running a DSP software stack on your local machine.
2 changes: 1 addition & 1 deletion knora/dsplib/import_scripts

0 comments on commit af192cb

Please sign in to comment.