Skip to content

Commit

Permalink
feat(excel-to-resources): create resources from excel (DSP-1576) (#88)
Browse files Browse the repository at this point in the history
* integrate script into dsp-tools

* add validation and schema

* add test

* update test

* add documentation

* Update dsp-tools-usage.md

* add resources module to bazel

* unify docstrings

* update documentation

* update documentation

* remove number from draft in schema reference

* change schema to http://json-schema.org/draft-07/schema
  • Loading branch information
irinaschubert committed Sep 14, 2021
1 parent c689d7f commit 7b0302f
Show file tree
Hide file tree
Showing 15 changed files with 371 additions and 61 deletions.
1 change: 1 addition & 0 deletions MANIFEST.in
Expand Up @@ -2,5 +2,6 @@ include README.md
include knora/dsplib/utils/knora-schema.json
include knora/dsplib/utils/knora-schema-lists.json
include knora/dsplib/utils/knora-schema-lists-only.json
include knora/dsplib/utils/knora-schema-resources-only.json
include knora/dsplib/utils/knora-data-schema.xsd
include knora/dsplib/utils/language-codes-3b2_csv.csv
Binary file added docs/assets/images/img-resources-example.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 2 additions & 1 deletion docs/dsp-tools-create.md
Expand Up @@ -431,7 +431,8 @@ The nodes section must contain the field:

- _folder_: Path to the folder where the Excel files are stored

Further details to this functionality can be read [here](dsp-tools-excel.md).
Further details to this functionality can be read
[here](dsp-tools-excel#create-a-json-list-file-from-one-or-several-excel-files.md).

The lists element is optional. If there are no lists, this element has to be omitted.

Expand Down
61 changes: 41 additions & 20 deletions docs/dsp-tools-excel.md
@@ -1,29 +1,50 @@
[![PyPI version](https://badge.fury.io/py/dsp-tools.svg)](https://badge.fury.io/py/dsp-tools)

# Excel files for data modelling and data import
dsp-tools is able to process Excel files and output the appropriate JSON or XML file. The JSON/XML file can then
be used to create the ontology on the DSP server or import data to the DSP repository. dsp-tools can also be used to
create a list from an Excel file.

## Create the data model JSON from an Excel file
dsp-tools is able to process Excel files and output the appropriate JSON or XML file. The JSON/XML file can then be used to create
the ontology on the DSP server or import data to the DSP repository. dsp-tools can also be used to create a list from an Excel
file.

## Create the resources for a data model from an Excel file

With dsp-tools the `resources` section used in a data model (JSON) can be created from an Excel file. Only the first worksheet of
the Excel file is considered and only `XLSX` files are allowed. The `resources` section can be inserted into the ontology file and
then be uploaded onto a DSP server.

The Excel sheet must have the following format:
![img-resources-example.png](assets/images/img-resources-example.png)

The expected columns are:

- `name` : The name of the resource
- `super` : The base resource of the resource
- `en`, `de`, `fr`, `it` : The labels of the resource in different languages, at least one language has to be provided

For further information about resources, see [here](./dsp-tools-create.md#resources).

## Create the properties for a data model from an Excel file

[not yet implemented]

## Create a DSP-conform XML file from an Excel file

[not yet implemented]

## Create a list from one or several Excel files
With dsp-tools a JSON list can be created from one or several Excel files. The list can then be inserted into a JSON ontology
and uploaded to a DSP server. The expected format of the Excel files is described [here](./dsp-tools-create.md#lists-from-excel).
It is possible to create multilingual lists. In this case, a separate Excel file has to be created for each language. The data
has to be in the first worksheet of the Excel file(s). It is important that all the Excel lists have the same structure. So,
the translation(s) of a label in one Excel sheet has to be in the exact same cell (i.e. with the same cell index) in its own
Excel sheet.

Only Excel files with file extension `.xlsx` are considered. All Excel files have to be located in the same directory. When
calling the `excel` command, this folder is provided as an argument to the call. The language of the labels has to be provided in
the Excel file's file name after an underline and before the file extension, p.ex. `liste_de.xlsx` would be considered a list with
German (`de`) labels, `list_en.xlsx` a list with English (`en`) labels. The language has to be a valid ISO 639-1 or ISO
639-2 language code.

With dsp-tools a JSON list can be created from one or several Excel files. The list can then be inserted into a JSON ontology and
uploaded to a DSP server. The expected format of the Excel files is described [here](./dsp-tools-create.md#lists-from-excel). It
is possible to create multilingual lists. In this case, a separate Excel file has to be created for each language. The data has to
be in the first worksheet of the Excel file(s). It is important that all the Excel lists have the same structure. So, the
translation(s) of a label in one Excel sheet has to be in the exact same cell (i.e. with the same cell index) in its own Excel
sheet.

Only Excel files with file extension `.xlsx` are considered. All Excel files have to be located in the same directory. When
calling the `excel` command, this folder is provided as an argument to the call. The language of the labels has to be provided in
the Excel file's file name after an underline and before the file extension, p.ex. `liste_de.xlsx` would be considered a list with
German (`de`) labels, `list_en.xlsx` a list with English (`en`) labels. The language has to be a valid ISO 639-1 or ISO 639-2
language code.

The following example shows how to create a JSON list from two Excel files which are in a directory called `lists`. The output is
written to the file `list.json`.
Expand All @@ -32,7 +53,7 @@ written to the file `list.json`.
dsp-tools excel lists list.json
```

The two Excel files `liste_de.xlsx` and `list_en.xlsx` are located in a folder called `lists`. `liste_de.xlsx` contains German
The two Excel files `liste_de.xlsx` and `list_en.xlsx` are located in a folder called `lists`. `liste_de.xlsx` contains German
labels for the list, `list_en.xlsx` contains the English labels.

```
Expand All @@ -41,8 +62,8 @@ lists
|__ list_en.xlsx
```

For each list node, the `label`s are read from the Excel files. The language code, provided in the file name, is then used for
the labels. As node `name`, a simplified version of the English label is taken if English is one of the available languages. If
For each list node, the `label`s are read from the Excel files. The language code, provided in the file name, is then used for the
labels. As node `name`, a simplified version of the English label is taken if English is one of the available languages. If
English is not available, one of the other languages is chosen (which one depends on the representation of the file order). If
there are two node names with the same name, an incrementing number is appended to the `name`.

Expand Down Expand Up @@ -79,5 +100,5 @@ there are two node names with the same name, an incrementing number is appended
}, ...
```

After the creation of the list, a validation against the JSON schema for lists is performed. An error message ist printed out if
After the creation of the list, a validation against the JSON schema for lists is performed. An error message ist printed out if
the list is not valid. Furthermore, it is checked that no two nodes are the same.
64 changes: 42 additions & 22 deletions docs/dsp-tools-usage.md
@@ -1,7 +1,8 @@
[![PyPI version](https://badge.fury.io/py/dsp-tools.svg)](https://badge.fury.io/py/dsp-tools)

# Installation and usage
The following paragraphs gives you an overview of how to install and use dsp-tools.

The following paragraphs gives you an overview of how to install and use dsp-tools.

## Installation

Expand Down Expand Up @@ -29,20 +30,20 @@ The following options are available:
- `-u` | `--user` _username_: username used for authentication with the DSP API (default: root@example.com)
- `-p` | `--password` _password_: password used for authentication with the DSP API (default: test)
- `-V` | `--validate`: If set, only the validation of the JSON file is performed.
- `-l` | `--lists`: If set, only the lists are created using a [simplified schema](./dsp-tools-create.md#lists). Please note
that in this case the project must already exist.
- `-l` | `--lists`: If set, only the lists are created using a [simplified schema](./dsp-tools-create.md#lists). Please note that
in this case the project must already exist.
- `-v` | `--verbose`: If set, some information about the progress is printed to the console.
The command is used to read the definition of a data model (provided in a JSON file) and create it on the
DSP server. The following example shows how to load the ontology defined in `data_model_definition.json` onto the DSP

The command is used to read the definition of a data model (provided in a JSON file) and create it on the DSP server. The
following example shows how to load the ontology defined in `data_model_definition.json` onto the DSP
server `https://api.dsl.server.org` provided with the `-s` option. The username `root@example.com` and the password
`test` are used.
`test` are used.

```bash
dsp-tools create -s https://api.dsl.server.org -u root@example.com -p test data_model_definition.json
```

The description of the expected JSON format can be found [here](./dsp-tools-create.md).
The description of the expected JSON format can be found [here](./dsp-tools-create.md).

## Get a data model from a DSP server

Expand All @@ -59,10 +60,10 @@ The following options are available:
[IRI](https://en.wikipedia.org/wiki/Internationalized_Resource_Identifier) of the project
- `-v` | `--verbose`: If set, some information about the progress is printed to the console.

The command is used to get the definition of a data model from a DSP server and write it into a JSON file. This JSON file
could then be used to upload the data model to another DSP server. The following example shows how to get the data model
from a DSP server `https://api.dsl.server.org` provided with the `-s` option. The username `root@example.com` and the
password `test` are used. The data model is saved into the output file `output_file.json`.
The command is used to get the definition of a data model from a DSP server and write it into a JSON file. This JSON file could
then be used to upload the data model to another DSP server. The following example shows how to get the data model from a DSP
server `https://api.dsl.server.org` provided with the `-s` option. The username `root@example.com` and the password `test` are
used. The data model is saved into the output file `output_file.json`.

```bash
dsp-tools get -s https://api.dsl.server.org -u root@example.com -p test output_file.json
Expand All @@ -83,16 +84,16 @@ The following options are available:
- `-S` | `--sipi` _SIPIserver_: URL of the SIPI IIIF server (default: http://0.0.0.0:1024)
- `-v` | `--verbose`: If set, more information about the uploaded resources is printed to the console.

The command is used to upload data defined in an XML file onto a DSP server. The following example shows how to upload
data from an XML file `xml_data_file.xml` onto the DSP server `https://api.dsl.server.org` provided with the `-s` option.
The username `root@example.com` and the password `test` are used. The interface for the SIPI IIIF server is provided
with the `-S` option (`https://iiif.dsl.server.org`).
The command is used to upload data defined in an XML file onto a DSP server. The following example shows how to upload data from
an XML file `xml_data_file.xml` onto the DSP server `https://api.dsl.server.org` provided with the `-s` option. The
username `root@example.com` and the password `test` are used. The interface for the SIPI IIIF server is provided with the `-S`
option (`https://iiif.dsl.server.org`).

```bash
dsp-tools xmlupload -s https://api.dsl.server.org -u root@example.com -p test -S https://iiif.dsl.server.org xml_data_file.xml
```

The description of the expected XML format can be found [here](./dsp-tools-xmlupload.md).
The description of the expected XML format can be found [here](./dsp-tools-xmlupload.md).

## Create a JSON list file from one or several Excel files

Expand All @@ -105,15 +106,34 @@ The following option is available:
- `-l` | `--listname` _listname_: name to be used for the list (filename before last occurrence of `_` is used if omitted)

The command is used to create a JSON list file from one or several Excel files. It is possible to create multilingual lists.
Therefore, an Excel file for each language has to be provided. The data has to be in the first worksheet of the Excel
file and all Excel files have to be in the same directory. When calling the `excel` command, this directory has to be provided
as an argument to the call.
Therefore, an Excel file for each language has to be provided. The data has to be in the first worksheet of the Excel file and all
Excel files have to be in the same directory. When calling the `excel` command, this directory has to be provided as an argument
to the call.

The following example shows how to create a JSON list from Excel files in a directory called `lists`.

```bash
dsp-tools excel lists list.json
```

The description of the expected Excel format can be found [here](./dsp-tools-create.md#lists-from-excel). More information
about the usage of this command can be found [here](./dsp-tools-excel.md#create-a-list-from-one-or-several-excel-files).
The description of the expected Excel format can be found [here](./dsp-tools-create.md#lists-from-excel). More information about
the usage of this command can be found [here](./dsp-tools-excel.md#create-a-list-from-one-or-several-excel-files).

## Create resources from an Excel file

```bash
dsp-tools excel2resources excel_file.xlsx output_file.json
```

The command is used to create the resource section of an ontology from an Excel file. Therefore, an Excel file has to be provided
with the data in the first worksheet of the Excel file.

The following example shows how to create the resources section from an Excel file called `Resources.xlsx`.

```bash
dsp-tools excel2resources Resources.xlsx resources.json
```

More information about the usage of this command can be
found [here](./dsp-tools-excel.md#create-the-resources-for-a-data-model-from-an-excel-file)
.
31 changes: 17 additions & 14 deletions docs/index.md
Expand Up @@ -4,22 +4,25 @@

dsp-tools is a command line tool that helps you interact with the DaSCH Service Platform server (DSP server).

In order to archive your data on the DaSCH Service Platform, you need a data model (ontology) that describes your data.
The data model is defined in a JSON file which has to be transmitted to the DSP server. If the DSP server is aware of
the data model for your project, conforming data can be uploaded into the DSP repository.
In order to archive your data on the DaSCH Service Platform, you need a data model (ontology) that describes your data. The data
model is defined in a JSON file which has to be transmitted to the DSP server. If the DSP server is aware of the data model for
your project, conforming data can be uploaded into the DSP repository.

Often, data is initially added in large quantities. Therefore, dsp-tools allows you to perform bulk imports of your data.
In order to do so, the data has to be described in an XML file. dsp-tools is able to read the XML file and upload all data
to the DSP server.
Often, data is initially added in large quantities. Therefore, dsp-tools allows you to perform bulk imports of your data. In order
to do so, the data has to be described in an XML file. dsp-tools is able to read the XML file and upload all data to the DSP
server.

dsp-tools helps you with the following tasks:

- [`dsp-tools create`](./dsp-tools-usage.md#create-a-data-model-on-a-dsp-server) creates the data model (ontology) on a
DSP server from a provided JSON file containing the data model.
- [`dsp-tools get`](./dsp-tools-usage.md#get-a-data-model-from-a-dsp-server) reads a data model from a DSP server and
writes it into a JSON file.
- [`dsp-tools xmlupload`](./dsp-tools-usage.md#upload-data-to-a-dsp-server) uploads data from a provided XML file (bulk
data import).
- [`dsp-tools create`](./dsp-tools-usage.md#create-a-data-model-on-a-dsp-server) creates the data model (ontology) on a DSP server
from a provided JSON file containing the data model.
- [`dsp-tools get`](./dsp-tools-usage.md#get-a-data-model-from-a-dsp-server) reads a data model from a DSP server and writes it
into a JSON file.
- [`dsp-tools xmlupload`](./dsp-tools-usage.md#upload-data-to-a-dsp-server) uploads data from a provided XML file (bulk data
import).
- [`dsp-tools excel`](./dsp-tools-usage.md#create-a-json-list-file-from-one-or-several-excel-files)
creates a JSON or XML file from one or several Excel files. The created data can then be uploaded to a DSP server with
`dsp-tools create`.
creates a JSON or XML file from one or several Excel files. The created data can either be integrated into an ontology or be
uploaded directly to a DSP server with `dsp-tools create`.
- [`dsp-tools excel2resources`](./dsp-tools-usage.md#create-resources-from-an-excel-file)
creates the ontology's resource section from an Excel file. The resources can be integrated into an ontology and then be
uploaded to a DSP server with `dsp-tools create`.
12 changes: 12 additions & 0 deletions knora/dsp_tools.py
Expand Up @@ -14,6 +14,7 @@
from dsplib.utils.onto_create_ontology import create_ontology
from dsplib.utils.onto_get import get_ontology
from dsplib.utils.excel_to_json_lists import list_excel2json, validate_list_with_schema
from dsplib.utils.excel_to_json_resources import resources_excel2json
from dsplib.utils.onto_validate import validate_ontology
from dsplib.utils.xml_upload import xml_upload

Expand Down Expand Up @@ -84,6 +85,14 @@ def program(args: list) -> None:
parser_excel_lists.add_argument('excelfolder', help='Path to the folder containing the Excel file(s)', default='lists')
parser_excel_lists.add_argument('outfile', help='Path to the output JSON file containing the list data', default='list.json')

parser_excel_resources = subparsers.add_parser('excel2resources', help='Create a JSON file from an Excel file containing '
'resources for a DSP ontology. ')
parser_excel_resources.set_defaults(action='excel2resources')
parser_excel_resources.add_argument('excelfile', help='Path to the Excel file containing the resources',
default='resources.xlsx')
parser_excel_resources.add_argument('outfile', help='Path to the output JSON file containing the resource data',
default='resources.json')

args = parser.parse_args(args)

if not hasattr(args, 'action'):
Expand Down Expand Up @@ -133,6 +142,9 @@ def program(args: list) -> None:
list_excel2json(listname=args.listname,
excelfolder=args.excelfolder,
outfile=args.outfile)
elif args.action == 'excel2resources':
resources_excel2json(excelfile=args.excelfile,
outfile=args.outfile)


def main():
Expand Down
10 changes: 10 additions & 0 deletions knora/dsplib/utils/BUILD.bazel
Expand Up @@ -14,6 +14,16 @@ py_library(
]
)

py_library(
name = "excel_to_json_resources",
visibility = ["//visibility:public"],
srcs = ["excel_to_json_resources.py"],
deps = [
requirement("jsonschema"),
requirement("openpyxl")
]
)

py_library(
name = "expand_all_lists",
visibility = ["//visibility:public"],
Expand Down

0 comments on commit 7b0302f

Please sign in to comment.