Skip to content

Commit

Permalink
feat(excel-lists): create multilanguage json lists from excel files (…
Browse files Browse the repository at this point in the history
…DSP-1580) (#75)

* add docstring to main

* integrate code from prep repo into dsp-tools

* update documentation

* reference folder directly in ontology

* Update BUILD.bazel

* add test data and setup and teardown methods for unit tests

* update tests, update bazel files, eliminate duplicated code

* update .gitignore

* reformat code
  • Loading branch information
irinaschubert committed Aug 10, 2021
1 parent b01eb04 commit 06d071a
Show file tree
Hide file tree
Showing 29 changed files with 1,228 additions and 1,218 deletions.
9 changes: 4 additions & 5 deletions .gitignore
Expand Up @@ -37,10 +37,6 @@ MANIFEST
pip-log.txt
pip-delete-this-directory.txt





# Environments
.env
.venv
Expand All @@ -64,7 +60,10 @@ venv.bak/
.mypy_cache/
.idea
.vscode
/knora/lists.json

# created files
lists.json
out.json

# bazel
/bazel-*
Expand Down
22 changes: 13 additions & 9 deletions docs/dsp-tools-create.md
Expand Up @@ -275,12 +275,18 @@ Here is an example on how to build a taxonomic structure in JSON:
{
"name": "my_list",
"labels": {"en": "Disciplines of the Humanities"},
"comments": {"en": "This ist is just a silly example", "fr": "un example un peu fou"},
"comments": {
"en": "This is just an example.",
"fr": "C'est un example."
},
"nodes": [
{
"name": "node_1_1",
"labels": {"en": "Performing arts"},
"comments": {"en": "Arts that are events", "de": "Künste mit performativem Character"},
"comments": {
"en": "Arts that are events",
"de": "Künste mit performativem Character"
},
"nodes": [
{
"name": "node_2_2",
Expand Down Expand Up @@ -340,17 +346,17 @@ Here is an example on how to build a taxonomic structure in JSON:
```
#### Lists from Excel

A list can also be imported from an Excel sheet. The Excel sheet must have the following format (currently only a single
language is supported):
A list can be directly imported from an Excel sheet. The Excel sheet must have the following format:

![img-list-example.png](assets/images/img-list-example.png)

In such a case, the Excel file can directly be referenced in the list definition by defining a special list node:
```json
{
"name": "fromexcel",
"name": "List-from-excel",
"labels": {
"en": "Fromexcel"
"en": "List from an Excel file",
"de": "Liste von einer Excel-Datei"
},
"nodes": {
"file": "excel-list.xlsx",
Expand Down Expand Up @@ -1066,9 +1072,7 @@ Example for a resource definition:
{
"name": "Schule",
"super": "Resource",
"labels": {
"de": "Schule"
},
"labels": {"de": "Schule"},
"cardinalities": [
{
"propname": ":schulcode",
Expand Down
74 changes: 70 additions & 4 deletions docs/dsp-tools-excel.md
Expand Up @@ -11,7 +11,73 @@ create a list from an Excel file.
## Create a DSP-conform XML file from an Excel file
[not yet implemented]

## Create flat or hierarchical lists from an Excel file
Lists or controlled vocabularies are sets of fixed terms that are used to characterize objects. Hierarchical lists
correspond to classifications or taxonomies. With dsp-tools a list can be created from an Excel file. The expected
format of the Excel file is described [here](./dsp-tools-create.md#lists-from-excel).
## Create a list from one or several Excel files
With dsp-tools a JSON list can be created from one or several Excel files. The list can then be inserted into a JSON ontology
and uploaded to a DSP server. The expected format of the Excel files is described [here](./dsp-tools-create.md#lists-from-excel).
It is possible to create multilingual lists. In this case, a separate Excel file has to be created for each language. The data
has to be in the first worksheet of the Excel file(s). It is important that all the Excel lists have the same structure. So,
the translation(s) of a label in one Excel sheet has to be in the exact same cell (i.e. with the same cell index) in its own
Excel sheet.

Only Excel files with file extension `.xlsx` are considered. All Excel files have to be located in the same directory. When
calling the `excel` command, this folder is provided as an argument to the call. The language of the labels has to be provided in
the Excel file's file name after an underline and before the file extension, p.ex. `liste_de.xlsx` would be considered a list with
German (`de`) labels, `list_en.xlsx` a list with English (`en`) labels. The language has to be a valid ISO 639-1 or ISO
639-2 language code.

The following example shows how to create a JSON list from two Excel files which are in a directory called `lists`. The output is
written to the file `list.json`.

```bash
dsp-tools excel lists list.json
```

The two Excel files `liste_de.xlsx` and `list_en.xlsx` are located in a folder called `lists`. `liste_de.xlsx` contains German
labels for the list, `list_en.xlsx` contains the English labels.

```
lists
|__ liste_de.xlsx
|__ list_en.xlsx
```

For each list node, the `label`s are read from the Excel files. The language code, provided in the file name, is then used for
the labels. As node `name`, a simplified version of the English label is taken if English is one of the available languages. If
English is not available, one of the other languages is chosen (which one depends on the representation of the file order). If
there are two node names with the same name, an incrementing number is appended to the `name`.

```JSON
{
"name": "sand",
"labels": {
"de": "Sand",
"en": "sand"
},
"nodes": [
{
"name": "fine-sand",
"labels": {
"de": "Feinsand",
"en": "fine sand"
}
},
{
"name": "medium-sand",
"labels": {
"de": "Mittelsand",
"en": "medium sand"
}
},
{
"name": "coarse-sand",
"labels": {
"de": "Grobsand",
"en": "coarse sand"
}
}
]
}, ...
```

After the creation of the list, a validation against the JSON schema for lists is performed. An error message ist printed out if
the list is not valid. Furthermore, it is checked that no two nodes are the same.
27 changes: 17 additions & 10 deletions docs/dsp-tools-usage.md
Expand Up @@ -94,19 +94,26 @@ dsp-tools xmlupload -s https://api.dsl.server.org -u root@example.com -p test -S

The description of the expected XML format can be found [here](./dsp-tools-xmlupload.md).

## Convert an Excel file into a JSON file that is compatible with dsp-tools
## Create a JSON list file from one or several Excel files

```bash
dsp-tools excel [options] excel_file.xlsx output_file.json
dsp-tools excel [option] folder_with_excel_files output_file.json
```

The following options are available:
The following option is available:

- `-S` | `--sheet` _sheetname_: name of the Excel worksheet to use (default: Tabelle1)
- `-s` | `--shortcode` _shortcode_: shortcode of the project (required)
- `-l` | `--listname` _listname_: name to be used for the list and the list definition file (required)
- `-L` | `--label` _label_: label to be used for the list (required)
- `-x` | `--lang` _lang_: language used for the list labels and commentaries (default: en)
- `-v` | `--verbose`: If set, some information about the progress is printed to the console.
- `-l` | `--listname` _listname_: name to be used for the list (filename before last occurrence of `_` is used if omitted)

The command is used to create a JSON list file from one or several Excel files. It is possible to create multilingual lists.
Therefore, an Excel file for each language has to be provided. The data has to be in the first worksheet of the Excel
file and all Excel files have to be in the same directory. When calling the `excel` command, this directory has to be provided
as an argument to the call.

The following example shows how to create a JSON list from Excel files in a directory called `lists`.

```bash
dsp-tools excel lists list.json
```

The description of the expected Excel format can be found [here](./dsp-tools-create.md#lists-from-excel).
The description of the expected Excel format can be found [here](./dsp-tools-create.md#lists-from-excel). More information
about the usage of this command can be found [here](./dsp-tools-excel.md#create-a-list-from-one-or-several-excel-files).
6 changes: 3 additions & 3 deletions docs/index.md
Expand Up @@ -20,6 +20,6 @@ dsp-tools helps you with the following tasks:
writes it into a JSON file.
- [`dsp-tools xmlupload`](./dsp-tools-usage.md#upload-data-to-a-dsp-server) uploads data from a provided XML file (bulk
data import).
- [`dsp-tools excel`](./dsp-tools-usage.md#convert-an-excel-file-into-a-json-file-that-is-compatible-with-dsp-tools)
converts an Excel file into a JSON and/or XML file in order to use it with `dsp-tools create` or `dsp-tools xmlupload`
(not yet implemented) or converts a list from an Excel file into a JSON file which than can be used in an ontology.
- [`dsp-tools excel`](./dsp-tools-usage.md#create-a-json-list-file-from-one-or-several-excel-files)
creates a JSON or XML file from one or several Excel files. The created data can then be uploaded to a DSP server with
`dsp-tools create`.

0 comments on commit 06d071a

Please sign in to comment.