Skip to content

Commit

Permalink
feat: improve excel command (DEV-955) (#228)
Browse files Browse the repository at this point in the history
  • Loading branch information
jnussbaum committed Sep 14, 2022
1 parent a0722d8 commit 21cc6bc
Show file tree
Hide file tree
Showing 30 changed files with 528 additions and 473 deletions.
Binary file added docs/assets/images/img-list-english-colors.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/assets/images/img-list-english-example.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/img-list-german-colors.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/assets/images/img-list-german-example.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/assets/templates/description_en.xlsx
Binary file not shown.
Binary file added docs/assets/templates/lists/de.xlsx
Binary file not shown.
Binary file added docs/assets/templates/lists/en.xlsx
Binary file not shown.
342 changes: 220 additions & 122 deletions docs/dsp-tools-create.md

Large diffs are not rendered by default.

170 changes: 98 additions & 72 deletions docs/dsp-tools-excel.md
Expand Up @@ -9,11 +9,12 @@ create a list from an Excel file.



## Create the resources for a data model from an Excel file
## JSON project file: "resources" section from Excel file

With dsp-tools, the `resources` section used in a data model (JSON) can be created from an Excel file. The command for
this is documented [here](./dsp-tools-usage.md#create-resources-from-an-excel-file). Only `XLSX` files are allowed.
The `resources` section can be inserted into the ontology file and then be uploaded onto a DSP server.
this is documented [here](./dsp-tools-usage.md#create-the-resources-section-of-a-json-project-file-from-an-excel-file).
Only `XLSX` files are allowed. The `resources` section can be inserted into the ontology file and then be uploaded onto
a DSP server.

**An Excel file template can be found [here](assets/templates/resources_template.xlsx). It is recommended to work from
the template.**
Expand Down Expand Up @@ -50,12 +51,12 @@ For further information about resources, see [here](./dsp-tools-create-ontologie



## Create the properties for a data model from an Excel file
## JSON project file: "properties" section from Excel file

With dsp-tools, the `properties` section used in a data model (JSON) can be created from an Excel file. The command for
this is documented [here](./dsp-tools-usage.md#create-properties-from-an-excel-file). Only the first worksheet of the
Excel file is considered and only XLSX files are allowed. The `properties` section can be inserted into the ontology
file and then be uploaded onto a DSP server.
this is documented [here](./dsp-tools-usage.md#create-the-properties-section-of-a-json-project-file-from-an-excel-file).
Only the first worksheet of the Excel file is considered and only XLSX files are allowed. The `properties` section can
be inserted into the ontology file and then be uploaded onto a DSP server.

**An Excel file template can be found [here](assets/templates/properties_template.xlsx). It is recommended to work
from the template.**
Expand Down Expand Up @@ -83,91 +84,116 @@ For further information about properties, see [here](./dsp-tools-create-ontologi



## Create a list from one or several Excel files
## JSON project file: "lists" section from Excel file(s)

With dsp-tools, a JSON list can be created from one or several Excel files. The command for this is documented
[here](./dsp-tools-usage.md#create-a-json-list-file-from-one-or-several-excel-files). The list can then be inserted
into a JSON ontology and uploaded to a DSP server. It is possible to create multilingual lists. In this case, a separate
Excel file has to be created for each language. The data must be in the first worksheet of each Excel file.
It is important that all the Excel lists have the same structure. So, the translation of a label in one Excel
sheet has to be in the exact same cell than the original was in the other Excel sheet (i.e. same cell index).
With dsp-tools, the "lists" section of a JSON project file can be created from one or several Excel files. The lists can
then be inserted into a JSON project file and uploaded to a DSP server. The command for this is documented
[here](./dsp-tools-usage.md#create-the-lists-section-of-a-json-project-file-from-excel-files).

**It is recommended to work from the following templates:
[description_en.xlsx](assets/templates/description_en.xlsx): The English list "description"
[Beschreibung_de.xlsx](assets/templates/Beschreibung_de.xlsx): Its German counterpart "Beschreibung"**
The following example shows how to create the "lists" section from the two Excel files `de.xlsx` and `en.xlsx` which are located
in a directory called `listfolder`:

```bash
dsp-tools excel2lists listfolder lists.json
```

The Excel sheets must have the following structure:
![img-list-english-example.png](assets/images/img-list-english-example.png)
![img-list-german-example.png](assets/images/img-list-german-example.png)

Only Excel files with file extension `.xlsx` are considered. All Excel files have to be located in the same directory.
When calling the `excel` command, this folder is provided as an argument to the call. The language of the labels has
to be provided in the Excel file's file name after an underline and before the file extension, e.g.
`Beschreibung_de.xlsx` would be considered a list with German (`de`) labels, `description_en.xlsx` a list with
English (`en`) labels. The language has to be one of {de, en, fr, it, rm}.

The following example shows how to create a JSON list from two Excel files which are in a directory called `listfolder`.
The output is written to the file `list.json`.

```bash
dsp-tools excel listfolder list.json
```
Some notes:

The two Excel files `Beschreibung_de.xlsx` and `description_en.xlsx` are located in a folder called `listfolder`.
- The data must be in the first worksheet of each Excel file.
- It is important that all Excel files have the same structure. So, the translation of a label in the second Excel
file has to be in the exact same cell as the one in the first Excel file.
- Only Excel files with file extension `.xlsx` are considered.
- The file name must consist of the language label, e.g. `de.xlsx` / `en.xlsx`.
- The language has to be one of {de, en, fr, it, rm}.
- As node name, a simplified version of the English label is taken. If English is not available, one of the other
languages is taken.
- If there are two nodes with the same name, an incrementing number is appended to the name.
- After the creation of the list, a validation against the JSON schema for lists is performed. An error message is
printed out if the list is not valid.

```
listfolder
|__ Beschreibung_de.xlsx
|__ description_en.xlsx
```
**It is recommended to work from the following templates:
[en.xlsx](assets/templates/lists/en.xlsx): File with the English labels
[de.xlsx](assets/templates/lists/de.xlsx): File with the German labels**

For each list node, the labels are read from the Excel files. The language code, provided in the file name, is then
used for the labels. As node name, a simplified version of the English label is taken if English is one of the
available languages. If English is not available, one of the other languages is chosen (which one depends on the
representation of the file order). If there are two node names with the same name, an incrementing number is appended to
the `name`.
The output of the above command, with the template files, is:

```JSON
{
"name": "description",
"labels": {
"de": "Beschreibung",
"en": "description"
},
"nodes": [
{
"name": "first-sublist",
"labels": {
"de": "erste Unterliste",
"en": "first sublist"
},
"nodes": [
"lists": [
{
"name": "colors",
"labels": {
"de": "Farben",
"en": "colors"
},
"comments": {
"de": "Farben",
"en": "colors"
},
"nodes": [
{
"name": "red",
"labels": {
"de": "rot",
"en": "red"
}
},
...
]
},
{
"name": "first-subnode",
"labels": {
"de": "erster Listenknoten",
"en": "first subnode"
},
"nodes": [
{
...
}
]
"name": "category",
"labels": {
"de": "Kategorie",
"en": "category"
},
"comments": {
"de": "Kategorie",
"en": "category"
},
"nodes": [
{
"name": "artwork",
"labels": {
"de": "Kunstwerk",
"en": "artwork"
}
},
...
]
},
...
]
}
]
{
"name": "faculties-of-the-university-of-basel",
"labels": {
"de": "Fakultäten der Universität Basel",
"en": "Faculties of the University of Basel"
},
"comments": {
"de": "Fakultäten der Universität Basel",
"en": "Faculties of the University of Basel"
},
"nodes": [
{
"name": "faculty-of-science",
"labels": {
"de": "Philosophisch-Naturwissenschaftliche Fakultät",
"en": "Faculty of Science"
}
},
...
]
}
]
}
```

After the creation of the list, a validation against the JSON schema for lists is performed. An error message is
printed out if the list is not valid. Furthermore, it is checked that no two nodes are the same.




## Create a DSP-conform XML file from an Excel/CSV file
## XML data file from Excel/CSV file

There are two use cases for a transformation from Excel/CSV to XML:

Expand Down
30 changes: 8 additions & 22 deletions docs/dsp-tools-usage.md
Expand Up @@ -117,35 +117,21 @@ to use this file to replace internal IDs in an existing XML file to reference ex



## Create a JSON list file from one or several Excel files
## Create the "lists" section of a JSON project file from Excel files

```bash
dsp-tools excel [option] folder_with_excel_files output_file.json
dsp-tools excel2lists folder output.json
```

The following option is available:

- `-l` | `--listname` _listname_: name to be used for the list (filename before last occurrence of `_` is used if
omitted)

The command is used to create a JSON list file from one or several Excel files. It is possible to create multilingual
lists. Therefore, an Excel file for each language has to be provided. The data has to be in the first worksheet of the
Excel file and all Excel files have to be in the same directory. When calling the `excel` command, this directory has to
be provided as an argument to the call.

The following example shows how to create a JSON list from Excel files in a directory called `lists`.

```bash
dsp-tools excel lists list.json
```

The expected Excel format is [documented here](./dsp-tools-create.md#lists-from-excel). More information about the usage
of this command can be found [here](./dsp-tools-excel.md#create-a-list-from-one-or-several-excel-files).
Arguments:
- `folder` (optional, default: "lists"): folder with the Excel file(s)
- `output.json` (optional, default: "lists.json"): Output file

The expected Excel format is [documented here](./dsp-tools-excel.md#create-the-lists-section-of-a-json-project-file-from-excel-files).



## Create resources from an Excel file
## Create the "resources" section of a JSON project file from an Excel file

```bash
dsp-tools excel2resources excel_file.xlsx output_file.json
Expand All @@ -167,7 +153,7 @@ found [here](./dsp-tools-excel.md#create-the-resources-for-a-data-model-from-an-



## Create properties from an Excel file
## Create the "properties" section of a JSON project file from an Excel file

```bash
dsp-tools excel2properties excel_file.xlsx output_file.json
Expand Down
20 changes: 10 additions & 10 deletions docs/index.md
Expand Up @@ -18,17 +18,17 @@ dsp-tools helps you with the following tasks:
on a DSP server from a JSON file.
- [`dsp-tools get`](./dsp-tools-usage.md#get-a-project-from-a-dsp-server) reads a project with its data model(s) from
a DSP server and writes it into a JSON file.
- [`dsp-tools xmlupload`](./dsp-tools-usage.md#upload-data-to-a-dsp-server) uploads data from a provided XML file (bulk
- [`dsp-tools xmlupload`](./dsp-tools-usage.md#upload-data-to-a-dsp-server) uploads data from an XML file (bulk
data import) and writes the mapping from internal IDs to IRIs into a local file.
- [`dsp-tools excel`](./dsp-tools-usage.md#create-a-json-list-file-from-one-or-several-excel-files)
creates a JSON or XML file from one or several Excel files. The created data can either be integrated into an ontology
or be uploaded directly to a DSP server with `dsp-tools create`.
- [`dsp-tools excel2resources`](./dsp-tools-usage.md#create-resources-from-an-excel-file)
creates the ontology's resource section from an Excel file. The resulting section can be integrated into an ontology
and then be uploaded to a DSP server with `dsp-tools create`.
- [`dsp-tools excel2properties`](./dsp-tools-usage.md#create-properties-from-an-excel-file)
creates the ontology's properties section from an Excel file. The resulting section can be integrated into an ontology
and then be uploaded to a DSP server with `dsp-tools create`.
- [`dsp-tools excel`](./dsp-tools-usage.md#create-the-lists-section-of-a-json-project-file-from-excel-files)
creates the "lists" section of a JSON project file from one or several Excel files. The resulting section can be
integrated into a JSON project file and then be uploaded to a DSP server with `dsp-tools create`.
- [`dsp-tools excel2resources`](./dsp-tools-usage.md#create-the-resources-section-of-a-json-project-file-from-an-excel-file)
creates the "resources" section of a JSON project file from an Excel file. The resulting section can be integrated
into a JSON project file and then be uploaded to a DSP server with `dsp-tools create`.
- [`dsp-tools excel2properties`](./dsp-tools-usage.md#create-the-properties-section-of-a-json-project-file-from-an-excel-file)
creates the "properties" section of a JSON project file from an Excel file. The resulting section can be integrated
into a JSON project file and then be uploaded to a DSP server with `dsp-tools create`.
- [`dsp-tools id2iri`](./dsp-tools-usage.md#replace-internal-ids-with-iris-in-xml-file)
takes an XML file for bulk data import and replaces referenced internal IDs with IRIs. The mapping has to be provided
with a JSON file.
Expand Down
29 changes: 12 additions & 17 deletions knora/dsp_tools.py
Expand Up @@ -6,7 +6,7 @@
import sys
from importlib.metadata import version

from knora.dsplib.utils.excel_to_json_lists import list_excel2json, validate_list_with_schema
from knora.dsplib.utils.excel_to_json_lists import list_excel2json, validate_lists_section_with_schema
from knora.dsplib.utils.excel_to_json_properties import properties_excel2json
from knora.dsplib.utils.excel_to_json_resources import resources_excel2json
from knora.dsplib.utils.id_to_iri import id_to_iri
Expand Down Expand Up @@ -89,19 +89,15 @@ def program(user_args: list[str]) -> None:

# excel
parser_excel_lists = subparsers.add_parser(
'excel',
help='Create a JSON list from one or multiple Excel files. The JSON list can be integrated into a JSON '
'ontology. If the list should contain multiple languages, an Excel file has to be used for each language. '
'The filenames should contain the language as label, p.ex. liste_de.xlsx, list_en.xlsx. The language is '
'then taken from the filename. Only files with extension .xlsx are considered.'
'excel2lists',
help='Create the "lists" section of a JSON project file from one or multiple Excel files. If the list should '
'contain multiple languages, a separate file has to be used for each language. The file names must '
'consist of the language label, e.g. "de.xlsx", "en.xlsx". Only files with extension .xlsx are considered.'
)
parser_excel_lists.set_defaults(action='excel')
parser_excel_lists.add_argument('-l', '--listname', type=str,
help='Name of the list to be created (filename is taken if omitted)', default=None)
parser_excel_lists.add_argument('excelfolder', help='Path to the folder containing the Excel file(s)',
default='lists')
parser_excel_lists.add_argument('outfile', help='Path to the output JSON file containing the list data',
default='list.json')
parser_excel_lists.set_defaults(action='excel2lists')
parser_excel_lists.add_argument('excelfolder', help='Path to the folder containing the Excel file(s)')
parser_excel_lists.add_argument('outfile', help='Path to the output JSON file containing the "lists" section',
default='lists.json')

# excel2resources
parser_excel_resources = subparsers.add_parser('excel2resources', help='Create a JSON file from an Excel file '
Expand Down Expand Up @@ -151,7 +147,7 @@ def program(user_args: list[str]) -> None:
if args.action == 'create':
if args.lists_only:
if args.validate_only:
validate_list_with_schema(args.datamodelfile)
validate_lists_section_with_schema(path_to_json_project_file=args.datamodelfile)
else:
create_lists(input_file=args.datamodelfile,
server=args.server,
Expand Down Expand Up @@ -189,9 +185,8 @@ def program(user_args: list[str]) -> None:
sipi=args.sipi,
verbose=args.verbose,
incremental=args.incremental)
elif args.action == 'excel':
list_excel2json(listname=args.listname,
excelfolder=args.excelfolder,
elif args.action == 'excel2lists':
list_excel2json(excelfolder=args.excelfolder,
outfile=args.outfile)
elif args.action == 'excel2resources':
resources_excel2json(excelfile=args.excelfile,
Expand Down

0 comments on commit 21cc6bc

Please sign in to comment.