Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(excel-lists): create multilanguage json lists from excel files (DSP-1580) #75

Merged
merged 16 commits into from Aug 10, 2021
10 changes: 5 additions & 5 deletions docs/dsp-tools-create.md
Expand Up @@ -275,7 +275,7 @@ Here is an example on how to build a taxonomic structure in JSON:
{
"name": "my_list",
"labels": {"en": "Disciplines of the Humanities"},
"comments": {"en": "This ist is just a silly example", "fr": "un example un peu fou"},
"comments": {"en": "This is just an example.", "fr": "C'est un example."},
irinaschubert marked this conversation as resolved.
Show resolved Hide resolved
"nodes": [
{
"name": "node_1_1",
Expand Down Expand Up @@ -340,17 +340,17 @@ Here is an example on how to build a taxonomic structure in JSON:
```
#### Lists from Excel

A list can also be imported from an Excel sheet. The Excel sheet must have the following format (currently only a single
language is supported):
A list can be directly imported from an Excel sheet. The Excel sheet must have the following format:

![img-list-example.png](assets/images/img-list-example.png)

In such a case, the Excel file can directly be referenced in the list definition by defining a special list node:
```json
{
"name": "fromexcel",
"name": "List-from-excel",
"labels": {
"en": "Fromexcel"
"en": "List from an Excel file",
"de": "Liste von einer Excel-Datei"
},
"nodes": {
"file": "excel-list.xlsx",
Expand Down
74 changes: 70 additions & 4 deletions docs/dsp-tools-excel.md
Expand Up @@ -11,7 +11,73 @@ create a list from an Excel file.
## Create a DSP-conform XML file from an Excel file
[not yet implemented]

## Create flat or hierarchical lists from an Excel file
Lists or controlled vocabularies are sets of fixed terms that are used to characterize objects. Hierarchical lists
correspond to classifications or taxonomies. With dsp-tools a list can be created from an Excel file. The expected
format of the Excel file is described [here](./dsp-tools-create.md#lists-from-excel).
## Create a list from one or several Excel files
With dsp-tools a JSON list can be created from one or several Excel files. The list can then be inserted into a JSON ontology
and uploaded to a DSP server. The expected format of the Excel files is described [here](./dsp-tools-create.md#lists-from-excel).
It is possible to create multilingual lists. In this case, a separate Excel file has to be created for each language. The data
has to be in the first worksheet of the Excel file(s). It is important, that all the Excel lists have the same structure. So,
irinaschubert marked this conversation as resolved.
Show resolved Hide resolved
the translation(s) of a label in one Excel sheet has to be in the exact same cell (i.e. with the same cell index) in its own
Excel sheet.

Only Excel files with file extension `.xlsx` are considered. All Excel files have to be located in the same directory. When
calling the `excel` command, this folder is provided as an argument to the call. The language of the labels has to be provided in
the Excel file's file name after an underline and before the file extension, p.ex. `liste_de.xlsx` would be considered a list with
German (`de`) labels, `list_en.xlsx` a list with English (`en`) labels. The language has to be a valid ISO 639-1 or ISO
639-2 language code.

The following example shows how to create a JSON list from two Excel files which are in a directory called `lists`. The output is
written to the file `list.json`.

```bash
dsp-tools excel lists list.json
```

The two Excel files `liste_de.xlsx` and `list_en.xlsx` are located in a folder called `lists`. `liste_de.xlsx` contains German
labels for the list, `list_en.xlsx` contains the English labels.

```
lists
|__ liste_de.xlsx
|__ list_en.xlsx
```

For each list node, the `label`s are read from the Excel files. The language code, provided in the file name, is then used for
the labels. As node `name`, a simplified version of the English label is taken if English is one of the available languages. If
English is not available, one of the other languages is chosen (which one depends on the representation of the file order). If
there are two node names with the same name, an incrementing number is appended to the `name`.

```JSON
{
"name": "sand",
"labels": {
"de": "Sand",
"en": "sand"
},
"nodes": [
{
"name": "fine-sand",
"labels": {
"de": "Feinsand",
"en": "fine sand"
}
},
{
"name": "medium-sand",
"labels": {
"de": "Mittelsand",
"en": "medium sand"
}
},
{
"name": "coarse-sand",
"labels": {
"de": "Grobsand",
"en": "coarse sand"
}
}
]
}, ...
```

After the creation of the list, a validation against the XSD schema for lists is performed. An error message ist printed out if
irinaschubert marked this conversation as resolved.
Show resolved Hide resolved
the list is not valid. Furthermore, it is checked that no two nodes are the same.
27 changes: 17 additions & 10 deletions docs/dsp-tools-usage.md
Expand Up @@ -94,19 +94,26 @@ dsp-tools xmlupload -s https://api.dsl.server.org -u root@example.com -p test -S

The description of the expected XML format can be found [here](./dsp-tools-xmlupload.md).

## Convert an Excel file into a JSON file that is compatible with dsp-tools
## Create a JSON list file from one or several Excel files

```bash
dsp-tools excel [options] excel_file.xlsx output_file.json
dsp-tools excel [option] folder_with_excel_files output_file.json
```

The following options are available:
The following option is available:

- `-S` | `--sheet` _sheetname_: name of the Excel worksheet to use (default: Tabelle1)
- `-s` | `--shortcode` _shortcode_: shortcode of the project (required)
- `-l` | `--listname` _listname_: name to be used for the list and the list definition file (required)
- `-L` | `--label` _label_: label to be used for the list (required)
- `-x` | `--lang` _lang_: language used for the list labels and commentaries (default: en)
- `-v` | `--verbose`: If set, some information about the progress is printed to the console.
- `-l` | `--listname` _listname_: name to be used for the list (filename before last occurrence of `_` is used if omitted)

The command is used to create a JSON list file from one or several Excel files. It is possible to create multilingual lists.
Therefore, an Excel file for each language has to be provided. The data has to be in the first worksheet of the Excel
file and all Excel files have to be in the same directory. When calling the `excel` command, this directory has to be provided
as an argument to the call.

The following example shows how to create a JSON list from Excel files in a directory called `lists`.

```bash
dsp-tools excel lists list.json
```

The description of the expected Excel format can be found [here](./dsp-tools-create.md#lists-from-excel).
The description of the expected Excel format can be found [here](./dsp-tools-create.md#lists-from-excel). More information
about the usage of this command can be found [here](./dsp-tools-excel.md#create-a-list-from-one-or-several-excel-files).
6 changes: 3 additions & 3 deletions docs/index.md
Expand Up @@ -20,6 +20,6 @@ dsp-tools helps you with the following tasks:
writes it into a JSON file.
- [`dsp-tools xmlupload`](./dsp-tools-usage.md#upload-data-to-a-dsp-server) uploads data from a provided XML file (bulk
data import).
- [`dsp-tools excel`](./dsp-tools-usage.md#convert-an-excel-file-into-a-json-file-that-is-compatible-with-dsp-tools)
converts an Excel file into a JSON and/or XML file in order to use it with `dsp-tools create` or `dsp-tools xmlupload`
(not yet implemented) or converts a list from an Excel file into a JSON file which than can be used in an ontology.
- [`dsp-tools excel`](./dsp-tools-usage.md#create-a-json-list-file-from-one-or-several-excel-files)
creates a JSON or XML file from one or several Excel files. The created data can then be uploaded to a DSP server with
`dsp-tools create`.
119 changes: 62 additions & 57 deletions knora/dsp_tools.py
Expand Up @@ -2,6 +2,7 @@
The code in this file handles the arguments passed by the user from the command line and calls the requested actions.
"""
import argparse
import datetime
import os
import sys

Expand All @@ -12,8 +13,8 @@
from dsplib.utils.onto_create_lists import create_lists
from dsplib.utils.onto_create_ontology import create_ontology
from dsplib.utils.onto_get import get_ontology
from dsplib.utils.onto_process_excel import list_excel2json
from dsplib.utils.onto_validate import validate_list, validate_ontology
from dsplib.utils.excel_to_json_lists import list_excel2json, validate_list_with_schema
from dsplib.utils.onto_validate import validate_ontology
from dsplib.utils.xml_upload import xml_upload


Expand All @@ -27,68 +28,72 @@ def program(args: list) -> None:
Returns:
None
"""
version = pkg_resources.require("dsp-tools")[0].version
version = pkg_resources.require('dsp-tools')[0].version
now = datetime.datetime.now()

# parse the arguments of the command line
parser = argparse.ArgumentParser(
description=f"dsp-tools (Version {version}) DaSCH Service Platform data modelling tools (© 2021 by DaSCH).")

subparsers = parser.add_subparsers(title="Subcommands", description='Valid subcommands are', help='sub-command help')

parser_create = subparsers.add_parser('create', help='Create ontologies, lists etc.')
parser_create.set_defaults(action="create")
parser_create.add_argument("-s", "--server", type=str, default="http://0.0.0.0:3333", help="URL of the DSP server")
parser_create.add_argument("-u", "--user", default="root@example.com", help="Username for DSP server")
parser_create.add_argument("-p", "--password", default="test", help="The password for login")
parser_create.add_argument("-V", "--validate", action='store_true',
help="Do only validation of JSON, no upload of the ontology")
parser_create.add_argument("-L", "--listfile", type=str, default="lists.json", help="Name of list node informationfile")
parser_create.add_argument("-l", "--lists", action='store_true', help="Only create the lists")
parser_create.add_argument("-v", "--verbose", action="store_true", help="Verbose feedback")
parser_create.add_argument("-d", "--dump", action="store_true", help="dump test files for DSP-API requests")
parser_create.add_argument("datamodelfile", help="path to data model file")

parser_get = subparsers.add_parser('get', help='Get project/ontology information from server')
parser_get.set_defaults(action="get")
parser_get.add_argument("-u", "--user", default="root@example.com", help="Username for DSP server")
parser_get.add_argument("-p", "--password", default="test", help="The password for login")
parser_get.add_argument("-s", "--server", type=str, default="http://0.0.0.0:3333", help="URL of the DSP server")
parser_get.add_argument("-P", "--project", type=str, help="Shortcode, shortname or iri of project", required=True)
parser_get.add_argument("-v", "--verbose", action="store_true", help="Verbose feedback")
parser_get.add_argument("datamodelfile", help="path to data model file", default="onto.json")

parser_upload = subparsers.add_parser('xmlupload', help='Upload data from XML file to server')
parser_upload.set_defaults(action="xmlupload")
parser_upload.add_argument("-s", "--server", type=str, default="http://0.0.0.0:3333", help="URL of the DSP server")
parser_upload.add_argument("-u", "--user", type=str, default="root@example.com", help="Username for DSP server")
parser_upload.add_argument("-p", "--password", type=str, default="test", help="The password for login")
parser_upload.add_argument("-V", "--validate", action='store_true', help="Do only validation of XML, no upload of the data")
parser_upload.add_argument("-i", "--imgdir", type=str, default=".", help="Path to folder containing the images")
parser_upload.add_argument("-S", "--sipi", type=str, default="http://0.0.0.0:1024", help="URL of SIPI server")
parser_upload.add_argument("-v", "--verbose", action="store_true", help="Verbose feedback")
parser_upload.add_argument("xmlfile", help="path to xml file containing the data", default="data.xml")

parser_excel_lists = subparsers.add_parser('excel', help='Create lists JSON from excel files')
parser_excel_lists.set_defaults(action="excel")
parser_excel_lists.add_argument("-S", "--sheet", type=str, help="Name of excel sheet to be used", default="Tabelle1")
parser_excel_lists.add_argument("-s", "--shortcode", type=str, help="Shortcode of project", default="4123")
parser_excel_lists.add_argument("-l", "--listname", type=str, help="Name of list to be created", default="my_list")
parser_excel_lists.add_argument("-L", "--label", type=str, help="Label of list to be created", default="MyList")
parser_excel_lists.add_argument("-x", "--lang", type=str, help="Language for label", default="en")
parser_excel_lists.add_argument("-v", "--verbose", action="store_true", help="Verbose feedback")
parser_excel_lists.add_argument("excelfile", help="Path to the excel file containing the list data", default="lists.xlsx")
parser_excel_lists.add_argument("outfile", help="Path to the output JSON file containing the list data", default="list.json")
description=f'dsp-tools (Version {version}) DaSCH Service Platform data modelling tools (© {now.year} by DaSCH).')

subparsers = parser.add_subparsers(title='Subcommands', description='Valid subcommands are', help='sub-command help')

parser_create = subparsers.add_parser('create', help='Upload an ontology and/or list(s) from a JSON file to the DaSCH '
'Service Platform')
parser_create.set_defaults(action='create')
parser_create.add_argument('-s', '--server', type=str, default='http://0.0.0.0:3333', help='URL of the DSP server')
parser_create.add_argument('-u', '--user', default='root@example.com', help='Username for DSP server')
parser_create.add_argument('-p', '--password', default='test', help='The password for login')
parser_create.add_argument('-V', '--validate', action='store_true',
help='Do only validation of JSON, no upload of the ontology')
parser_create.add_argument('-L', '--listfile', type=str, default='lists.json', help='Name of list node informationfile')
parser_create.add_argument('-l', '--lists', action='store_true', help='Upload only the list(s)')
parser_create.add_argument('-v', '--verbose', action='store_true', help='Verbose feedback')
parser_create.add_argument('-d', '--dump', action='store_true', help='dump test files for DSP-API requests')
parser_create.add_argument('datamodelfile', help='path to data model file')

parser_get = subparsers.add_parser('get', help='Get the ontology (data model) of a project from the DaSCH Service Platform.')
parser_get.set_defaults(action='get')
parser_get.add_argument('-u', '--user', default='root@example.com', help='Username for DSP server')
parser_get.add_argument('-p', '--password', default='test', help='The password for login')
parser_get.add_argument('-s', '--server', type=str, default='http://0.0.0.0:3333', help='URL of the DSP server')
parser_get.add_argument('-P', '--project', type=str, help='Shortcode, shortname or iri of project', required=True)
parser_get.add_argument('-v', '--verbose', action='store_true', help='Verbose feedback')
parser_get.add_argument('datamodelfile', help='Path to the file the ontology should be written to', default='onto.json')

parser_upload = subparsers.add_parser('xmlupload', help='Upload data from an XML file to the DaSCH Service Platform.')
parser_upload.set_defaults(action='xmlupload')
parser_upload.add_argument('-s', '--server', type=str, default='http://0.0.0.0:3333', help='URL of the DSP server')
parser_upload.add_argument('-u', '--user', type=str, default='root@example.com', help='Username for DSP server')
parser_upload.add_argument('-p', '--password', type=str, default='test', help='The password for login')
parser_upload.add_argument('-V', '--validate', action='store_true', help='Do only validation of XML, no upload of the data')
parser_upload.add_argument('-i', '--imgdir', type=str, default='.', help='Path to folder containing the images')
parser_upload.add_argument('-S', '--sipi', type=str, default='http://0.0.0.0:1024', help='URL of SIPI server')
parser_upload.add_argument('-v', '--verbose', action='store_true', help='Verbose feedback')
parser_upload.add_argument('xmlfile', help='path to xml file containing the data', default='data.xml')

parser_excel_lists = subparsers.add_parser('excel', help='Create a JSON list from one or multiple Excel files. The JSON '
'list can be integrated into a JSON ontology. If the list should '
'contain multiple languages, an Excel file has to be used for '
'each language. The filenames should contain the language as '
'label, p.ex. liste_de.xlsx, list_en.xlsx. The language is then '
'taken from the filename. Only files with extension .xlsx are '
'considered.')
parser_excel_lists.set_defaults(action='excel')
parser_excel_lists.add_argument('-l', '--listname', type=str,
help='Name of the list to be created (filename is taken if omitted)', default=None)
parser_excel_lists.add_argument('excelfolder', help='Path to the folder containing the Excel file(s)', default='lists')
parser_excel_lists.add_argument('outfile', help='Path to the output JSON file containing the list data', default='list.json')

args = parser.parse_args(args)

if not hasattr(args, 'action'):
parser.print_help(sys.stderr)
exit(0)

if args.action == "create":
if args.action == 'create':
irinaschubert marked this conversation as resolved.
Show resolved Hide resolved
if args.lists:
if args.validate:
validate_list(args.datamodelfile)
validate_list_with_schema(args.datamodelfile)
else:
create_lists(input_file=args.datamodelfile, lists_file=args.listfile, server=args.server, user=args.user,
password=args.password, verbose=args.verbose, dump=args.dump)
Expand All @@ -98,18 +103,18 @@ def program(args: list) -> None:
else:
create_ontology(input_file=args.datamodelfile, lists_file=args.listfile, server=args.server, user=args.user,
password=args.password, verbose=args.verbose, dump=args.dump if args.dump else False)
elif args.action == "get":
elif args.action == 'get':
get_ontology(projident=args.project, outfile=args.datamodelfile, server=args.server, user=args.user,
password=args.password, verbose=args.verbose)
elif args.action == "xmlupload":
elif args.action == 'xmlupload':
xml_upload(input_file=args.xmlfile, server=args.server, user=args.user, password=args.password, imgdir=args.imgdir,
sipi=args.sipi, verbose=args.verbose, validate_only=args.validate)
elif args.action == "excel":
list_excel2json(excelpath=args.excelfile, sheetname=args.sheet, shortcode=args.shortcode, listname=args.listname,
label=args.label, lang=args.lang, outfile=args.outfile, verbose=args.verbose)
elif args.action == 'excel':
list_excel2json(listname=args.listname, excelfolder=args.excelfolder, outfile=args.outfile)


def main():
"""Main entry point of the program as referenced in setup.py"""
program(sys.argv[1:])


Expand Down
8 changes: 3 additions & 5 deletions knora/dsplib/models/permission.py
@@ -1,10 +1,8 @@
from enum import Enum, unique
from typing import List, Set, Dict, Tuple, Optional, Any, Union, Type
from pystrict import strict
import re
from enum import Enum, unique
from typing import List, Dict, Optional, Union

from dsplib.models.group import Group
from dsplib.models.helpers import BaseError
from pystrict import strict
irinaschubert marked this conversation as resolved.
Show resolved Hide resolved


@unique
Expand Down