Skip to content

Commit

Permalink
feat(excel-to-properties): create properties from Excel (DSP-1577) (#89)
Browse files Browse the repository at this point in the history
* Squashed commit of the following:

commit e24a895
Author: irinaschubert <irina.schubert@dasch.swiss>
Date:   Thu Sep 9 15:49:19 2021 +0200

    add resources module to bazel

commit 0aefee9
Author: irinaschubert <irina.schubert@dasch.swiss>
Date:   Thu Sep 9 15:44:29 2021 +0200

    Update dsp-tools-usage.md

commit 3ddecdc
Author: irinaschubert <irina.schubert@dasch.swiss>
Date:   Thu Sep 9 15:41:26 2021 +0200

    add documentation

commit 914d3a6
Author: irinaschubert <irina.schubert@dasch.swiss>
Date:   Thu Sep 9 15:31:36 2021 +0200

    update test

commit 75a04ae
Author: irinaschubert <irina.schubert@dasch.swiss>
Date:   Thu Sep 9 15:11:07 2021 +0200

    add test

commit 1fb39b9
Author: irinaschubert <irina.schubert@dasch.swiss>
Date:   Thu Sep 9 14:40:17 2021 +0200

    add validation and schema

commit 6275c13
Author: irinaschubert <irina.schubert@dasch.swiss>
Date:   Thu Sep 9 12:51:20 2021 +0200

    integrate script into dsp-tools

* integrate standalone script from dsp-tools-prep into dsp-tools

* add documentation

* update documentation
  • Loading branch information
irinaschubert committed Sep 16, 2021
1 parent 7b0302f commit 9f48e9a
Show file tree
Hide file tree
Showing 13 changed files with 348 additions and 5 deletions.
1 change: 1 addition & 0 deletions MANIFEST.in
Expand Up @@ -3,5 +3,6 @@ include knora/dsplib/utils/knora-schema.json
include knora/dsplib/utils/knora-schema-lists.json
include knora/dsplib/utils/knora-schema-lists-only.json
include knora/dsplib/utils/knora-schema-resources-only.json
include knora/dsplib/utils/knora-schema-properties-only.json
include knora/dsplib/utils/knora-data-schema.xsd
include knora/dsplib/utils/language-codes-3b2_csv.csv
Binary file added docs/assets/images/img-properties-example.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
18 changes: 17 additions & 1 deletion docs/dsp-tools-excel.md
Expand Up @@ -25,7 +25,23 @@ For further information about resources, see [here](./dsp-tools-create.md#resour

## Create the properties for a data model from an Excel file

[not yet implemented]
With dsp-tools the `properties` section used in a data model (JSON) can be created from an Excel file. Only the first worksheet of
the Excel file is considered and only XLSX files are allowed. The `properties` section can be inserted into the ontology file and
then be uploaded onto a DSP server.

The Excel sheet must have the following format:
![img-properties-example.png](assets/images/img-properties-example.png)

The expected columns are:

- `name` : The name of the property
- `super` : The base property of the property
- `object` : The resource the property refers to if it is a link property (property derived from `hasLinkTo`)
- `en`, `de`, `fr`, `it` : The labels of the property in different languages, at least one language has to be provided
- `gui_element` : The GUI element for the property
- `hlist` : In case of list values the according list

For further information about properties, see [here](./dsp-tools-create.md#properties).

## Create a DSP-conform XML file from an Excel file

Expand Down
25 changes: 23 additions & 2 deletions docs/dsp-tools-usage.md
Expand Up @@ -125,15 +125,36 @@ the usage of this command can be found [here](./dsp-tools-excel.md#create-a-list
dsp-tools excel2resources excel_file.xlsx output_file.json
```

The command is used to create the resource section of an ontology from an Excel file. Therefore, an Excel file has to be provided
The command is used to create the resources section of an ontology from an Excel file. Therefore, an Excel file has to be provided
with the data in the first worksheet of the Excel file.

The following example shows how to create the resources section from an Excel file called `Resources.xlsx`.
The following example shows how to create the resources section from an Excel file called `Resources.xlsx`. The output is written
to a file called `resources.json`.

```bash
dsp-tools excel2resources Resources.xlsx resources.json
```

More information about the usage of this command can be
found [here](./dsp-tools-excel.md#create-the-resources-for-a-data-model-from-an-excel-file)
.

## Create properties from an Excel file

```bash
dsp-tools excel2properties excel_file.xlsx output_file.json
```

The command is used to create the properties section of an ontology from an Excel file. Therefore, an Excel file has to be
provided with the data in the first worksheet of the Excel file.

The following example shows how to create the properties section from an Excel file called `Properties.xlsx`. The output is
written to a file called `properties.json`.

```bash
dsp-tools excel2properties Properties.xlsx properties.json
```

More information about the usage of this command can be found
[here](./dsp-tools-excel.md#create-the-properties-for-a-data-model-from-an-excel-file)
.
6 changes: 5 additions & 1 deletion docs/index.md
Expand Up @@ -24,5 +24,9 @@ dsp-tools helps you with the following tasks:
creates a JSON or XML file from one or several Excel files. The created data can either be integrated into an ontology or be
uploaded directly to a DSP server with `dsp-tools create`.
- [`dsp-tools excel2resources`](./dsp-tools-usage.md#create-resources-from-an-excel-file)
creates the ontology's resource section from an Excel file. The resources can be integrated into an ontology and then be
creates the ontology's resource section from an Excel file. The resulting section can be integrated into an ontology and then be
uploaded to a DSP server with `dsp-tools create`.
- [`dsp-tools excel2properties`](./dsp-tools-usage.md#create-properties-from-an-excel-file)
creates the ontology's properties section from an Excel file. The resulting section can be integrated into an ontology and then
be uploaded to a DSP server with `dsp-tools create`.

12 changes: 12 additions & 0 deletions knora/dsp_tools.py
Expand Up @@ -15,6 +15,7 @@
from dsplib.utils.onto_get import get_ontology
from dsplib.utils.excel_to_json_lists import list_excel2json, validate_list_with_schema
from dsplib.utils.excel_to_json_resources import resources_excel2json
from dsplib.utils.excel_to_json_properties import properties_excel2json
from dsplib.utils.onto_validate import validate_ontology
from dsplib.utils.xml_upload import xml_upload

Expand Down Expand Up @@ -93,6 +94,14 @@ def program(args: list) -> None:
parser_excel_resources.add_argument('outfile', help='Path to the output JSON file containing the resource data',
default='resources.json')

parser_excel_properties = subparsers.add_parser('excel2properties', help='Create a JSON file from an Excel file containing '
'properties for a DSP ontology. ')
parser_excel_properties.set_defaults(action='excel2properties')
parser_excel_properties.add_argument('excelfile', help='Path to the Excel file containing the properties',
default='properties.xlsx')
parser_excel_properties.add_argument('outfile', help='Path to the output JSON file containing the properties data',
default='properties.json')

args = parser.parse_args(args)

if not hasattr(args, 'action'):
Expand Down Expand Up @@ -145,6 +154,9 @@ def program(args: list) -> None:
elif args.action == 'excel2resources':
resources_excel2json(excelfile=args.excelfile,
outfile=args.outfile)
elif args.action == 'excel2properties':
properties_excel2json(excelfile=args.excelfile,
outfile=args.outfile)


def main():
Expand Down
10 changes: 10 additions & 0 deletions knora/dsplib/utils/BUILD.bazel
Expand Up @@ -24,6 +24,16 @@ py_library(
]
)

py_library(
name = "excel_to_json_properties",
visibility = ["//visibility:public"],
srcs = ["excel_to_json_properties.py"],
deps = [
requirement("jsonschema"),
requirement("openpyxl")
]
)

py_library(
name = "expand_all_lists",
visibility = ["//visibility:public"],
Expand Down
93 changes: 93 additions & 0 deletions knora/dsplib/utils/excel_to_json_properties.py
@@ -0,0 +1,93 @@
import json
import os

import jsonschema
from openpyxl import load_workbook


def validate_properties_with_schema(json_file: str) -> bool:
"""
This function checks if the json properties are valid according to the schema.
Args:
json_file: the json with the properties to be validated
Returns:
True if the data passed validation, False otherwise
"""
current_dir = os.path.dirname(os.path.realpath(__file__))
with open(os.path.join(current_dir, 'knora-schema-properties-only.json')) as schema:
properties_schema = json.load(schema)

try:
jsonschema.validate(instance=json_file, schema=properties_schema)
except jsonschema.exceptions.ValidationError as err:
print(err)
return False
print('Properties data passed schema validation.')
return True


def properties_excel2json(excelfile: str, outfile: str):
"""
Converts properties described in an Excel file into a properties section which can be integrated into a DSP ontology
Args:
excelfile: path to the Excel file containing the properties
outfile: path to the output JSON file containing the properties section for the ontology
Returns:
None
"""
# load file
wb = load_workbook(filename=excelfile, read_only=True)
sheet = wb.worksheets[0]
props = [row_to_prop(row) for row in sheet.iter_rows(min_row=2, values_only=True, max_col=9)]

prefix = '"properties":'

if validate_properties_with_schema(json.loads(json.dumps(props, indent=4))):
# write final list to JSON file if list passed validation
with open(file=outfile, mode='w+', encoding='utf-8') as file:
file.write(prefix)
json.dump(props, file, indent=4)
print('Properties file was created successfully and written to file:', outfile)
else:
print('Properties data is not valid according to schema.')

return props


def row_to_prop(row):
"""
Parses the row of an Excel sheet and makes a property from it
Args:
row: the row of an Excel sheet
Returns:
prop (JSON): the property in JSON format
"""
name, super_, object_, en, de, fr, it, gui_element, hlist = row
labels = {}
if en:
labels['en'] = en
if de:
labels['de'] = de
if fr:
labels['fr'] = fr
if it:
labels['it'] = it
if not labels:
raise Exception(f"No label given in any of the four languages: {name}")
prop = {
'name': name,
'super': [super_],
'object': object_,
'labels': labels,
'gui_element': gui_element
}
if hlist:
prop['gui_attributes'] = {'hlist': hlist}
return prop

0 comments on commit 9f48e9a

Please sign in to comment.