Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(excel2resources, excel2properties): cover all cases (DEV-1040) #201

Merged
merged 28 commits into from Jun 23, 2022
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
78f2a3f
fix bugs
jnussbaum Jun 15, 2022
fee1074
PR-title: allow commas and numbers in scope
jnussbaum Jun 15, 2022
ae8cd88
fix DeprecationWarning: The metaschema specified by $schema was not f…
jnussbaum Jun 16, 2022
f9eac36
allow multiple superclasses for properties
jnussbaum Jun 16, 2022
397cdd7
remove old entry from MANIFEST.in (forgotten in an older PR)
jnussbaum Jun 16, 2022
93eb58f
- cover more cases in the test data
jnussbaum Jun 16, 2022
5d8b6da
replace the too generic "Exception" by "ValueError"
jnussbaum Jun 16, 2022
a1fe43a
improve checks if row is non-empty
jnussbaum Jun 20, 2022
e0249db
unittests: check contents of outfile, not only if it exists
jnussbaum Jun 20, 2022
606a921
add Romansh to docs, testdata, and templates
jnussbaum Jun 20, 2022
44b951a
move pandas from dev-packages to packages
jnussbaum Jun 20, 2022
6bfeb1b
include äöü
jnussbaum Jun 20, 2022
66fb7fc
Merge branch 'main' into wip/dev-1040-improve-excel2resources-excel2p…
jnussbaum Jun 20, 2022
12985de
improve docs
jnussbaum Jun 21, 2022
6bb6917
refactor tests
jnussbaum Jun 21, 2022
a2553cb
remove superfluous e2e tests
jnussbaum Jun 21, 2022
2579d1c
improve test data
jnussbaum Jun 21, 2022
82c1708
use pandas instead of openpyxl
jnussbaum Jun 21, 2022
b8a71c7
move pandas from dev-packages to packages (also in requirements.txt a…
jnussbaum Jun 21, 2022
4fc05ab
reduce number of code smells
jnussbaum Jun 21, 2022
9364036
apply reviewer's feedback
jnussbaum Jun 21, 2022
2512729
harmonize single quotes/double quotes
jnussbaum Jun 21, 2022
998909f
break too long lines
jnussbaum Jun 21, 2022
5443d61
prevent empty labels or comments
jnussbaum Jun 22, 2022
8e4ca89
shorten testdata files
jnussbaum Jun 22, 2022
cb3933c
add a unit test
jnussbaum Jun 23, 2022
aa25c88
reduce number of code smells
jnussbaum Jun 23, 2022
8b77cce
- restore test_excel_to_json_resources() and test_excel_to_json_prop…
jnussbaum Jun 23, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Expand Up @@ -12,7 +12,7 @@ jobs:
# check PR title
- uses: deepakputhraya/action-pr-title@master
with:
regex: '([a-z])+(\(([a-z\-_ ])+\))?!?: [a-z]([a-zA-Z-\.\d \(\)\[\]#_,])+$' # Regex the title should match.
regex: '([a-z])+(\(([0-9a-z\-_, ])+\))?!?: [a-z]([a-zA-Z-\.\d \(\)\[\]#_,])+$' # Regex the title should match.
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved
allowed_prefixes: "fix,refactor,feat,docs,chore,style,test" # title should start with the given prefix
disallowed_prefixes: "feature,hotfix" # title should not start with the given prefix
prefix_case_sensitive: true # title prefix are case insensitive
Expand Down
1 change: 0 additions & 1 deletion MANIFEST.in
Expand Up @@ -4,4 +4,3 @@ include knora/dsplib/schemas/lists-only.json
include knora/dsplib/schemas/resources-only.json
include knora/dsplib/schemas/properties-only.json
include knora/dsplib/schemas/data.xsd
include knora/dsplib/utils/language-codes-3b2_csv.csv
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved
Binary file modified docs/assets/images/img-properties-example.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/assets/templates/properties_template.xlsx
Binary file not shown.
23 changes: 12 additions & 11 deletions docs/dsp-tools-excel.md
Expand Up @@ -27,18 +27,18 @@ The worksheet called `classes` has the following structure:

The expected columns are:

- `name` : The name of the resource
- `en`, `de`, `fr`, `it` : The labels of the resource in different languages, at least one language has to be provided
- `name`: The name of the resource
- `en`, `de`, `fr`, `it`: The labels of the resource in different languages, at least one language has to be provided
- `comment_en`, `comment_de`, `comment_fr`, `comment_it`: optional comments in the respective language
- `super` : The base class of the resource
- `super`: The base class of the resource

All other worksheets, one for each resource class, have the following structure:
![img-resources-example-2.png](assets/images/img-resources-example-2.png){ width=50% }

The expected columns are:

- `Property` : The name of the property
- `Cardinality` : The cardinality, one of: `1`, `0-1`, `1-n`, `0-n`
- `Property`: The name of the property
- `Cardinality`: The cardinality, one of: `1`, `0-1`, `1-n`, `0-n`

The GUI order is given by the order in which the properties are listed in the Excel sheet.

Expand All @@ -58,15 +58,16 @@ The Excel sheet must have the following structure:

The expected columns are:

- `name` : The name of the property
- `super` : The base property of the property
- `object` : If the property is derived from `hasValue`, the type of the property must be further specified by the
- `name`: The name of the property
- `super`: The base property/ies of the property, separated by commas
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved
- `object`: If the property is derived from `hasValue`, the type of the property must be further specified by the
object it takes, e.g. `TextValue`, `ListValue`, or `IntValue`. If the property is derived from `hasLinkTo`,
the `object` specifies the resource class that this property refers to.
- `en`, `de`, `fr`, `it` : The labels of the property in different languages, at least one language has to be provided
- `en`, `de`, `fr`, `it`: The labels of the property in different languages, at least one language has to be provided
- `comment_en`, `comment_de`, `comment_fr`, `comment_it`: optional comments in the respective language
- `gui_element` : The GUI element for the property
- `hlist` : In case of list values: the name of the list
- `gui_element`: The GUI element for the property
- `gui_attributes` (optional): The gui_attributes in the form "attr: value, attr: value". In case of ListValues, the
name of the list can be given as "hlist: listname" (according to the pattern), or simply as "listname".
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved

For further information about properties, see [here](./dsp-tools-create-ontologies.md#properties).

Expand Down
2 changes: 1 addition & 1 deletion knora/dsplib/schemas/lists-only.json
@@ -1,5 +1,5 @@
{
"$schema": "https://json-schema.org/draft-07/schema",
"$schema": "http://json-schema.org/draft-07/schema#",
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved
"$id": "https://dasch.swiss/schema/lists.json",
"title": "JSON schema for DSP lists",
"description": "JSON schema for the lists section used in DSP ontologies",
Expand Down
2 changes: 1 addition & 1 deletion knora/dsplib/schemas/ontology.json
@@ -1,5 +1,5 @@
{
"$schema": "https://json-schema.org/draft-07/schema",
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://dasch.swiss/schema/ontology.json",
"title": "JSON schema for DSP ontologies",
"description": "JSON schema for DSP ontologies",
Expand Down
3 changes: 2 additions & 1 deletion knora/dsplib/schemas/properties-only.json
@@ -1,5 +1,5 @@
{
"$schema": "https://json-schema.org/draft-07/schema",
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://dasch.swiss/schema/properties-only.json",
"title": "JSON schema for properties used in DSP ontologies",
"description": "JSON schema for the properties section used in DSP ontologies",
Expand Down Expand Up @@ -82,6 +82,7 @@
"ListValue",
"Region",
"Resource",
"Representation",
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved
"Annotation"
]
},
Expand Down
2 changes: 1 addition & 1 deletion knora/dsplib/schemas/resources-only.json
@@ -1,5 +1,5 @@
{
"$schema": "https://json-schema.org/draft-07/schema",
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://dasch.swiss/schema/resources-only.json",
"title": "JSON schema for resources used in DSP ontologies",
"description": "JSON schema for the resources section used in DSP ontologies",
Expand Down
52 changes: 35 additions & 17 deletions knora/dsplib/utils/excel_to_json_properties.py
@@ -1,5 +1,6 @@
import json
import os
import re
from typing import Any

import jsonschema
Expand Down Expand Up @@ -44,7 +45,8 @@ def properties_excel2json(excelfile: str, outfile: str) -> list[dict[str, Any]]:
# load file
wb = load_workbook(filename=excelfile, read_only=True)
sheet = wb.worksheets[0]
props = [row_to_prop(row) for row in sheet.iter_rows(min_row=2, values_only=True, max_col=13)]
props = [row_to_prop(row) for row in sheet.iter_rows(min_row=2, values_only=True, max_col=13)
if any(row) and any([re.search(r'[A-Za-z]+', elem) for elem in row if isinstance(elem, str)])]
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved

prefix = '"properties":'

Expand All @@ -70,35 +72,51 @@ def row_to_prop(row: tuple[str, str, str, str, str, str, str, str, str, str, str
Returns:
prop (JSON): the property in JSON format
"""
name, super_, object_, en, de, fr, it, comment_en, comment_de, comment_fr, comment_it, gui_element, hlist = row
name, super_, object_, en, de, fr, it, comment_en, comment_de, comment_fr, comment_it, gui_element, gui_attributes = row
labels = {}
if en:
labels['en'] = en
labels['en'] = en.strip()
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved
if de:
labels['de'] = de
labels['de'] = de.strip()
if fr:
labels['fr'] = fr
labels['fr'] = fr.strip()
if it:
labels['it'] = it
labels['it'] = it.strip()
if not labels:
raise Exception(f"No label given in any of the four languages: {name}")
raise ValueError(f"No label given in any of the four languages: {name}")
comments = {}
if comment_en:
comments['en'] = comment_en
comments['en'] = comment_en.strip()
if comment_de:
comments['de'] = comment_de
comments['de'] = comment_de.strip()
if comment_fr:
comments['fr'] = comment_fr
comments['fr'] = comment_fr.strip()
if comment_it:
comments['it'] = comment_it
comments['it'] = comment_it.strip()
prop = {
'name': name,
'super': [super_],
'object': object_,
'name': name.strip(),
'super': [elem.strip() for elem in super_.split(',')],
'object': object_.strip(),
'labels': labels,
'comments': comments,
'gui_element': gui_element
'gui_element': gui_element.strip()
}
if hlist:
prop['gui_attributes'] = {'hlist': hlist}
if gui_attributes:
attr_list = [x.strip() for x in gui_attributes.split(',')]
attr_dict = dict()
for elem in attr_list:
if ':' in elem:
attr, val = [x.strip() for x in elem.split(':', maxsplit=1)]
if re.search(r'\d+\.\d+', val):
val = float(val)
elif re.search(r'\d+', val):
val = int(val)
attr_dict.update({attr: val})
elif object_.strip() == 'ListValue':
attr_dict.update({'hlist': elem})
else:
raise ValueError(f'gui_attribute must be of the form "attr: value", except for ListValues, where the '
f'simple name of the list is allowed. But the property "{name}", which is not a list, '
f'has a gui_attribute that does not contain a colon.')
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved
prop['gui_attributes'] = attr_dict
return prop
32 changes: 17 additions & 15 deletions knora/dsplib/utils/excel_to_json_resources.py
@@ -1,5 +1,6 @@
import json
import os
import re
from typing import Any

import jsonschema
Expand All @@ -16,7 +17,6 @@ def validate_resources_with_schema(json_file: str) -> bool:

Returns:
True if the data passed validation, False otherwise

"""
current_dir = os.path.dirname(os.path.realpath(__file__))
with open(os.path.join(current_dir, '../schemas/resources-only.json')) as schema:
Expand Down Expand Up @@ -48,7 +48,8 @@ def resources_excel2json(excelfile: str, outfile: str) -> None:

# get overview
sheet = wb['classes']
resource_list = [c for c in sheet.iter_rows(min_row=2, values_only=True)]
resource_list = [c for c in sheet.iter_rows(min_row=2, values_only=True)
if any(c) and any([re.search(r'[A-Za-z]+', elem) for elem in c if isinstance(elem, str)])]
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved

prefix = '"resources":'
resources = [_extract_row(res, wb) for res in resource_list]
Expand All @@ -66,42 +67,43 @@ def resources_excel2json(excelfile: str, outfile: str) -> None:
def _extract_row(row: tuple[str, str, str, str, str, str, str, str, str, str], wb: Workbook) -> dict[str, Any]:
"""build a property dict from a row of the excel file"""
# get name
name = row[0]
name = row[0].strip()
# get labels
labels = {}
if row[1]:
labels['en'] = row[1]
labels['en'] = row[1].strip()
if row[2]:
labels['de'] = row[2]
labels['de'] = row[2].strip()
if row[3]:
labels['fr'] = row[3]
labels['fr'] = row[3].strip()
if row[4]:
labels['it'] = row[4]
labels['it'] = row[4].strip()
# get comments
comments = {}
if row[5]:
comments['en'] = row[5]
comments['en'] = row[5].strip()
if row[6]:
comments['de'] = row[6]
comments['de'] = row[6].strip()
if row[7]:
comments['fr'] = row[7]
comments['fr'] = row[7].strip()
if row[8]:
comments['it'] = row[8]
comments['it'] = row[8].strip()
# get super
sup = row[9]
sup = row[9].strip()

# load details for this resource
sh = wb[name]
property_list = [c for c in sh.iter_rows(min_row=2, values_only=True)]
property_list = [c for c in sh.iter_rows(min_row=2, values_only=True)
if any(c) and any([re.search(r'[A-Za-z]+', elem) for elem in c if isinstance(elem, str)])]

cards = []
# for each of the detail sheets
for i, prop in enumerate(property_list):
# get name and cardinality.
# GUI-order is equal to order in the sheet.
property_ = {
"propname": ":" + prop[0],
"cardinality": str(prop[1]),
"propname": ":" + prop[0].strip(),
"cardinality": str(prop[1]).lower().strip(),
"gui_order": i + 1
}
cards.append(property_)
Expand Down
63 changes: 63 additions & 0 deletions test/unittests/test_excel_to_properties.py
@@ -0,0 +1,63 @@
"""unit tests for excel to properties"""
import os
import unittest

from openpyxl import Workbook

from knora.dsplib.utils import excel_to_json_properties as e2j


class TestExcelToProperties(unittest.TestCase):

def setUp(self) -> None:
"""Is executed before each test"""
os.makedirs('testdata/tmp', exist_ok=True)

def test_excel2json(self) -> None:
excelfile = "testdata/Properties.xlsx"
outfile = "testdata/tmp/_out_properties.json"
e2j.properties_excel2json(excelfile, outfile)
self.assertTrue(os.path.exists(outfile))

def test_row_to_prop(self) -> None:
wb = Workbook()
ws = wb.create_sheet("Tabelle1")
row = (
"hasAnthroponym ",
" hasValue, dcterms:creator ",
" TextValue ",
"anthroponym",
"",
"Anthroponyme",
"",
" A strange chance put me in possession of this journal. ",
"",
"",
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved
"",
" Richtext ",
""
)
for i, c in enumerate(row):
ws.cell(row=2, column=i+1, value=c)
properties_dict = e2j.row_to_prop(row)
expected_dict = {
"name": "hasAnthroponym",
"super": [
"hasValue",
"dcterms:creator"
],
"object": "TextValue",
"labels": {
"en": "anthroponym",
"fr": "Anthroponyme",
},
"comments": {
"en": "A strange chance put me in possession of this journal."
},
"gui_element": "Richtext"
}
self.assertDictEqual(properties_dict, expected_dict)


if __name__ == '__main__':
unittest.main()
Binary file modified testdata/Properties.xlsx
Binary file not shown.
Binary file modified testdata/Resources.xlsx
Binary file not shown.