Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(xmlupload): enable migration of resource creation date (DEV-1402) #238

Merged
merged 24 commits into from Oct 18, 2022
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 3 additions & 3 deletions Makefile
Expand Up @@ -51,7 +51,7 @@ install: ## install from source (runs setup.py)

.PHONY: test
test: dsp-stack ## run all tests
pytest test/
-pytest test/
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved
$(MAKE) stack-down

.PHONY: test-no-stack
Expand All @@ -60,7 +60,7 @@ test-no-stack: ## run tests without starting the stack (if a dsp-stack is alread

.PHONY: test-end-to-end
test-end-to-end: dsp-stack ## run e2e tests
pytest test/e2e/
-pytest test/e2e/
$(MAKE) stack-down

.PHONY: test-end-to-end-ci
Expand All @@ -77,7 +77,7 @@ test-unittests: ## run unit tests

.PHONY: clean
clean: ## clean local project directories
@rm -rf dist/ build/ site/ dsp_tools.egg-info/
@rm -rf dist/ build/ site/ dsp_tools.egg-info/ id2iri_*_mapping_*.json stashed_*_properties_*.txt

.PHONY: help
help: ## show this help
Expand Down
Binary file modified docs/assets/images/img-excel2xml.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5 changes: 4 additions & 1 deletion docs/dsp-tools-excel.md
Expand Up @@ -220,4 +220,7 @@ Some notes:

- The special tags `<annotation>`, `<link>`, and `<region>` are represented as resources of restype `Annotation`,
`LinkObj`, and `Region`.
- The columns "ark" and "iri" are only used for DaSCH-internal data migration.
- The columns "ark", "iri", and "creation_date" are only used for DaSCH-internal data migration.
- If `file` is provided, but no `file permissions`, an attempt will be started to deduce them from the resource
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved
permissions (`res-default` --> `prop-default` and `res-restricted` --> `prop-restricted`). If this attempt is not
successful, a `BaseError` will be raised.
20 changes: 12 additions & 8 deletions docs/dsp-tools-xmlupload.md
Expand Up @@ -201,14 +201,18 @@ To take `KnownUser` as example:

A `<resource>` element contains all necessary information to create a resource. It has the following attributes:

- `label`: a human-readable, preferably meaningful short name of the resource (required)
- `restype`: the resource type as defined within the ontology (required)
- `id`: a unique, arbitrary string providing a unique ID to the resource in order to be referencable by other resources;
the ID is only used during the import process and later replaced by the IRI used internally by DSP (required)
- `permissions`: a reference to a permission set; the permissions will be applied to the created resource (optional)
- `iri`: a custom IRI used when migrating existing resources (optional)
- `ark`: a version 0 ARK used when migrating existing resources from salsah.org to DSP (optional), it is not possible to
use `iri` and `ark` in the same resource. When `ark` is used, it overrides `iri`.
- `label` (required): a human-readable, preferably meaningful short name of the resource
- `restype` (required): the resource type as defined within the ontology
- `id` (required): a unique, arbitrary string providing a unique ID to the resource in order to be referencable by other
resources; the ID is only used during the import process and later replaced by the IRI used internally by DSP
- `permissions` (optional, but if omitted, users who are lower than a `ProjectAdmin` have no permissions at all, not
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved
even view rights): a reference to a permission set; the permissions will be applied to the created resource
- `iri` (optional): a custom IRI, used when migrating existing resources
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved
- `ark` (optional): a version 0 ARK, used when migrating existing resources from salsah.org to DSP. It is not possible
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved
to use `iri` and `ark` in the same resource. When `ark` is used, it overrides `iri`.
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved
- `creation_date` (optional): the creation date of the resource, used when migrating existing resources from salsah.org
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved
to DSP. It must be formatted according to the constraints of [xsd:dateTimeStamp](https://www.w3.org/TR/xmlschema11-2/#dateTimeStamp),
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved
which means that the timezone is required, e.g.: `2005-10-23T13:45:12.502951+02:00`
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved

A complete `<resource>` element may look as follows:

Expand Down
2 changes: 1 addition & 1 deletion docs/index.md
Expand Up @@ -20,7 +20,7 @@ dsp-tools helps you with the following tasks:
a DSP server and writes it into a JSON file.
- [`dsp-tools xmlupload`](./dsp-tools-usage.md#upload-data-to-a-dsp-server) uploads data from an XML file (bulk
data import) and writes the mapping from internal IDs to IRIs into a local file.
- [`dsp-tools excel`](./dsp-tools-usage.md#create-the-lists-section-of-a-json-project-file-from-excel-files)
- [`dsp-tools excel2lists`](./dsp-tools-usage.md#create-the-lists-section-of-a-json-project-file-from-excel-files)
creates the "lists" section of a JSON project file from one or several Excel files. The resulting section can be
integrated into a JSON project file and then be uploaded to a DSP server with `dsp-tools create`.
- [`dsp-tools excel2resources`](./dsp-tools-usage.md#create-the-resources-section-of-a-json-project-file-from-an-excel-file)
Expand Down
3 changes: 1 addition & 2 deletions knora/dsplib/models/helpers.py
Expand Up @@ -2,7 +2,6 @@
import sys
from dataclasses import dataclass
from enum import Enum, unique
from traceback import format_exc
from typing import NewType, Optional, Any, Tuple, Union, Pattern

from pystrict import strict
Expand Down Expand Up @@ -63,7 +62,7 @@ def __str__(self) -> str:
Convert to string
:return: stringyfied error message
"""
return self._message + "\n\n" + format_exc()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a clue what format_exc() was good for in the past, but I know that it causes problems. It was the source of this strange behaviour that I already had in the past: when a BaseError occurs while testing, Python prints an infinitely long stacktrace full of riddles, and then crashes.

I found out that I can just remove format_exc()

return self._message
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved

@property
def message(self) -> str:
Expand Down
12 changes: 12 additions & 0 deletions knora/dsplib/models/resource.py
Expand Up @@ -71,6 +71,7 @@ class ResourceInstance(Model):
_iri: Optional[str]
_ark: Optional[str]
_version_ark: Optional[str]
_creation_date: Optional[str]
_label: Optional[str]
_permissions: Optional[Permissions]
_user_permission: Optional[PermissionValue]
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved
Expand All @@ -82,6 +83,7 @@ def __init__(self,
iri: Optional[str] = None,
ark: Optional[str] = None,
version_ark: Optional[str] = None,
creation_date: Optional[str] = None,
label: Optional[str] = None,
permissions: Optional[Permissions] = None,
user_permission: Optional[PermissionValue] = None,
Expand All @@ -93,6 +95,7 @@ def __init__(self,
self._iri = iri
self._ark = ark
self._version_ark = version_ark
self._creation_date = creation_date
self._label = label
self._permissions = permissions
self._user_permission = user_permission
Expand Down Expand Up @@ -181,6 +184,10 @@ def iri(self) -> str:
def ark(self) -> str:
return self._ark

@property
def creation_date(self) -> str:
return self._creation_date

@property
def vark(self) -> str:
return self._version_ark
Expand Down Expand Up @@ -286,6 +293,11 @@ def toJsonLdObj(self, action: Actions) -> Any:
tmp[property_name] = value.toJsonLdObj(action)

tmp['@context'] = self.context
if self._creation_date:
tmp['knora-api:creationDate'] = {
'@type': 'xsd:dateTimeStamp',
'@value': self._creation_date
}
return tmp

def create(self) -> 'ResourceInstance':
Expand Down
9 changes: 8 additions & 1 deletion knora/dsplib/models/xmlresource.py
Expand Up @@ -2,10 +2,10 @@

from lxml import etree

from knora.dsplib.models.xmlbitstream import XMLBitstream
from knora.dsplib.models.helpers import BaseError
from knora.dsplib.models.permission import Permissions
from knora.dsplib.models.value import KnoraStandoffXml
from knora.dsplib.models.xmlbitstream import XMLBitstream
from knora.dsplib.models.xmlproperty import XMLProperty


Expand All @@ -18,6 +18,7 @@ class XMLResource:
_label: str
_restype: str
_permissions: Optional[str]
_creation_date: Optional[str]
_bitstream: Optional[XMLBitstream]
_properties: list[XMLProperty]

Expand All @@ -35,6 +36,7 @@ def __init__(self, node: etree.Element, default_ontology: str) -> None:
self._id = node.attrib['id']
self._iri = node.attrib.get('iri')
self._ark = node.attrib.get('ark')
self._creation_date = node.attrib.get('creation_date')
self._label = node.attrib['label']
# get the resource type which is in format namespace:resourcetype, p.ex. rosetta:Image
tmp_res_type = node.attrib['restype'].split(':')
Expand Down Expand Up @@ -74,6 +76,11 @@ def ark(self) -> Optional[str]:
"""The custom ARK of the resource"""
return self._ark

@property
def creation_date(self) -> Optional[str]:
"""The creation date of the resource"""
return self._creation_date

@property
def label(self) -> str:
"""The label of the resource"""
Expand Down
12 changes: 11 additions & 1 deletion knora/dsplib/schemas/data.xsd
Expand Up @@ -410,9 +410,10 @@
<xs:attribute name="label" type="xs:string" use="required"/>
<xs:attribute name="restype" type="xs:string" use="required"/>
<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attribute name="iri" type="xs:string" use="optional"/>
<xs:attribute name="permissions" type="xs:NCName" use="optional"/>
<xs:attribute name="iri" type="xs:string" use="optional"/>
<xs:attribute name="ark" type="xs:string" use="optional"/>
<xs:attribute name="creation_date" type="xs:dateTime" use="optional"/>
</xs:complexType>

<!-- annotation tag -->
Expand All @@ -424,6 +425,9 @@
<xs:attribute name="label" type="xs:string" use="required"/>
<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attribute name="permissions" type="xs:NCName" use="optional"/>
<xs:attribute name="iri" type="xs:string" use="optional"/>
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved
<xs:attribute name="ark" type="xs:string" use="optional"/>
<xs:attribute name="creation_date" type="xs:dateTime" use="optional"/>
</xs:complexType>

<!-- region tag -->
Expand All @@ -437,6 +441,9 @@
<xs:attribute name="label" type="xs:string" use="required"/>
<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attribute name="permissions" type="xs:NCName" use="optional"/>
<xs:attribute name="iri" type="xs:string" use="optional"/>
<xs:attribute name="ark" type="xs:string" use="optional"/>
<xs:attribute name="creation_date" type="xs:dateTime" use="optional"/>
</xs:complexType>

<!-- link tag -->
Expand All @@ -448,6 +455,9 @@
<xs:attribute name="label" type="xs:string" use="required"/>
<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attribute name="permissions" type="xs:NCName" use="optional"/>
<xs:attribute name="iri" type="xs:string" use="optional"/>
<xs:attribute name="ark" type="xs:string" use="optional"/>
<xs:attribute name="creation_date" type="xs:dateTime" use="optional"/>
</xs:complexType>

<!-- data type for knora shortcode -->
Expand Down
22 changes: 22 additions & 0 deletions knora/dsplib/utils/validation.py
@@ -0,0 +1,22 @@
import regex
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved

from knora.dsplib.models.helpers import BaseError


def validate_resource_creation_date(creation_date: str, err_msg: str) -> None:
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved
"""
Checks if creation_date is a valid https://www.w3.org/TR/xmlschema11-2/#dateTimeStamp.

Args:
creation_date: the attribute "creation_date" from the <resource> tag in the XML

Returns:
None if validation passes. Raises a BaseError if validation fails.
"""
_regex = r"-?([1-9][0-9]{3,}|0[0-9]{3})" \
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved
r"-(0[1-9]|1[0-2])" \
r"-(0[1-9]|[12][0-9]|3[01])" \
r"T(([01][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9](\.[0-9]+)?|(24:00:00(\.0+)?))" \
r"(Z|(\+|-)((0[0-9]|1[0-3]):[0-5][0-9]|14:00))"
if not regex.search(_regex, creation_date):
raise BaseError(err_msg)
99 changes: 72 additions & 27 deletions knora/dsplib/utils/xml_upload.py
Expand Up @@ -24,6 +24,7 @@
from knora.dsplib.models.xmlproperty import XMLProperty
from knora.dsplib.models.xmlresource import XMLResource
from knora.dsplib.utils.shared import try_network_action, validate_xml_against_schema
from knora.dsplib.utils.validation import validate_resource_creation_date


def _remove_circular_references(resources: list[XMLResource], verbose: bool) -> \
Expand Down Expand Up @@ -221,6 +222,68 @@ def _parse_xml_file(input_file: str) -> etree.ElementTree:
return tree


def _perform_deep_inspection(
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved
resources: list[XMLResource],
resclass_name_2_type: dict[str, type],
verbose: bool = False
) -> None:
"""
Inspects a list of resources more thoroughly than an XSD validation can do. Mainly, it checks if the resource types
and properties in the XML are consistent with the ontology.
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved

Args:
resources: a list of parsed XMLResources
resclass_name_2_type: infos about the resource classes that exist on the DSP server for the current ontology
verbose: verbose switch

Returns:
None if everything went well. Raises a BaseError if there is a problem.
"""
if verbose:
print("Perform a deep inspection of your XML file...")
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved
knora_properties = resclass_name_2_type[resources[0].restype].knora_properties
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved

for resource in resources:

# check that the resource type is consistent with the ontology
if resource.restype not in resclass_name_2_type:
raise BaseError(
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved
f"=========================\n"
f"ERROR: Resource '{resource.label}' (ID: {resource.id}) has an invalid resource type "
f"'{resource.restype}'. Is your syntax correct? Remember the rules:\n"
f" - DSP-API internals: <resource restype=\"restype\"> "
f"(will be interpreted as 'knora-api:restype')\n"
f" - current ontology: <resource restype=\":restype\"> "
f"('restype' must be defined in the 'resources' section of your ontology)\n"
f" - other ontology: <resource restype=\"other:restype\"> "
f"(not yet implemented: 'other' must be defined in the same JSON project file than your ontology)"
)

# validate attribute 'creation_date' of <resource>
if resource.creation_date:
validate_resource_creation_date(resource.creation_date,
f"The resource '{resource.label}' (ID: {resource.id}) has an invalid "
f"creation date. Did you perhaps forget the timezone?")

# check that the property types are consistent with the ontology
resource_properties = resclass_name_2_type[resource.restype].properties.keys()
for propname in [prop.name for prop in resource.properties]:
if propname not in knora_properties and propname not in resource_properties:
raise BaseError(
f"=========================\n"
f"ERROR: Resource '{resource.label}' (ID: {resource.id}) has an invalid property '{propname}'. "
f"Is your syntax correct? Remember the rules:\n"
f" - DSP-API internals: <text-prop name=\"propname\"> "
f"(will be interpreted as 'knora-api:propname')\n"
f" - current ontology: <text-prop name=\":propname\"> "
f"('propname' must be defined in the 'properties' section of your ontology)\n"
f" - other ontology: <text-prop name=\"other:propname\"> "
f"(not yet implemented: 'other' must be defined in the same JSON project file than your ontology)"
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved
)

print("Deep inspection of your XML file successfully finished.")
jnussbaum marked this conversation as resolved.
Show resolved Hide resolved


def xml_upload(input_file: str, server: str, user: str, password: str, imgdir: str, sipi: str, verbose: bool,
incremental: bool) -> bool:
"""
Expand Down Expand Up @@ -256,11 +319,11 @@ def xml_upload(input_file: str, server: str, user: str, password: str, imgdir: s
action=lambda: ProjectContext(con=con))
sipi_server = Sipi(sipi, con.get_token())

# parse the XML file
tree = _parse_xml_file(input_file)
root = tree.getroot()
default_ontology = root.attrib['default-ontology']
shortcode = root.attrib['shortcode']

resources: list[XMLResource] = []
permissions: dict[str, XmlPermission] = {}
for child in root:
Expand All @@ -271,35 +334,16 @@ def xml_upload(input_file: str, server: str, user: str, password: str, imgdir: s
resources.append(XMLResource(child, default_ontology))

# get the project information and project ontology from the server
res_inst_factory = ResourceInstanceFactory(con, shortcode)
res_inst_factory = try_network_action("", lambda: ResourceInstanceFactory(con, shortcode))
permissions_lookup: dict[str, Permissions] = {s: perm.get_permission_instance() for s, perm in permissions.items()}
resclass_name_2_type: dict[str, type] = {s: res_inst_factory.get_resclass_type(s) for s in res_inst_factory.get_resclass_names()}

# check if the data in the XML is consistent with the ontology
if verbose:
print("Check if the resource types and properties in your XML are consistent with the ontology...")
knora_properties = resclass_name_2_type[resources[0].restype].knora_properties
for resource in resources:
if resource.restype not in resclass_name_2_type:
print(f"=========================\n"
f"ERROR: Resource '{resource.label}' (ID: {resource.id}) has an invalid resource type "
f"'{resource.restype}'. Is your syntax correct? Remember the rules:\n"
f" - DSP-API internals: <resource restype=\"restype\"> (will be interpreted as 'knora-api:restype')\n"
f" - current ontology: <resource restype=\":restype\"> ('restype' must be defined in the 'resources' section of your ontology)\n"
f" - other ontology: <resource restype=\"other:restype\"> (not yet implemented: 'other' must be defined in the same JSON project file than your ontology)")
exit(1)
resource_properties = resclass_name_2_type[resource.restype].properties.keys()
for propname in [prop.name for prop in resource.properties]:
if propname not in knora_properties and propname not in resource_properties:
print(f"=========================\n"
f"ERROR: Resource '{resource.label}' (ID: {resource.id}) has an invalid property '{propname}'. "
f"Is your syntax correct? Remember the rules:\n"
f" - DSP-API internals: <text-prop name=\"propname\"> (will be interpreted as 'knora-api:propname')\n"
f" - current ontology: <text-prop name=\":propname\"> ('propname' must be defined in the 'properties' section of your ontology)\n"
f" - other ontology: <text-prop name=\"other:propname\"> (not yet implemented: 'other' must be defined in the same JSON project file than your ontology)")
exit(1)

print("The resource types and properties in your XML are consistent with the ontology.")
_perform_deep_inspection(
resources=resources,
resclass_name_2_type=resclass_name_2_type,
verbose=verbose
)

# temporarily remove circular references, but only if not an incremental upload
if not incremental:
Expand All @@ -308,9 +352,9 @@ def xml_upload(input_file: str, server: str, user: str, password: str, imgdir: s
stashed_xml_texts = dict()
stashed_resptr_props = dict()

# upload all resources
id2iri_mapping: dict[str, str] = {}
failed_uploads: list[str] = []

try:
id2iri_mapping, failed_uploads = _upload_resources(resources, imgdir, sipi_server, permissions_lookup,
resclass_name_2_type, id2iri_mapping, con, failed_uploads)
Expand Down Expand Up @@ -423,6 +467,7 @@ def _upload_resources(
label=resource.label,
iri=resource_iri,
permissions=permissions_lookup.get(resource.permissions),
creation_date=resource.creation_date,
bitstream=resource_bitstream,
values=properties
),
Expand Down