diff --git a/Makefile b/Makefile index b1b51f7fb..0a9ca9cb1 100644 --- a/Makefile +++ b/Makefile @@ -53,6 +53,7 @@ install-requirements: ## install requirements .PHONY: install install: ## install from source (runs setup.py) + python3 -m pip install --upgrade pip pip3 install . .PHONY: test diff --git a/docs/dsp-tools-usage.md b/docs/dsp-tools-usage.md index c727d8469..6cdadea3b 100644 --- a/docs/dsp-tools-usage.md +++ b/docs/dsp-tools-usage.md @@ -82,6 +82,7 @@ The following options are available: - `-p` | `--password` _password_: password used for authentication with the DSP API (default: test) - `-i` | `--imgdir` _dirpath_: path to the directory where the bitstream objects are stored (default: .) - `-S` | `--sipi` _SIPIserver_: URL of the SIPI IIIF server (default: http://0.0.0.0:1024) +- `-I` | `--incremental` : If set, IRIs instead of internal IDs are expected as reference to already existing resources on DSP - `-v` | `--verbose`: If set, more information about the uploaded resources is printed to the console. The command is used to upload data defined in an XML file onto a DSP server. The following example shows how to upload @@ -96,6 +97,13 @@ dsp-tools xmlupload -s https://api.dsl.server.org -u root@example.com -p test -S The description of the expected XML format can be found [here](./dsp-tools-xmlupload.md). +An internal ID is used in the `` tag of an XML file used for `xmlupload` to reference resources inside the same +XML file. Once data is uploaded to DSP it cannot be referenced by this internal ID anymore. Instead, the resource's IRI +has to be used. The mapping of internal IDs to their respective IRIs is written to a file +called `id2iri_mapping_[timstamp].json` after a successful `xmlupload`. +See [`dsp-tools id2iri`](./dsp-tools-usage.md#replace-internal-ids-with-iris-in-xml-file) for more information about how +to use this file to replace internal IDs in an existing XML file to reference existing resources. + ## Create a JSON list file from one or several Excel files ```bash @@ -161,3 +169,23 @@ dsp-tools excel2properties Properties.xlsx properties.json More information about the usage of this command can be found [here](./dsp-tools-excel.md#create-the-properties-for-a-data-model-from-an-excel-file) . + +## Replace internal IDs with IRIs in XML file + +```bash +dsp-tools id2iri xml_file.xml mapping_file.json --outfile xml_out_file.xml +``` + +When uploading data with `dsp-tools xmlupload` an internal ID is used in the `` tag of the XML file to reference +resources inside the same XML file. Once data is uploaded to DSP it cannot be referenced by this internal ID anymore. +Instead, the resource's IRI has to be used. + +With `dsp-tools id2iri` internal IDs can be replaced with their corresponding IRIs within a provided XML. The output is +written to a new XML file called `id2iri_replaced_[timestamp].xml` (the file path and name can be overwritten with +option `--outfile`). If all internal IDs were replaced, the newly created XML can be used +with `dsp-tools xmlupload --incremental id2iri_replaced_20211026_120247263754.xml` to upload the data. + +Note that internal IDs and IRIs cannot be mixed. The input XML file has to be provided as well as the JSON file which +contains the mapping from internal IDs to IRIs. This JSON file is generated after each successful `xmlupload`. + +In order to upload data incrementally the procedure described [here](dsp-tools-xmlupload.md#incremental-xml-upload) is recommended. diff --git a/docs/dsp-tools-xmlupload.md b/docs/dsp-tools-xmlupload.md index 3114679a9..d9b95aec9 100644 --- a/docs/dsp-tools-xmlupload.md +++ b/docs/dsp-tools-xmlupload.md @@ -3,7 +3,9 @@ # DSP XML file format for importing data With dsp-tools data can be imported into a DSP repository (on a DSP server) from an XML file. The import file is a -standard XML file as described on this page. +standard XML file as described on this page. After a successful upload of the data, an output file is written (called +`id2iri_mapping_[timstamp].json`) with the mapping of internal IDs used inside the XML and their corresponding IRIs which +uniquely identify them inside DSP. This file should be kept if data is later added with the `--incremental` [option](#incremental-xml-upload). The import file must start with the standard XML header: @@ -578,7 +580,9 @@ Attributes: #### `` -The `` element contains the internal ID of another resource. +The `` element contains either the internal ID of another resource inside the XML or the IRI of an already +existing resource on DSP. Inside the same XML file a mixture of the two is not possible. If referencing existing +resources, `xmlupload --incremental` has to be used. Attributes: @@ -587,8 +591,8 @@ Attributes: Example: -If there is a resource defined as `...`, -it can be referenced as: +If there is a resource defined as `...`, it can +be referenced as: ```xml @@ -712,6 +716,24 @@ Example: ``` +## Incremental XML Upload + +After a successful upload of the data, an output file is written (called `id2iri_mapping_[timstamp].json`) with the +mapping of internal IDs used inside the XML and their corresponding IRIs which uniquely identify them inside DSP. This +file should be kept if data is later added with the `--incremental` option. + +To do an incremental XML upload, one of the following procedures is recommended. + +- Incremental XML upload with use of internal IDs: + +1. Initial XML upload with internal IDs. +2. The file `id2iri_mapping_[timestamp].json` is created. +3. Create new XML file(s) with resources referencing other resources by their internal IDs in `` (using the same IDs as in the initial XML upload). +4. Run `dsp-tools id2iri new_data.xml id2iri_mapping_[timestamp].json` to replace the internal IDs in `new_data.xml` with IRIs. Only internal IDs inside the `` tag are replaced. +5. Run `dsp-tools xmlupload --incremental new_data.xml` to upload the data to DSP. + +- Incremental XML Upload with the use of IRIs: Use IRIs in the XML to reference existing data on the DSP server. + ## Complete example ```xml diff --git a/docs/index.md b/docs/index.md index f995b2d5e..d4aacc24d 100644 --- a/docs/index.md +++ b/docs/index.md @@ -19,7 +19,7 @@ dsp-tools helps you with the following tasks: - [`dsp-tools get`](./dsp-tools-usage.md#get-a-data-model-from-a-dsp-server) reads a data model from a DSP server and writes it into a JSON file. - [`dsp-tools xmlupload`](./dsp-tools-usage.md#upload-data-to-a-dsp-server) uploads data from a provided XML file (bulk - data import). + data import) and writes the mapping from internal IDs to IRIs into a local file. - [`dsp-tools excel`](./dsp-tools-usage.md#create-a-json-list-file-from-one-or-several-excel-files) creates a JSON or XML file from one or several Excel files. The created data can either be integrated into an ontology or be uploaded directly to a DSP server with `dsp-tools create`. @@ -29,4 +29,7 @@ dsp-tools helps you with the following tasks: - [`dsp-tools excel2properties`](./dsp-tools-usage.md#create-properties-from-an-excel-file) creates the ontology's properties section from an Excel file. The resulting section can be integrated into an ontology and then be uploaded to a DSP server with `dsp-tools create`. +- [`dsp-tools id2iri`](./dsp-tools-usage.md#replace-internal-ids-with-iris-in-xml-file) + takes an XML file for bulk data import and replaces referenced internal IDs with IRIs. The mapping has to be provided + with a JSON file. diff --git a/knora/dsp_tools.py b/knora/dsp_tools.py index e7ccd6522..8038c2886 100644 --- a/knora/dsp_tools.py +++ b/knora/dsp_tools.py @@ -9,6 +9,7 @@ from knora.dsplib.utils.excel_to_json_lists import list_excel2json, validate_list_with_schema from knora.dsplib.utils.excel_to_json_properties import properties_excel2json from knora.dsplib.utils.excel_to_json_resources import resources_excel2json +from knora.dsplib.utils.id_to_iri import id_to_iri from knora.dsplib.utils.onto_create_lists import create_lists from knora.dsplib.utils.onto_create_ontology import create_ontology from knora.dsplib.utils.onto_get import get_ontology @@ -76,6 +77,7 @@ def program(user_args: list[str]) -> None: parser_upload.add_argument('-i', '--imgdir', type=str, default='.', help='Path to folder containing the images') parser_upload.add_argument('-S', '--sipi', type=str, default='http://0.0.0.0:1024', help='URL of SIPI server') parser_upload.add_argument('-v', '--verbose', action='store_true', help='Verbose feedback') + parser_upload.add_argument('-I', '--incremental', action='store_true', help='Incremental XML upload') parser_upload.add_argument('xmlfile', help='path to xml file containing the data', default='data.xml') parser_excel_lists = subparsers.add_parser('excel', @@ -113,6 +115,14 @@ def program(user_args: list[str]) -> None: parser_excel_properties.add_argument('outfile', help='Path to the output JSON file containing the properties data', default='properties.json') + parser_id2iri = subparsers.add_parser('id2iri', + help='Replace internal IDs in an XML with their corresponding IRIs from a provided JSON file.') + parser_id2iri.set_defaults(action='id2iri') + parser_id2iri.add_argument('xmlfile', help='Path to the XML file containing the data to be replaced') + parser_id2iri.add_argument('jsonfile', help='Path to the JSON file containing the mapping of internal IDs and their respective IRIs') + parser_id2iri.add_argument('--outfile', default=None, help='Path to the XML output file containing the replaced IDs (optional)') + parser_id2iri.add_argument('-v', '--verbose', action='store_true', help='Verbose feedback') + args = parser.parse_args(user_args) if not hasattr(args, 'action'): @@ -160,7 +170,8 @@ def program(user_args: list[str]) -> None: imgdir=args.imgdir, sipi=args.sipi, verbose=args.verbose, - validate_only=args.validate) + validate_only=args.validate, + incremental=args.incremental) elif args.action == 'excel': list_excel2json(listname=args.listname, excelfolder=args.excelfolder, @@ -171,6 +182,11 @@ def program(user_args: list[str]) -> None: elif args.action == 'excel2properties': properties_excel2json(excelfile=args.excelfile, outfile=args.outfile) + elif args.action == 'id2iri': + id_to_iri(xml_file=args.xmlfile, + json_file=args.jsonfile, + out_file=args.outfile, + verbose=args.verbose) def main() -> None: diff --git a/knora/dsplib/utils/BUILD.bazel b/knora/dsplib/utils/BUILD.bazel index 3cf5f706c..816045d76 100644 --- a/knora/dsplib/utils/BUILD.bazel +++ b/knora/dsplib/utils/BUILD.bazel @@ -124,3 +124,12 @@ py_library( imports = [".", ".."], ) +py_library( + name = "id_to_iri", + visibility = ["//visibility:public"], + srcs = ["id_to_iri.py"], + deps = [ + requirement("lxml") + ] +) + diff --git a/knora/dsplib/utils/id_to_iri.py b/knora/dsplib/utils/id_to_iri.py new file mode 100644 index 000000000..88e0b6230 --- /dev/null +++ b/knora/dsplib/utils/id_to_iri.py @@ -0,0 +1,80 @@ +""" +This module handles the replacement of internal IDs with their corresponding IRIs from DSP. +""" +import json +import os +from datetime import datetime +from pathlib import Path + +from lxml import etree + + +def id_to_iri(xml_file: str, json_file: str, out_file: str, verbose: bool) -> None: + """ + This function replaces all occurrences of internal IDs with their respective IRIs inside an XML file. It gets the + mapping from the JSON file provided as parameter for this function. + + Args: + xml_file : the XML file with the data to be replaced + json_file : the JSON file with the mapping (dict) of internal IDs to IRIs + out_file: path to the output XML file with replaced IDs (optional), default: "id2iri_replaced_" + timestamp + ".xml" + verbose: verbose feedback if set to True + + Returns: + None + """ + + # check that provided files exist + if not os.path.isfile(xml_file): + print(f"File {xml_file} could not be found.") + exit(1) + + if not os.path.isfile(json_file): + print(f"File {json_file} could not be found.") + exit(1) + + # load JSON from provided json file to dict + with open(json_file, encoding="utf-8", mode='r') as file: + mapping = json.load(file) + + # parse XML from provided xml file + tree = etree.parse(xml_file) + + # iterate through all XML elements and remove namespace declarations + for elem in tree.getiterator(): + # skip comments and processing instructions as they do not have namespaces + if not ( + isinstance(elem, etree._Comment) + or isinstance(elem, etree._ProcessingInstruction) + ): + # remove namespace declarations + elem.tag = etree.QName(elem).localname + + resource_elements = tree.xpath("/knora/resource/resptr-prop/resptr") + for resptr_prop in resource_elements: + value_before = resptr_prop.text + value_after = mapping.get(resptr_prop.text) + if value_after: + resptr_prop.text = value_after + if verbose: + print(f"Replaced internal ID '{value_before}' with IRI '{value_after}'") + + else: # if value couldn't be found in mapping file + if value_before.startswith("http://rdfh.ch/"): + if verbose: + print(f"Skipping '{value_before}'") + else: + print(f"WARNING Could not find internal ID '{value_before}' in mapping file {json_file}. " + f"Skipping...") + + # write xml with replaced IDs to file with timestamp + if not out_file: + timestamp_now = datetime.now() + timestamp_str = timestamp_now.strftime("%Y%m%d-%H%M%S") + + file_name = Path(xml_file).stem + out_file = file_name + "_replaced_" + timestamp_str + ".xml" + + et = etree.ElementTree(tree.getroot()) + et.write(out_file, pretty_print=True) + print(f"XML with replaced IDs was written to file {out_file}.") diff --git a/knora/dsplib/utils/xml_upload.py b/knora/dsplib/utils/xml_upload.py index 257471958..090e66788 100644 --- a/knora/dsplib/utils/xml_upload.py +++ b/knora/dsplib/utils/xml_upload.py @@ -1,13 +1,17 @@ """ This module handles the import of XML data into the DSP platform. """ +import json import os +from datetime import datetime +from pathlib import Path from typing import Dict, List, Optional, Union from lxml import etree from knora.dsplib.models.connection import Connection from knora.dsplib.models.group import Group +from knora.dsplib.models.helpers import BaseError from knora.dsplib.models.permission import Permissions from knora.dsplib.models.project import Project from knora.dsplib.models.resource import ResourceInstanceFactory, ResourceInstance @@ -306,10 +310,10 @@ def get_propvals(self, resiri_lookup: Dict[str, str], permissions_lookup: Dict[s if iri is not None: v = iri else: - v = value.value # if we do not find the unique_id, we assume it's a valid knora IRI + v = value.value # if we do not find the id, we assume it's a valid knora IRI elif prop.valtype == 'text': if isinstance(value.value, KnoraStandoffXml): - iri_refs = value.value.findall() # The IRI's must be embedded as "...IRI:unique_id:IRI..." + iri_refs = value.value.findall() for iri_ref in iri_refs: res_id = iri_ref.split(':')[1] iri = resiri_lookup.get(res_id) @@ -435,7 +439,7 @@ def print(self): a.print() -def do_sort_order(resources: List[KnoraResource]) -> List[KnoraResource]: +def do_sort_order(resources: List[KnoraResource], verbose) -> List[KnoraResource]: """ Sorts a list of resources. @@ -444,6 +448,7 @@ def do_sort_order(resources: List[KnoraResource]) -> List[KnoraResource]: Args: resources: List of resources before sorting + verbose: verbose output if True Returns: sorted list of resources @@ -475,7 +480,7 @@ def do_sort_order(resources: List[KnoraResource]) -> List[KnoraResource]: notok_resources.append(resource) resources = notok_resources if not len(notok_resources) < notok_len: - print('Cannot resolve resptr dependencies. Giving up....') + print('Cannot resolve resptr dependencies. Giving up...') print(len(notok_resources)) for r in notok_resources: print('Resource {} has unresolvable resptrs to: '.format(r.id), end=' ') @@ -487,7 +492,8 @@ def do_sort_order(resources: List[KnoraResource]) -> List[KnoraResource]: notok_len = len(notok_resources) notok_resources = [] cnt += 1 - print('{}. Ordering pass Finished!'.format(cnt)) + if verbose: + print('{}. Ordering pass Finished!'.format(cnt)) # print('Remaining: {}'.format(len(resources))) return ok_resources @@ -515,7 +521,7 @@ def validate_xml_against_schema(input_file: str, schema_file: str) -> bool: def xml_upload(input_file: str, server: str, user: str, password: str, imgdir: str, sipi: str, verbose: bool, - validate_only: bool) -> bool: + validate_only: bool, incremental: bool) -> None: """ This function reads an XML file and imports the data described in it onto the DSP server. @@ -528,6 +534,7 @@ def xml_upload(input_file: str, server: str, user: str, password: str, imgdir: s sipi : the sipi instance to be used verbose : verbose option for the command, if used more output is given to the user validate_only : validation option to validate the XML data without the actual import of the data + incremental: if set, IRIs instead of internal IDs are expected as resource pointers Returns: None @@ -538,11 +545,11 @@ def xml_upload(input_file: str, server: str, user: str, password: str, imgdir: s schema_file = os.path.join(current_dir, '../schemas/data.xsd') if validate_xml_against_schema(input_file, schema_file): - print("The input data file is syntactically correct and passed validation!") + print("The input data file is syntactically correct and passed validation.") if validate_only: exit(0) else: - print("The input data file did not pass validation!") + print("ERROR The input data file did not pass validation.") exit(1) # Connect to the DaSCH Service Platform API and get the project context @@ -583,8 +590,9 @@ def xml_upload(input_file: str, server: str, user: str, password: str, imgdir: s elif child.tag == "resource": resources.append(KnoraResource(child, default_ontology)) - # sort the resources (resources which do not link to others come first) - resources = do_sort_order(resources) + # sort the resources (resources which do not link to others come first) but only if not an incremental upload + if not incremental: + resources = do_sort_order(resources, verbose) sipi = Sipi(sipi, con.get_token()) @@ -603,21 +611,42 @@ def xml_upload(input_file: str, server: str, user: str, password: str, imgdir: s res_iri_lookup: Dict[str, str] = {} + failed_uploads = [] for resource in resources: - if verbose: - resource.print() - if resource.bitstream: - img = sipi.upload_bitstream(os.path.join(imgdir, resource.bitstream)) - bitstream = img['uploadedFiles'][0]['internalFilename'] - else: - bitstream = None - - # create the resource on the server - instance: ResourceInstance = res_classes[resource.restype](con=con, label=resource.label, - permissions=permissions_lookup.get( - resource.permissions), - bitstream=bitstream, - values=resource.get_propvals(res_iri_lookup, - permissions_lookup)).create() - res_iri_lookup[resource.id] = instance.iri - print("Created resource:", instance.label, "(", resource.id, ") with IRI", instance.iri) + bitstream = None + try: + if verbose: + resource.print() + if resource.bitstream: + img = sipi.upload_bitstream(os.path.join(imgdir, resource.bitstream)) + bitstream = img['uploadedFiles'][0]['internalFilename'] + + # create the resource on the server + instance = res_classes[resource.restype](con=con, label=resource.label, + permissions=permissions_lookup.get(resource.permissions), + bitstream=bitstream, + values=resource.get_propvals(res_iri_lookup, + permissions_lookup)).create() + res_iri_lookup[resource.id] = instance.iri + print(f"Created resource '{instance.label}' ({resource.id}) with IRI '{instance.iri}'") + + except BaseError as err: + failed_uploads.append(resource.id) + print(f"ERROR while trying to upload '{resource.label}' ({resource.id}). The error message was: {err.message}") + + except Exception as exception: + failed_uploads.append(resource.id) + print(f"ERROR while trying to upload '{resource.label}' ({resource.id}). The error message was: {exception}") + + # write mapping of internal IDs to IRIs to file with timestamp + timestamp_now = datetime.now() + timestamp_str = timestamp_now.strftime("%Y%m%d-%H%M%S") + + xml_file_name = Path(input_file).stem + res_iri_lookup_file = "id2iri_" + xml_file_name + "_mapping_" + timestamp_str + ".json" + with open(res_iri_lookup_file, "w") as outfile: + print(f"============\nThe mapping of internal IDs to IRIs was written to {res_iri_lookup_file}") + outfile.write(json.dumps(res_iri_lookup)) + + if failed_uploads: + print(f"Could not upload the following resources: {failed_uploads}") diff --git a/requirements.txt b/requirements.txt index 6f242c8b0..0d205454b 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,14 +1,61 @@ +attrs==21.2.0 +bleach==4.1.0 +certifi==2021.10.8 +charset-normalizer==2.0.7 +click==8.0.3 +colorama==0.4.4 +decorator==5.1.0 +docutils==0.18 +et-xmlfile==1.1.0 +future==0.18.2 +ghp-import==2.0.2 +idna==3.3 +importlib-metadata==4.8.1 +isodate==0.6.0 +Jinja2==3.0.2 +joblib==1.1.0 +jsonschema==4.2.1 +keyring==23.2.1 +livereload==2.6.3 +lunr==0.5.8 +lxml==4.6.4 +Markdown==3.3.4 +MarkupSafe==2.0.1 +mergedeep==1.3.4 +mkdocs==1.2.3 +mkdocs-autorefs==0.3.0 +mkdocs-include-markdown-plugin==3.2.3 +mkdocs-material==7.2.3 +mkdocs-material-extensions==1.0.3 +mkdocstrings==0.16.2 +nltk==3.6.5 +openpyxl==3.0.9 +packaging==21.2 +pkginfo==1.7.1 +Pygments==2.10.0 +pymdown-extensions==9.0 +pyparsing==2.4.7 +pyrsistent==0.18.0 +pystrict==1.1 +python-dateutil==2.8.2 +pytkdocs==0.12.0 +PyYAML==6.0 +pyyaml_env_tag==0.1 +rdflib==6.0.2 +readme-renderer==30.0 +regex==2021.11.2 +requests==2.26.0 +requests-toolbelt==0.9.1 +rfc3986==1.5.0 +rfc3987==1.3.8 setuptools +six==1.16.0 +tornado==6.1 +tqdm==4.62.3 +twine==3.5.0 +urllib3==1.26.7 +validators==0.18.2 +watchdog==2.1.6 +webencodings==0.5.1 wheel -tqdm -twine -mkdocs==1.1.2 -mkdocs-material -rdflib -lxml -validators -requests -jsonschema -rfc3987 -openpyxl -pystrict +zipp==3.6.0 diff --git a/setup.cfg b/setup.cfg index b88034e41..08aedd7e6 100644 --- a/setup.cfg +++ b/setup.cfg @@ -1,2 +1,2 @@ [metadata] -description-file = README.md +description_file = README.md diff --git a/test/e2e/BUILD.bazel b/test/e2e/BUILD.bazel index 1c7ce0706..e8fe3dabd 100644 --- a/test/e2e/BUILD.bazel +++ b/test/e2e/BUILD.bazel @@ -118,6 +118,7 @@ py_test( "//knora/dsplib/utils:excel_to_json_lists", "//knora/dsplib/utils:excel_to_json_resources", "//knora/dsplib/utils:excel_to_json_properties", + "//knora/dsplib/utils:id_to_iri" ], data = [ "//testdata:testdata", diff --git a/test/e2e/test_tools.py b/test/e2e/test_tools.py index 1697b1e16..e22b1ba67 100644 --- a/test/e2e/test_tools.py +++ b/test/e2e/test_tools.py @@ -7,6 +7,7 @@ from knora.dsplib.utils.excel_to_json_lists import list_excel2json from knora.dsplib.utils.excel_to_json_properties import properties_excel2json from knora.dsplib.utils.excel_to_json_resources import resources_excel2json +from knora.dsplib.utils.id_to_iri import id_to_iri from knora.dsplib.utils.onto_create_ontology import create_ontology from knora.dsplib.utils.onto_get import get_ontology from knora.dsplib.utils.onto_validate import validate_ontology @@ -118,7 +119,14 @@ def test_xml_upload(self) -> None: imgdir='testdata/bitstreams', sipi='http://0.0.0.0:1024', verbose=False, - validate_only=False) + validate_only=False, + incremental=False) + + def test_id_to_iri(self) -> None: + id_to_iri(xml_file='testdata/test-id2iri-data.xml', + json_file='testdata/test-id2iri-mapping.json', + out_file='_test-id2iri-replaced.xml', + verbose=False) if __name__ == '__main__': diff --git a/test/unittests/BUILD.bazel b/test/unittests/BUILD.bazel index a7980a926..b697da955 100644 --- a/test/unittests/BUILD.bazel +++ b/test/unittests/BUILD.bazel @@ -1,3 +1,5 @@ +package(default_visibility = ["//visibility:public"]) + # make the python rules available load("@rules_python//python:defs.bzl", "py_binary", "py_library", "py_test") @@ -6,10 +8,7 @@ load("@knora_py_deps//:requirements.bzl", "requirement") py_test( name = "test_langstring", - srcs = ["test_langstring.py"], - deps = [ - "//knora/dsplib/models:langstring", - ] + srcs = ["test_langstring.py"] ) py_test( @@ -21,3 +20,11 @@ py_test( "//knora/dsplib/models:helpers" ] ) + +py_test( + name = "test_id_to_iri", + srcs = ["test_id_to_iri.py"], + data = [ + "//testdata:testdata" + ] +) diff --git a/test/unittests/test_id_to_iri.py b/test/unittests/test_id_to_iri.py new file mode 100644 index 000000000..e4be8a938 --- /dev/null +++ b/test/unittests/test_id_to_iri.py @@ -0,0 +1,58 @@ +"""Unit tests for id to iri mapping""" + +import unittest + +from lxml import etree + +from knora.dsplib.utils.id_to_iri import id_to_iri + + +class TestIdToIri(unittest.TestCase): + out_file = '_test-id2iri-replaced.xml' + + def test_invalid_xml_file_name(self): + with self.assertRaises(SystemExit) as cm: + id_to_iri(xml_file='test.xml', + json_file='testdata/test-id2iri-mapping.json', + out_file=self.out_file, + verbose=True) + + self.assertEqual(cm.exception.code, 1) + + def test_invalid_json_file_name(self): + with self.assertRaises(SystemExit) as cm: + id_to_iri(xml_file='testdata/test-id2iri-data.xml', + json_file='test.json', + out_file=self.out_file, + verbose=True) + + self.assertEqual(cm.exception.code, 1) + + def test_replace_id_with_iri(self): + id_to_iri(xml_file='testdata/test-id2iri-data.xml', + json_file='testdata/test-id2iri-mapping.json', + out_file=self.out_file, + verbose=True) + + tree = etree.parse(self.out_file) + + for elem in tree.getiterator(): + # skip comments and processing instructions as they do not have namespaces + if not ( + isinstance(elem, etree._Comment) + or isinstance(elem, etree._ProcessingInstruction) + ): + # remove namespace declarations + elem.tag = etree.QName(elem).localname + + resource_elements = tree.xpath("/knora/resource/resptr-prop/resptr") + result = [] + for resptr_prop in resource_elements: + result.append(resptr_prop.text) + + self.assertEqual(result, + ["http://rdfh.ch/082E/ylRvrg7tQI6aVpcTJbVrwg", "http://rdfh.ch/082E/JK63OpYWTDWNYVOYFN7FdQ"]) + + +if __name__ == '__main__': + unittest.main() diff --git a/testdata/BUILD.bazel b/testdata/BUILD.bazel index 51adcc7d6..a7db91d28 100644 --- a/testdata/BUILD.bazel +++ b/testdata/BUILD.bazel @@ -14,7 +14,11 @@ filegroup( "lists/description_en.xlsx", "lists/Beschreibung_de.xlsx", "test-data.xml", - "test-onto.json" + "test-onto.json", + "test-id2iri-data.xml", + "test-id2iri-mapping.json", + "test-id2iri-replaced.xml", + "tmp/_test-id2iri-replaced.xml" ], ) diff --git a/testdata/test-id2iri-data.xml b/testdata/test-id2iri-data.xml new file mode 100644 index 000000000..b06044b36 --- /dev/null +++ b/testdata/test-id2iri-data.xml @@ -0,0 +1,64 @@ + + + + + + + V + V + CR + CR + + + V + V + CR + CR + + + V + V + CR + CR + + + V + V + CR + CR + + + + + Images/Danby-deluge.jpg + + The Deluge + + + person_2 + + + GREGORIAN:CE:1877:CE:1879 + + + Oil paint on canvas. + + + The Deluge + + + institution_2 + + + 6695209 + + + https://www.tate.org.uk/art/artworks/danby-the-deluge-t01337 + + + + diff --git a/testdata/test-id2iri-mapping.json b/testdata/test-id2iri-mapping.json new file mode 100644 index 000000000..a317b56fc --- /dev/null +++ b/testdata/test-id2iri-mapping.json @@ -0,0 +1,4 @@ +{ + "person_2": "http://rdfh.ch/082E/ylRvrg7tQI6aVpcTJbVrwg", + "institution_2": "http://rdfh.ch/082E/JK63OpYWTDWNYVOYFN7FdQ" +} diff --git a/testdata/test-id2iri-replaced.xml b/testdata/test-id2iri-replaced.xml new file mode 100644 index 000000000..bf791dfdf --- /dev/null +++ b/testdata/test-id2iri-replaced.xml @@ -0,0 +1,58 @@ + + + + + V + V + CR + CR + + + V + V + CR + CR + + + V + V + CR + CR + + + V + V + CR + CR + + + + + Images/Danby-deluge.jpg + + The Deluge + + + http://rdfh.ch/082E/ylRvrg7tQI6aVpcTJbVrwg + + + GREGORIAN:CE:1877:CE:1879 + + + Oil paint on canvas. + + + The Deluge + + + http://rdfh.ch/082E/JK63OpYWTDWNYVOYFN7FdQ + + + 6695209 + + + https://www.tate.org.uk/art/artworks/danby-the-deluge-t01337 + + + + diff --git a/testdata/tmp/_test-id2iri-replaced.xml b/testdata/tmp/_test-id2iri-replaced.xml new file mode 100644 index 000000000..bf791dfdf --- /dev/null +++ b/testdata/tmp/_test-id2iri-replaced.xml @@ -0,0 +1,58 @@ + + + + + V + V + CR + CR + + + V + V + CR + CR + + + V + V + CR + CR + + + V + V + CR + CR + + + + + Images/Danby-deluge.jpg + + The Deluge + + + http://rdfh.ch/082E/ylRvrg7tQI6aVpcTJbVrwg + + + GREGORIAN:CE:1877:CE:1879 + + + Oil paint on canvas. + + + The Deluge + + + http://rdfh.ch/082E/JK63OpYWTDWNYVOYFN7FdQ + + + 6695209 + + + https://www.tate.org.uk/art/artworks/danby-the-deluge-t01337 + + + +