Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix!: fix valueHasUri bad values and missing types (DEV-1036) #2079

Merged
merged 8 commits into from Jun 21, 2022
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Expand Up @@ -21,6 +21,7 @@ sipi/test
/docs/architecture/.structurizr
/docs/architecture/workspace.json


**/target/
*.aux
*.bbl
Expand All @@ -39,11 +40,11 @@ sipi/test

mpro7 marked this conversation as resolved.
Show resolved Hide resolved
dependencies.txt
/client-test-data.zip
/db_ls_test_server_dump.trig
/sipi/images/082E/*
/sipi/images/originals/082E/*
sipi/images/1111/*
sipi/images/originals/1111/*
/dependencies.bzl

.idea/
.metals/
Expand Down
29 changes: 29 additions & 0 deletions test_data/upgrade/pr2079.trig
@@ -0,0 +1,29 @@
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix knora-base: <http://www.knora.org/ontology/knora-base#> .
@prefix knora-admin: <http://www.knora.org/ontology/knora-admin#> .
@prefix anything: <http://www.knora.org/ontology/0001/anything#> .

# value with missing valueHasUri datatype
<http://rdfh.ch/0103/fN89IUgvSSyMxJ7XWssP9w/values/Rl2rfjDlRBWeuRr-EgIgCw>
rdf:type knora-base:UriValue ;
knora-base:attachedToUser <http://rdfh.ch/users/user-unil-valentina-ponzetto> ;
knora-base:hasPermissions "CR knora-admin:ProjectAdmin|D knora-admin:Creator|M knora-admin:ProjectMember|V knora-admin:KnownUser,knora-admin:UnknownUser" ;
knora-base:isDeleted false ;
knora-base:valueCreationDate "2018-01-12T19:21:37.991+00:00"^^xsd:dateTime ;
knora-base:valueHasOrder 1 ;
knora-base:valueHasString "https://viaf.org/viaf/2571146" ;
knora-base:valueHasUUID "PYQ5mqH8TYOKCytMozecWw" ;
knora-base:valueHasUri "https://viaf.org/viaf/2571146" .

# bad valueHasUri value and missing datatype
<http://rdfh.ch/0103/5LE8P57nROClWUxEPJhiug/values/fEbt5NzaSe6GnCqKoF4Nhg/standoff/2>
rdf:type knora-base:StandoffUriTag ;
knora-base:standoffTagHasEnd 50 ;
knora-base:standoffTagHasStart 20 ;
knora-base:standoffTagHasStartIndex 2 ;
knora-base:standoffTagHasStartParent <http://rdfh.ch/0103/5LE8P57nROClWUxEPJhiug/values/fEbt5NzaSe6GnCqKoF4Nhg/standoff/1> ;
knora-base:standoffTagHasUUID "dd2e785b-041e-4bcb-a77d-570efe8a068c" ;
knora-base:valueHasUri <http://www.maison-george-sand.fr/> .
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. did you check, if the standoff handling actually requires or can handle typed values? The value at line 29 is a correct IRI in pointy brackets. You could check it by for example changing existing test data and then running the tests (if this case is covered).

  2. Also, did you check if the existing test data needs to be changed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how/where to check but in LS data there are only 2 instances of data written like in the code snippet above. Any other valueHasUri used in standoff looks like that:

knora-base:valueHasUri          "https://lumieres.unil.ch/fiches/bio/2666/"^^xsd:anyURI .

In our test data I found only one example which is typed string:
https://github.com/dasch-swiss/dsp-api/blob/main/test_data/all_data/anything-data.ttl#L1244

Expand Up @@ -54,6 +54,11 @@ object RepositoryUpdatePlan {
versionNumber = 20,
plugin = new UpgradePluginPR2018(featureFactoryConfig, log),
prBasedVersionString = Some("PR 2018")
),
PluginForKnoraBaseVersion(
versionNumber = 20,
mpro7 marked this conversation as resolved.
Show resolved Hide resolved
plugin = new UpgradePluginPR2079(featureFactoryConfig, log),
prBasedVersionString = Some("PR 2079")
)
)

Expand Down
@@ -0,0 +1,67 @@
/*
* Copyright © 2021 - 2022 Swiss National Data and Service Center for the Humanities and/or DaSCH Service Platform contributors.
* SPDX-License-Identifier: Apache-2.0
*/

package org.knora.webapi.store.triplestore.upgrade.plugins

import com.typesafe.scalalogging.Logger
import org.knora.webapi.feature.FeatureFactoryConfig
import org.knora.webapi.messages.OntologyConstants
import org.knora.webapi.messages.util.rdf._
import org.knora.webapi.store.triplestore.upgrade.UpgradePlugin

/**
* Transforms a repository for Knora PR 2079.
* Adds missing datatype ^^<http://www.w3.org/2001/XMLSchema#anyURI> and/or value to valueHasUri
*/
class UpgradePluginPR2079(featureFactoryConfig: FeatureFactoryConfig, log: Logger) extends UpgradePlugin {
private val nodeFactory: RdfNodeFactory = RdfFeatureFactory.getRdfNodeFactory(featureFactoryConfig)

override def transform(model: RdfModel): Unit = {
val statementsToRemove: collection.mutable.Set[Statement] = collection.mutable.Set.empty
val statementsToAdd: collection.mutable.Set[Statement] = collection.mutable.Set.empty

for (statement: Statement <- model) {
mpro7 marked this conversation as resolved.
Show resolved Hide resolved
if (statement.pred.iri == OntologyConstants.KnoraBase.ValueHasUri) {
statement.obj match {
case literal: DatatypeLiteral =>
if (literal.datatype != OntologyConstants.Xsd.Uri) {
statementsToRemove += statement

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
val newObjectValue = nodeFactory.makeDatatypeLiteral(literal.value, OntologyConstants.Xsd.Uri)

Copy link
Collaborator Author

@mpro7 mpro7 Jun 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for suggestion, but it needs to take parameter because in another case it takes differently named value to create DatatypeLiteral.

statementsToAdd += nodeFactory.makeStatement(
subj = statement.subj,
pred = statement.pred,
obj = nodeFactory.makeDatatypeLiteral(literal.value, OntologyConstants.Xsd.Uri),
mpro7 marked this conversation as resolved.
Show resolved Hide resolved
context = statement.context
)

log.info(
s"Transformed valueHasIri: $literal to ${nodeFactory
.makeDatatypeLiteral(literal.value, OntologyConstants.Xsd.Uri)}."
mpro7 marked this conversation as resolved.
Show resolved Hide resolved
)
}

case node: IriNode =>
statementsToRemove += statement

statementsToAdd += nodeFactory.makeStatement(
subj = statement.subj,
pred = statement.pred,
obj = nodeFactory.makeDatatypeLiteral(node.iri, OntologyConstants.Xsd.Uri),
context = statement.context
)

log.info(
s"Transformed valueHasIri $node to ${nodeFactory.makeDatatypeLiteral(node.iri, OntologyConstants.Xsd.Uri)}."
)

case _ => ()
}
}
}

model.removeStatements(statementsToRemove.toSet)
model.addStatements(statementsToAdd.toSet)
}
}
@@ -0,0 +1,84 @@
/*
* Copyright © 2021 - 2022 Swiss National Data and Service Center for the Humanities and/or DaSCH Service Platform contributors.
* SPDX-License-Identifier: Apache-2.0
*/

package org.knora.webapi.store.triplestore.upgrade.plugins

import com.typesafe.scalalogging.LazyLogging
import dsp.errors.AssertionException
import org.knora.webapi.messages.OntologyConstants
import org.knora.webapi.messages.util.rdf._

class UpgradePluginPR2079Spec extends UpgradePluginSpec with LazyLogging {
private val nodeFactory: RdfNodeFactory = RdfFeatureFactory.getRdfNodeFactory(defaultFeatureFactoryConfig)

"Upgrade plugin PR2079" should {
"fix the missing datatype of valueHasUri" in {
// Parse the input file.
val model: RdfModel = trigFileToModel("../test_data/upgrade/pr2079.trig")

// Use the plugin to transform the input.
val plugin = new UpgradePluginPR2079(defaultFeatureFactoryConfig, log)
plugin.transform(model)

// Check that the datatype was fixed.
val subj = nodeFactory.makeIriNode("http://rdfh.ch/0103/fN89IUgvSSyMxJ7XWssP9w/values/Rl2rfjDlRBWeuRr-EgIgCw")
val pred = nodeFactory.makeIriNode(OntologyConstants.KnoraBase.ValueHasUri)

model
.find(
subj = Some(subj),
pred = Some(pred),
obj = None
)
.toSet
.headOption match {
case Some(statement: Statement) =>
statement.obj match {
case datatypeLiteral: DatatypeLiteral =>
assert(datatypeLiteral.datatype == OntologyConstants.Xsd.Uri)

case other =>
throw AssertionException(s"Unexpected object for $pred: $other")
}

case None => throw AssertionException(s"No statement found with subject $subj and predicate $pred")
}
}

"fix value valueHasUri etered ad node w/o datatype" in {
mpro7 marked this conversation as resolved.
Show resolved Hide resolved
// Parse the input file.
val model: RdfModel = trigFileToModel("../test_data/upgrade/pr2079.trig")

// Use the plugin to transform the input.
val plugin = new UpgradePluginPR2079(defaultFeatureFactoryConfig, log)
plugin.transform(model)

// Check that the value amd datatype was fixed.
mpro7 marked this conversation as resolved.
Show resolved Hide resolved
val subj =
nodeFactory.makeIriNode("http://rdfh.ch/0103/5LE8P57nROClWUxEPJhiug/values/fEbt5NzaSe6GnCqKoF4Nhg/standoff/2")
val pred = nodeFactory.makeIriNode(OntologyConstants.KnoraBase.ValueHasUri)

model
.find(
subj = Some(subj),
pred = Some(pred),
obj = None
)
.toSet
.headOption match {
case Some(statement: Statement) =>
statement.obj match {
case datatypeLiteral: DatatypeLiteral =>
assert(datatypeLiteral.datatype == OntologyConstants.Xsd.Uri)

case other =>
throw AssertionException(s"Unexpected object for $pred: $other")
}

case None => throw AssertionException(s"No statement found with subject $subj and predicate $pred")
}
}
}
}