Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rdf-api): Add a general-purpose SHACL validation utility (DSP-930) #1762

Merged
merged 4 commits into from Dec 1, 2020
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
15 changes: 13 additions & 2 deletions docs/05-internals/design/principles/rdf-api.md
Expand Up @@ -49,10 +49,12 @@ The API is in the package `org.knora.webapi.messages.util.rdf`. It includes:
- `JsonLDUtil`, which provides specialised functionality for working
with RDF in JSON-LD format, and for converting between RDF models
and JSON-LD documents. `RdfFormatUtil` uses `JsonLDUtil` when appropriate.

- `ShaclValidator`, which validates RDF models using SHACL shapes.

To work with RDF models, start with `RdfFeatureFactory`, which returns instances
of `RdfNodeFactory`, `RdfModelFactory`, and `RdfFormatUtil`, using feature toggle
configuration. `JsonLDUtil` does not need a feature factory.
of `RdfNodeFactory`, `RdfModelFactory`, `RdfFormatUtil`, and `ShaclValidator`,
using feature toggle configuration. `JsonLDUtil` does not need a feature factory.

To iterate efficiently over the statements in an `RdfModel`, use its `iterator` method.
An `RdfModel` cannot be modified while you are iterating over it.
Expand Down Expand Up @@ -85,6 +87,15 @@ an `RdfModel`. To do this, use the `RdfModel.asRepository` method, which
returns an `RdfRepository` that can run `SELECT` queries.


## SHACL validation

On startup, graphs of SHACL shapes are loaded from Turtle files in a directory specified
by `app.shacl.shapes-dir` in `application.conf`, and in subdirectories of
that directory. To validate the default graph of an `RdfModel` using a graph of
SHACL shapes, call `ShaclValidator.validate`, specifying the relative path of the
Turtle file containing the graph of shapes.


## Implementations

- The Jena-based implementation, in package `org.knora.webapi.messages.util.rdf.jenaimpl`.
Expand Down
13 changes: 13 additions & 0 deletions test_data/shacl/test/person.ttl
@@ -0,0 +1,13 @@
@prefix ex: <http://example.com/ns#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

ex:PersonShape a sh:NodeShape ;
sh:targetClass foaf:Person ;
sh:property ex:PersonShapeProperty .

ex:PersonShapeProperty sh:path foaf:age ;
sh:datatype xsd:int ;
sh:maxCount 1 ;
sh:minCount 1 .
4 changes: 3 additions & 1 deletion third_party/dependencies.bzl
Expand Up @@ -84,7 +84,9 @@ def dependencies():
"org.xmlunit:xmlunit-core:2.1.1",

# other
"org.eclipse.rdf4j:rdf4j-runtime:3.0.0",
"org.eclipse.rdf4j:rdf4j-runtime:3.4.4",
"org.eclipse.rdf4j:rdf4j-client:3.4.4",
"org.eclipse.rdf4j:rdf4j-shacl:3.4.4",
"org.rogach:scallop_2.12:3.2.0",
"com.google.gwt:gwt-servlet:2.8.0",
"net.sf.saxon:Saxon-HE:9.9.0-2",
Expand Down
1 change: 1 addition & 0 deletions webapi/BUILD.bazel
Expand Up @@ -212,6 +212,7 @@ scala_library(
"//webapi/src/main/scala/org/knora/webapi/app",
"//webapi/src/main/scala/org/knora/webapi/core",
"//webapi/src/main/scala/org/knora/webapi/exceptions",
"//webapi/src/main/scala/org/knora/webapi/feature",
"//webapi/src/main/scala/org/knora/webapi/instrumentation",
"//webapi/src/main/scala/org/knora/webapi/messages",
"//webapi/src/main/scala/org/knora/webapi/routing",
Expand Down
Expand Up @@ -20,6 +20,7 @@
package org.knora.webapi

import akka.actor.ActorSystem
import akka.event.LoggingAdapter
import akka.http.scaladsl.Http
import akka.http.scaladsl.client.RequestBuilding
import akka.http.scaladsl.model.{HttpRequest, HttpResponse, StatusCodes}
Expand All @@ -33,6 +34,7 @@ import org.scalatest.wordspec.AnyWordSpecLike
import org.scalatest.{BeforeAndAfterAll, Suite}
import spray.json.{JsObject, _}
import org.knora.webapi.messages.StringFormatter
import org.knora.webapi.messages.util.rdf.RdfFeatureFactory

import scala.concurrent.duration.{Duration, _}
import scala.concurrent.{Await, ExecutionContext}
Expand Down Expand Up @@ -63,8 +65,9 @@ class ITKnoraFakeSpec(_system: ActorSystem) extends Core with KnoraFakeCore with

/* Needs to be initialized before any responders */
StringFormatter.initForTest()
RdfFeatureFactory.init(settings)

val log = akka.event.Logging(system, this.getClass)
val log: LoggingAdapter = akka.event.Logging(system, this.getClass)

protected val baseApiUrl: String = settings.internalKnoraApiBaseUrl
protected val baseInternalSipiUrl: String = settings.internalSipiBaseUrl
Expand Down
Expand Up @@ -36,7 +36,7 @@ import org.knora.webapi.exceptions.AssertionException
import org.knora.webapi.messages.StringFormatter
import org.knora.webapi.messages.app.appmessages.{AppStart, AppStop, SetAllowReloadOverHTTPState}
import org.knora.webapi.messages.store.triplestoremessages.{RdfDataObject, TriplestoreJsonProtocol}
import org.knora.webapi.messages.util.rdf.{JsonLDDocument, JsonLDUtil}
import org.knora.webapi.messages.util.rdf.{JsonLDDocument, JsonLDUtil, RdfFeatureFactory}
import org.knora.webapi.settings._
import org.knora.webapi.util.StartupUtils
import org.scalatest.matchers.should.Matchers
Expand Down Expand Up @@ -75,6 +75,7 @@ class ITKnoraLiveSpec(_system: ActorSystem) extends Core with StartupUtils with

/* Needs to be initialized before any responders */
StringFormatter.initForTest()
RdfFeatureFactory.init(settings)

val log: LoggingAdapter = akka.event.Logging(system, this.getClass)

Expand Down
64 changes: 35 additions & 29 deletions webapi/src/main/resources/application.conf
Expand Up @@ -264,35 +264,41 @@ akka-http-cors {
}

app {
feature-toggles {
new-list-admin-routes {
description = "Replace the old list admin routes with new ones."

available-versions = [ 1 ]
default-version = 1
enabled-by-default = no
override-allowed = yes
expiration-date = "2021-12-01T00:00:00Z"

developer-emails = [
"Sepideh Alassi <sepideh.alassi@dasch.swiss>"
"Benjamin Geer <benjamin.geer@dasch.swiss>"
]
}

jena-rdf-library {
description = "Use the Jena API for RDF processing. If turned off, use the RDF4J API."

available-versions = [ 1 ]
default-version = 1
enabled-by-default = no
override-allowed = yes

developer-emails = [
"Benjamin Geer <benjamin.geer@dasch.swiss>"
]
}
}
feature-toggles {
new-list-admin-routes {
description = "Replace the old list admin routes with new ones."

available-versions = [ 1 ]
default-version = 1
enabled-by-default = no
override-allowed = yes
expiration-date = "2021-12-01T00:00:00Z"

developer-emails = [
"Sepideh Alassi <sepideh.alassi@dasch.swiss>"
"Benjamin Geer <benjamin.geer@dasch.swiss>"
]
}

jena-rdf-library {
description = "Use the Jena API for RDF processing. If turned off, use the RDF4J API."

available-versions = [ 1 ]
default-version = 1
enabled-by-default = no
override-allowed = yes

developer-emails = [
"Benjamin Geer <benjamin.geer@dasch.swiss>"
]
}
}

shacl {
# The directory that SHACL shapes are loaded from.
shapes-dir = "shacl"
shapes-dir = ${?KNORA_WEBAPI_SHACLE_SHAPES_DIR}
}

print-extended-config = false // If true, an extended list of configuration parameters will be printed out at startup.
print-extended-config = ${?KNORA_WEBAPI_PRINT_EXTENDED_CONFIG}
Expand Down
6 changes: 4 additions & 2 deletions webapi/src/main/scala/org/knora/webapi/app/LiveCore.scala
Expand Up @@ -27,8 +27,8 @@ import org.knora.webapi.settings.{KnoraDispatchers, KnoraSettings, KnoraSettings
import scala.concurrent.ExecutionContext
import scala.language.postfixOps
import scala.languageFeature.postfixOps

import org.knora.webapi.core.Core
import org.knora.webapi.messages.util.rdf.RdfFeatureFactory


/**
Expand Down Expand Up @@ -57,8 +57,10 @@ trait LiveCore extends Core {
implicit val executionContext: ExecutionContext = system.dispatchers.lookup(KnoraDispatchers.KnoraActorDispatcher)


// Initialise StringFormatter with the system settings. This must happen before any responders are constructed.
// Initialise StringFormatter and RdfFeatureFactory with the system settings.
// This must happen before any responders are constructed.
StringFormatter.init(settings)
RdfFeatureFactory.init(settings)

/**
* The main application supervisor actor which is at the top of the actor
Expand Down
Expand Up @@ -37,6 +37,7 @@ scala_library(
"@maven//:org_eclipse_rdf4j_rdf4j_repository_sail",
"@maven//:org_eclipse_rdf4j_rdf4j_sail_api",
"@maven//:org_eclipse_rdf4j_rdf4j_sail_memory",
"@maven//:org_eclipse_rdf4j_rdf4j_shacl",
"@maven//:org_jodd_jodd",
"@maven//:org_scala_lang_modules_scala_xml_2_12",
"@maven//:org_scala_lang_scala_library",
Expand Down
Expand Up @@ -125,6 +125,16 @@ object OntologyConstants {
val DateTimeStamp: IRI = XsdPrefixExpansion + "dateTimeStamp"
}

object Shacl {
val ShaclPrefixExpansion: IRI = "http://www.w3.org/ns/shacl#"

val Conforms: IRI = ShaclPrefixExpansion + "conforms"
val Result: IRI = ShaclPrefixExpansion + "result"
val SourceConstraintComponent: IRI = ShaclPrefixExpansion + "sourceConstraintComponent"
val DatatypeConstraintComponent: IRI = ShaclPrefixExpansion + "DatatypeConstraintComponent"
val MaxCountConstraintComponent: IRI = ShaclPrefixExpansion + "MaxCountConstraintComponent"
}

/**
* http://schema.org
*/
Expand Down
Expand Up @@ -318,18 +318,6 @@ object StringFormatter {
}
}

/**
* Initialises the singleton instance of [[StringFormatter]] for use in a client program that will connect to Knora.
*/
def initForClient(knoraHostAndPort: String): Unit = {
this.synchronized {
generalInstance match {
case Some(_) => ()
case None => generalInstance = Some(new StringFormatter(maybeKnoraHostAndPort = Some(knoraHostAndPort)))
}
}
}

/**
* Initialises the singleton instance of [[StringFormatter]] for a test.
*/
Expand Down
@@ -0,0 +1,91 @@
/*
* Copyright © 2015-2019 the contributors (see Contributors.md).
*
* This file is part of Knora.
*
* Knora is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published
* by the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Knora is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU Affero General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public
* License along with Knora. If not, see <http://www.gnu.org/licenses/>.
*/

package org.knora.webapi.messages.util.rdf

import java.io.File
import java.nio.file.Path

import org.knora.webapi.exceptions.AssertionException

/**
* An abstract base class for classes that validate RDF models using SHACL shapes.
*
* @param baseDir the base directory that SHACL graphs are loaded from.
* @param rdfFormatUtil an [[RdfFormatUtil]].
* @tparam ShaclGraphT an implementation-specific representation of a graph of SHACL shapes.
*/
abstract class AbstractShaclValidator[ShaclGraphT](baseDir: File, private val rdfFormatUtil: RdfFormatUtil) extends ShaclValidator {

/**
* A map of relative paths to objects representing graphs of SHACL shapes.
*/
private val shaclGraphs: Map[Path, ShaclGraphT] = if (baseDir.exists) {
loadShaclGraphs(baseDir, baseDir)
} else {
Map.empty
}

def validate(rdfModel: RdfModel, shaclPath: Path): ShaclValidationResult = {
validateWithShaclGraph(
rdfModel = rdfModel,
shaclGraph = shaclGraphs.getOrElse(shaclPath, throw AssertionException(s"SHACL graph $shaclPath not found"))
)
}

/**
* Recursively loads graphs of SHACL shapes from a directory and its subdirectories.
*
* @param baseDir the base directory that SHACL graphs are loaded from.
* @param dir the base directory or a subdirectory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the Shacl shapes are actually loaded from dir directory and its subdirectories, and baseDir is only used to get its path to make the relativePath. Can you please change the comments for baseDir and dir in scaladocs?

I also wonder, if only the path of the baseDir is needed here, why not just using its path as the parameter of this function, I mean

 private def  loadShaclGraphs(baseDirPath: Path, dir: File) : Map[Path, ShaclGraphT] = {...}

two directory parameters are a bit confusing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the Shacl shapes are actually loaded from dir directory and its subdirectories, and baseDir is only used to get its path to make the relativePath.... I also wonder, if only the path of the baseDir is needed here, why not just using its path as the parameter of this function

Actually, the SHACL shapes are loaded from baseDir and its subdirectories. File and Path are basically the same thing, but File is the old java.io API which I'm more used to, and Path is the newer java.nio API. The consensus in the Java community seems to be that seems that Path should be used instead of File for everything nowadays (https://stackoverflow.com/a/26658436). I guess we should do a PR sometime to use java.nio instead of java.io everywhere. For now, I've refactored AbstractShaclValidator to use java.nio to walk the directory tree, which seems clearer.

* @return a map of file paths (relative to the base directory) to graphs of SHACL shapes.
*/
private def loadShaclGraphs(baseDir: File, dir: File): Map[Path, ShaclGraphT] = {
// Parse the files representing SHACL graphs in this directory.
val modelsInDir: Map[Path, ShaclGraphT] = dir.listFiles.collect {
case file: File if file.isFile && file.getName.endsWith(".ttl") =>
// Map each SHACL file's relative path to a ShapesT containing the SHACL graph.
val relativePath: Path = baseDir.toPath.relativize(file.toPath)
val shaclModel: RdfModel = rdfFormatUtil.fileToRdfModel(file = file, rdfFormat = Turtle)
relativePath -> rdfModelToShaclGraph(shaclModel)
}.toMap

// Recurse in subdirectories.
modelsInDir ++ dir.listFiles.filter(_.isDirectory).flatMap {
subDir => loadShaclGraphs(baseDir = baseDir, dir = subDir)
}
}

/**
* Validates the default graph of an [[RdfModel]] using a graph of SHACL shapes.
*
* @param rdfModel the [[RdfModel]] to be validated.
* @param shaclGraph a graph of SHACL shapes.
* @return the validation result.
*/
protected def validateWithShaclGraph(rdfModel: RdfModel, shaclGraph: ShaclGraphT): ShaclValidationResult

/**
* Converts the default graph of an [[RdfModel]] to a [[ShaclGraphT]].
*
* @param rdfModel an [[RdfModel]] whose default graph contains SHACL shapes.
* @return a [[ShaclGraphT]] representing the SHACL shapes.
*/
protected def rdfModelToShaclGraph(rdfModel: RdfModel): ShaclGraphT
}