Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
feat(api-v2): Add an RDF processing façade (DSP-1020) (#1754)
  • Loading branch information
Benjamin Geer committed Nov 17, 2020
1 parent f31c075 commit 9170419
Show file tree
Hide file tree
Showing 165 changed files with 9,390 additions and 3,462 deletions.
1 change: 1 addition & 0 deletions docs/05-internals/design/principles/index.md
Expand Up @@ -23,6 +23,7 @@ License along with Knora. If not, see <http://www.gnu.org/licenses/>.
- [Futures with Akka](futures-with-akka.md)
- [HTTP Module](http-module.md)
- [Store Module](store-module.md)
- [RDF Processing API](rdf-api.md)
- [Triplestore Updates](triplestore-updates.md)
- [Consistency Checking](consistency-checking.md)
- [Authentication](authentication.md)
Expand Down
111 changes: 111 additions & 0 deletions docs/05-internals/design/principles/rdf-api.md
@@ -0,0 +1,111 @@
<!---
Copyright © 2015-2019 the contributors (see Contributors.md).
This file is part of Knora.
Knora is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as published
by the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
Knora is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public
License along with Knora. If not, see <http://www.gnu.org/licenses/>.
-->

# RDF Processing API

Knora provides an API for parsing and formatting RDF data and
for working with RDF graphs. This allows Knora developers to use a single,
idiomatic Scala API as a façade for a Java RDF library.
By using a feature toggle, you can choose either
[Jena](https://jena.apache.org/tutorials/rdf_api.html)
or
[RDF4J](https://rdf4j.org/documentation/programming/)
as the underlying implementation.


## Overview

The API is in the package `org.knora.webapi.messages.util.rdf`. It includes:

- `RdfModel`, which represents a set of RDF graphs (a default graph and/or one or more named graphs).
A model can be constructed from scratch, modified, and searched.

- `RdfNode` and its subclasses, which represent RDF nodes (IRIs, blank nodes, and literals).

- `Statement`, which represents a triple or quad.

- `RdfNodeFactory`, which creates nodes and statements.

- `RdfModelFactory`, which creates empty RDF models.

- `RdfFormatUtil`, which parses and formats RDF models.

- `JsonLDUtil`, which provides specialised functionality for working
with RDF in JSON-LD format, and for converting between RDF models
and JSON-LD documents. `RdfFormatUtil` uses `JsonLDUtil` when appropriate.

To work with RDF models, start with `RdfFeatureFactory`, which returns instances
of `RdfNodeFactory`, `RdfModelFactory`, and `RdfFormatUtil`, using feature toggle
configuration.

`JsonLDUtil` does not need a feature factory.


## Implementations

- The Jena-based implementation, in package `org.knora.webapi.messages.util.rdf.jenaimpl`.

- The RDF4J-based implementation, in package `org.knora.webapi.messages.util.rdf.rdf4jimpl`.


## Feature toggle

For an overview of feature toggles, see [Feature Toggles](feature-toggles.md).

The RDF API uses the feature toggle `jena-rdf-library`:

- `on`: use the Jena implementation.

- `off` (the default): use the RDF4J implementation.


The default setting is used on startup, e.g. to read ontologies from the
repository. After startup, the per-request setting is used.


## What still uses RDF4J directly

Before this API was added, Knora mainly used the RDF4J API directly, and still does
in some places:

- Code that uses RDF4J's streaming API to process large amounts of data, especially to
avoid constructing a large string in TriG format:

- `ProjectsResponderADM.projectDataGetRequestADM`

- `HttpTriplestoreConnector.turtleToTrig`

- `RepositoryUpdater`

- The repository update plugin tests, which use SPARQL.

- `TEIHeader`: uses XSLT that depends on the exact format of the RDF/XML generated by RDF4J.
The XSLT would need to be improved to handle `rdf:Description`.

- `GravsearchParser`: uses RDF4J's SPARQL parser. This is probably
not worth changing.


## TODO

- SHACL validation.

- SPARQL querying.

- A streaming parsing/formatting API for processing large graphs.
6 changes: 1 addition & 5 deletions webapi/BUILD.bazel
Expand Up @@ -2,7 +2,6 @@ package(default_visibility = ["//visibility:public"])

load("@io_bazel_rules_scala//scala:scala.bzl", "scala_binary", "scala_library", "scala_repl", "scala_test")


# alias added for convenience. To call, use: bazel run //webapi:GenerateContributorsFile
alias(
name = "GenerateContributorsFile",
Expand Down Expand Up @@ -156,6 +155,7 @@ scala_library(
"//webapi/src/main/scala/org/knora/webapi/http/handler",
"//webapi/src/main/scala/org/knora/webapi/instrumentation",
"//webapi/src/main/scala/org/knora/webapi/messages",
"//webapi/src/main/scala/org/knora/webapi/feature",
"//webapi/src/main/scala/org/knora/webapi/responders",
"//webapi/src/main/scala/org/knora/webapi/routing",
"//webapi/src/main/scala/org/knora/webapi/settings",
Expand All @@ -167,17 +167,14 @@ scala_library(
# Test Libs
"@maven//:com_typesafe_akka_akka_testkit_2_12",
"@maven//:com_typesafe_akka_akka_http_testkit_2_12",
"@maven//:com_jsuereth_scala_arm_2_12",
"@maven//:com_typesafe_akka_akka_actor_2_12",
"@maven//:com_typesafe_akka_akka_http_2_12",
"@maven//:com_typesafe_akka_akka_http_core_2_12",
"@maven//:com_typesafe_akka_akka_http_spray_json_2_12",
"@maven//:com_typesafe_akka_akka_stream_2_12",
"@maven//:com_typesafe_config",
"@maven//:io_spray_spray_json_2_12",
"@maven//:org_eclipse_rdf4j_rdf4j_client",
"@maven//:org_scalactic_scalactic_2_12",
"@maven//:org_scalatest_scalatest_2_12",
"@maven//:org_scalatest_scalatest_core_2_12",
"@maven//:org_scalatest_scalatest_wordspec_2_12",
"@maven//:org_scalatest_scalatest_matchers_core_2_12",
Expand Down Expand Up @@ -236,7 +233,6 @@ scala_library(
"@maven//:org_scala_lang_scala_library",
"@maven//:org_scala_lang_scala_reflect",
"@maven//:org_scalactic_scalactic_2_12",
"@maven//:org_scalatest_scalatest_2_12",
"@maven//:org_scalatest_scalatest_core_2_12",
"@maven//:org_scalatest_scalatest_wordspec_2_12",
"@maven//:org_scalatest_scalatest_matchers_core_2_12",
Expand Down
4 changes: 2 additions & 2 deletions webapi/src/it/scala/org/knora/webapi/ITKnoraLiveSpec.scala
Expand Up @@ -36,8 +36,8 @@ import org.knora.webapi.exceptions.AssertionException
import org.knora.webapi.messages.StringFormatter
import org.knora.webapi.messages.app.appmessages.{AppStart, AppStop, SetAllowReloadOverHTTPState}
import org.knora.webapi.messages.store.triplestoremessages.{RdfDataObject, TriplestoreJsonProtocol}
import org.knora.webapi.messages.util.{JsonLDDocument, JsonLDUtil}
import org.knora.webapi.settings.{KnoraDispatchers, KnoraSettings, KnoraSettingsImpl, _}
import org.knora.webapi.messages.util.rdf.{JsonLDDocument, JsonLDUtil}
import org.knora.webapi.settings._
import org.knora.webapi.util.StartupUtils
import org.scalatest.matchers.should.Matchers
import org.scalatest.wordspec.AnyWordSpecLike
Expand Down
Expand Up @@ -31,6 +31,7 @@ import org.knora.webapi.exceptions.AssertionException
import org.knora.webapi.messages.IriConversions._
import org.knora.webapi.messages.store.triplestoremessages.TriplestoreJsonProtocol
import org.knora.webapi.messages.util._
import org.knora.webapi.messages.util.rdf.{JsonLDArray, JsonLDKeywords, JsonLDDocument, JsonLDObject, JsonLDValue}
import org.knora.webapi.messages.v2.routing.authenticationmessages._
import org.knora.webapi.messages.{OntologyConstants, SmartIri, StringFormatter}
import org.knora.webapi.sharedtestdata.SharedTestDataADM
Expand Down Expand Up @@ -157,11 +158,11 @@ class KnoraSipiIntegrationV2ITSpec extends ITKnoraLiveSpec(KnoraSipiIntegrationV
private def getValueFromResource(resource: JsonLDDocument,
propertyIriInResult: SmartIri,
expectedValueIri: IRI): JsonLDObject = {
val resourceIri: IRI = resource.requireStringWithValidation(JsonLDConstants.ID, stringFormatter.validateAndEscapeIri)
val resourceIri: IRI = resource.requireStringWithValidation(JsonLDKeywords.ID, stringFormatter.validateAndEscapeIri)
val propertyValues: JsonLDArray = getValuesFromResource(resource = resource, propertyIriInResult = propertyIriInResult)

val matchingValues: Seq[JsonLDObject] = propertyValues.value.collect {
case jsonLDObject: JsonLDObject if jsonLDObject.requireStringWithValidation(JsonLDConstants.ID, stringFormatter.validateAndEscapeIri) == expectedValueIri => jsonLDObject
case jsonLDObject: JsonLDObject if jsonLDObject.requireStringWithValidation(JsonLDKeywords.ID, stringFormatter.validateAndEscapeIri) == expectedValueIri => jsonLDObject
}

if (matchingValues.isEmpty) {
Expand Down Expand Up @@ -620,7 +621,7 @@ class KnoraSipiIntegrationV2ITSpec extends ITKnoraLiveSpec(KnoraSipiIntegrationV

val request = Post(s"$baseApiUrl/v2/resources", HttpEntity(RdfMediaTypes.`application/ld+json`, jsonLdEntity)) ~> addCredentials(BasicHttpCredentials(anythingUserEmail, password))
val responseJsonDoc: JsonLDDocument = getResponseJsonLD(request)
val resourceIri: IRI = responseJsonDoc.body.requireStringWithValidation(JsonLDConstants.ID, stringFormatter.validateAndEscapeIri)
val resourceIri: IRI = responseJsonDoc.body.requireStringWithValidation(JsonLDKeywords.ID, stringFormatter.validateAndEscapeIri)
csvResourceIri.set(responseJsonDoc.body.requireIDAsKnoraDataIri.toString)

// Get the resource from Knora.
Expand Down Expand Up @@ -768,7 +769,7 @@ class KnoraSipiIntegrationV2ITSpec extends ITKnoraLiveSpec(KnoraSipiIntegrationV

val request = Post(s"$baseApiUrl/v2/resources", HttpEntity(RdfMediaTypes.`application/ld+json`, jsonLdEntity)) ~> addCredentials(BasicHttpCredentials(anythingUserEmail, password))
val responseJsonDoc: JsonLDDocument = getResponseJsonLD(request)
val resourceIri: IRI = responseJsonDoc.body.requireStringWithValidation(JsonLDConstants.ID, stringFormatter.validateAndEscapeIri)
val resourceIri: IRI = responseJsonDoc.body.requireStringWithValidation(JsonLDKeywords.ID, stringFormatter.validateAndEscapeIri)
xmlResourceIri.set(responseJsonDoc.body.requireIDAsKnoraDataIri.toString)

// Get the resource from Knora.
Expand Down
13 changes: 13 additions & 0 deletions webapi/src/main/resources/application.conf
Expand Up @@ -279,6 +279,19 @@ app {
"Benjamin Geer <benjamin.geer@dasch.swiss>"
]
}

jena-rdf-library {
description = "Use the Jena API for RDF processing. If turned off, use the RDF4J API."

available-versions = [ 1 ]
default-version = 1
enabled-by-default = no
override-allowed = yes

developer-emails = [
"Benjamin Geer <benjamin.geer@dasch.swiss>"
]
}
}

print-extended-config = false // If true, an extended list of configuration parameters will be printed out at startup.
Expand Down
8 changes: 8 additions & 0 deletions webapi/src/main/scala/org/knora/webapi/RdfMediaTypes.scala
Expand Up @@ -48,13 +48,21 @@ object RdfMediaTypes {
fileExtensions = List("rdf")
)

val `application/trig`: MediaType.WithFixedCharset = MediaType.customWithFixedCharset(
mainType = "application",
subType = "trig",
charset = HttpCharsets.`UTF-8`,
fileExtensions = List("trig")
)

/**
* A map of MIME types (strings) to supported RDF media types.
*/
val registry: Map[String, MediaType.NonBinary] = Set(
`application/json`,
`application/ld+json`,
`text/turtle`,
`application/trig`,
`application/rdf+xml`
).map {
mediaType => mediaType.toString -> mediaType
Expand Down
Expand Up @@ -32,6 +32,7 @@ import com.typesafe.scalalogging.LazyLogging
import kamon.Kamon
import org.knora.webapi.core.LiveActorMaker
import org.knora.webapi.exceptions.{InconsistentTriplestoreDataException, SipiException, UnexpectedMessageException, UnsupportedValueException}
import org.knora.webapi.feature.{FeatureFactoryConfig, KnoraSettingsFeatureFactoryConfig}
import org.knora.webapi.http.handler
import org.knora.webapi.http.version.ServerVersion
import org.knora.webapi.messages.admin.responder.KnoraRequestADM
Expand Down Expand Up @@ -105,6 +106,11 @@ class ApplicationActor extends Actor with Stash with LazyLogging with AroundDire
*/
implicit val knoraSettings: KnoraSettingsImpl = KnoraSettings(system)

/**
* The default feature factory configuration, which is used during startup.
*/
val defaultFeatureFactoryConfig: FeatureFactoryConfig = new KnoraSettingsFeatureFactoryConfig(knoraSettings)

/**
* Provides the actor materializer (akka-http)
*/
Expand Down Expand Up @@ -353,7 +359,10 @@ class ApplicationActor extends Actor with Stash with LazyLogging with AroundDire

/* load ontologies request */
case LoadOntologies() =>
responderManager ! LoadOntologiesRequestV2(KnoraSystemInstances.Users.SystemUser)
responderManager ! LoadOntologiesRequestV2(
featureFactoryConfig = defaultFeatureFactoryConfig,
requestingUser = KnoraSystemInstances.Users.SystemUser
)

/* load ontologies response */
case SuccessResponseV2(_) =>
Expand Down
3 changes: 3 additions & 0 deletions webapi/src/main/scala/org/knora/webapi/app/BUILD.bazel
Expand Up @@ -10,6 +10,7 @@ scala_library(
"//webapi/src/main/scala/org/knora/webapi",
"//webapi/src/main/scala/org/knora/webapi/core",
"//webapi/src/main/scala/org/knora/webapi/exceptions",
"//webapi/src/main/scala/org/knora/webapi/feature",
"//webapi/src/main/scala/org/knora/webapi/http/handler",
"//webapi/src/main/scala/org/knora/webapi/http/version",
"//webapi/src/main/scala/org/knora/webapi/instrumentation",
Expand Down Expand Up @@ -54,6 +55,8 @@ scala_binary(
"@maven//:ch_qos_logback_logback_core",
"@maven//:com_typesafe_akka_akka_slf4j_2_12",
"@maven//:org_slf4j_log4j_over_slf4j",
"@maven//:org_glassfish_jakarta_json",
"@maven//:org_scala_lang_modules_scala_java8_compat_2_12",
],
)

Expand Down
Expand Up @@ -467,6 +467,14 @@ case class TestConfigurationException(message: String) extends ApplicationConfig
*/
case class FeatureToggleException(message: String, cause: Option[Throwable] = None) extends ApplicationConfigurationException(message)

/**
* Indicates that RDF processing failed.
*
* @param message a description of the error.
* @param cause the original exception representing the cause of the error, if any.
*/
case class RdfProcessingException(message: String, cause: Option[Throwable] = None) extends InternalServerException(message)

/**
* Helper functions for error handling.
*/
Expand All @@ -482,7 +490,7 @@ object ExceptionUtil {
SerializationUtils.serialize(e)
true
} catch {
case serEx: SerializationException => false
case _: SerializationException => false
}
}

Expand Down
3 changes: 0 additions & 3 deletions webapi/src/main/scala/org/knora/webapi/feature/BUILD.bazel
Expand Up @@ -9,12 +9,9 @@ scala_library(
deps = [
"//webapi/src/main/scala/org/knora/webapi",
"//webapi/src/main/scala/org/knora/webapi/exceptions",
"//webapi/src/main/scala/org/knora/webapi/messages",
"//webapi/src/main/scala/org/knora/webapi/settings",
"@maven//:com_typesafe_akka_akka_actor_2_12",
"@maven//:com_typesafe_akka_akka_http_2_12",
"@maven//:com_typesafe_akka_akka_http_core_2_12",
"@maven//:com_typesafe_scala_logging_scala_logging_2_12",
"@maven//:org_apache_jena_apache_jena_libs",
],
)
Expand Up @@ -23,12 +23,12 @@ import akka.http.scaladsl.model.{HttpHeader, HttpResponse}
import akka.http.scaladsl.model.headers.RawHeader
import akka.http.scaladsl.server.RequestContext
import org.knora.webapi.exceptions.{BadRequestException, FeatureToggleException}
import org.knora.webapi.messages.StringFormatter
import org.knora.webapi.settings.KnoraSettings.FeatureToggleBaseConfig
import org.knora.webapi.settings.KnoraSettingsImpl

import scala.annotation.tailrec
import scala.util.{Failure, Success, Try}
import scala.util.control.Exception._

/**
* A tagging trait for module-specific factories that produce implementations of features.
Expand Down Expand Up @@ -362,8 +362,7 @@ object RequestContextFeatureFactoryConfig {
* @param parent the parent [[FeatureFactoryConfig]].
*/
class RequestContextFeatureFactoryConfig(requestContext: RequestContext,
parent: FeatureFactoryConfig)(implicit stringFormatter: StringFormatter) extends OverridingFeatureFactoryConfig(parent) {

parent: FeatureFactoryConfig) extends OverridingFeatureFactoryConfig(parent) {
import FeatureToggle._
import RequestContextFeatureFactoryConfig._

Expand Down Expand Up @@ -392,7 +391,7 @@ class RequestContextFeatureFactoryConfig(requestContext: RequestContext,
}

val maybeVersion: Option[Int] = featureNameAndVersion.drop(1).headOption.map {
versionStr => stringFormatter.validateInt(versionStr, throw BadRequestException(s"Invalid version number '$versionStr' in feature toggle $featureName"))
versionStr: String => allCatch.opt(versionStr.toInt).getOrElse(throw BadRequestException(s"Invalid version number '$versionStr' in feature toggle $featureName"))
}

featureName -> FeatureToggle(
Expand Down
Expand Up @@ -26,7 +26,7 @@ import com.typesafe.scalalogging.LazyLogging
import org.knora.webapi.exceptions.{InternalServerException, RequestRejectedException}
import org.knora.webapi.http.status.{ApiStatusCodesV1, ApiStatusCodesV2}
import org.knora.webapi.messages.OntologyConstants
import org.knora.webapi.messages.util.{JsonLDDocument, JsonLDObject, JsonLDString}
import org.knora.webapi.messages.util.rdf.{JsonLDDocument, JsonLDObject, JsonLDString}
import org.knora.webapi.settings.KnoraSettingsImpl
import spray.json.{JsNumber, JsObject, JsString, JsValue}

Expand Down

0 comments on commit 9170419

Please sign in to comment.