Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
feat: Add an RDF processing façade (2nd iteration) (DSP-1083) (#1759)
  • Loading branch information
Benjamin Geer committed Nov 24, 2020
1 parent e4e16e0 commit 346873d
Show file tree
Hide file tree
Showing 89 changed files with 2,366 additions and 1,462 deletions.
60 changes: 30 additions & 30 deletions docs/05-internals/design/principles/rdf-api.md
Expand Up @@ -52,9 +52,37 @@ The API is in the package `org.knora.webapi.messages.util.rdf`. It includes:

To work with RDF models, start with `RdfFeatureFactory`, which returns instances
of `RdfNodeFactory`, `RdfModelFactory`, and `RdfFormatUtil`, using feature toggle
configuration.
configuration. `JsonLDUtil` does not need a feature factory.

`JsonLDUtil` does not need a feature factory.
To iterate efficiently over the statements in an `RdfModel`, use its `iterator` method.
An `RdfModel` cannot be modified while you are iterating over it.
If you are iterating to look for statements to modify, you can
collect a `Set` of statements to remove and a `Set` of statements
to add, and perform these update operations after you have finished
the iteration.

## RDF stream processing

To read or write a large amount of RDF data without generating a large string
object, you can use the stream processing methods in `RdfFormatUtil`.

To parse an `InputStream` to an `RdfModel`, use `inputStreamToRdfModel`.
To format an `RdfModel` to an `OutputStream`, use `rdfModelToOutputStream`.

To parse RDF data from an `InputStream` and process it one statement at a time,
you can write a class that implements the `RdfStreamProcessor` trait, and
use it with the `RdfFormatUtil.parseWithStreamProcessor` method.
Your `RdfStreamProcessor` can also send one statement at a time to a
formatting stream processor, which knows how to write RDF to an `OutputStream`
in a particular format. Use `RdfFormatUtil.makeFormattingStreamProcessor` to
construct one of these.


## SPARQL queries

In tests, it can be useful to run SPARQL queries to check the content of
an `RdfModel`. To do this, use the `RdfModel.asRepository` method, which
returns an `RdfRepository` that can run `SELECT` queries.


## Implementations
Expand All @@ -74,38 +102,10 @@ The RDF API uses the feature toggle `jena-rdf-library`:

- `off` (the default): use the RDF4J implementation.


The default setting is used on startup, e.g. to read ontologies from the
repository. After startup, the per-request setting is used.


## What still uses RDF4J directly

Before this API was added, Knora mainly used the RDF4J API directly, and still does
in some places:

- Code that uses RDF4J's streaming API to process large amounts of data, especially to
avoid constructing a large string in TriG format:

- `ProjectsResponderADM.projectDataGetRequestADM`

- `HttpTriplestoreConnector.turtleToTrig`

- `RepositoryUpdater`

- The repository update plugin tests, which use SPARQL.

- `TEIHeader`: uses XSLT that depends on the exact format of the RDF/XML generated by RDF4J.
The XSLT would need to be improved to handle `rdf:Description`.

- `GravsearchParser`: uses RDF4J's SPARQL parser. This is probably
not worth changing.


## TODO

- SHACL validation.

- SPARQL querying.

- A streaming parsing/formatting API for processing large graphs.
33 changes: 14 additions & 19 deletions docs/05-internals/development/updating-repositories.md
Expand Up @@ -51,18 +51,18 @@ it to `org.knora.webapi.store.triplestore.upgrade.RepositoryUpdater`.

3. Download the entire repository from the triplestore into a TriG file.

4. Read the TriG file into an RDF4J `Model`.
4. Read the TriG file into an `RdfModel`.

5. Update the `Model` by running the necessary transformations, and replacing the
5. Update the `RdfModel` by running the necessary transformations, and replacing the
built-in Knora ontologies with the current ones.

6. Save the `Model` to a new TriG file.
6. Save the `RdfModel` to a new TriG file.

7. Empty the repository in the triplestore.

8. Upload the transformed repository file to the triplestore.

To update the `Model`, `RepositoryUpdater` runs a sequence of upgrade plugins, each of which
To update the `RdfModel`, `RepositoryUpdater` runs a sequence of upgrade plugins, each of which
is a class in `org.knora.webapi.store.triplestore.upgrade.plugins` and is registered
in `RepositoryUpdatePlan`.

Expand Down Expand Up @@ -94,32 +94,27 @@ with existing data, the following must happen:
in the string constant `org.knora.webapi.KnoraBaseVersion`.

- A plugin must be added in the package `org.knora.webapi.store.triplestore.upgrade.plugins`,
and registered in `RepositoryUpdatePlan`, to transform
existing repositories so that they are compatible with the code changes
introduced in the pull request.
to transform existing repositories so that they are compatible with the code changes
introduced in the pull request. Each new plugin must be registered
by adding it to the sequence returned by `RepositoryUpdatePlan.makePluginsForVersions`.

The order of version numbers must correspond to the order in which the pull requests
are merged.
The order of version numbers (and the plugins) must correspond to the order in which the
pull requests are merged.

An upgrade plugin is a Scala class that extends `UpgradePlugin`. The name of the plugin
class should refer to the pull request that made the transformation necessary,
using the format `UpgradePluginPRNNNN`, where `NNNN` is the number of the pull request.

A plugin's `transform` method takes an RDF4J `Model` (a mutable object representing
the repository) and modifies it as needed. For details on how to do this, see
[The RDF Model API](https://rdf4j.eclipse.org/documentation/programming/model/)
in the RDF4J documentation.
A plugin's `transform` method takes an `RdfModel` (a mutable object representing
the repository) and modifies it as needed.

Before transforming the data, a plugin can check whether a required manual transformation
has been carried out. If the requirement is not met, the plugin can throw
`InconsistentTriplestoreDataException` to abort the upgrade process.

The plugin must then be appended to the sequence `pluginsForVersions` in
`RepositoryUpdatePlan`.
`InconsistentRepositoryDataException` to abort the upgrade process.

## Testing Update Plugins

Each plugin should have a unit test that extends `UpgradePluginSpec`. A typical
test loads a TriG file containing test data into a `Model`, runs the plugin,
makes an RDF4J `SailRepository` containing the transformed `Model`, and uses
test loads a TriG file containing test data into a `RdfModel`, runs the plugin,
makes an `RdfRepository` containing the transformed `RdfModel`, and uses
SPARQL to check the result.
73 changes: 30 additions & 43 deletions test_data/test_route/texts/beol/header.xsl
@@ -1,4 +1,11 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
An example stylesheet that transforms an RDF/XML representation of a beol:letter into
a TEI/XML header. This stylesheet assumes that the input consists only of
<rdf:Description> elements.
-->
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
Expand All @@ -15,25 +22,6 @@
<xsl:sequence select="replace($input, '\(DE-588\)', 'http://d-nb.info/gnd/')"/>
</xsl:function>

<!-- Given a link value IRI and the document root node, returns the IRI of the target resource. -->
<xsl:function name="knora-api:getTargetResourceIri" as="xs:anyURI">
<xsl:param name="linkValueIri" as="xs:anyURI"/>
<xsl:param name="documentRoot" as="item()"/>

<xsl:choose>
<xsl:when test="boolean($documentRoot//knora-api:LinkValue[@rdf:about=$linkValueIri]//beol:person)">
<!-- The target resource is nested in the LinkValue. -->
<xsl:value-of
select="$documentRoot//knora-api:LinkValue[@rdf:about=$linkValueIri]//beol:person/@rdf:about"/>
</xsl:when>
<xsl:otherwise>
<!-- The target resource is not nested in the LinkValue. -->
<xsl:value-of
select="$documentRoot//knora-api:LinkValue[@rdf:about=$linkValueIri]//knora-api:linkValueHasTarget/@rdf:resource"/>
</xsl:otherwise>
</xsl:choose>
</xsl:function>

<!-- https://www.safaribooksonline.com/library/view/xslt-cookbook/0596003722/ch03s03.html?orpq -->
<xsl:function name="knora-api:last-day-of-month" as="xs:string">
<xsl:param name="month"/>
Expand Down Expand Up @@ -137,10 +125,9 @@

</xsl:function>

<xsl:template match="rdf:RDF">
<xsl:variable name="resourceIri" select="beol:letter/@rdf:about"/>
<xsl:variable name="label" select="beol:letter/rdfs1:label/text()"/>

<xsl:template match="//rdf:RDF">
<xsl:variable name="resourceIri" select="//rdf:Description[./rdf:type/@rdf:resource='http://0.0.0.0:3333/ontology/0801/beol/v2#letter']/@rdf:about"/>
<xsl:variable name="label" select="//rdf:Description[@rdf:about=$resourceIri]/rdfs1:label/text()"/>

<teiHeader>
<fileDesc>
Expand All @@ -164,31 +151,31 @@
<xsl:attribute name="ref">
<xsl:value-of select="$resourceIri"/>
</xsl:attribute>
<xsl:apply-templates/>
<xsl:apply-templates select="//rdf:Description/beol:hasAuthorValue"/>
<xsl:apply-templates select="//rdf:Description/beol:hasRecipientValue"/>
</correspDesc>
</profileDesc>
</teiHeader>
</xsl:template>

<xsl:template match="beol:letter/beol:hasAuthorValue">
<xsl:template match="//rdf:Description/beol:hasAuthorValue">
<xsl:variable name="authorValueIri" select="@rdf:resource"/>
<xsl:variable name="authorIri" select="knora-api:getTargetResourceIri($authorValueIri, /.)"/>
<xsl:variable name="authorIri" select="//rdf:Description[@rdf:about=$authorValueIri]//knora-api:linkValueHasTarget/@rdf:resource"/>

<xsl:variable name="authorIAFValue"
select="//beol:person[@rdf:about=$authorIri]//beol:hasIAFIdentifier/@rdf:resource"/>
select="//rdf:Description[@rdf:about=$authorIri]//beol:hasIAFIdentifier/@rdf:resource"/>
<xsl:variable name="authorFamilyNameValue"
select="//beol:person[@rdf:about=$authorIri]//beol:hasFamilyName/@rdf:resource"/>
select="//rdf:Description[@rdf:about=$authorIri]//beol:hasFamilyName/@rdf:resource"/>
<xsl:variable name="authorGivenNameValue"
select="//beol:person[@rdf:about=$authorIri]//beol:hasGivenName/@rdf:resource"/>
select="//rdf:Description[@rdf:about=$authorIri]//beol:hasGivenName/@rdf:resource"/>

<correspAction type="sent">

<xsl:variable name="authorIAFText"
select="//knora-api:TextValue[@rdf:about=$authorIAFValue]/knora-api:valueAsString/text()"/>
select="//rdf:Description[@rdf:about=$authorIAFValue]/knora-api:valueAsString/text()"/>
<xsl:variable name="authorFamilyNameText"
select="//knora-api:TextValue[@rdf:about=$authorFamilyNameValue]/knora-api:valueAsString/text()"/>
select="//rdf:Description[@rdf:about=$authorFamilyNameValue]/knora-api:valueAsString/text()"/>
<xsl:variable name="authorGivenNameText"
select="//knora-api:TextValue[@rdf:about=$authorGivenNameValue]/knora-api:valueAsString/text()"/>
select="//rdf:Description[@rdf:about=$authorGivenNameValue]/knora-api:valueAsString/text()"/>

<persName>
<xsl:attribute name="ref">
Expand All @@ -203,32 +190,32 @@
<xsl:variable name="dateValue" select="//beol:creationDate/@rdf:resource"/>

<xsl:variable name="dateObj"
select="//knora-api:DateValue[@rdf:about=$dateValue]"/>
select="//rdf:Description[@rdf:about=$dateValue]"/>

<xsl:copy-of select="knora-api:dateformat($dateObj)"/>

</correspAction>
</xsl:template>

<xsl:template match="beol:letter/beol:hasRecipientValue">
<xsl:template match="//rdf:Description/beol:hasRecipientValue">
<xsl:variable name="recipientValueIri" select="@rdf:resource"/>
<xsl:variable name="recipientIri" select="knora-api:getTargetResourceIri($recipientValueIri, /.)"/>
<xsl:variable name="recipientIri" select="//rdf:Description[@rdf:about=$recipientValueIri]//knora-api:linkValueHasTarget/@rdf:resource"/>

<xsl:variable name="recipientIAFValue"
select="//beol:person[@rdf:about=$recipientIri]//beol:hasIAFIdentifier/@rdf:resource"/>
select="//rdf:Description[@rdf:about=$recipientIri]//beol:hasIAFIdentifier/@rdf:resource"/>
<xsl:variable name="recipientFamilyNameValue"
select="//beol:person[@rdf:about=$recipientIri]//beol:hasFamilyName/@rdf:resource"/>
select="//rdf:Description[@rdf:about=$recipientIri]//beol:hasFamilyName/@rdf:resource"/>
<xsl:variable name="recipientGivenNameValue"
select="//beol:person[@rdf:about=$recipientIri]//beol:hasGivenName/@rdf:resource"/>
select="//rdf:Description[@rdf:about=$recipientIri]//beol:hasGivenName/@rdf:resource"/>

<correspAction type="received">

<xsl:variable name="recipientIAFText"
select="//knora-api:TextValue[@rdf:about=$recipientIAFValue]/knora-api:valueAsString/text()"/>
select="//rdf:Description[@rdf:about=$recipientIAFValue]/knora-api:valueAsString/text()"/>
<xsl:variable name="recipientFamilyNameText"
select="//knora-api:TextValue[@rdf:about=$recipientFamilyNameValue]/knora-api:valueAsString/text()"/>
select="//rdf:Description[@rdf:about=$recipientFamilyNameValue]/knora-api:valueAsString/text()"/>
<xsl:variable name="recipientGivenNameText"
select="//knora-api:TextValue[@rdf:about=$recipientGivenNameValue]/knora-api:valueAsString/text()"/>
select="//rdf:Description[@rdf:about=$recipientGivenNameValue]/knora-api:valueAsString/text()"/>

<persName>
<xsl:attribute name="ref">
Expand All @@ -244,7 +231,7 @@
</xsl:template>

<!-- ignore text if there is no template for the element containing it -->
<xsl:template match="text()"></xsl:template>
<xsl:template match="text()"/>


</xsl:transform>
Expand Up @@ -31,7 +31,7 @@ import ch.megard.akka.http.cors.scaladsl.settings.CorsSettings
import com.typesafe.scalalogging.LazyLogging
import kamon.Kamon
import org.knora.webapi.core.LiveActorMaker
import org.knora.webapi.exceptions.{InconsistentTriplestoreDataException, SipiException, UnexpectedMessageException, UnsupportedValueException}
import org.knora.webapi.exceptions.{InconsistentRepositoryDataException, SipiException, UnexpectedMessageException, UnsupportedValueException}
import org.knora.webapi.feature.{FeatureFactoryConfig, KnoraSettingsFeatureFactoryConfig}
import org.knora.webapi.http.handler
import org.knora.webapi.http.version.ServerVersion
Expand Down Expand Up @@ -143,7 +143,7 @@ class ApplicationActor extends Actor with Stash with LazyLogging with AroundDire
case _: ArithmeticException => Resume
case _: NullPointerException => Restart
case _: IllegalArgumentException => Stop
case e: InconsistentTriplestoreDataException =>
case e: InconsistentRepositoryDataException =>
logger.info(s"Received a 'InconsistentTriplestoreDataException', will shutdown now. Cause: {}", e.message)
Stop
case e: SipiException =>
Expand Down Expand Up @@ -552,6 +552,11 @@ class ApplicationActor extends Actor with Stash with LazyLogging with AroundDire
msg += s"DSP-API Server started: http://${knoraSettings.internalKnoraApiHost}:${knoraSettings.internalKnoraApiPort}\n"
msg += "------------------------------------------------\n"

defaultFeatureFactoryConfig.makeToggleSettingsString match {
case Some(toggleSettingsString) => msg += s"Default feature toggle settings: $toggleSettingsString\n"
case None => ()
}

if (allowReloadOverHTTPState | knoraSettings.allowReloadOverHTTP) {
msg += "WARNING: Resetting DB over HTTP is turned ON.\n"
msg += "------------------------------------------------\n"
Expand Down
Expand Up @@ -324,15 +324,15 @@ object TriplestoreResponseException {
}

/**
* Indicates that the triplestore returned inconsistent data.
* Indicates an inconsistency in repository data.
*
* @param message a description of the error.
*/
case class InconsistentTriplestoreDataException(message: String, cause: Option[Throwable] = None) extends TriplestoreException(message, cause)
case class InconsistentRepositoryDataException(message: String, cause: Option[Throwable] = None) extends InternalServerException(message, cause)

object InconsistentTriplestoreDataException {
def apply(message: String, e: Throwable, log: LoggingAdapter): InconsistentTriplestoreDataException =
InconsistentTriplestoreDataException(message, Some(ExceptionUtil.logAndWrapIfNotSerializable(e, log)))
object InconsistentRepositoryDataException {
def apply(message: String, e: Throwable, log: LoggingAdapter): InconsistentRepositoryDataException =
InconsistentRepositoryDataException(message, Some(ExceptionUtil.logAndWrapIfNotSerializable(e, log)))
}

/**
Expand Down
Expand Up @@ -227,9 +227,9 @@ abstract class FeatureFactoryConfig(protected val maybeParent: Option[FeatureFac
protected[feature] def getLocalConfig(featureName: String): Option[FeatureToggle]

/**
* Returns an [[HttpHeader]] giving the state of all feature toggles.
* Returns a string giving the state of all feature toggles.
*/
def makeHttpResponseHeader: Option[HttpHeader] = {
def makeToggleSettingsString: Option[String] = {
// Convert each toggle to its string representation.
val enabledToggles: Set[String] = getAllBaseConfigs.map {
baseConfig: FeatureToggleBaseConfig =>
Expand All @@ -246,13 +246,22 @@ abstract class FeatureFactoryConfig(protected val maybeParent: Option[FeatureFac
// Are any toggles enabled?
if (enabledToggles.nonEmpty) {
// Yes. Return a header.
Some(RawHeader(FeatureToggle.RESPONSE_HEADER, enabledToggles.mkString(",")))
Some(enabledToggles.mkString(","))
} else {
// No. Don't return a header.
None
}
}

/**
* Returns an [[HttpHeader]] giving the state of all feature toggles.
*/
def makeHttpResponseHeader: Option[HttpHeader] = {
makeToggleSettingsString.map {
settingsStr: String => RawHeader(FeatureToggle.RESPONSE_HEADER, settingsStr)
}
}

/**
* Adds an [[HttpHeader]] to an [[HttpResponse]] indicating which feature toggles are enabled.
*/
Expand Down
3 changes: 3 additions & 0 deletions webapi/src/main/scala/org/knora/webapi/messages/BUILD.bazel
Expand Up @@ -34,6 +34,9 @@ scala_library(
"@maven//:org_apache_commons_commons_text",
"@maven//:org_apache_jena_apache_jena_libs",
"@maven//:org_eclipse_rdf4j_rdf4j_client",
"@maven//:org_eclipse_rdf4j_rdf4j_repository_sail",
"@maven//:org_eclipse_rdf4j_rdf4j_sail_api",
"@maven//:org_eclipse_rdf4j_rdf4j_sail_memory",
"@maven//:org_jodd_jodd",
"@maven//:org_scala_lang_modules_scala_xml_2_12",
"@maven//:org_scala_lang_scala_library",
Expand Down

0 comments on commit 346873d

Please sign in to comment.