Skip to content

Commit

Permalink
refactor(triplestore): remove embedded-jena-tdb related code (#2043)
Browse files Browse the repository at this point in the history
* remove embedded-jena-tdb related code

* remove TriplestoreTypes

* remove triplestore param from sparql queries in admin

* remove triplestore param from sparql queries in v1

* remove triplestore param from sparql queries in v2

* remove more triplestore params

* minor improvements

* fix: cache calls twirl template instead of resolving it, when trying to build cache from triplestore

* fix: more wrong twirl template calls

* docs: remove Jena TDB from docs

Co-authored-by: Balduin Landolt <33053745+BalduinLandolt@users.noreply.github.com>
  • Loading branch information
mpro7 and BalduinLandolt committed Apr 25, 2022
1 parent bdc4f39 commit a5ea62e
Show file tree
Hide file tree
Showing 180 changed files with 622 additions and 1,766 deletions.
9 changes: 2 additions & 7 deletions docs/05-internals/design/principles/design-overview.md
Expand Up @@ -279,19 +279,14 @@ SPARQL queries are generated from templates, using the
[Twirl](https://github.com/playframework/twirl) template engine. For
example, if we're querying a resource, the template will contain a
placeholder for the resource's IRI. The templates can be found under
`src/main/twirl/queries/sparql`. In many cases, different SPARQL must
be generated for different triplestores; the Twirl template function
then takes the name of the triplestore as a parameter, and may delegate
to triplestore-specific templates.
`src/main/twirl/queries/sparql`.

Responders are not expected to know which triplestore is being used or how it
is accessed. To perform a SPARQL SELECT query, a responder sends a `SparqlSelectRequest`
To perform a SPARQL SELECT query, a responder sends a `SparqlSelectRequest`
message to the `storeManager` actor, like this:

```scala
for {
isEntityUsedSparql <- Future(queries.sparql.v2.txt.isEntityUsed(
triplestore = settings.triplestoreType,
entityIri = entityIri,
ignoreKnoraConstraints = ignoreKnoraConstraints,
ignoreRdfSubjectAndObject = ignoreRdfSubjectAndObject
Expand Down
169 changes: 9 additions & 160 deletions docs/05-internals/design/principles/store-module.md
Expand Up @@ -7,6 +7,9 @@

## Overview

**GraphDB and embedded Jena TDB triplestores support is deprecated** since
[v20.1.1](https://github.com/dasch-swiss/dsp-api/releases/tag/v20.1.1) of DSP-API.

The store module houses the different types of data stores supported by
Knora. At the moment, only triplestores and IIIF servers (Sipi) are supported.
The triplestore support is implemented in the
Expand All @@ -20,170 +23,16 @@ which is started when Knora starts. The `StoreManager` then starts the
`TriplestoreManager` and `IIIFManager`, which each in turn starts their
correct actor implementation.

## HTTP-based Triplestores

HTTP-based triplestore support is implemented in the
`org.knora.webapi.triplestore.http` package.

An HTTP-based triplestore is one that is accessed remotely over the HTTP
protocol. `HttpTriplestoreConnector` supports the open source triplestore
- [Apache Jena Fuseki](https://jena.apache.org).

### Apache Jena Fuseki

## Embedded Triplestores

Embedded triplestores are implemented in the
`org.knora.webapi.triplestore.embedded` package.

An embedded triplestore is one that runs in the same JVM as the Knora
API server.

### Apache Jena TDB

The support for embedded Jena TDB is currently dropped. The
documentation and the code will remain in the repository. You can use it
at your own risk.

The support for the embedded Jena-TDB triplestore is implemented in
`org.knora.webapi.triplestore.embedded.JenaTDBActor`.

The relevant Jena libraries that are used are the following:

- Jena API - The library used to work programmatically with RDF data
- Jena TDB - Their implementation of a triple store

#### Concurrency

Jena provides concurrency on different levels.

On the Jena TDB level there is the `Dataset` object, representing the
triple store. On every access, a transaction (read or write) can be
started.

On the Jena API level there is a `Model` object, which is equivalent to
an RDF `Graph`. Here we can lock the model, so that MRSW (Multiple
Reader Single Writer) access is allowed.

- <https://jena.apache.org/documentation/tdb/tdb_transactions.html>
- <https://jena.apache.org/documentation/notes/concurrency-howto.html>
## Triplestores

#### Implementation
Currently, the only supported triplestore is [Apache Jena Fuseki](https://jena.apache.org), a HTTP-based triplestore.

We employ transactions on the `Dataset` level. This means that every
thread that accesses the triplestore, starts a read or write enabled
transaction.
HTTP-based triplestore support is implemented in the `org.knora.webapi.triplestore.http` package.

The transaction mechanism in TDB is based on write-ahead-logging. All
changes made inside a write-transaction are written to journals, then
propagated to the main database at a suitable moment. This design allows
for read-transactions to proceed without locking or other overhead over
the base database.

Transactional TDB supports one active write transaction, and multiple
read transactions at the same time. Read-transactions started before a
write-transaction commits see the database in a state without any
changes visible. Any transaction starting after a write-transaction
commits sees the database with the changes visible, whether fully
propagates back to the database or not. There can be active read
transactions seeing the state of the database before the updates, and
read transactions seeing the state of the database after the updates
running at the same time.

#### Configuration

In `application.conf` set to use the embedded triplestore:

```
triplestore {
dbtype = "embedded-jena-tdb"
embedded-jena-tdb {
persisted = true // "false" -> memory, "true" -> disk
loadExistingData = false // "false" -> use data if exists, "false" -> create a fresh store
storage-path = "_TMP" // ignored if "memory"
}
reload-on-start = false // ignored if "memory" as it will always reload
rdf-data = [
{
path = "knora-ontologies/knora-base.ttl"
name = "http://www.knora.org/ontology/knora-base"
}
{
path = "knora-ontologies/salsah-gui.ttl"
name = "http://www.knora.org/ontology/salsah-gui"
}
{
path = "test_data/ontologies/incunabula-onto.ttl"
name = "http://www.knora.org/ontology/0803/incunabula"
}
{
path = "test_data/demo_data/incunabula-demo-data.ttl"
name = "http://www.knora.org/data/incunabula"
}
{
path = "test_data/ontologies/images-onto.ttl"
name = "http://www.knora.org/ontology/0804/dokubib"
}
{
path = "test_data/demo_data/images-demo-data.ttl"
name = "http://www.knora.org/data/dokubib"
}
]
}
```

Here the storage is set to `persistent`, meaning that a Jena TDB store
will be created under the defined `tdb-storage-path`. The
`reload-on-start` flag, if set to `true` would reload the triplestore
with the data referenced in `rdf-data`.

#### TDB Disk Persisted Store

Make sure to set `reload-on-start` to `true` if run for the first time.
This will create a TDB store and load the data.

If only *read access* is performed, then Knora can be run once with
reloading enabled. After that, reloading can be turned off, and the
persisted TDB store can be reused, as any data found under the
`tdb-storage-path` will be reused.

If the TDB storage files get corrupted, then just delete the folder and
reload the data anew.

#### Actor Messages

- `ResetTripleStoreContent(rdfDataObjects: List[RdfDataObject])`
- `ResetTripleStoreContentACK()`

The embedded Jena TDB can receive reset messages, and will ACK when
reloading of the data is finished. `RdfDataObject` is a simple case
class, containing the path and name (the same as `rdf-data` in the
config file)

As an example, to use it inside a test you could write something like:

```scala
val rdfDataObjects = List (
RdfDataObject(path = "knora-ontologies/knora-base.ttl",
name = "http://www.knora.org/ontology/knora-base"),
RdfDataObject(path = "knora-ontologies/salsah-gui.ttl",
name = "http://www.knora.org/ontology/salsah-gui"),
RdfDataObject(path = "test_data/ontologies/incunabula-onto.ttl",
name = "http://www.knora.org/ontology/0803/incunabula"),
RdfDataObject(path = "test_data/all_data/incunabula-data.ttl",
name = "http://www.knora.org/data/incunabula")
)
An HTTP-based triplestore is one that is accessed remotely over the HTTP
protocol. `HttpTriplestoreConnector` supports the open source triplestore [Apache Jena Fuseki](https://jena.apache.org).

"Reload data " in {
storeManager ! ResetTripleStoreContent(rdfDataObjects)
expectMsg(300.seconds, ResetTripleStoreContentACK())
}
```

## IIIF Servers

Currently, only support for SIPI is implemented in
`org.knora.webapi.store.iiifSipiConnector`.
Currently, only support for SIPI is implemented in `org.knora.webapi.store.iiifSipiConnector`.
2 changes: 1 addition & 1 deletion docs/05-internals/development/overview.md
Expand Up @@ -24,7 +24,7 @@ installation of Knora. The different parts are:
A number of triplestore implementations are available, including [free
software](http://www.gnu.org/philosophy/free-sw.en.html) as well as
proprietary options. DSP-API is designed to work with any
standards-compliant triplestore. It is primarily tested with [Apache Jena](https://jena.apache.org/).
standards-compliant triplestore. It is primarily tested with [Apache Jena Fuseki](https://jena.apache.org/).

## Sipi

Expand Down
12 changes: 0 additions & 12 deletions webapi/src/main/resources/application.conf
Expand Up @@ -510,7 +510,6 @@ app {
triplestore {
dbtype = "fuseki"
dbtype = ${?KNORA_WEBAPI_TRIPLESTORE_DBTYPE}
// dbtype = "embedded-jena-tdb"
// dbtype = "fake-triplestore"

use-https = false
Expand Down Expand Up @@ -540,17 +539,6 @@ app {
password = ${?KNORA_WEBAPI_TRIPLESTORE_FUSEKI_PASSWORD}
}

embedded-jena-tdb {
persisted = true // "false" -> memory, "true" -> disk
loadExistingData = false // "false" -> use data if exists, "false" -> create a fresh store
storage-path = "_TMP" // ignored if "memory"
}

fake-jena-tdb {
fake-persisted-storage = true
fake-triplestore-data-dir = "src/main/resources/query-log"
}

reload-on-start = false // ignored if "memory" as it will always reload

# This data is automatically loaded during resetting of the triple store content initiated
Expand Down
24 changes: 13 additions & 11 deletions webapi/src/main/scala/org/knora/webapi/responders/Responder.scala
Expand Up @@ -6,23 +6,27 @@
package org.knora.webapi
package responders

import exceptions.{BadRequestException, DuplicateValueException, UnexpectedMessageException}
import messages.store.triplestoremessages.SparqlSelectRequest
import messages.util.ResponderData
import messages.util.rdf.SparqlSelectResult
import messages.{SmartIri, StringFormatter}
import settings.{KnoraDispatchers, KnoraSettings, KnoraSettingsImpl}
import akka.actor.{ActorRef, ActorSystem}
import akka.actor.ActorRef
import akka.actor.ActorSystem
import akka.event.LoggingAdapter
import akka.http.scaladsl.util.FastFuture
import akka.pattern._
import akka.util.Timeout
import com.typesafe.scalalogging.{LazyLogging, Logger}
import com.typesafe.scalalogging.LazyLogging
import com.typesafe.scalalogging.Logger
import org.knora.webapi.store.cacheservice.settings.CacheServiceSettings

import scala.concurrent.{ExecutionContext, Future}
import scala.concurrent.ExecutionContext
import scala.concurrent.Future
import scala.language.postfixOps

import exceptions.{BadRequestException, DuplicateValueException, UnexpectedMessageException}
import messages.store.triplestoremessages.SparqlSelectRequest
import messages.util.ResponderData
import messages.util.rdf.SparqlSelectResult
import messages.{SmartIri, StringFormatter}
import settings.{KnoraDispatchers, KnoraSettings, KnoraSettingsImpl}

/**
* Responder helper methods.
*/
Expand Down Expand Up @@ -118,7 +122,6 @@ abstract class Responder(responderData: ResponderData) extends LazyLogging {
isEntityUsedSparql <- Future(
org.knora.webapi.messages.twirl.queries.sparql.v2.txt
.isEntityUsed(
triplestore = settings.triplestoreType,
entityIri = entityIri,
ignoreKnoraConstraints = ignoreKnoraConstraints,
ignoreRdfSubjectAndObject = ignoreRdfSubjectAndObject
Expand All @@ -145,7 +148,6 @@ abstract class Responder(responderData: ResponderData) extends LazyLogging {
isClassUsedInDataSparql <- Future(
org.knora.webapi.messages.twirl.queries.sparql.v2.txt
.isClassUsedInData(
triplestore = settings.triplestoreType,
classIri = classIri
)
.toString()
Expand Down
Expand Up @@ -11,17 +11,22 @@ import org.knora.webapi._
import org.knora.webapi.exceptions._
import org.knora.webapi.feature.FeatureFactoryConfig
import org.knora.webapi.messages.IriConversions._
import org.knora.webapi.messages.OntologyConstants
import org.knora.webapi.messages.SmartIri
import org.knora.webapi.messages.admin.responder.groupsmessages._
import org.knora.webapi.messages.admin.responder.projectsmessages.{ProjectADM, ProjectGetADM, ProjectIdentifierADM}
import org.knora.webapi.messages.admin.responder.projectsmessages.ProjectADM
import org.knora.webapi.messages.admin.responder.projectsmessages.ProjectGetADM
import org.knora.webapi.messages.admin.responder.projectsmessages.ProjectIdentifierADM
import org.knora.webapi.messages.admin.responder.usersmessages._
import org.knora.webapi.messages.admin.responder.valueObjects.GroupStatus
import org.knora.webapi.messages.store.triplestoremessages._
import org.knora.webapi.messages.util.KnoraSystemInstances
import org.knora.webapi.messages.util.ResponderData
import org.knora.webapi.messages.util.rdf.SparqlSelectResult
import org.knora.webapi.messages.util.{KnoraSystemInstances, ResponderData}
import org.knora.webapi.messages.v1.responder.projectmessages._
import org.knora.webapi.messages.{OntologyConstants, SmartIri}
import org.knora.webapi.responders.IriLocker
import org.knora.webapi.responders.Responder
import org.knora.webapi.responders.Responder.handleUnexpectedMessage
import org.knora.webapi.responders.{IriLocker, Responder}

import java.util.UUID
import scala.concurrent.Future
Expand Down Expand Up @@ -88,7 +93,6 @@ class GroupsResponderADM(responderData: ResponderData) extends Responder(respond
sparqlQuery <- Future(
org.knora.webapi.messages.twirl.queries.sparql.admin.txt
.getGroups(
triplestore = settings.triplestoreType,
maybeIri = None
)
.toString()
Expand Down Expand Up @@ -210,7 +214,6 @@ class GroupsResponderADM(responderData: ResponderData) extends Responder(respond
sparqlQuery <- Future(
org.knora.webapi.messages.twirl.queries.sparql.admin.txt
.getGroups(
triplestore = settings.triplestoreType,
maybeIri = Some(groupIri)
)
.toString()
Expand Down Expand Up @@ -326,7 +329,6 @@ class GroupsResponderADM(responderData: ResponderData) extends Responder(respond
sparqlQueryString <- Future(
org.knora.webapi.messages.twirl.queries.sparql.v1.txt
.getGroupMembersByIri(
triplestore = settings.triplestoreType,
groupIri
)
.toString()
Expand Down Expand Up @@ -461,7 +463,6 @@ class GroupsResponderADM(responderData: ResponderData) extends Responder(respond
createNewGroupSparqlString = org.knora.webapi.messages.twirl.queries.sparql.admin.txt
.createNewGroup(
adminNamedGraphIri = OntologyConstants.NamedGraphs.AdminNamedGraph,
triplestore = settings.triplestoreType,
groupIri,
groupClassIri = OntologyConstants.KnoraAdmin.UserGroup,
name = createRequest.name.value,
Expand Down Expand Up @@ -726,7 +727,6 @@ class GroupsResponderADM(responderData: ResponderData) extends Responder(respond
org.knora.webapi.messages.twirl.queries.sparql.admin.txt
.updateGroup(
adminNamedGraphIri = "http://www.knora.org/data/admin",
triplestore = settings.triplestoreType,
groupIri,
maybeName = groupUpdatePayload.name.map(_.value),
maybeDescriptions = groupUpdatePayload.descriptions.map(_.value),
Expand Down

0 comments on commit a5ea62e

Please sign in to comment.