Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(triplestores): Support Apache Jena Fuseki #1375

Merged
merged 73 commits into from Apr 2, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
9e844a2
feat(triplestores): Initial config for attempt to use Fuseki with Tom…
Jul 15, 2019
6c29d4d
feat: Fix things for Fuseki (ongoing).
Jul 16, 2019
3b3aa92
feat: Fix things for Fuseki (ongoing).
Jul 16, 2019
50330af
feat: Fix things for Fuseki (ongoing).
Jul 16, 2019
912b377
feat: Fix things for Fuseki (ongoing).
Jul 17, 2019
b6ab504
feat: Fix and optimise things for Fuseki.
Jul 17, 2019
b4a2a29
feat: Fix and optimise more things for Fuseki.
Jul 18, 2019
5ab21c2
feat: Fix more things for Fuseki.
Jul 18, 2019
f5e6f48
fix: Fix fix.
Jul 18, 2019
9e0b0f9
Fix: fix fix.
Jul 18, 2019
3a2149a
fix(knora-ontologies): Remove extra rdfs:comment.
Jul 18, 2019
54b8d48
style(fuseki): Add comments.
Jul 18, 2019
7292e3e
Merge branch 'develop' into wip/1374-fuseki
Jul 19, 2019
9b89a34
Merge branch 'develop' into wip/1374-fuseki
Jul 19, 2019
5f8a8d9
fix(fuseki): Get value permissions correctly with Fuseki.
Jul 19, 2019
8cf5d20
Merge branch 'develop' into wip/1374-fuseki
Jul 19, 2019
37fb633
Merge branch 'develop' into wip/1374-fuseki
Aug 13, 2019
54340d0
Merge branch 'develop' into wip/1374-fuseki
Aug 13, 2019
3c439a8
Merge branch 'develop' into wip/1374-fuseki
Aug 19, 2019
804543a
Merge branch 'develop' into wip/1374-fuseki
Aug 26, 2019
db8db5d
Merge branch 'develop' into wip/1374-fuseki
Aug 27, 2019
c228098
Merge branch 'develop' into wip/1374-fuseki
Aug 29, 2019
b717dee
Merge branch 'develop' into wip/1374-fuseki
Aug 30, 2019
e9f213d
Merge branch 'develop' into wip/1374-fuseki
Aug 30, 2019
b9da9c7
Merge branch 'develop' into wip/1374-fuseki
Aug 30, 2019
5d658a5
Merge branch 'develop' into wip/1374-fuseki
Aug 30, 2019
69262dd
Merge branch 'develop' into wip/1374-fuseki
Sep 9, 2019
bc2d74a
Merge branch 'develop' into wip/1374-fuseki
Sep 10, 2019
053ea52
Merge branch 'develop' into wip/1374-fuseki
Sep 12, 2019
99c072d
Merge branch 'develop' into wip/1374-fuseki
Sep 26, 2019
0f1a44c
Merge branch 'develop' into wip/1374-fuseki
Oct 8, 2019
4236ee3
Merge branch 'develop' into wip/1374-fuseki
Oct 18, 2019
bf4253f
Merge branch 'develop' into wip/1374-fuseki
Oct 21, 2019
366017e
Merge branch 'develop' into wip/1374-fuseki
Oct 22, 2019
bce43dc
Merge branch 'develop' into wip/1374-fuseki
Oct 23, 2019
7541121
Merge branch 'develop' into wip/1374-fuseki
Nov 4, 2019
5e9c550
Merge branch 'develop' into wip/1374-fuseki
Nov 5, 2019
283f60d
Merge branch 'develop' into wip/1374-fuseki
Nov 8, 2019
8a400d9
Merge branch 'develop' into wip/1374-fuseki
Nov 14, 2019
f7efe5e
Merge branch 'develop' into wip/1374-fuseki
Nov 15, 2019
666ffd8
Merge branch 'develop' into wip/1374-fuseki
Nov 15, 2019
3a7e799
Merge branch 'develop' into wip/1374-fuseki
Nov 19, 2019
7f21b78
Merge branch 'develop' into wip/1374-fuseki
Nov 19, 2019
cf8eb70
Merge branch 'develop' into wip/1374-fuseki
Nov 27, 2019
fdfeae9
Merge branch 'develop' into wip/1374-fuseki
Nov 28, 2019
692196a
Merge branch 'develop' into wip/1374-fuseki
Nov 28, 2019
1ece065
Merge branch 'develop' into wip/1374-fuseki
Dec 2, 2019
664cd36
Merge branch 'develop' into wip/1374-fuseki
Dec 3, 2019
323ee1a
test: Fix test script.
Dec 3, 2019
0f7539e
Merge branch 'develop' into wip/1374-fuseki
Dec 17, 2019
27560dc
test: Fix test.
Dec 17, 2019
4b33566
Merge branch 'develop' into wip/1374-fuseki
Dec 17, 2019
627f504
Merge branch 'develop' into wip/1374-fuseki
Jan 2, 2020
5fbb8ed
Merge branch 'develop' into wip/1374-fuseki
Feb 4, 2020
dd32e52
Merge branch 'develop' into wip/1374-fuseki
Feb 5, 2020
90ec7df
Merge branch 'develop' into wip/1374-fuseki
Feb 5, 2020
7225d87
Merge branch 'develop' into wip/1374-fuseki
Mar 9, 2020
4642954
fix(gravsearch): Check for unbound variables in GROUP_CONCAT.
Mar 9, 2020
bcec680
fix(gravsearch): Fix variable name conflict when expanding statement …
Mar 10, 2020
54b4ff3
Merge branch 'develop' into wip/1374-fuseki
Mar 10, 2020
864fe13
feat(gravsearch): Support Lucene in Fuseki (ongoing).
Mar 10, 2020
d0230d6
Merge branch 'develop' into wip/1374-fuseki
Mar 11, 2020
6c9c922
feature(gravsearch): Support Lucene in Fuseki (ongoing).
Mar 13, 2020
5496dab
feat(gravsearch): Support full-text search with Fuseki.
Mar 16, 2020
a9daf8d
fix(api-v2): Fix search by label.
Mar 17, 2020
0f9a390
fix(api-v2): Add missing changes from last commit.
Mar 17, 2020
981ef43
fix(fuseki): Support virtual property knora-base:targetHasOriginalXMLID.
Mar 18, 2020
97fbbe0
test(triplestore): Fix IRI in named graph query in test.
Mar 18, 2020
c60e328
fix(fuseki): Fix erase resource.
Mar 18, 2020
b377bdf
feat(upgrade): Support upgrading a Fuseki repository.
Mar 19, 2020
18f8417
docs: Document how inference is used/implemented.
Mar 23, 2020
44feb70
docs: Update API and design docs.
Mar 25, 2020
4fd2c20
test (fuseki): update fuseki config for integration tests
subotic Apr 2, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Expand Up @@ -58,4 +58,5 @@ knora-graphdb-free
knora-graphdb-se
knora-sipi
knora-upgrade
triplestores/fuseki-tomcat/system
dump.rdb
42 changes: 21 additions & 21 deletions docs/src/paradox/03-apis/api-v2/query-language.md
Expand Up @@ -192,12 +192,12 @@ a matching dependent resource, only its IRI is returned.

## Inference

Gravsearch queries are understood to imply
Gravsearch queries are understood to imply a subset of
[RDFS reasoning](https://www.w3.org/TR/rdf11-mt/). Depending on the
triplestore being used, this may be implemented using the triplestore's
own reasoner or by query expansion in Knora.

This means that if a statement pattern specifies a property, the pattern will
Specifically, if a statement pattern specifies a property, the pattern will
also match subproperties of that property, and if a statement specifies that
a subject has a particular `rdf:type`, the statement will also match subjects
belonging to subclasses of that type.
Expand Down Expand Up @@ -352,31 +352,28 @@ text markup (see @ref:[Matching Standoff Dates](#matching-standoff-dates)).

#### Searching for Matching Words

The function `knora-api:match` searches for matching words anywhere in a
The function `knora-api:matchText` searches for matching words anywhere in a
text value, and is implemented using a full-text search index if available.
The first argument must be a variable of type `xsd:string`, and the second
argument is a string containing the words to be matched, separated by spaces.
The words to be matched are separated by spaces in a string literal.
The first argument must represent a text value (a `knore-api:TextValue` in
the complex schema, or an `xsd:string` in the simple schema). The second
argument is a string literal containing the words to be matched, separated by spaces.
The function supports the
@ref:[Lucene Query Parser syntax](../../08-lucene/index.md).
Note that Lucene's default operator is a logical OR when submitting several search terms.

For example, to search for titles that contain the words 'Zeitglöcklein' and
'Lebens' in the simple schema:

```
FILTER knora-api:match(?title, "Zeitglöcklein Lebens")
```
This function can only be used as the top-level expression in a `FILTER`.

In the complex schema:
For example, to search for titles that contain the words 'Zeitglöcklein' and
'Lebens':

```
?title knora-api:valueAsString ?titleStr .
FILTER knora-api:match(?titleStr, "Zeitglöcklein Lebens")
?book incunabule:title ?title .
FILTER knora-api:matchText(?title, "Zeitglöcklein Lebens")
```

If `knora-api:match` is used in a `FILTER`, it must be the only expression in
the `FILTER`.
Note: the `knora-api:match` function has been deprecated, and will no longer work in
a future release of Knora. Please change your Gravsearch queries to use `knora-api:matchText`
instead. Attention: the first argument is different.

#### Filtering Text by Language

Expand Down Expand Up @@ -426,11 +423,11 @@ tags in the text. You can match the tags you're interested in using

#### Matching Text in a Standoff Tag

The function `knora-api:matchInStandoff` searches for standoff tags containing certain terms.
The function `knora-api:matchTextInStandoff` searches for standoff tags containing certain terms.
The implementation is optimised using the full-text search index if available. The
function takes three arguments:

1. A variable representing the string literal value of a text value.
1. A variable representing a text value.
2. A variable representing a standoff tag.
3. A string literal containing space-separated search terms.

Expand All @@ -448,16 +445,19 @@ CONSTRUCT {
} WHERE {
?letter a beol:letter .
?letter beol:hasText ?text .
?text knora-api:valueAsString ?textStr .
?text knora-api:textValueHasStandoff ?standoffParagraphTag .
?standoffParagraphTag a standoff:StandoffParagraphTag .
FILTER knora-api:matchInStandoff(?textStr, ?standoffParagraphTag, "Grund Richtigkeit")
FILTER knora-api:matchTextInStandoff(?text, ?standoffParagraphTag, "Grund Richtigkeit")
}
```

Here we are looking for letters containing the words "Grund" and "Richtigkeit"
within a single paragraph.

Note: the `knora-api:matchInStandoff` function has been deprecated, and will no longer
work in a future release of Knora. Please change your Gravsearch queries to use
`knora-api:matchTextInStandoff` instead. Attention: the first argument is different.

#### Matching Standoff Links

If you are only interested in specifying that a resource has some text
Expand Down
Expand Up @@ -547,6 +547,9 @@ This is useful only if the project does not contain a large amount of data;
otherwise, you should use @ref:[Gravsearch](query-language.md) to search
using more specific criteria.

The specified class and property are used without inference; they will not
match subclasses or subproperties.

The HTTP header `X-Knora-Accept-Project` must be submitted; its value is
a Knora project IRI. In the request URL, the values of `resourceClass` and `orderByProperty`
are URL-encoded IRIs in the @ref:[complex schema](introduction.md#api-schema).
Expand Down
60 changes: 47 additions & 13 deletions docs/src/paradox/05-internals/design/api-v2/gravsearch.md
Expand Up @@ -188,15 +188,16 @@ The resulting SELECT clause of the prequery looks as follows:
```sparql
SELECT DISTINCT
?page
(GROUP_CONCAT(DISTINCT(?book); SEPARATOR='') AS ?book__Concat)
(GROUP_CONCAT(DISTINCT(?seqnum); SEPARATOR='') AS ?seqnum__Concat)
(GROUP_CONCAT(DISTINCT(?book__LinkValue); SEPARATOR='') AS ?book__LinkValue__Concat)
(GROUP_CONCAT(DISTINCT(IF(BOUND(?book), STR(?book), "")); SEPARATOR='') AS ?book__Concat)
(GROUP_CONCAT(DISTINCT(IF(BOUND(?seqnum), STR(?seqnum), "")); SEPARATOR='') AS ?seqnum__Concat)
(GROUP_CONCAT(DISTINCT(IF(BOUND(?book__LinkValue), STR(?book__LinkValue), "")); SEPARATOR='') AS ?book__LinkValue__Concat)
WHERE {...}
GROUP BY ?page
ORDER BY ASC(?page)
LIMIT 25
```
`?page` represents the main resource. When accessing the prequery's result rows, `?page` contains the Iri of the main resource.

`?page` represents the main resource. When accessing the prequery's result rows, `?page` contains the IRI of the main resource.
The prequery's results are grouped by the main resource so that there is exactly one result row per matching main resource.
`?page` is also used as a sort criterion although none has been defined in the input query.
This is necessary to make paging work: results always have to be returned in the same order (the prequery is always deterministic).
Expand All @@ -205,17 +206,23 @@ Like this, results can be fetched page by page using LIMIT and OFFSET.
Grouping by main resource requires other results to be aggregated using the function `GROUP_CONCAT`.
`?book` is used as an argument of the aggregation function.
The aggregation's result is accessible in the prequery's result rows as `?book__Concat`.
The variable `?book` is bound to an Iri.
Since more than one Iri could be bound to a variable representing a dependent resource, the results have to be aggregated.
`GROUP_CONCAT` takes two arguments: a collection of strings (Iris in our use case) and a separator.
When accessing `?book__Concat` in the prequery's results containing the Iris of dependent resources, the string has to be split with the separator used in the aggregation function.
The result is a collection of Iris representing dependent resources.
The variable `?book` is bound to an IRI.
Since more than one IRI could be bound to a variable representing a dependent resource, the results have to be aggregated.
`GROUP_CONCAT` takes two arguments: a collection of strings (IRIs in our use case) and a separator
(we use the non-printing Unicode character `INFORMATION SEPARATOR ONE`).
When accessing `?book__Concat` in the prequery's results containing the IRIs of dependent resources, the string has to be split with the separator used in the aggregation function.
The result is a collection of IRIs representing dependent resources.
The same logic applies to value objects.

Each `GROUP_CONCAT` checks whether the concatenated variable is bound in each result in the group; if a variable
is unbound, we concatenate an empty string. This is necessary because, in Apache Jena (and perhaps other
triplestores), "If `GROUP_CONCAT` has an unbound value in the list of values to concat, the overall result is 'error'"
(see [this Jena issue](https://issues.apache.org/jira/browse/JENA-1856)).

### Main Query

The purpose of the main query is to get all requested information about the main resource, dependent resources, and value objects.
The Iris of those resources and value objects were returned by the prequery.
The IRIs of those resources and value objects were returned by the prequery.
Since the prequery only returns resources and value objects matching the input query's criteria,
the main query can specifically ask for more detailed information on these resources and values without having to reconsider these criteria.

Expand All @@ -225,8 +232,8 @@ The classes involved in generating prequeries can be found in `org.knora.webapi.

The main query is a SPARQL CONSTRUCT query. Its generation is handled by the method `GravsearchMainQueryGenerator.createMainQuery`.
It takes three arguments: `mainResourceIris: Set[IriRef], dependentResourceIris: Set[IriRef], valueObjectIris: Set[IRI]`.
From the given Iris, statements are generated that ask for complete information on *exactly* these resources and values.
For any given resource Iri, only the values present in `valueObjectIris` are to be queried.
From the given IRIs, statements are generated that ask for complete information on *exactly* these resources and values.
For any given resource IRI, only the values present in `valueObjectIris` are to be queried.
This is achieved by using SPARQL's `VALUES` expression for the main resource and dependent resources as well as for values.

#### Processing the Main Query's results
Expand All @@ -237,7 +244,7 @@ The method `getMainQueryResultsWithFullGraphPattern` takes the main query's resu
A main resource and its dependent resources and values are only returned if the user has view permissions on all the resources and value objects present in the main query.
Otherwise the method suppresses the main resource.
To do the permission checking, the results of the main query are passed to `ConstructResponseUtilV2` which transforms a `SparqlConstructResponse` (a set of RDF triples)
into a structure organized by main resource Iris. In this structure, dependent resources and values are nested can be accessed via their main resource.
into a structure organized by main resource IRIs. In this structure, dependent resources and values are nested can be accessed via their main resource.
`SparqlConstructResponse` suppresses all resources and values the user has insufficient permissions on.
For each main resource, a check is performed for the presence of all resources and values after permission checking.

Expand All @@ -247,3 +254,30 @@ All the resources and values not present in the input query's CONSTRUCT clause a
The main resources that have been filtered out due to insufficient permissions are represented by the placeholder `ForbiddenResource`.
This placeholder stands for a main resource that cannot be returned, nevertheless it informs the client that such a resource exists.
This is necessary for a consistent behaviour when doing paging.

## Inference

Gravsearch queries support a subset of RDFS reasoning
(see @ref:[Inference](../../../03-apis/api-v2/query-language.md#inference) in the API documentation
on Gravsearch). This is implemented as follows:

When the non-triplestore-specific version of a SPARQL query is generated, statements that do not need
inference are marked with the virtual named graph `<http://www.knora.org/explicit>`.

When the triplestore-specific version of the query is generated:

- If the triplestore is GraphDB, `SparqlTransformer.transformKnoraExplicitToGraphDBExplicit` changes statements
with the virtual graph `<http://www.knora.org/explicit>` so that they are marked with the GraphDB-specific graph
`<http://www.ontotext.com/explicit>`, and leaves other statements unchanged.

- If Knora is not using the triplestore's inference (e.g. with Fuseki),
`SparqlTransformer.expandStatementForNoInference` removes `<http://www.knora.org/explicit>`, and expands unmarked
statements using `rdfs:subClassOf*` and `rdfs:subPropertyOf*`.

Gravsearch also provides some virtual properties, which take advantage of forward-chaining inference
as an optimisation if the triplestore provides it. For example, the virtual property
`knora-api:standoffTagHasStartAncestor` is equivalent to `knora-base:standoffTagHasStartParent*`, but
with GraphDB it is implemented using a custom inference rule (in `KnoraRules.pie`) and is therefore more
efficient. If Knora is not using the triplestore's inference,
`SparqlTransformer.transformStatementInWhereForNoInference` replaces `knora-api:standoffTagHasStartAncestor`
with `knora-base:standoffTagHasStartParent*`.
93 changes: 93 additions & 0 deletions docs/src/paradox/05-internals/design/api-v2/query-design.md
Expand Up @@ -21,6 +21,99 @@ License along with Knora. If not, see <http://www.gnu.org/licenses/>.

@@toc

## Inference

Knora does not require the triplestore to perform inference, but may be able
to take advantage of inference if the triplestore provides it.

In particular, Knora's SPARQL queries currently need to do the following:

- Given a base property, find triples using a subproperty as predicate, and
return the subproperty used in each case.
- Given a base class, find triples using an instance of subclass as subject or
object, and return the subclass used in each case.

Without inference, this can be done using property path syntax.

```sparql
CONSTRUCT {
?resource a ?resourceClass .
?resource ?resourceValueProperty ?valueObject.
WHERE {
?resource a ?resourceClass .
?resourceType rdfs:subClassOf* knora-base:Resource .
?resource ?resourceValueProperty ?valueObject .
?resourceValueProperty rdfs:subPropertyOf* knora-base:hasValue .
```

This query:

- Checks that the queried resource belongs to a subclass of `knora-base:Resource`.

- Returns the class that the resource explicitly belongs to.

- Finds the Knora values attached to the resource, and returns each value along with
the property that explicitly attaches it to the resource.

In some triplestores, it can be more efficient to use RDFS inference than to use property path syntax,
depending on how inference is implemented. For example, Ontotext GraphDB does inference when
data is inserted, and stores inferred triples in the repository
([forward chaining with full materialisation](http://graphdb.ontotext.com/documentation/standard/reasoning.html)).
Moreover, it provides a way of choosing whether to return explicit or inferred triples.
This allows the query above to be optimised as follows, querying inferred triples but returning
explicit triples:

```sparql
CONSTRUCT {
?resource a ?resourceClass .
?resource ?resourceValueProperty ?valueObject.
WHERE {
?resource a knora-base:Resource . # inferred triple

GRAPH <http://www.ontotext.com/explicit> {
?resource a ?resourceClass . # explicit triple
}

?resource knora-base:hasValue ?valueObject . # inferred triple

GRAPH <http://www.ontotext.com/explicit> {
?resource ?resourceValueProperty ?valueObject . # explicit triple
}
```

By querying inferred triples that are already stored in the repository, the optimised query avoids property path
syntax and is therefore more efficient, while still only returning explicit triples in the query result.

Other triplestores use a backward-chaining inference strategy, meaning that inference is performed during
the execution of a SPARQL query, by expanding the query itself. The expanded query is likely to look like
the first example, using property path syntax, and therefore it is not likely to be more efficient. Moreover,
other triplestores may not provide a way to return explicit rather than inferred triples. To support such
a triplestore, Knora uses property path syntax rather than inference.
See @ref:[the Gravsearch design documentation](gravsearch.md#inference) for information on how this is done
for Gravsearch queries.

The support for Apache Jena Fuseki currently works in this way. However, Fuseki supports both forward-chaining
and backward-chaining rule engines, although it does not seem to have anything like
GraphDB's `<http://www.ontotext.com/explicit>`. It would be worth exploring whether Knora's query result
processing could be changed so that it could use forward-chaining inference as an optimisation, even if
nothing like `<http://www.ontotext.com/explicit>` is available. For example, the example query= could be written like
this:

```sparql
CONSTRUCT {
?resource a ?resourceClass .
?resource ?resourceValueProperty ?valueObject .
WHERE {
?resource a knora-base:Resource .
?resource a ?resourceClass .
?resource knora-base:hasValue ?valueObject .
?resource ?resourceValueProperty ?valueObject .
```

This would return inferred triples as well as explicit ones: a triple for each base class of the explicit
`?resourceClass`, and a triple for each base property of the explicit `?resourceValueProperty`. But since Knora knows
the class and property inheritance hierarchies, it could ignore the additional triples.

## Querying Past Value Versions

Value versions are a linked list, starting with the current version. Each value points to
Expand Down
2 changes: 0 additions & 2 deletions knora-ontologies/knora-base.ttl
Expand Up @@ -457,8 +457,6 @@
"a lien vers"@fr ,
"ha Link verso"@it ;

rdfs:comment "Represents a direct connection between two resources"@en ;

:isEditable true ;

:objectClassConstraint :LinkValue ;
Expand Down
33 changes: 33 additions & 0 deletions triplestores/fuseki-tomcat/config.ttl
@@ -0,0 +1,33 @@
# Licensed under the terms of http://www.apache.org/licenses/LICENSE-2.0

## Fuseki Server configuration file.

@prefix : <#> .
@prefix fuseki: <http://jena.apache.org/fuseki#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .

[] rdf:type fuseki:Server ;
# Example::
# Server-wide query timeout.
#
# Timeout - server-wide default: milliseconds.
# Format 1: "1000" -- 1 second timeout
# Format 2: "10000,60000" -- 10s timeout to first result,
# then 60s timeout for the rest of query.
#
# See javadoc for ARQ.queryTimeout for details.
# This can also be set on a per dataset basis in the dataset assembler.
#
# ja:context [ ja:cxtName "arq:queryTimeout" ; ja:cxtValue "30000" ] ;



# Add any custom classes you want to load.
# Must have a "public static void init()" method.
# ja:loadClass "your.code.Class" ;
ja:loadClass "org.apache.jena.query.text.TextQuery";

# End triples.
.