dasch-swiss · benjamingeer · Apr 2, 2020 · Jul 15, 2019 · Jul 16, 2019 · Jul 16, 2019
diff --git a/.gitignore b/.gitignore
@@ -58,4 +58,5 @@ knora-graphdb-free
 knora-graphdb-se
 knora-sipi
 knora-upgrade
+triplestores/fuseki-tomcat/system
 dump.rdb
diff --git a/docs/src/paradox/03-apis/api-v2/query-language.md b/docs/src/paradox/03-apis/api-v2/query-language.md
@@ -192,12 +192,12 @@ a matching dependent resource, only its IRI is returned.
 
 ## Inference
 
-Gravsearch queries are understood to imply
+Gravsearch queries are understood to imply a subset of
 [RDFS reasoning](https://www.w3.org/TR/rdf11-mt/). Depending on the
 triplestore being used, this may be implemented using the triplestore's
 own reasoner or by query expansion in Knora.
 
-This means that if a statement pattern specifies a property, the pattern will
+Specifically, if a statement pattern specifies a property, the pattern will
 also match subproperties of that property, and if a statement specifies that
 a subject has a particular `rdf:type`, the statement will also match subjects
 belonging to subclasses of that type.
@@ -352,31 +352,28 @@ text markup (see @ref:[Matching Standoff Dates](#matching-standoff-dates)).
 
 #### Searching for Matching Words
 
-The function `knora-api:match` searches for matching words anywhere in a
+The function `knora-api:matchText` searches for matching words anywhere in a
 text value, and is implemented using a full-text search index if available.
-The first argument must be a variable of type `xsd:string`, and the second
-argument is a string containing the words to be matched, separated by spaces.
-The words to be matched are separated by spaces in a string literal.
+The first argument must represent a text value (a `knore-api:TextValue` in
+the complex schema, or an `xsd:string` in the simple schema). The second
+argument is a string literal containing the words to be matched, separated by spaces.
 The function supports the
 @ref:[Lucene Query Parser syntax](../../08-lucene/index.md).
 Note that Lucene's default operator is a logical OR when submitting several search terms.
 
-For example, to search for titles that contain the words 'Zeitglöcklein' and
-'Lebens' in the simple schema:
-
-```
-FILTER knora-api:match(?title, "Zeitglöcklein Lebens")
-```
+This function can only be used as the top-level expression in a `FILTER`.
 
-In the complex schema:
+For example, to search for titles that contain the words 'Zeitglöcklein' and
+'Lebens':
 
 ```
-?title knora-api:valueAsString ?titleStr .
-FILTER knora-api:match(?titleStr, "Zeitglöcklein Lebens")
+?book incunabule:title ?title .
+FILTER knora-api:matchText(?title, "Zeitglöcklein Lebens")
 ```
 
-If `knora-api:match` is used in a `FILTER`, it must be the only expression in
-the `FILTER`.
+Note: the `knora-api:match` function has been deprecated, and will no longer work in
+a future release of Knora. Please change your Gravsearch queries to use `knora-api:matchText`
+instead. Attention: the first argument is different.
 
 #### Filtering Text by Language
 
@@ -426,11 +423,11 @@ tags in the text. You can match the tags you're interested in using
 
 #### Matching Text in a Standoff Tag
 
-The function `knora-api:matchInStandoff` searches for standoff tags containing certain terms.
+The function `knora-api:matchTextInStandoff` searches for standoff tags containing certain terms.
 The implementation is optimised using the full-text search index if available. The
 function takes three arguments:
 
-1. A variable representing the string literal value of a text value.
+1. A variable representing a text value.
 2. A variable representing a standoff tag.
 3. A string literal containing space-separated search terms.
 
@@ -448,16 +445,19 @@ CONSTRUCT {
 } WHERE {
     ?letter a beol:letter .
     ?letter beol:hasText ?text .
-    ?text knora-api:valueAsString ?textStr .
     ?text knora-api:textValueHasStandoff ?standoffParagraphTag .
     ?standoffParagraphTag a standoff:StandoffParagraphTag .
-    FILTER knora-api:matchInStandoff(?textStr, ?standoffParagraphTag, "Grund Richtigkeit")
+    FILTER knora-api:matchTextInStandoff(?text, ?standoffParagraphTag, "Grund Richtigkeit")
 }
 ```
 
 Here we are looking for letters containing the words "Grund" and "Richtigkeit"
 within a single paragraph.
 
+Note: the `knora-api:matchInStandoff` function has been deprecated, and will no longer
+work in a future release of Knora. Please change your Gravsearch queries to use
+`knora-api:matchTextInStandoff` instead. Attention: the first argument is different.
+
 #### Matching Standoff Links
 
 If you are only interested in specifying that a resource has some text

diff --git a/docs/src/paradox/03-apis/api-v2/reading-and-searching-resources.md b/docs/src/paradox/03-apis/api-v2/reading-and-searching-resources.md
@@ -547,6 +547,9 @@ This is useful only if the project does not contain a large amount of data;
 otherwise, you should use @ref:[Gravsearch](query-language.md) to search
 using more specific criteria.
 
+The specified class and property are used without inference; they will not
+match subclasses or subproperties.
+
 The HTTP header `X-Knora-Accept-Project` must be submitted; its value is
 a Knora project IRI. In the request URL, the values of `resourceClass` and `orderByProperty`
 are URL-encoded IRIs in the @ref:[complex schema](introduction.md#api-schema).

diff --git a/docs/src/paradox/05-internals/design/api-v2/gravsearch.md b/docs/src/paradox/05-internals/design/api-v2/gravsearch.md
@@ -188,15 +188,16 @@ The resulting SELECT clause of the prequery looks as follows:
 ```sparql
 SELECT DISTINCT
     ?page
-    (GROUP_CONCAT(DISTINCT(?book); SEPARATOR='') AS ?book__Concat)
-    (GROUP_CONCAT(DISTINCT(?seqnum); SEPARATOR='') AS ?seqnum__Concat)
-    (GROUP_CONCAT(DISTINCT(?book__LinkValue); SEPARATOR='') AS ?book__LinkValue__Concat)
+    (GROUP_CONCAT(DISTINCT(IF(BOUND(?book), STR(?book), "")); SEPARATOR='') AS ?book__Concat)
+    (GROUP_CONCAT(DISTINCT(IF(BOUND(?seqnum), STR(?seqnum), "")); SEPARATOR='') AS ?seqnum__Concat)
+    (GROUP_CONCAT(DISTINCT(IF(BOUND(?book__LinkValue), STR(?book__LinkValue), "")); SEPARATOR='') AS ?book__LinkValue__Concat)
     WHERE {...}
     GROUP BY ?page
     ORDER BY ASC(?page)
     LIMIT 25
 ```
-`?page` represents the main resource. When accessing the prequery's result rows, `?page` contains the Iri of the main resource.
+
+`?page` represents the main resource. When accessing the prequery's result rows, `?page` contains the IRI of the main resource.
 The prequery's results are grouped by the main resource so that there is exactly one result row per matching main resource.
 `?page` is also used as a sort criterion although none has been defined in the input query.
 This is necessary to make paging work: results always have to be returned in the same order (the prequery is always deterministic).
@@ -205,17 +206,23 @@ Like this, results can be fetched page by page using LIMIT and OFFSET.
 Grouping by main resource requires other results to be aggregated using the function `GROUP_CONCAT`.
 `?book` is used as an argument of the aggregation function.
 The aggregation's result is accessible in the prequery's result rows as `?book__Concat`.
-The variable `?book` is bound to an Iri.
-Since more than one Iri could be bound to a variable representing a dependent resource, the results have to be aggregated.
-`GROUP_CONCAT` takes two arguments: a collection of strings (Iris in our use case) and a separator.
-When accessing `?book__Concat` in the prequery's results containing the Iris of dependent resources, the string has to be split with the separator used in the aggregation function.
-The result is a collection of Iris representing dependent resources.
+The variable `?book` is bound to an IRI.
+Since more than one IRI could be bound to a variable representing a dependent resource, the results have to be aggregated.
+`GROUP_CONCAT` takes two arguments: a collection of strings (IRIs in our use case) and a separator
+(we use the non-printing Unicode character `INFORMATION SEPARATOR ONE`).
+When accessing `?book__Concat` in the prequery's results containing the IRIs of dependent resources, the string has to be split with the separator used in the aggregation function.
+The result is a collection of IRIs representing dependent resources.
 The same logic applies to value objects.
 
+Each `GROUP_CONCAT` checks whether the concatenated variable is bound in each result in the group; if a variable
+is unbound, we concatenate an empty string. This is necessary because, in Apache Jena (and perhaps other
+triplestores), "If `GROUP_CONCAT` has an unbound value in the list of values to concat, the overall result is 'error'"
+(see [this Jena issue](https://issues.apache.org/jira/browse/JENA-1856)).
+
 ### Main Query
 
 The purpose of the main query is to get all requested information about the main resource, dependent resources, and value objects.
-The Iris of those resources and value objects were returned by the prequery.
+The IRIs of those resources and value objects were returned by the prequery.
 Since the prequery only returns resources and value objects matching the input query's criteria,
 the main query can specifically ask for more detailed information on these resources and values without having to reconsider these criteria.
 
@@ -225,8 +232,8 @@ The classes involved in generating prequeries can be found in `org.knora.webapi.
 
 The main query is a SPARQL CONSTRUCT query. Its generation is handled by the method `GravsearchMainQueryGenerator.createMainQuery`.
 It takes three arguments: `mainResourceIris: Set[IriRef], dependentResourceIris: Set[IriRef], valueObjectIris: Set[IRI]`.
-From the given Iris, statements are generated that ask for complete information on *exactly* these resources and values.
-For any given resource Iri, only the values present in `valueObjectIris` are to be queried.
+From the given IRIs, statements are generated that ask for complete information on *exactly* these resources and values.
+For any given resource IRI, only the values present in `valueObjectIris` are to be queried.
 This is achieved by using SPARQL's `VALUES` expression for the main resource and dependent resources as well as for values.
 
 #### Processing the Main Query's results
@@ -237,7 +244,7 @@ The method `getMainQueryResultsWithFullGraphPattern` takes the main query's resu
 A main resource and its dependent resources and values are only returned if the user has view permissions on all the resources and value objects present in the main query.
 Otherwise the method suppresses the main resource.
 To do the permission checking, the results of the main query are passed to `ConstructResponseUtilV2` which transforms a `SparqlConstructResponse` (a set of RDF triples)
-into a structure organized by main resource Iris. In this structure, dependent resources and values are nested can be accessed via their main resource.
+into a structure organized by main resource IRIs. In this structure, dependent resources and values are nested can be accessed via their main resource.
 `SparqlConstructResponse` suppresses all resources and values the user has insufficient permissions on.
 For each main resource, a check is performed for the presence of all resources and values after permission checking.
 
@@ -247,3 +254,30 @@ All the resources and values not present in the input query's CONSTRUCT clause a
 The main resources that have been filtered out due to insufficient permissions are represented by the placeholder `ForbiddenResource`.
 This placeholder stands for a main resource that cannot be returned, nevertheless it informs the client that such a resource exists.
 This is necessary for a consistent behaviour when doing paging.
+
+## Inference
+
+Gravsearch queries support a subset of RDFS reasoning
+(see @ref:[Inference](../../../03-apis/api-v2/query-language.md#inference) in the API documentation
+on Gravsearch). This is implemented as follows:
+
+When the non-triplestore-specific version of a SPARQL query is generated, statements that do not need
+inference are marked with the virtual named graph `<http://www.knora.org/explicit>`.
+
+When the triplestore-specific version of the query is generated:
+
+- If the triplestore is GraphDB, `SparqlTransformer.transformKnoraExplicitToGraphDBExplicit` changes statements
+  with the virtual graph `<http://www.knora.org/explicit>` so that they are marked with the GraphDB-specific graph
+  `<http://www.ontotext.com/explicit>`, and leaves other statements unchanged.
+
+- If Knora is not using the triplestore's inference (e.g. with Fuseki),
+  `SparqlTransformer.expandStatementForNoInference` removes `<http://www.knora.org/explicit>`, and expands unmarked
+  statements using `rdfs:subClassOf*` and `rdfs:subPropertyOf*`.
+
+Gravsearch also provides some virtual properties, which take advantage of forward-chaining inference
+as an optimisation if the triplestore provides it. For example, the virtual property
+`knora-api:standoffTagHasStartAncestor` is equivalent to `knora-base:standoffTagHasStartParent*`, but
+with GraphDB it is implemented using a custom inference rule (in `KnoraRules.pie`) and is therefore more
+efficient. If Knora is not using the triplestore's inference,
+`SparqlTransformer.transformStatementInWhereForNoInference` replaces `knora-api:standoffTagHasStartAncestor`
+with `knora-base:standoffTagHasStartParent*`.
diff --git a/docs/src/paradox/05-internals/design/api-v2/query-design.md b/docs/src/paradox/05-internals/design/api-v2/query-design.md
@@ -21,6 +21,99 @@ License along with Knora.  If not, see <http://www.gnu.org/licenses/>.
 
 @@toc
 
+## Inference
+
+Knora does not require the triplestore to perform inference, but may be able
+to take advantage of inference if the triplestore provides it.
+
+In particular, Knora's SPARQL queries currently need to do the following:
+
+- Given a base property, find triples using a subproperty as predicate, and
+  return the subproperty used in each case.
+- Given a base class, find triples using an instance of subclass as subject or
+  object, and return the subclass used in each case.
+
+Without inference, this can be done using property path syntax.
+
+```sparql
+CONSTRUCT {
+  ?resource a ?resourceClass .
+  ?resource ?resourceValueProperty ?valueObject.
+WHERE {
+  ?resource a ?resourceClass .
+  ?resourceType rdfs:subClassOf* knora-base:Resource .
+  ?resource ?resourceValueProperty ?valueObject .
+  ?resourceValueProperty rdfs:subPropertyOf* knora-base:hasValue .
+```
+
+This query:
+
+- Checks that the queried resource belongs to a subclass of `knora-base:Resource`.
+
+- Returns the class that the resource explicitly belongs to.
+
+- Finds the Knora values attached to the resource, and returns each value along with
+  the property that explicitly attaches it to the resource.
+
+In some triplestores, it can be more efficient to use RDFS inference than to use property path syntax,
+depending on how inference is implemented. For example, Ontotext GraphDB does inference when
+data is inserted, and stores inferred triples in the repository
+([forward chaining with full materialisation](http://graphdb.ontotext.com/documentation/standard/reasoning.html)).
+Moreover, it provides a way of choosing whether to return explicit or inferred triples.
+This allows the query above to be optimised as follows, querying inferred triples but returning
+explicit triples:
+
+```sparql
+CONSTRUCT {
+  ?resource a ?resourceClass .
+  ?resource ?resourceValueProperty ?valueObject.
+WHERE {
+  ?resource a knora-base:Resource . # inferred triple
+
+  GRAPH <http://www.ontotext.com/explicit> {
+    ?resource a ?resourceClass .  # explicit triple
+  }
+
+  ?resource knora-base:hasValue ?valueObject . # inferred triple
+
+  GRAPH <http://www.ontotext.com/explicit> {
+    ?resource ?resourceValueProperty ?valueObject . # explicit triple
+  }
+```
+
+By querying inferred triples that are already stored in the repository, the optimised query avoids property path
+syntax and is therefore more efficient, while still only returning explicit triples in the query result.
+
+Other triplestores use a backward-chaining inference strategy, meaning that inference is performed during
+the execution of a SPARQL query, by expanding the query itself. The expanded query is likely to look like
+the first example, using property path syntax, and therefore it is not likely to be more efficient. Moreover,
+other triplestores may not provide a way to return explicit rather than inferred triples. To support such
+a triplestore, Knora uses property path syntax rather than inference.
+See @ref:[the Gravsearch design documentation](gravsearch.md#inference) for information on how this is done
+for Gravsearch queries.
+
+The support for Apache Jena Fuseki currently works in this way. However, Fuseki supports both forward-chaining
+and backward-chaining rule engines, although it does not seem to have anything like
+GraphDB's `<http://www.ontotext.com/explicit>`. It would be worth exploring whether Knora's query result
+processing could be changed so that it could use forward-chaining inference as an optimisation, even if
+nothing like `<http://www.ontotext.com/explicit>` is available. For example, the example query= could be written like
+this:
+
+```sparql
+CONSTRUCT {
+  ?resource a ?resourceClass .
+  ?resource ?resourceValueProperty ?valueObject .
+WHERE {
+  ?resource a knora-base:Resource .
+  ?resource a ?resourceClass .
+  ?resource knora-base:hasValue ?valueObject .
+  ?resource ?resourceValueProperty ?valueObject .
+```
+
+This would return inferred triples as well as explicit ones: a triple for each base class of the explicit
+`?resourceClass`, and a triple for each base property of the explicit `?resourceValueProperty`. But since Knora knows
+the class and property inheritance hierarchies, it could ignore the additional triples.
+
 ## Querying Past Value Versions
 
 Value versions are a linked list, starting with the current version. Each value points to

diff --git a/knora-ontologies/knora-base.ttl b/knora-ontologies/knora-base.ttl
@@ -457,8 +457,6 @@
                            "a lien vers"@fr ,
                            "ha Link verso"@it ;
 
-                rdfs:comment "Represents a direct connection between two resources"@en ;
-
                 :isEditable true ;
 
                 :objectClassConstraint :LinkValue ;

diff --git a/triplestores/fuseki-tomcat/config.ttl b/triplestores/fuseki-tomcat/config.ttl
@@ -0,0 +1,33 @@
+# Licensed under the terms of http://www.apache.org/licenses/LICENSE-2.0
+
+## Fuseki Server configuration file.
+
+@prefix :        <#> .
+@prefix fuseki:  <http://jena.apache.org/fuseki#> .
+@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
+@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
+@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
+
+[]  rdf:type            fuseki:Server ;
+    # Example::
+    # Server-wide query timeout.
+    #
+    # Timeout - server-wide default: milliseconds.
+    # Format 1: "1000" -- 1 second timeout
+    # Format 2: "10000,60000" -- 10s timeout to first result,
+    #                            then 60s timeout for the rest of query.
+    #
+    # See javadoc for ARQ.queryTimeout for details.
+    # This can also be set on a per dataset basis in the dataset assembler.
+    #
+    # ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "30000" ] ;
+
+
+
+    # Add any custom classes you want to load.
+    # Must have a "public static void init()" method.
+    # ja:loadClass "your.code.Class" ;
+    ja:loadClass        "org.apache.jena.query.text.TextQuery";
+
+    # End triples.
+    .