Skip to content

Commit

Permalink
Gravsearch optimisations (#1679)
Browse files Browse the repository at this point in the history
* feat(gravsearch): Start implementing optimisation.

* fix the compiling problem

* fix the pattern matching problem

* feature (gravsearch) add 'inferredfromProperty' flags to types

* refactor (gravsearch) remove the unnecessary statement to check `rdf:type knora-base:Resource` from `AbstractPrequeryGenerator`

* fix (gravsearch) correct the flag

* feature (gravsearch) NonPropertyTypeInfo should contain the classIris instead of type iri

* refactor (gravsearch)  rule names and comments

* feature (gravsearch) refine the detected types

* test (gravsearch) unit test for removing a type from `IntermediateTypeInspectionResult`

* refactor (gravsearch) make the type refiner recursive

* test (gravsearch) test for the `refineDeterminedTypes`

* refactor (gravsearch) correct refining the inferred types, and inferring type from filter rule

* feature (gravsearch) use the isResourceType flag

* fix (gravsearch) correct the failing tests

* refactor (gravsearch) get read of unused method

* feature (gravsearch) turn the `entitiesInferredFromProperty` to a map, and remove a type from it if necessary

* test (gravsearch) tests for checking if the collection `entitiesInferredFromProperty` is populated correctly

* refactor (gravsearch) get rid of `GravsearchTypeIris`

* refactor (gravsearch) use the `isResourceType` flag

* fix (gravsearch) missing import

* feature (gravsearch) store the actual resource class of the object property.

* feature (gravsearch) remove statements whose subject is a resource and its type can be inferred from a property

* fix (gravsearch) do not remove entity type is the property it should be inferred from is in an Optional

* refactor (gravsearch) codacy refactor

* fix (gravsearch) wrong condition

* fix (gravsearch) unit tests for queries with optional should have the rdf:type

* fix (test) correct the wrong test

* feature (gravsearch) sanitize inconsistent types resulted from storing multiple distinct resource types

* fix (gravsearch) type annotation must be added to objects of optional patterns that are in entitiesInferredFromProperties

* refactor (gravsearch) run the code through code formatter

* feature (gravsearch) move lucene statements to begining

* refactor (gravsearch) merge two optimize functions

* refactor (gravsearch) applied PR review points

* refactor (gravsearch) comment about optimization

* refactor (gravsearch) add explanation to methods

* test (gravsearch) unit tests for transformation and type inspection of a query with an optional block

* feature (gravsearch) replace inconsistent types with a common base class

* refactor (gravsearch) remove unnecessary refinement.

* fix (gravsearch) fix the failing test

* fix (gravsearch) remove the valueHasString statement

* fix (gravsearch) leave the valueHasString statement be in handleMatchTextInStandoffFunction

* fix (gravsearch) fix the bug in finding common base classes

* fix (gravsearch) merge types in entitiesInferredFromProperties instead of replacing them

* fix (gravsearch) only remove a type from entitiesInferredFromProperties, not the entire set of types.

* refactor (gravsearch) change comment

* refactor (gravsearch) rename `findCommonBaseClass` to `findCommonBaseResourceClass`

* doc (gravsearch) updated the Gravsearch design document.

* docs (gravsearch) edit documentation

* refactor(gravsearch): Clean up a few things.

* fix(gravsearch): Don't run optimisation in AnnotationRemovingWhereTransformer.

- Clean up doc a bit.

Co-authored-by: Benjamin Geer <benjaminlewis.geer@unibas.ch>
  • Loading branch information
SepidehAlassi and Benjamin Geer committed Aug 13, 2020
1 parent 1c88651 commit fa48e61
Show file tree
Hide file tree
Showing 151 changed files with 9,640 additions and 8,972 deletions.
48 changes: 39 additions & 9 deletions docs/05-internals/design/api-v2/gravsearch.md
Expand Up @@ -60,12 +60,27 @@ There are two type inspectors in the pipeline:
as from ontology information that it requests from `OntologyResponderV2`.

Each type inspector takes as input, and returns as output, an `IntermediateTypeInspectionResult`, which
associates each `TypeableEntity` with zero or more types. Initially, each `TypeableEntity` has no types. Each type inspector
adds whatever types it finds for each entity. At the end of the pipeline, each entity should
have exactly one type. If not, that's an error, with two possible causes:
associates each `TypeableEntity` with zero or more types. Initially, each `TypeableEntity` has no types.
Each type inspector adds whatever types it finds for each entity.

At the end of the pipeline, each entity should
have exactly one type. Therefore, to only keep the most specific type for an entity,
the method `refineDeterminedTypes` refines the determined types by removing those that are base classes of others. However,
it can be that inconsistent types are determined for entities. For example, in cases where multiple resource class types
are determined, but one is not a base class of the others. From the following statement

```
{ ?document a beol:manuscript . } UNION { ?document a beol:letter .}
```

two inconsistent types can be inferred for `?document`: `beol:letter` and `beol:manuscript`.
In these cases, a sanitizer `sanitizeInconsistentResourceTypes` replaces the inconsistent resource types by
their common base resource class (in the above example, it would be `beol:writtenSource`).

Lastly, an error is returned if

- An entity's type could not be determined. The client must add a type annotation to make the query work.
- An entity appears to have more than one type, because the query used the entity inconsistently.
- Inconsistent types could not be sanitized (an entity appears to have more than one type). The client must correct the query.

If there are no errors, `GravsearchTypeInspectionRunner` converts the pipeline's output to a
`GravsearchTypeInspectionResult`, in which each entity is associated with exactly one type.
Expand All @@ -87,7 +102,9 @@ about those classes and properties, as well as about the classes that are subjec
Next, the inspector runs inference rules (which extend `InferenceRule`) on each `TypeableEntity`. Each rule
takes as input a `TypeableEntity`, the usage index, the ontology information, and the `IntermediateTypeInspectionResult`,
and returns a new `IntermediateTypeInspectionResult`. For example, `TypeOfObjectFromPropertyRule` infers an entity's type
if the entity is used as the object of a statement and the predicate's `knora-api:objectType` is known.
if the entity is used as the object of a statement and the predicate's `knora-api:objectType` is known. For each `TypeableEntity`,
if a type is inferred from a property, the entity and the inferred type are added to
`IntermediateTypeInspectionResult.entitiesInferredFromProperty`.

The inference rules are run repeatedly, because the output of one rule may allow another rule to infer additional
information. There are two pipelines of rules: a pipeline for the first iteration of type inference, and a
Expand All @@ -111,7 +128,12 @@ In `SearchResponderV2`, two queries are generated from a given Gravsearch query:
The Gravsearch query is passed to `QueryTraverser` along with a query transformer. Query transformers are classes
that implement traits supported by `QueryTraverser`:

- `WhereTransformer`: instructions how to convert statements in the Where clause of a SPARQL query (to generate the prequery's Where clause).
- `WhereTransformer`: instructions how to convert statements in the WHERE clause of a SPARQL query (to generate the prequery's Where clause).

To improve query performance, this trait defines the method `optimiseQueryPatterns` whose implementation can call
private methods to optimise the generated SPARQL. For example, before transformation of statements in WHERE clause, query
pattern orders must be optimised by moving `LuceneQueryPatterns` to the beginning and `isDeleted` statement patterns to the end of the WHERE clause.

- `ConstructToSelectTransformer` (extends `WhereTransformer`): instructions how to turn a Construct query into a Select query (converts a Gravsearch query into a prequery)
- `SelectToSelectTransformer` (extends `WhereTransformer`): instructions how to turn a triplestore independent Select query into a triplestore dependent Select query (implementation of inference).
- `ConstructToConstructTransformer` (extends `WhereTransformer`): instructions how to turn a triplestore independent Construct query into a triplestore dependent Construct query (implementation of inference).
Expand All @@ -130,13 +152,19 @@ The classes involved in generating prequeries can be found in `org.knora.webapi.

If the client submits a count query, the prequery returns the overall number of hits, but not the results themselves.

In a first step, the Gravsearch query's WHERE clause is transformed and the prequery (SELECT and WHERE clause) is generated from this result.
In a first step, before transforming the WHERE clause, query patterns must be further optimised by removing
the `rdfs:type` statement for entities whose type could be inferred from a property since there would be no need
for explicit `rdfs:type` statements for them (unless the property from which the type of an entity must be inferred from
is wrapped in an `OPTIONAL` block). This optimisation has to happen in advance, because
otherwise `transformStatementInWhere` would expand the redundant `rdfs:type` statements.

Next, the Gravsearch query's WHERE clause is transformed and the prequery (SELECT and WHERE clause) is generated from this result.
The transformation of the Gravsearch query's WHERE clause relies on the implementation of the abstract class `AbstractPrequeryGenerator`.

`AbstractPrequeryGenerator` contains members whose state is changed during the iteration over the statements of the input query.
They can then by used to create the converted query.

- `mainResourceVariable: Option[QueryVariable]`: SPARQL variable representing the main resource of the input query. Present in the prequery's SELECT clause.
- `mainResourceVariable: Option[QueryVariable]`: SPARQL variable representing the main resource of the input query. Present in the prequery's SELECT clause.
- `dependentResourceVariables: mutable.Set[QueryVariable]`: a set of SPARQL variables representing dependent resources in the input query. Used in an aggregation function in the prequery's SELECT clause (see below).
- `dependentResourceVariablesGroupConcat: Set[QueryVariable]`: a set of SPARQL variables representing an aggregation of dependent resources. Present in the prequery's SELECT clause.
- `valueObjectVariables: mutable.Set[QueryVariable]`: a set of SPARQL variables representing value objects. Used in an aggregation function in the prequery's SELECT clause (see below).
Expand Down Expand Up @@ -281,7 +309,9 @@ When the triplestore-specific version of the query is generated:

- If the triplestore is GraphDB, `SparqlTransformer.transformKnoraExplicitToGraphDBExplicit` changes statements
with the virtual graph `<http://www.knora.org/explicit>` so that they are marked with the GraphDB-specific graph
`<http://www.ontotext.com/explicit>`, and leaves other statements unchanged.
`<http://www.ontotext.com/explicit>`, and leaves other statements unchanged.
`SparqlTransformer.transformKnoraExplicitToGraphDBExplicit` also adds the `valueHasString` statements which GraphDB needs
for text searches.

- If Knora is not using the triplestore's inference (e.g. with Fuseki),
`SparqlTransformer.expandStatementForNoInference` removes `<http://www.knora.org/explicit>`, and expands unmarked
Expand Down
4 changes: 2 additions & 2 deletions webapi/src/main/scala/org/knora/webapi/LanguageCodes.scala
Expand Up @@ -20,8 +20,8 @@
package org.knora.webapi

/**
* Constants for language codes.
*/
* Constants for language codes.
*/
object LanguageCodes {
val DE: String = "de"
val EN: String = "en"
Expand Down
94 changes: 47 additions & 47 deletions webapi/src/main/scala/org/knora/webapi/OntologySchema.scala
Expand Up @@ -20,101 +20,101 @@
package org.knora.webapi

/**
* Indicates the schema that a Knora ontology or ontology entity conforms to.
*/
* Indicates the schema that a Knora ontology or ontology entity conforms to.
*/
sealed trait OntologySchema

/**
* The schema of Knora ontologies and entities that are used in the triplestore.
*/
* The schema of Knora ontologies and entities that are used in the triplestore.
*/
case object InternalSchema extends OntologySchema

/**
* The schema of Knora ontologies and entities that are used in API v2.
*/
* The schema of Knora ontologies and entities that are used in API v2.
*/
sealed trait ApiV2Schema extends OntologySchema

/**
* The simple schema for representing Knora ontologies and entities. This schema represents values as literals
* when possible.
*/
* The simple schema for representing Knora ontologies and entities. This schema represents values as literals
* when possible.
*/
case object ApiV2Simple extends ApiV2Schema

/**
* The default (or complex) schema for representing Knora ontologies and entities. This
* schema always represents values as objects.
*/
* The default (or complex) schema for representing Knora ontologies and entities. This
* schema always represents values as objects.
*/
case object ApiV2Complex extends ApiV2Schema

/**
* A trait representing options that can be submitted to configure an ontology schema.
*/
* A trait representing options that can be submitted to configure an ontology schema.
*/
sealed trait SchemaOption

/**
* A trait representing options that affect the rendering of markup when text values are returned.
*/
* A trait representing options that affect the rendering of markup when text values are returned.
*/
sealed trait MarkupRendering extends SchemaOption

/**
* Indicates that markup should be rendered as XML when text values are returned.
*/
* Indicates that markup should be rendered as XML when text values are returned.
*/
case object MarkupAsXml extends MarkupRendering

/**
* Indicates that markup should not be returned with text values, because it will be requested
* separately as standoff.
*/
* Indicates that markup should not be returned with text values, because it will be requested
* separately as standoff.
*/
case object MarkupAsStandoff extends MarkupRendering

/**
* Indicates that no markup should be returned with text values. Used only internally.
*/
* Indicates that no markup should be returned with text values. Used only internally.
*/
case object NoMarkup extends MarkupRendering

/**
* Utility functions for working with schema options.
*/
* Utility functions for working with schema options.
*/
object SchemaOptions {
/**
* A set of schema options for querying all standoff markup along with text values.
*/
* A set of schema options for querying all standoff markup along with text values.
*/
val ForStandoffWithTextValues: Set[SchemaOption] = Set(MarkupAsXml)

/**
* A set of schema options for querying standoff markup separately from text values.
*/
* A set of schema options for querying standoff markup separately from text values.
*/
val ForStandoffSeparateFromTextValues: Set[SchemaOption] = Set(MarkupAsStandoff)

/**
* Determines whether standoff should be queried when a text value is queried.
*
* @param targetSchema the target API schema.
* @param schemaOptions the schema options submitted with the request.
* @return `true` if standoff should be queried.
*/
* Determines whether standoff should be queried when a text value is queried.
*
* @param targetSchema the target API schema.
* @param schemaOptions the schema options submitted with the request.
* @return `true` if standoff should be queried.
*/
def queryStandoffWithTextValues(targetSchema: ApiV2Schema, schemaOptions: Set[SchemaOption]): Boolean = {
targetSchema == ApiV2Complex && !schemaOptions.contains(MarkupAsStandoff)
}

/**
* Determines whether markup should be rendered as XML.
*
* @param targetSchema the target API schema.
* @param schemaOptions the schema options submitted with the request.
* @return `true` if markup should be rendered as XML.
*/
* Determines whether markup should be rendered as XML.
*
* @param targetSchema the target API schema.
* @param schemaOptions the schema options submitted with the request.
* @return `true` if markup should be rendered as XML.
*/
def renderMarkupAsXml(targetSchema: ApiV2Schema, schemaOptions: Set[SchemaOption]): Boolean = {
targetSchema == ApiV2Complex && !schemaOptions.contains(MarkupAsStandoff)
}

/**
* Determines whether markup should be rendered as standoff, separately from text values.
*
* @param targetSchema the target API schema.
* @param schemaOptions the schema options submitted with the request.
* @return `true` if markup should be rendered as standoff.
*/
* Determines whether markup should be rendered as standoff, separately from text values.
*
* @param targetSchema the target API schema.
* @param schemaOptions the schema options submitted with the request.
* @return `true` if markup should be rendered as standoff.
*/
def renderMarkupAsStandoff(targetSchema: ApiV2Schema, schemaOptions: Set[SchemaOption]): Boolean = {
targetSchema == ApiV2Complex && schemaOptions.contains(MarkupAsStandoff)
}
Expand Down
32 changes: 16 additions & 16 deletions webapi/src/main/scala/org/knora/webapi/RdfMediaTypes.scala
Expand Up @@ -22,9 +22,9 @@ package org.knora.webapi
import akka.http.scaladsl.model.{ContentType, HttpCharsets, MediaType, MediaTypes}

/**
* Represents media types supported by the Knora API server for representing RDF data, and provides
* convenience methods for transforming media types.
*/
* Represents media types supported by the Knora API server for representing RDF data, and provides
* convenience methods for transforming media types.
*/
object RdfMediaTypes {
val `application/json`: MediaType.WithFixedCharset = MediaTypes.`application/json`

Expand All @@ -49,8 +49,8 @@ object RdfMediaTypes {
)

/**
* A map of MIME types (strings) to supported RDF media types.
*/
* A map of MIME types (strings) to supported RDF media types.
*/
val registry: Map[String, MediaType.NonBinary] = Set(
`application/json`,
`application/ld+json`,
Expand All @@ -61,11 +61,11 @@ object RdfMediaTypes {
}.toMap

/**
* Ensures that a media specifies the UTF-8 charset if necessary.
*
* @param mediaType a non-binary media type.
* @return the same media type, specifying the UTF-8 charset if necessary.
*/
* Ensures that a media specifies the UTF-8 charset if necessary.
*
* @param mediaType a non-binary media type.
* @return the same media type, specifying the UTF-8 charset if necessary.
*/
def toUTF8ContentType(mediaType: MediaType.NonBinary): ContentType.NonBinary = {
mediaType match {
case withFixedCharset: MediaType.WithFixedCharset => withFixedCharset.toContentType
Expand All @@ -74,12 +74,12 @@ object RdfMediaTypes {
}

/**
* Converts less specific media types to more specific ones if necessary (e.g. specifying
* JSON-LD instead of JSON).
*
* @param mediaType a non-binary media type.
* @return the most specific similar media type that the Knora API server supports.
*/
* Converts less specific media types to more specific ones if necessary (e.g. specifying
* JSON-LD instead of JSON).
*
* @param mediaType a non-binary media type.
* @return the most specific similar media type that the Knora API server supports.
*/
def toMostSpecificMediaType(mediaType: MediaType.NonBinary): MediaType.NonBinary = {
mediaType match {
case `application/json` => `application/ld+json`
Expand Down
Expand Up @@ -20,24 +20,24 @@
package org.knora.webapi.annotation

/**
* Creates the ApiMayChange annotation.
*
* Marks APIs that are meant to evolve towards becoming stable APIs, but are not stable APIs yet.
*
* <p>Evolving interfaces MAY change from one patch release to another (i.e. 2.4.10 to 2.4.11)
* without up-front notice. A best-effort approach is taken to not cause more breakage than really
* necessary, and usual deprecation techniques are utilised while evolving these APIs, however there
* is NO strong guarantee regarding the source or binary compatibility of APIs marked using this
* annotation.
*
* <p>It MAY also change when promoting the API to stable, for example such changes may include
* removal of deprecated methods that were introduced during the evolution and final refactoring
* that were deferred because they would have introduced to much breaking changes during the
* evolution phase.
*
* <p>Promoting the API to stable MAY happen in a patch release.
*
* <p>It is encouraged to document in ScalaDoc how exactly this API is expected to evolve.
*
*/
* Creates the ApiMayChange annotation.
*
* Marks APIs that are meant to evolve towards becoming stable APIs, but are not stable APIs yet.
*
* <p>Evolving interfaces MAY change from one patch release to another (i.e. 2.4.10 to 2.4.11)
* without up-front notice. A best-effort approach is taken to not cause more breakage than really
* necessary, and usual deprecation techniques are utilised while evolving these APIs, however there
* is NO strong guarantee regarding the source or binary compatibility of APIs marked using this
* annotation.
*
* <p>It MAY also change when promoting the API to stable, for example such changes may include
* removal of deprecated methods that were introduced during the evolution and final refactoring
* that were deferred because they would have introduced to much breaking changes during the
* evolution phase.
*
* <p>Promoting the API to stable MAY happen in a patch release.
*
* <p>It is encouraged to document in ScalaDoc how exactly this API is expected to evolve.
*
*/
class ApiMayChange() extends scala.annotation.StaticAnnotation
Expand Up @@ -20,9 +20,9 @@
package org.knora.webapi.annotation

/**
* Creates the ProjectUnique annotation.
*
* Marks values which need to be unique on the level of the PROJECT.
*
*/
* Creates the ProjectUnique annotation.
*
* Marks values which need to be unique on the level of the PROJECT.
*
*/
class ProjectUnique() extends scala.annotation.StaticAnnotation
Expand Up @@ -20,9 +20,9 @@
package org.knora.webapi.annotation

/**
* Creates the ServerUnique annotation.
*
* Marks values which need to be unique on the level of the SERVER.
*
*/
* Creates the ServerUnique annotation.
*
* Marks values which need to be unique on the level of the SERVER.
*
*/
class ServerUnique() extends scala.annotation.StaticAnnotation

0 comments on commit fa48e61

Please sign in to comment.