Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
fix(gravsearch): Prevent duplicate results (#1626)
  • Loading branch information
loicjaouen committed Apr 16, 2020
1 parent e80bf52 commit 9313b88
Show file tree
Hide file tree
Showing 28 changed files with 1,032 additions and 841 deletions.
32 changes: 20 additions & 12 deletions docs/src/paradox/05-internals/design/api-v2/gravsearch.md
Expand Up @@ -174,16 +174,10 @@ PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>
}
```

The prequery's SELECT clause is built using the member variables defined in `AbstractPrequeryGenerator`.
State of member variables after transformation of the input query into the prequery:

- `mainResourceVariable`: `QueryVariable(page)`
- `dependentResourceVariables`: `Set(QueryVariable(book))`
- `dependentResourceVariablesGroupConcat`: `Set(QueryVariable(book__Concat))`
- `valueObjectVariables`: `Set(QueryVariable(book__LinkValue), QueryVariable(seqnum))`: `?book` represents the dependent resource and `?book__LinkValue` the link value connecting `?page` and `?book`.
- `valueObjectVariablesGroupConcat`: `Set(QueryVariable(seqnum__Concat), QueryVariable(book__LinkValue__Concat))`

The resulting SELECT clause of the prequery looks as follows:
The prequery's SELECT clause is built by
`NonTriplestoreSpecificGravsearchToPrequeryTransformer.getSelectColumns`,
based on the variables used in the input query's `CONSTRUCT` clause.
The resulting SELECT clause looks as follows:

```sparql
SELECT DISTINCT
Expand Down Expand Up @@ -219,6 +213,11 @@ is unbound, we concatenate an empty string. This is necessary because, in Apache
triplestores), "If `GROUP_CONCAT` has an unbound value in the list of values to concat, the overall result is 'error'"
(see [this Jena issue](https://issues.apache.org/jira/browse/JENA-1856)).

If the input query contains a `UNION`, and a variable is bound in one branch
of the `UNION` and not in another branch, it is possible that the prequery
will return more than one row per main resource. To deal with this situation,
`SearchResponderV2` merges rows that contain the same main resource IRI.

### Main Query

The purpose of the main query is to get all requested information about the main resource, dependent resources, and value objects.
Expand All @@ -233,8 +232,17 @@ The classes involved in generating the main query can be found in

The main query is a SPARQL CONSTRUCT query. Its generation is handled by the
method `GravsearchMainQueryGenerator.createMainQuery`.
It takes three arguments: `mainResourceIris: Set[IriRef], dependentResourceIris:
Set[IriRef], valueObjectIris: Set[IRI]`. From the given Iris, statements are
It takes three arguments:
`mainResourceIris: Set[IriRef], dependentResourceIris: Set[IriRef], valueObjectIris: Set[IRI]`.

These sets are constructed based on information about variables representing
dependent resources and value objects in the prequery, which is provided by
`NonTriplestoreSpecificGravsearchToPrequeryTransformer`:

- `dependentResourceVariablesGroupConcat`: `Set(QueryVariable(book__Concat))`
- `valueObjectVariablesGroupConcat`: `Set(QueryVariable(seqnum__Concat), QueryVariable(book__LinkValue__Concat))`

From the given Iris, statements are
generated that ask for complete information on *exactly* these resources and
values. For any given resource Iri, only the values present in
`valueObjectIris` are to be queried. This is achieved by using SPARQL's
Expand Down
Expand Up @@ -824,6 +824,17 @@ case class ReadResourcesSequenceV2(resources: Seq[ReadResourceV2],
)
}

private def getOntologiesFromResource(resource: ReadResourceV2): Set[SmartIri] = {
val propertyIriOntologies: Set[SmartIri] = resource.values.keySet.map(_.getOntologyFromEntity)

val valueOntologies: Set[SmartIri] = resource.values.values.flatten.collect {
case readLinkValueV2: ReadLinkValueV2 =>
readLinkValueV2.valueContent.nestedResource.map(nested => getOntologiesFromResource(nested))
}.flatten.flatten.toSet

propertyIriOntologies ++ valueOntologies + resource.resourceClassIri.getOntologyFromEntity
}

// #generateJsonLD
private def generateJsonLD(targetSchema: ApiV2Schema, settings: SettingsImpl, schemaOptions: Set[SchemaOption]): JsonLDDocument = {
// #generateJsonLD
Expand All @@ -843,14 +854,7 @@ case class ReadResourcesSequenceV2(resources: Seq[ReadResourceV2],
// Make JSON-LD prefixes for the project-specific ontologies used in the response.

val projectSpecificOntologiesUsed: Set[SmartIri] = resources.flatMap {
resource =>
val resourceOntology = resource.resourceClassIri.getOntologyFromEntity

val propertyOntologies = resource.values.keySet.map {
property => property.getOntologyFromEntity
}

propertyOntologies + resourceOntology
resource => getOntologiesFromResource(resource)
}.toSet.filter(!_.isKnoraBuiltInDefinitionIri)

// Make the knora-api prefix for the target schema.
Expand Down
Expand Up @@ -744,7 +744,7 @@ class ValuesResponderV1(responderData: ResponderData) extends Responder(responde
// If we're updating a link, findResourceWithValueResult will contain the IRI of the property that points to the
// knora-base:LinkValue, but we'll need the IRI of the corresponding link property.
val propertyIri = changeValueRequest.value match {
case linkUpdateV1: LinkUpdateV1 => stringFormatter.linkValuePropertyIri2LinkPropertyIri(findResourceWithValueResult.propertyIri)
case linkUpdateV1: LinkUpdateV1 => stringFormatter.linkValuePropertyIriToLinkPropertyIri(findResourceWithValueResult.propertyIri)
case _ => findResourceWithValueResult.propertyIri
}

Expand Down Expand Up @@ -1075,7 +1075,7 @@ class ValuesResponderV1(responderData: ResponderData) extends Responder(responde
case (p, o) => p == OntologyConstants.KnoraBase.HasPermissions
}.map(_._2).getOrElse(throw InconsistentTriplestoreDataException(s"Value ${deleteValueRequest.valueIri} has no permissions"))

val linkPropertyIri = stringFormatter.linkValuePropertyIri2LinkPropertyIri(findResourceWithValueResult.propertyIri)
val linkPropertyIri = stringFormatter.linkValuePropertyIriToLinkPropertyIri(findResourceWithValueResult.propertyIri)

for {
// Get project info
Expand Down
Expand Up @@ -1210,6 +1210,7 @@ class ResourcesResponderV2(responderData: ResponderData) extends ResponderWithSt
apiResponse: ReadResourcesSequenceV2 <- ConstructResponseUtilV2.createApiResponse(
mainResourcesAndValueRdfData = mainResourcesAndValueRdfData,
orderByResourceIri = resourceIrisDistinct,
pageSizeBeforeFiltering = resourceIris.size, // doesn't matter because we're not doing paging
mappings = mappingsAsMap,
queryStandoff = queryStandoff,
versionDate = versionDate,
Expand Down Expand Up @@ -1264,6 +1265,7 @@ class ResourcesResponderV2(responderData: ResponderData) extends ResponderWithSt
apiResponse: ReadResourcesSequenceV2 <- ConstructResponseUtilV2.createApiResponse(
mainResourcesAndValueRdfData = mainResourcesAndValueRdfData,
orderByResourceIri = resourceIrisDistinct,
pageSizeBeforeFiltering = resourceIris.size, // doesn't matter because we're not doing paging
mappings = Map.empty[IRI, MappingAndXSLTransformation],
queryStandoff = false,
versionDate = None,
Expand Down

0 comments on commit 9313b88

Please sign in to comment.