Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(gravsearch): Prevent duplicate results #1626

Merged
merged 47 commits into from Apr 16, 2020
Merged

Conversation

loicjaouen
Copy link
Collaborator

@loicjaouen loicjaouen commented Mar 10, 2020

fixes #1588

a simple query in the available anything data set:

PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>
PREFIX anything: <http://0.0.0.0:3333/ontology/0001/anything/v2#>
CONSTRUCT {
    ?thing knora-api:isMainResource true .
    ?thing anything:hasInteger ?int .
} WHERE {
    ?thing a knora-api:Resource .
    ?thing a anything:Thing .
    {
        ?thing anything:hasRichtext ?richtext .
        ?richtext knora-api:valueAsString ?richtextLitteral
        FILTER knora-api:match(?richtextLitteral, "test")

		?thing anything:hasInteger ?int .
		?int knora-api:intValueAsInt 1
    }
    UNION
    {
        ?thing anything:hasText ?text .
        ?text knora-api:valueAsString ?textLitteral
        FILTER knora-api:match(?textLitteral, "test")

		?thing anything:hasInteger ?int .
		?int knora-api:intValueAsInt 1
    }
}
order by (?int)
  • gives a match count of one with the route: /v2/searchextended/count
  • gives a result set of two with the route: /v2/searchextended

the in and outs of this have been discussed in the above mentionned #1588

if ever it helps, here are the generated sparql requests, for the count:

SELECT DISTINCT (COUNT(DISTINCT ?thing) AS ?count)
WHERE {
    ?thing <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.knora.org/ontology/knora-base#Resource> .
    GRAPH <http://www.ontotext.com/explicit> {
        ?thing <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
    }
    ?thing <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.knora.org/ontology/0001/anything#Thing> .
    {
        ?thing <http://www.knora.org/ontology/0001/anything#hasRichtext> ?richtext .
        GRAPH <http://www.ontotext.com/explicit> {
            ?richtext <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
        }
        ?richtext <http://www.knora.org/ontology/knora-base#valueHasString> ?richtextLitteral .
        ?thing <http://www.knora.org/ontology/0001/anything#hasInteger> ?int .
        GRAPH <http://www.ontotext.com/explicit> {
            ?int <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
        }
        ?int <http://www.knora.org/ontology/knora-base#valueHasInteger> "1"^^<http://www.w3.org/2001/XMLSchema#integer> .
        GRAPH <http://www.ontotext.com/explicit> {
            ?richtextLitteral <http://www.ontotext.com/owlim/lucene#fullTextSearchIndex> "test"^^<http://www.w3.org/2001/XMLSchema#string> .
        }
    } UNION {
        ?thing <http://www.knora.org/ontology/0001/anything#hasText> ?text .
        GRAPH <http://www.ontotext.com/explicit> {
            ?text <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
        }
        ?text <http://www.knora.org/ontology/knora-base#valueHasString> ?textLitteral .
        ?thing <http://www.knora.org/ontology/0001/anything#hasInteger> ?int .
        GRAPH <http://www.ontotext.com/explicit> {
            ?int <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
        }
        ?int <http://www.knora.org/ontology/knora-base#valueHasInteger> "1"^^<http://www.w3.org/2001/XMLSchema#integer> .
        GRAPH <http://www.ontotext.com/explicit> {
            ?textLitteral <http://www.ontotext.com/owlim/lucene#fullTextSearchIndex> "test"^^<http://www.w3.org/2001/XMLSchema#string> .
        }
    }
}
LIMIT 1

for the search:

SELECT DISTINCT ?thing (GROUP_CONCAT(DISTINCT(?text); SEPARATOR='') AS ?text__Concat) (GROUP_CONCAT(DISTINCT(?int); SEPARATOR='') AS ?int__Concat) (GROUP_CONCAT(DISTINCT(?richtext); SEPARATOR='') AS ?richtext__Concat)
WHERE {
    ?thing <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.knora.org/ontology/knora-base#Resource> .
    GRAPH <http://www.ontotext.com/explicit> {
        ?thing <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
    }
    ?thing <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.knora.org/ontology/0001/anything#Thing> .
    {
        ?thing <http://www.knora.org/ontology/0001/anything#hasRichtext> ?richtext .
        GRAPH <http://www.ontotext.com/explicit> {
            ?richtext <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
        }
        ?richtext <http://www.knora.org/ontology/knora-base#valueHasString> ?richtextLitteral .
        ?thing <http://www.knora.org/ontology/0001/anything#hasInteger> ?int .
        GRAPH <http://www.ontotext.com/explicit> {
            ?int <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
        }
        GRAPH <http://www.ontotext.com/explicit> {
            ?int <http://www.knora.org/ontology/knora-base#valueHasInteger> ?int__valueHasInteger .
        }
        ?int <http://www.knora.org/ontology/knora-base#valueHasInteger> "1"^^<http://www.w3.org/2001/XMLSchema#integer> .
        GRAPH <http://www.ontotext.com/explicit> {
            ?richtextLitteral <http://www.ontotext.com/owlim/lucene#fullTextSearchIndex> "test"^^<http://www.w3.org/2001/XMLSchema#string> .
        }
    } UNION {
        ?thing <http://www.knora.org/ontology/0001/anything#hasText> ?text .
        GRAPH <http://www.ontotext.com/explicit> {
            ?text <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
        }
        ?text <http://www.knora.org/ontology/knora-base#valueHasString> ?textLitteral .
        ?thing <http://www.knora.org/ontology/0001/anything#hasInteger> ?int .
        GRAPH <http://www.ontotext.com/explicit> {
            ?int <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
        }
        ?int <http://www.knora.org/ontology/knora-base#valueHasInteger> "1"^^<http://www.w3.org/2001/XMLSchema#integer> .
        GRAPH <http://www.ontotext.com/explicit> {
            ?textLitteral <http://www.ontotext.com/owlim/lucene#fullTextSearchIndex> "test"^^<http://www.w3.org/2001/XMLSchema#string> .
        }
    }
}
GROUP BY ?thing ?int__valueHasInteger
ORDER BY ASC(?int__valueHasInteger) ASC(?thing)
LIMIT 25

@loicjaouen loicjaouen added API/V2 bug something isn't working Gravsearch labels Mar 10, 2020
@loicjaouen loicjaouen added this to the Backlog milestone Mar 10, 2020
@loicjaouen loicjaouen linked an issue Mar 10, 2020 that may be closed by this pull request
@loicjaouen
Copy link
Collaborator Author

so for now, this is only the test case demonstrating the bug of the issue #1588

Sorry @benjamingeer, I assign this to you as you are also the assignee on #1588

@benjamingeer
Copy link

@loicjaouen What would you like to change in the generated SPARQL SELECT?

We need the GROUP_CONCAT for each variable whose values you want to receive in the API response.

@tobiasschweizer
Copy link
Contributor

I can confirm that omitting MainQueryResultProcessor.getRequestedValuesFromResultsWithFullGraphPattern before returning the Gravsearch results leads to the presence of knora-api:hasIncomingLinkValue.

This is the simplified JSON for the first resource:

{
  "@graph": [
    {
      "@id": "http://rdfh.ch/0803/2e0252679a35",
      "@type": "knora-api:Region",
      "knora-api:isRegionOfValue": {
        "@id": "http://rdfh.ch/0803/2e0252679a35/values/498d07ec-dddb-4051-bfd1-62626f282fc5",
        "@type": "knora-api:LinkValue",
        "knora-api:linkValueHasTarget": {
          "@id": "http://rdfh.ch/0803/3f89a693a501",
          "@type": "http://0.0.0.0:3333/ontology/0803/incunabula/v2#page",
          "http://0.0.0.0:3333/ontology/0803/incunabula/v2#partOfValue": {
            "@id": "http://rdfh.ch/0803/3f89a693a501/values/133ea4ce-b9a5-4c66-a3ab-38be07ccd201",
            "@type": "knora-api:LinkValue",
            "knora-api:linkValueHasTarget": {
              "@id": "http://rdfh.ch/0803/ff17e5ef9601",
              "@type": "http://0.0.0.0:3333/ontology/0803/incunabula/v2#book",
              "http://0.0.0.0:3333/ontology/0803/incunabula/v2#title": {
                "@id": "http://rdfh.ch/0803/ff17e5ef9601/values/d9a522845006",
                "@type": "knora-api:TextValue",
                "knora-api:valueAsString": "Zeitglöcklein des Lebens und Leidens Christi",
              },
              "knora-api:hasIncomingLinkValue": [
                {
                  "@id": "http://rdfh.ch/0803/6240ce899801/values/e29da7bf-bfa5-4842-a4fc-e455c72dbb0a",
                  "@type": "knora-api:LinkValue",
                  "knora-api:linkValueHasSource": {
                    "@id": "http://rdfh.ch/0803/6240ce899801",
                    "@type": "http://0.0.0.0:3333/ontology/0803/incunabula/v2#page",
                    "http://0.0.0.0:3333/ontology/0803/incunabula/v2#partOfValue": {
                      "@id": "http://rdfh.ch/0803/6240ce899801/values/e29da7bf-bfa5-4842-a4fc-e455c72dbb0a",
                      "@type": "knora-api:LinkValue",
                      "knora-api:linkValueHasTargetIri": {
                        "@id": "http://rdfh.ch/0803/ff17e5ef9601"
                      }
                    },
                    "rdfs:label": "a3v"
                  }
                },
                {
                  "@id": "http://rdfh.ch/0803/c568b7239a01/values/46951e64-8f06-47fb-a07a-c8b19c146447",
                  "@type": "knora-api:LinkValue",
                  "knora-api:linkValueHasSource": {
                    "@id": "http://rdfh.ch/0803/c568b7239a01",
                    "@type": "http://0.0.0.0:3333/ontology/0803/incunabula/v2#page",
                    "http://0.0.0.0:3333/ontology/0803/incunabula/v2#partOfValue": {
                      "@id": "http://rdfh.ch/0803/c568b7239a01/values/46951e64-8f06-47fb-a07a-c8b19c146447",
                      "@type": "knora-api:LinkValue",
                      "knora-api:linkValueHasTargetIri": {
                        "@id": "http://rdfh.ch/0803/ff17e5ef9601"
                      }
                    },
                    "rdfs:label": "a7r"
                  }
                }
              ],
              "rdfs:label": "Zeitglöcklein des Lebens und Leidens Christi"
            }
          },
          "rdfs:label": "d8r"
        }
      },
      "rdfs:label": "DEMOX"
    }
  ],
  "@context": {
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "knora-api": "http://api.knora.org/ontology/knora-api/v2#"
  }
}

@benjamingeer
Copy link

After talking with @tobiasschweizer, we decided to handle further refactoring in a separate PR (see issue #1632).

Benjamin Geer added 7 commits March 20, 2020 14:28
# Conflicts:
#	webapi/src/main/scala/org/knora/webapi/responders/v2/search/MainQueryResultProcessor.scala
- Simplify Gravsearch paging.
- Add design docs.
# Conflicts:
#	docs/src/paradox/05-internals/design/api-v2/gravsearch.md
# Conflicts:
#	docs/src/paradox/05-internals/design/api-v2/gravsearch.md
#	webapi/src/main/scala/org/knora/webapi/responders/v2/SearchResponderV2.scala
#	webapi/src/main/scala/org/knora/webapi/responders/v2/search/MainQueryResultProcessor.scala
#	webapi/src/main/scala/org/knora/webapi/responders/v2/search/QueryTraverser.scala
#	webapi/src/main/scala/org/knora/webapi/responders/v2/search/gravsearch/prequery/AbstractPrequeryGenerator.scala
#	webapi/src/main/scala/org/knora/webapi/responders/v2/search/gravsearch/prequery/NonTriplestoreSpecificGravsearchToPrequeryGenerator.scala
Benjamin Geer added 3 commits April 8, 2020 17:46
# Conflicts:
#	docs/src/paradox/05-internals/design/api-v2/gravsearch.md
#	webapi/src/main/scala/org/knora/webapi/messages/v2/responder/resourcemessages/ResourceMessagesV2.scala
#	webapi/src/main/scala/org/knora/webapi/responders/v2/ResourcesResponderV2.scala
#	webapi/src/main/scala/org/knora/webapi/responders/v2/SearchResponderV2.scala
#	webapi/src/main/scala/org/knora/webapi/responders/v2/StandoffResponderV2.scala
#	webapi/src/main/scala/org/knora/webapi/util/ConstructResponseUtilV2.scala
#	webapi/src/test/scala/org/knora/webapi/util/ConstructResponseUtilV2Spec.scala
@tobiasschweizer
Copy link
Contributor

@benjamingeer I will review this PR today in the afternoon.

@tobiasschweizer
Copy link
Contributor

tobiasschweizer commented Apr 9, 2020

Doing this review will also help me understand our own paper :-)

Copy link
Contributor

@tobiasschweizer tobiasschweizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a first look, please see my comments. However, I would appreciate if I could talk to you on video chat some time next week to make sure I understand the changes introduced by this PR.

The prequery's SELECT clause is built by
`NonTriplestoreSpecificGravsearchToPrequeryTransformer.getSelectColumns`,
based on the variables used in the input query's `CONSTRUCT` clause.
The resulting SELECT clause looks as follows:

```sparql
SELECT DISTINCT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unaware of the construct (GROUP_CONCAT(DISTINCT(IF(BOUND(?book), STR(?book), "")); SEPARATOR='') AS ?book__Concat) (82f8a55). I presume this means that if the variable is bound, an IRI is returned, otherwise an empty string.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is explained in the paragraph starting on line 211:

Each GROUP_CONCAT checks whether the concatenated variable is bound in each result in the group; if a variable is unbound, we concatenate an empty string. This is necessary because, in Apache Jena (and perhaps other triplestores), "If GROUP_CONCAT has an unbound value in the list of values to concat, the overall result is 'error'" (see this Jena issue).

@@ -281,7 +282,8 @@ class SearchResponderV2(responderData: ResponderData) extends ResponderWithStand

// Create a Select prequery

nonTriplestoreSpecificConstructToSelectTransformer: NonTriplestoreSpecificGravsearchToCountPrequeryGenerator = new NonTriplestoreSpecificGravsearchToCountPrequeryGenerator(
nonTriplestoreSpecificConstructToSelectTransformer: NonTriplestoreSpecificGravsearchToCountPrequeryTransformer = new NonTriplestoreSpecificGravsearchToCountPrequeryTransformer(
constructClause = inputQuery.constructClause,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you need to know about the Construct clause because you only want to return the IRIs of the resources and values that the client asks for.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

* the search criteria. This query will be used to get resource IRIs for a single page of results. These IRIs will be included in a CONSTRUCT
* query to get the actual results for the page.
*
* @param typeInspectionResult the result of type inspection of the original query.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add all args to the doc string?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 8d4fdfb.


/**
* Transforms a preprocessed CONSTRUCT query into a SELECT query that returns only the IRIs and sort order of the main resources that matched
* the search criteria. This query will be used to get resource IRIs for a single page of results. These IRIs will be included in a CONSTRUCT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment should include additional information such as:

Transforms a preprocessed CONSTRUCT query into a SELECT query that returns only the IRIs and sort order of the main resources that matched the search criteria and are requested by client in the input query's WHERE clause.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 8d4fdfb.

@@ -0,0 +1,345 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has NonTriplestoreSpecificGravsearchToPrequeryGenerator.scala been renamed / moved to NonTriplestoreSpecificGravsearchToPrequeryTransformer? If yes, I don't see a diff (it looks like the files are unrelated) and it's hard for me to see what has changed exactly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will attach the diff below.

@benjamingeer
Copy link

Here's the diff you asked for:

prequery.diff.zip

@benjamingeer
Copy link

I would appreciate if I could talk to you on video chat some time next week to make sure I understand the changes introduced by this PR.

No problem. I'll take a couple of days of holiday next week. How about we talk on Friday 16 April?

@tobiasschweizer
Copy link
Contributor

I would appreciate if I could talk to you on video chat some time next week to make sure I understand the changes introduced by this PR.

No problem. I'll take a couple of days of holiday next week. How about we talk on Friday 16 April?

Works for me. I would prefer to have an online meeting before 12. Would 10 o'clock work for you?

@benjamingeer
Copy link

Would 10 o'clock work for you?

Perfect!

@tobiasschweizer
Copy link
Contributor

tobiasschweizer commented Apr 9, 2020

Would 10 o'clock work for you?

Perfect!

The appointment you created is on a Thursday, but that's also ok for me.

Copy link
Contributor

@tobiasschweizer tobiasschweizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the additional comments and adaptions.

@benjamingeer
Copy link

Thanks for the review!

@benjamingeer benjamingeer merged commit 9313b88 into develop Apr 16, 2020
@benjamingeer benjamingeer deleted the wip/1588_duplicates branch April 16, 2020 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API/V2 bug something isn't working Gravsearch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Duplicated resources in Gravsearch result
3 participants