Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query optimisation. #1240

Open
wants to merge 1 commit into
base: skosmos-2
Choose a base branch
from
Open

Query optimisation. #1240

wants to merge 1 commit into from

Conversation

pulquero
Copy link

@pulquero pulquero commented Nov 9, 2021

No description provided.

@sonarcloud
Copy link

sonarcloud bot commented Nov 9, 2021

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@osma
Copy link
Member

osma commented Nov 10, 2021

Thanks for the PR @pulquero !

Can you explain what you did here in a bit more detail?

Which operation was slow, and how much does the optimisation help?

Are there any side effects that you are aware of?

@pulquero
Copy link
Author

For my data it was milliseconds vs minutes.

Added my observations as comments below:

 SELECT ?object ?label (GROUP_CONCAT(STR(?dir);separator=' ') as ?direct)
 WHERE {
    <$uri> a skos:Concept .
    OPTIONAL {
      <$uri> $propertyClause* ?object . # ?object may not be bound, but looks like we only care about ?object being bound, what is the reason for this being in an optional?
      OPTIONAL {
        ?object $propertyClause ?dir .
      }
    }
    OPTIONAL {
      ?object skos:prefLabel ?label . # only has an effect if ?object is bound, else it has no correlation with the non-optional part.
      FILTER (langMatches(lang(?label), "$lang"))
    }
    $otherlang
  }
  GROUP BY ?object ?label

@osma
Copy link
Member

osma commented Nov 12, 2021

Thanks for the details. It's still not entirely clear to me which operation was slow from the user perspective. The function in question (generateTransitivePropertyQuery) is a rather low level one and is used, indirectly, at least to generate the QL query used for querying breadcrumb paths in the web UI, but also for some of the REST API methods. It would be good to know e.g. which direction is relevant here (transitive broaders - like in the breadcrumbs - or transitive narrowers?)

Also, what does your data look like? Is the hierarchy somehow big or complicated since the query ends up taking minutes? This hasn't been a big performance issue in the past for us, that's why I'm asking.

Also, which triple store? We're using Fuseki mostly, but are you perhaps using GraphDB as in your other PR?

@pulquero
Copy link
Author

I'm using graphdb, and my vocabulary consists of a million skos:Concept. It is in the default graph alongside other vocabularies and cross-referenced. I think the avg tree depth is about 3. I believe

<$uri> a skos:Concept
OPTIONAL {
?object skos:prefLabel ?label . # only has an effect if ?object is bound, else it has no correlation with the non-optional part.
FILTER (langMatches(lang(?label), "$lang"))
}

results in a cartesian product with ?object skos:prefLabel ?label matching everything in the entire default graph.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Proposed items for this sprint
Development

Successfully merging this pull request may close these issues.

None yet

2 participants