Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gravsearch): Optimise Gravsearch queries using topological sort (DSP-1327) #1813

Merged
merged 42 commits into from Mar 2, 2021

Conversation

benjamingeer
Copy link

@benjamingeer benjamingeer commented Feb 4, 2021

resolves DSP-1327
resolves DSP-1361

@benjamingeer benjamingeer added enhancement improve existing code or new feature Gravsearch labels Feb 4, 2021
@SepidehAlassi SepidehAlassi marked this pull request as draft February 8, 2021 10:24
@benjamingeer benjamingeer marked this pull request as ready for review February 22, 2021 08:27
@benjamingeer
Copy link
Author

benjamingeer commented Feb 23, 2021

@SepidehAlassi In NonTriplestoreSpecificGravsearchToPrequeryTransformerSpec, in the test reorder query patterns in where clause with union, the input UNION looks like this:

{
    ?thing anything:hasRichtext ?richtext .
    FILTER knora-api:matchText(?richtext, "test")
    ?thing anything:hasInteger ?int .
    ?int knora-api:intValueAsInt 1 .
}
UNION
{
    ?thing anything:hasText ?text .
    FILTER knora-api:matchText(?text, "test")
    ?thing anything:hasInteger ?int .
    ?int knora-api:intValueAsInt 3 .
}

After topological sort, it looks like this:

{
    ?thing anything:hasRichtext ?richtext .
    ?int knora-api:intValueAsInt 1 .
    ?thing anything:hasInteger ?int .
    FILTER(knora-api:matchText(?richtext, "test"))
} UNION {
    ?thing anything:hasText ?text .
    ?int knora-api:intValueAsInt 3 .
    ?thing anything:hasInteger ?int .
    FILTER(knora-api:matchText(?text, "test"))
}

I'm wondering if there's a way to make ?int knora-api:intValueAsInt 1 . come before ?thing anything:hasRichtext ?richtext .

Copy link
Contributor

@SepidehAlassi SepidehAlassi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for all your work on this!

@benjamingeer
Copy link
Author

Thanks for doing most of it and for reviewing!

Copy link
Contributor

@SepidehAlassi SepidehAlassi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just remembered we have not added any documentation about topological sorting and its feature toggle. Let's add that before merging this.

@SepidehAlassi
Copy link
Contributor

SepidehAlassi commented Feb 26, 2021

BTW,

I'm wondering if there's a way to make ?int knora-api:intValueAsInt 1 . come before ?thing anything:hasRichtext ?richtext .

well, topological sorting works with the degree of nodes; here both ?thing and ?int are of 2nd degree, so both permutations (?thing, ?int, ?richtext, 1) and (?thing, ?int, 1, ?richtext) are valid orders. We can add a logic to consider the degree of nodes with respect to the number of outgoing edges instead of the number of total edges. In that case, ?int would be of 1st degree and ?thing of 2nd degree.
An easy approach to do this without changing the logic of findAllTopologicalOrders would be the following:
after calculating all permutations of topological sorting and obtaining results of findOrdersNotEndingWithObjectOfRdfType, instead of choosing any of ordersNotEndingWithObjectOfRdfType, we choose the one ordered with the number of outgoing edges.
That means; considering the example above, ordersNotEndingWithObjectOfRdfType would contain o1=(?thing, ?int, ?richtext, 1) and o2=(?thing, ?int, 1, ?richtext). Putting the root element, ?thing aside (because it is not an object of any statement), we take each node in these orders, find the statement it is the object of, calculate the number of outgoing edges of the node representing the subject of this statement. That would give for o1 -> (2, 2, 1) and for o2 -> (2,1,2). Then we choose the permutation which ends with the lowest number, which would be o1.

@SepidehAlassi
Copy link
Contributor

@benjamingeer So I adjusted the code to get all permutations of the topological order according to Kahn's algorithm. For the following query

{
    ?thing anything:hasRichtext ?richtext .
    FILTER knora-api:matchText(?richtext, "test")
    ?thing anything:hasInteger ?int .
    ?int knora-api:intValueAsInt 1 .
}
UNION
{
    ?thing anything:hasText ?text .
    FILTER knora-api:matchText(?text, "test")
    ?thing anything:hasInteger ?int .
    ?int knora-api:intValueAsInt 3 .
}

After the topological sort, now it looks like this:

{
   ?int knora-api:intValueAsInt 1 .
    ?thing anything:hasRichtext ?richtext .
    ?thing anything:hasInteger ?int .
    FILTER(knora-api:matchText(?richtext, "test"))
} UNION {
    ?int knora-api:intValueAsInt 3 .
    ?thing anything:hasText ?text .
    ?thing anything:hasInteger ?int .
    FILTER(knora-api:matchText(?text, "test"))
}

As you can see with the corrected version of the code, ?int knora-api:intValueAsInt 1 . comes before ?thing anything:hasRichtext ?richtext .

For the first block of the union, there are two valid topological sorting permutations: O1=(?thing, ?richtext, ?int, 1) and O2=(?thing, ?int, ?richtext, 1). As can be seen above, O2 is chosen to sort the statements. However, I believe maybe we should define logic to prefer O1 instead to make the query even faster. What do you think?

@benjamingeer
Copy link
Author

So I adjusted the code to get all permutations of the topological order according to Kahn's algorithm.

Excellent!

However, I believe maybe we should define logic to prefer O1 instead to make the query even faster. What do you think?

Sounds good to me, how do we do that?

@SepidehAlassi
Copy link
Contributor

Sounds good to me, how do we do that?

I wrote down the idea in DSP-1327. This can be implemented in a separate PR.

@SepidehAlassi
Copy link
Contributor

@benjamingeer I added the documentation, can you please proof-read it and add any missing points?

@benjamingeer
Copy link
Author

This looks good to me, I proofread the docs. If you approve the PR we can merge it.

@SepidehAlassi SepidehAlassi self-requested a review March 2, 2021 09:34
Copy link
Contributor

@SepidehAlassi SepidehAlassi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benjamingeer thank you.

@benjamingeer benjamingeer merged commit efbecee into main Mar 2, 2021
@benjamingeer benjamingeer deleted the wip/DSP-1327-gravsearch branch March 2, 2021 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement improve existing code or new feature Gravsearch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants