Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the FILTER EXISTS from organization_page-production.sparql #2208

Open
Daniel-Mietchen opened this issue Dec 27, 2022 · 6 comments · May be fixed by #2209
Open

Remove the FILTER EXISTS from organization_page-production.sparql #2208

Daniel-Mietchen opened this issue Dec 27, 2022 · 6 comments · May be fixed by #2209
Labels
P108-employer Wikidata property P1104-number-of-pages Wikidata property P1416-affiliation Wikidata property SPARQL the way Scholia queries Wikidata

Comments

@Daniel-Mietchen
Copy link
Member

What query is this about

The query in organization_page-production.sparql uses a clause

FILTER EXISTS { ?researcher wdt:P108 | wdt:P463 | (wdt:P1416 / wdt:P361*) target: . }

wherein the FILTER EXISTS part is not really necessary but slows things down considerably.

What change do you propose, and why?

Just remove it:

?researcher wdt:P108 | wdt:P463 | (wdt:P1416 / wdt:P361*) target: .

Any other considerations?

This can be tested with any organization profile, e.g. https://scholia.toolforge.org/organization/Q3530296#page-production .

@Daniel-Mietchen Daniel-Mietchen added SPARQL the way Scholia queries Wikidata P108-employer Wikidata property P1416-affiliation Wikidata property P1104-number-of-pages Wikidata property labels Dec 27, 2022
@Daniel-Mietchen Daniel-Mietchen added this to To do in Organizations via automation Dec 27, 2022
@Daniel-Mietchen Daniel-Mietchen linked a pull request Dec 28, 2022 that will close this issue
10 tasks
@egonw
Copy link
Collaborator

egonw commented Dec 29, 2022

Duplicate of #2176

@egonw egonw marked this as a duplicate of #2176 Dec 29, 2022
@fnielsen
Copy link
Collaborator

fnielsen commented Jan 2, 2023

The problem with the new query is that there might be multiple paths to the target for a researcher, so there might be double count.

@egonw
Copy link
Collaborator

egonw commented Jan 2, 2023

The problem with the new query is that there might be multiple paths to the target for a researcher, so there might be double count.

Can that be solves with a DISTINCT ?

@Daniel-Mietchen
Copy link
Member Author

Daniel-Mietchen commented Jan 8, 2023

I took a look at this:

  • query for folks that have P108 and P463 statements to the same institution: https://w.wiki/6Cam
  • picking one of them, I ran the query for
    • the current live version (i.e. with FILTER EXISTS; https://w.wiki/6Cax ) ==> times out
    • the current version of this PR (i.e. without FILTER EXISTS; https://w.wiki/6Cay ) ==> first image below
    • a modified version (i.e. without FILTER EXISTS and with DISTINCT ?researcher_of_paper; https://w.wiki/6Cb8 ) ==> second image below

image
image

A close inspection of these images does indeed reveal a change in the numbers (most visible for 2002 and 2014).

Perhaps the researchers should be identified in a subquery that uses DISTINCT.

@Daniel-Mietchen
Copy link
Member Author

Here is a version with a dedicated subquery for the researchers: https://w.wiki/6Cc2 .

@Daniel-Mietchen
Copy link
Member Author

I now see that this is essentially what Finn had proposed in #2176 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P108-employer Wikidata property P1104-number-of-pages Wikidata property P1416-affiliation Wikidata property SPARQL the way Scholia queries Wikidata
Projects
Organizations
  
To do
Development

Successfully merging a pull request may close this issue.

3 participants