Skip to content

Issue 280, quering Wikidata, OLD QUERIES

Peter edited this page Jul 12, 2017 · 1 revision

(appendix of Issue 280, quering Wikidata)

... to get back the information, we need "figure out the SPARQL for query.wikidata.org that would extract these mappings", as @danbri suggested.

Quering and exporting

Test results at query.wikidata.org

Simplest test

The "wanted universe" is provided by a simple query, and perhaps works fine for a local Wikidata user (at the query.wikidata.org's server without timeout restrictions), is like

SELECT * WHERE {
  ?x ?eqv ?s . 
  FILTER (?eqv = wdt:P1709 || ?eqv = wdt:P1628 || ?eqv = wdt:P2235 || ?eqv = wdt:P2888) .
  FILTER (?s = schema:Person)
}

Instead FILTER (?s = schema:Person), need a kind of prefixed wildcard (imagine schema:*)... Using regex, for example FILTER( REGEX(STR(?s), "schema.org") ), it produces an error, "Query deadline is expired", even when using LIMIT 1 clause.

A workaround is to use "less generic" quering... It works fine!

SELECT * WHERE {  
       {?p wdt:P2235 ?s.}
       UNION { ?p wdt:P2236 ?s. }
       UNION { ?p wdt:P1628 ?s. }
       UNION { ?p wdt:P1709 ?s. }
       UNION { ?p wdt:P2888 ?s. }
       FILTER( REGEX(STR(?s), "schema.org") )
}

Add . FILTER( REGEX(STR(?x), "Q") ) (or "P") to list only Wikidata-entities or only Wikidata-properties.

First sparql query convention

The @thadguidry solution to get relationship information (equivclass, equivprop, sub or super) is the "standard query" for export result to other algorithms, databases or spreadsheets.

SELECT ?pLabel ?p ?equivclass ?equivprop ?sub ?super ?exact  
WHERE {
  { ?p wdt:P2235 ?super. }
  UNION
  { ?p wdt:P2236 ?sub. }
  UNION
  { ?p wdt:P1628 ?equivprop. }
  UNION
  { ?p wdt:P1709 ?equivclass. }
  UNION
  { ?p wdt:P2888 ?exact. }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
  FILTER(
    (REGEX(STR(?equivprop), "schema.org")) 
    || (REGEX(STR(?sub), "schema.org")) 
    || (REGEX(STR(?super), "schema.org")) 
    || (REGEX(STR(?equivclass), "schema.org"))
    || (REGEX(STR(?exact), "schema.org"))
  )
}

It generates an sparse matrix (many null cells), but can be managed by SQL... It was used as standard sparql query of this task until June 2017.

Current standard sparql query

As suggested by @VladimirAlexiev here, there are a good way to query directly the corrName as a column in the SparQL query:

SELECT ?wd ?wdLabel ?corrName ?schema
{
  values (?corr ?corrName) 
    { (wdt:P2235 "superProp") (wdt:P2236 "subProp") (wdt:P1628 "equivProp") 
      (wdt:P1709 "equivClass") (wdt:P2888 "exactMatch")
    }
  ?wd ?corr ?schema
  filter(regex(str(?schema), "schema.org"))
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} order by ?corrName ?schema