Skip to content

Use Case: Does DBpedia use PROV?

timrdf edited this page Jan 16, 2013 · 23 revisions

What is first

  • Getting started
  • We use [SADI](SADI Semantic Web Services framework) as part of the solution.

What we will cover

This page describes how to use DataFAQs to determine which -- if any -- PROV-O terms are asserted in DBPedia's SPARQL endpoint. It shows how to reuse existing Linked Data and [SADI](SADI Semantic Web Services framework) services to answer the question. BTW, the answer is that DBPedia only uses http://www.w3.org/ns/prov#wasDerivedFrom, and it uses it 11,547,302 times.

Let's get to it!

First, we need to find out where DBPedia's SPARQL endpoint is, and we can find that out using Linked Data. Dereference DBPedia's URI to find out (unfortunately, CKAN does not return Turtle):

$ curl -H "Accept: application/rdf+xml" -L http://datahub.io/dataset/dbpedia

  <dcat:Dataset rdf:about="http://datahub.io/dataset/dbpedia">
...
        <dcat:distribution>
            <dcat:Distribution>
                <dcat:accessURL rdf:resource="http://dbpedia.org/sparql"></dcat:accessURL>
                    <dct:format>
                        <dct:IMT>
                            <rdf:value>api/sparql</rdf:value>
                            <rdfs:label>api/sparql</rdfs:label>
                        </dct:IMT>
                    </dct:format>
            </dcat:Distribution>
        </dcat:distribution>
...

Because CKAN doesn't always provide the best structured RDF, we can POST DBPedia's URI (http://datahub.io/dataset/dbpedia) to the SADI service that does provide a nice RDF description: http://aquarius.tw.rpi.edu/projects/datafaqs/services/sadi/ckan/lift-ckan (source code). Dereferencing the URI provides a good enough representation for the SPARQL endpoint, but doesn't provide other details very well.

(Note: the SADI services deployed under http://aquarius.tw.rpi.edu/projects/datafaqs/services are currently broken with a HTTP 500, but the work fine if deployed on your local machine by running python lift-ckan.py from the same directory)

$ cat dbpedia.ttl 
@prefix dcat:     <http://www.w3.org/ns/dcat#> .
@prefix datafaqs: <http://purl.org/twc/vocab/datafaqs#> .

<http://datahub.io/dataset/dbpedia> a dcat:Dataset, datafaqs:CKANDataset;

$ curl -H "Content-Type: text/turtle" -d @dbpedia.ttl http://aquarius.tw.rpi.edu/projects/datafaqs/services/sadi/ckan/lift-ckan

<http://datahub.io/dataset/dbpedia> a datafaqs:CKANDataset;
...
    void:sparqlEndpoint <http://dbpedia.org/sparql>;
...

Next, we POST the RDF description of DBPedia (which includes the reference to the SPARQL endpoint) to the SADI service http://aquarius.tw.rpi.edu/projects/datafaqs/services/sadi/faqt/vocabulary/uses/prov (source code). We can post either the RDF that we got from dereferencing DBPedia's URI, or the RDF that we got from the lifting SADI service.

(Note: the SADI services deployed under http://aquarius.tw.rpi.edu/projects/datafaqs/services are currently broken with a HTTP 500, but the work fine if deployed on your local machine by running python prov.py from the same directory)

$ grep "sparql" dbpedia-deref.ttl 
...
dcat:accessURL <http://dbpedia.org/sparql>
...

$ curl -H "Content-Type: text/turtle" -d @dbpedia-deref.ttl http://aquarius.tw.rpi.edu/projects/datafaqs/services/sadi/faqt/vocabulary/uses/prov
...
prov:wasDerivedFrom a rdf:Property;
    sio:count 11547302 .
...

$ grep "sparql" dbpedia-lifted.ttl 
...
    void:sparqlEndpoint <http://dbpedia.org/sparql>;
...

$ curl -H "Content-Type: text/turtle" -d @dbpedia-lifted.ttl http://aquarius.tw.rpi.edu/projects/datafaqs/services/sadi/faqt/vocabulary/uses/prov
...
prov:wasDerivedFrom a rdf:Property;
    sio:count 11547302 .
...

What is next

  • The series of calls shown on this page can be specified by writing an RDF configuration (which uses PROV-O); see FAqT Brick.
Clone this wiki locally