Skip to content

FAqT Services

Tim L edited this page Apr 7, 2014 · 72 revisions

What is first

What we will cover

This page describes different FAqT Services.

Let's get to it

lift-ckan

The lift-ckan service accepts a CKAN dataset URI and returns a VoID description.

$ python lift-ckan.py 
lift-ckan running on port 9225. Invoke it with:
curl -H "Content-Type: text/turtle" -d @my.ttl http://localhost:9225/lift-ckan

The SADI Java implementation demands RDF/XML, so we have to dumb down our syntax to call it:

$ cat manual/elviajero.ttl
@prefix rdfs:     <http://www.w3.org/2000/01/rdf-schema#> .
@prefix datafaqs: <http://purl.org/twc/vocab/datafaqs#> .
@prefix prov:     <http://www.w3.org/ns/prov#> .

<http://datahub.io/dataset/elviajero> a datafaqs:CKANDataset .

$ rapper -i turtle -o rdfxml manual/elviajero.ttl > manual/elviajero.ttl.rdf
rapper: Parsing URI file:///home/lebot/prizms/lodcloud/data/source/us/lod-tag/version/2014-Apr-06/manual/elviajero.ttl with parser turtle
rapper: Serializing with serializer rdfxml
rapper: Parsing returned 1 triple

$ curl -d @manual/elviajero.ttl.rdf http://lodcloud.tw.rpi.edu/sadi-services/lift-ckan

Uses:

wikipedia-2-dbpedia

DBPedia URI from Wikipedia page URL

Following SADI in Java take 2:

Need to get from http://en.wikipedia.org/wiki/.us to http://dbpedia.org/resource/.us

Need to know what type http://en.wikipedia.org/wiki/.us is, and need to decorate it with foaf:isPrimaryTopicOf (or, similar) the DBPedia URI.

Dereferencing http://dbpedia.org/resource/.us gives:

<http://en.wikipedia.org/wiki/.us>
    foaf:primaryTopic <http://dbpedia.org/resource/.us> .

So, http://en.wikipedia.org/wiki/.us is a foaf:Document (by foaf:primaryTopic's domain), but we can get more specific by using http://www.geonames.org/ontology#WikipediaArticle.

When http://en.wikipedia.org/wiki/.us comes out of the SADI service, we want it to be decorated with a foaf:primaryTopic to its corresponding DBPedia URI (so, min 1 on foaf:primaryTopic).

mvn ca.wilkinsonlab.sadi:sadi-generator:generate-service -DserviceClass=edu.rpi.tw.data.naming.WikipediaPrimaryTopic -DinputClass=http://www.geonames.org/ontology#WikipediaArticle -DoutputClass=http://purl.org/twc/vocab/data-carver#DocumentWithPrimaryTopic -DcontactEmail=lebot@rpi.edu -DserviceName=wikipedia-2-dbpedia

Deployed at http://datafaqstest.tw.rpi.edu/sadi-services/wikipedia-2-dbpedia

search-lov-v1

Linked Open Vocabularies offers an API that is documented at http://lov.okfn.org/dataset/lov/apidoc/. For example, http://lov.okfn.org/dataset/lov/api/v1/search?q=Movie returns

{
   "count":31,
   "offset":0,
   "limit":15,
   "search_query":"Movie",
   "search_type":null,
   "search_vocSpace":null,
   "search_voc":null,
   ...
   "params":{
      "maxScore_nbMainLabel":1,
      "maxScore_nbSecLabel":1,
      "maxLovNbOcc":23,
      "maxLovNbVoc":2,
      "maxLodNbOcc":101061,
      "weightScoreNbMainLabel":0.5,
      "weightScoreNbSecLabel":0.0,
      "weightRatioSearchWordsInLabels":1.0,
      "weightLovNbOcc":0.0,
      "weightLovNbVoc":0.0,
      "weightLodNbOcc":0.7
   },
   "results":[
      {
         "uri":"http://schema.org/Movie",
         "uriPrefixed":"schema:Movie",
         "vocabulary":"http://schema.org/",
         "vocabularyLOVLink":"http://lov.okfn.org/dataset/lov/details/vocabulary_schema.html",
         "vocabularyPrefix":"schema",
         "types":[
            {
               "uri":"http://www.w3.org/2000/01/rdf-schema#Class",
               "uriPrefixed":"rdfs:Class"
            },
            {
               "uri":"http://www.w3.org/2002/07/owl#Thing",
               "uriPrefixed":"owl:Thing"
            }
         ],
         "vocSpaces":[
            {
               "uri":"http://lov.okfn.org/dataset/lov/lov#SCHEMA",
               "label":"Schema",
               "lovLink":"http://lov.okfn.org/dataset/lov/details/vocabularySpace_Schema.html"
            },
            {
               "uri":"http://lov.okfn.org/dataset/lov/lov#WEB",
               "label":"Data & Systems",
               "lovLink":"http://lov.okfn.org/dataset/lov/details/vocabularySpace_Data+%26+Systems.html"
            },
            {
               "uri":"http://lov.okfn.org/dataset/lov/lov#LOV",
               "label":"All",
               "lovLink":"http://lov.okfn.org/dataset/lov/details/vocabularySpace_All.html"
            }
         ],
         "matches":[
            {
               "property":"http://www.w3.org/2000/01/rdf-schema#label",
               "propertyPrefixed":"rdfs:label",
               "value":"Movie",
               "valueShort":"<b>Movie</b>"
            },
            {
               "property":"http://www.w3.org/2000/01/rdf-schema#comment",
               "propertyPrefixed":"rdfs:comment",
               "value":"A movie.",
               "valueShort":"A <b>movie</b>."
            }
         ],
         "score":0.6818182,
         "score_nbMainLabel":1,
         "score_nbSecLabel":1,
         "bestRatioSearchWordsInLabels":1.0,
         "lovNbOcc":1,
         "lovNbVoc":1,
         "lodNbOcc":0,
         "uricontainsSearchWords":true
      },


      {
         "uri":"http://www.ontotext.com/proton/protonext#Movie",
         "uriPrefixed":"pext:Movie",
         "vocabulary":"http://www.ontotext.com/proton/protonext",
         "vocabularyLOVLink":"http://lov.okfn.org/dataset/lov/details/vocabulary_pext.html",
         "vocabularyPrefix":"pext",
         "types":[
            {
               "uri":"http://www.w3.org/2002/07/owl#Class",
               "uriPrefixed":"owl:Class"
            },
            {
               "uri":"http://www.w3.org/2000/01/rdf-schema#Class",
               "uriPrefixed":"rdfs:Class"
            },
            {
               "uri":"http://www.w3.org/2002/07/owl#Thing",
               "uriPrefixed":"owl:Thing"
            }
         ],
         "vocSpaces":[
            {
               "uri":"http://lov.okfn.org/dataset/lov/lov#PROTON",
               "label":"PROTON",
               "lovLink":"http://lov.okfn.org/dataset/lov/details/vocabularySpace_PROTON.html"
            },
            {
               "uri":"http://lov.okfn.org/dataset/lov/lov#UPMETA",
               "label":"Upper & Meta",
               "lovLink":"http://lov.okfn.org/dataset/lov/details/vocabularySpace_Upper+%26+Meta.html"
            },
            {
               "uri":"http://lov.okfn.org/dataset/lov/lov#LOV",
               "label":"All",
               "lovLink":"http://lov.okfn.org/dataset/lov/details/vocabularySpace_All.html"
            }
         ],
         "matches":[
            {
               "property":"http://www.w3.org/2000/01/rdf-schema#label",
               "propertyPrefixed":"rdfs:label",
               "value":"Movie <span style='color:#093;'>@en</span>",
               "valueShort":"<b>Movie</b> <span style='color:#093;'>@en</span>"
            },
            {
               "property":"http://www.w3.org/2000/01/rdf-schema#comment",
               "propertyPrefixed":"rdfs:comment",
               "value":"A film, also called a movie or motion picture, is a series of still or moving images. Wikipedia. <span style='color:#093;'>@en</span>",
               "valueShort":"...film, also called a <b>movie</b> or motion picture, ... <span style='color:#093;'>@en</span>"
            }
         ],
         "score":0.6818182,
         "score_nbMainLabel":1,
         "score_nbSecLabel":1,
         "bestRatioSearchWordsInLabels":1.0,
         "lovNbOcc":0,
         "lovNbVoc":0,
         "lodNbOcc":0,
         "uricontainsSearchWords":true
      },
     ...
   ]
}

First, make the stub:

mvn ca.wilkinsonlab.sadi:sadi-generator:generate-service -DserviceClass=edu.rpi.tw.data.lov.SearchLOVv1 -DinputClass=http://www.w3.org/ns/prov#Entity -DoutputClass=http://www.w3.org/ns/prov#Entity -DcontactEmail=lebot@rpi.edu -DserviceName=search-lov-v1 -DserviceDescription="Returns a list of Classes and Properties that match a flat string search"

Change http://www.w3.org/ns/prov#Entity to http://provenanceweb.org/ns/pml#Query and http://www.w3.org/ns/prov#Answer, respectively in the .java.

We'll need to GSON the JSON response into a Java object.

Call the service with curl -H "Content-Type: application/rdf+xml" -d @movie.ttl.rdf http://localhost:8080/sadi-services/search-lov-v1

datasets-by-ckan-sparql-endpoint

e.g. Buil-Aranda et al. used http://datahub.io/api/2/search/resource?format=api/sparql&all_fields=1&limit=10000 to find all SPARQL endpoints listed on datahub.io

{
   "count":492,
   "results":[
      {
         "id":"bb268209-3d26-49c1-ab58-d46453192bb1",
         "resource_group_id":"00cf0fb4-2e22-9538-2f0e-df7043f92f4d",
         "url":"http://sgd.bio2rdf.org/sparql",
         "format":"api/sparql",
         "description":"SPARQL endpoint",
         "hash":"",
         "name":null,
         "resource_type":null,
         "mimetype":null,
         "mimetype_inner":null,
         "size":null,
         "created":null,
         "last_modified":null,
         "cache_url":null,
         "cache_last_updated":null,
         "webstore_url":null,
         "webstore_last_updated":null,
         "position":0,
         "package_id":"089742e2-8df6-4009-ad69-65172968a5bb",
         "tracking_summary":{
            "total":5,
            "recent":3
         }
      },
      {
         "id":"8623a25b-bc4b-4d39-81b1-9fd5247315c1",
         "resource_group_id":"01122cb7-d173-48c1-ac7a-3913903dd7b0",
         "url":"http://opendatacommunities.org/sparql",
         "format":"api/sparql",
         "description":"SPARQL endpoint deref-vocab ",
         "hash":"",
         "name":null,
         "resource_type":null,
         "mimetype":null,
         "mimetype_inner":null,
         "size":null,
         "created":null,
         "last_modified":null,
         "cache_url":null,
         "cache_last_updated":null,
         "webstore_url":null,
         "webstore_last_updated":null,
         "position":3,
         "package_id":"b0137f1f-2990-4075-8a74-31c8ded47191",
         "tracking_summary":{
            "total":21,
            "recent":3
         }
      },

Create the SADI service:

mvn ca.wilkinsonlab.sadi:sadi-generator:generate-service -DserviceClass=edu.rpi.tw.data.quality.sadi.faqt.discovery.endpoints.CKANSPARQLEndpoints -DinputClass=http://www.w3.org/ns/prov#Entity -DoutputClass=http://www.w3.org/ns/prov#Entity -DcontactEmail=lebot@rpi.edu -DserviceName=ckan-list-sparql-endpoints -DserviceDescription="Returns a list of the SPARQL endpoints listed in the given CKAN instance."

sameas-org

mvn ca.wilkinsonlab.sadi:sadi-generator:generate-service -DserviceClass=edu.rpi.tw.data.linking.sameas.SameAsDotOrg -DinputClass=http://www.w3.org/2000/01/rdf-schema#Resource -DoutputClass=http://www.w3.org/2000/01/rdf-schema#Resource -DcontactEmail=lebot@rpi.edu -DserviceName=sameas-org -DserviceDescription="Returns owl:sameAs for the given URIs, based on sameas.org."

http://stackoverflow.com/questions/18252458/use-apache-jena-to-get-rdf-from-url

Preliminaries at https://github.com/timrdf/DataFAQs/wiki/sameas.org-store-datafaqs

point-in-country

edu.rpi.tw.data.geo.PointInCountry

git2prov

http://git2prov.org (Tom's page) will return a PROV description of the commits in a Git repository. Give it https: URLs, not git@ URLs.

Invoke the SADI service wrapper with:

curl -sH "Content-Type: application/rdf+xml" -d @opendap.ttl.rdf http://opendap.tw.rpi.edu/sadi-services/git2prov

Enhancements to the git2prov service, [to be] provided by the SADI wrapper:

(When talking about a Github Repo...)

  1. Use the GitHub URL for the general file.
@prefix result: 
<http://git2prov.org/git2prov?giturl=https%3A%2F%2Fgithub.com%2Ftetherless-world%2Fopendap.git&serialization=PROV-O#> .

#
# The following (version-less) file path is actually 
# https://github.com/tetherless-world/opendap/blob/master/.gitignore
# raw:
# https://raw.github.com/tetherless-world/opendap/master/.gitignore
#
result:file--gitignore 
    a prov:Entity ;
    rdfs:label ".gitignore"@en .

#
# The following file version is actually 
# https://github.com/tetherless-world/opendap/blob/27d235ea566148d0980eda1b230c83090b8b9bd9/.gitignore
# raw:
# https://raw.github.com/tetherless-world/opendap/27d235ea566148d0980eda1b230c83090b8b9bd9/.gitignore
#
result:file--gitignore_commit-27d235ea566148d0980eda1b230c83090b8b9bd9
    a prov:Entity ;
    prov:qualifiedAttribution [
        a prov:Attribution, "authorship"@en ;
        prov:agent result:user-Tim-Lebo
    ] ;
    prov:qualifiedGeneration [
        a prov:Generation ;
        prov:activity result:commit-27d235ea566148d0980eda1b230c83090b8b9bd9 ;
        prov:atTime "2013-12-21T19:09:41.000Z"^^xsd:dateTime
    ] ;
    prov:specializationOf result:file--gitignore ;
    prov:wasAttributedTo result:user-Tim-Lebo ;
    prov:wasGeneratedBy result:commit-27d235ea566148d0980eda1b230c83090b8b9bd9 .

#
# The following commit is actually
# https://github.com/tetherless-world/opendap/commit/27d235ea566148d0980eda1b230c83090b8b9bd9
#
result:commit-758c0b46960303157f1f7acd3df8ebde6166697b
    a prov:Activity ;
    rdfs:label "git2prov applied to opendap git repo"@en ;
    prov:endedAtTime "2013-12-21T22:31:06.000Z"^^xsd:dateTime ;
    prov:qualifiedAssociation [
        a prov:Association ;
        prov:agent result:user-Tim-Lebo ;
        prov:hadRole "author, committer"@en
    ] ;
    prov:startedAtTime "2013-12-21T22:31:06.000Z"^^xsd:dateTime ;
    prov:wasAssociatedWith result:user-Tim-Lebo .
  1. Use Nepomuk instead of RDFS for the file name

See http://www.semanticdesktop.org/ontologies/nfo/#fileName

  1. Model the Git repo as a prov:Collection of the files
<https://github.com/tetherless-world/opendap.git>
    a prov:Collection, doap:GitRepository;
    prov:hadMember result:file--gitignore,
                   result:file-lodspeakr-settings-inc-php;
.

(and, for that matter, where's the versioned repository?)

<https://github.com/tetherless-world/opendap/tree/758c0b46960303157f1f7acd3df8ebde6166697b>
   a prov:Collection, doap:GitRepository;
   prov:specializationOf <https://github.com/tetherless-world/opendap.git>;
   prov:hadMember <https://github.com/tetherless-world/opendap/blob/758c0b46960303157f1f7acd3df8ebde6166697b/.gitignore>;
.
  1. Model the file containment relation.

http://www.semanticdesktop.org/ontologies/nfo/#belongsToContainer

Clone this wiki locally