Skip to content

SADI Semantic Web Services framework

Tim L edited this page Jul 5, 2014 · 134 revisions

What's first

What we'll cover

This page provides some developer notes that try to help those that want to write a SADI service. Although the focus initial was on using the python implementation, there are some bits about using java that will evolve. We originally used python because we've found it much easier to develop, maintain, and deploy. However, we've recently decided to switch to Java to implement SADI services because the python stack has been too brittle in our uses (e.g., it can't parse non-ASCII RDF, it fails to execute SPARQL-as-string in SuRF, and it "randomly" hits AttributeError:n3 on larger POSTed inputs).

Let's get to it

The SADI Semantic Web-Services framework is web services done right, and we're excited to incorporate it as a fundamental design element for DataFAQs. This page provides some information about how to set up your own SADI service, and thus a DataFAQs FAqT service.

For a walk through on how to talk to an existing SADI service, see the page on csv2rdf4lod's wiki.

sadi.py

The following technologies are stacked together to create your sadi.py service:

  • your sadi.py service
  • sadi.py
  • SuRF
  • rdflib
  • python

Jim McCusker contributed a python implementation to the SADI code base, adding a third language to the two that already exist (Java and Perl).

The following command will add Jim McCusker's sadi.py into your Python installation (but make sure you reference the latest egg, listed here):

sudo easy_install http://sadi.googlecode.com/files/sadi-0.1.5-py2.6.egg

If you want to build sadi.py yourself, use:

svn checkout http://sadi.googlecode.com/svn/trunk/python/sadi.python sadi.python
cd sadi.python
python setup.py bdist_egg
sudo easy_install dist/sadi-0.1.4-py2.6.egg

Bug Jim on http://code.google.com/p/sadi/issues/list with "sadi.py" problems.

To check to see what version of sadi.py you have installed:

bash-3.2$ easy_install -n sadi
Searching for sadi
Best match: sadi 0.1.2
Processing sadi-0.1.2-py2.6.egg
sadi 0.1.2 is already the active version in easy-install.pth

Using /Library/Python/2.6/site-packages/sadi-0.1.2-py2.6.egg
Processing dependencies for sadi
Finished processing dependencies for sadi

sadi.py services accept turtle when HTTP request header "Content-Type" is text/turtle (preferred) or application/x-turtle (will work).

SuRF

SuRF is an object mapping library that lets you work with RDF data as if they were Python objects. SuRF has a Google Code project and list, but is primarily documented at http://packages.python.org/SuRF/. SuRF provides a handful of vocabulary namespaces by default.

One essential plug in for SuRF is one that fulfills the SPARQL Protocol. If you start using it with code like:

        self.logd = Store(  reader          =   "sparql_protocol",
                            writer          =   "sparql_protocol",
                            endpoint        =   "http://logd.tw.rpi.edu:8890/sparql")

you might get the error:

<class 'surf.plugin.manager.PluginNotFoundException'>: The <sparql_protocol> READER plugin was not found

To resolve it, run sudo easy_install -U surf.sparql_protocol (according to their docs):

Cosmin May 2012: You can submit all your questions to the SuRF mailing list: https://groups.google.com/group/surfrdf Less severe issues are likely to be solved fairly quickly.

Access rdflib's graph from a SuRF store with store.reader.graph

rdflib

sadi.py and SURF build on top of rdflib, an RDF api for python. See

sadi.py's Turtle parse issue is fixable with --data-binary.

Python

See Python notes

Example SADI Service: Identified Resource -> SameAs Resource

running services/sadi/contextual-inverse-functional/contextual-inverse-functional.rpy:

python contextual-inverse-functional.rpy

Will launch the service at http://localhost:9090/ContextualInverseFunctional

Calling the service with one of its examples:

curl -LO https://raw.github.com/timrdf/DataFAQs/master/services/sadi/contextual-inverse-functional/sample-inputs/myPa.ttl
curl -H "Content-Type: text/turtle" -d @myPa.ttl http://localhost:9090/ContextualInverseFunctional

will return owl:sameAs triples to instances from a query against LOGD's SPARQL endpoint (A few commented lines show an alternative way to draw from the Turtle file http://homepages.rpi.edu/~lebot/lod-links/state-fips-dbpedia.ttl instead).

@prefix owl: <http://www.w3.org/2002/07/owl#> .

<http://example.org/id/myPA#myPA> 
  a <http://purl.org/twc/ontology/cif.owl#SameResource>;
  owl:sameAs 
<http://dbpedia.org/resource/Pennsylvania>,
<http://logd.tw.rpi.edu/id/us/state/Pennsylvania>,
<http://logd.tw.rpi.edu/source/data-gov/dataset/1146/value-of/state_abbrv/PA>,
<http://logd.tw.rpi.edu/source/data-gov/dataset/1148/value-of/state_abbrv/PA>,
<http://logd.tw.rpi.edu/source/data-gov/dataset/1149/value-of/state_abbrv/PA>,
<http://logd.tw.rpi.edu/source/data-gov/dataset/1292/value-of/lstate09/PA>,
<http://logd.tw.rpi.edu/source/data-gov/dataset/1292/value-of/mstate09/PA>,
<http://logd.tw.rpi.edu/source/data-gov/dataset/1330/District-size-order/value-of/state/PA>,
<http://logd.tw.rpi.edu/source/data-gov/dataset/1356/typed/state_abbreviation/PA>,
<http://logd.tw.rpi.edu/source/data-gov/dataset/1536/typed/state/PA>,
<http://logd.tw.rpi.edu/source/data-gov/dataset/1930/typed/state/PA>,
<http://logd.tw.rpi.edu/source/data-gov/dataset/1930/value-of/candidate_state/PA>,
<http://logd.tw.rpi.edu/source/data-gov/dataset/353/typed/state/PA>,
<http://logd.tw.rpi.edu/source/eia-doe-gov/dataset/fossil_fuel_consumption/2001_2007_final/value-of/state/PA>,
<http://logd.tw.rpi.edu/source/eia-doe-gov/dataset/fossil_fuel_consumption/2008_final/value-of/state/PA>,
<http://logd.tw.rpi.edu/source/eia-doe-gov/dataset/fossil_fuel_consumption/2009_2010_preliminary/value-of/state/PA>,
<http://logd.tw.rpi.edu/source/eia-doe-gov/dataset/net_generation/2003_2004_final/value-of/state/PA>,
<http://logd.tw.rpi.edu/source/eia-doe-gov/dataset/net_generation/2005_2007_final/value-of/state/PA>,
<http://logd.tw.rpi.edu/source/eia-doe-gov/dataset/net_generation/2008_final/value-of/state/PA>,
<http://logd.tw.rpi.edu/source/eia-doe-gov/dataset/net_generation/2009_2010_preliminary/value-of/state/PA>,
<http://logd.tw.rpi.edu/source/epa-gov/dataset/crn-stations/value-of/state/PA>,
<http://logd.tw.rpi.edu/source/ncdc-noaa-gov/dataset/us-climate-reference-network/value-of/state/PA>,
<http://logd.tw.rpi.edu/source/nci-nih-gov/dataset/tobacco-law-coverage/table1-anrf-zt/typed/state/PA>,
<http://logd.tw.rpi.edu/source/nci-nih-gov/dataset/tobacco-law-coverage/table2-anrf/typed/state/PA>,
<http://logd.tw.rpi.edu/source/nci-nih-gov/dataset/tobacco-law-coverage/table3-anrf/typed/state/PA>,
<http://logd.tw.rpi.edu/source/nitrd-gov/dataset/nsf_awards/typed/state/PA>,
<http://sws.geonames.org/6254927/>,
<http://www.rdfabout.com/rdf/usgov/geo/us/PA> .

Sample FAqT deployment describes how to deploy this service using twistd.

Other example python SADI services

Jim started a collection of services for LOBD in its google code svn.

http://sadiframework.org/registry/ allows others to submit SADI service URIs, whose descriptions are available from a SPARQL endpoint in the graph named <http://sadiframework.org/registry/>. A wrapper to that endpoint is available here. SADI services registered at http://sadiframework.org/registry/ are listed at http://sadiframework.org/registry/services. See also the Resources tab.

Debugging in the python interpreter

bash-3.2$ python
Python 2.7.1 (r271:86832, Jun 16 2011, 16:59:05) 
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from surf import *
>>> store = Store(reader="rdflib", writer="rdflib", rdflib_store="IOMemory")
>>> session = Session(store)
>>> store.reader.graph.parse(open('arrayexpress-e-afmx-1.ttl'),format='n3')
<Graph identifier=_5ed9652e-1b37-4793-8b4a-f61edb081bb6 (<class 'rdflib.graph.Graph'>)>
>>> query='''prefix void:     <http://rdfs.org/ns/void#>
... prefix datafaqs: <http://purl.org/twc/vocab/datafaqs#>
... prefix dcterms:  <http://purl.org/dc/terms/>
... select distinct ?group
... where { 
...    <http://thedatahub.org/en/dataset/arrayexpress-e-afmx-1> 
...       a datafaqs:CKANDataset; 
...       dcterms:isPartOf ?group .
...    ?group a datafaqs:CKANGroup .
... }
... '''
>>> results = store.execute_sparql(query)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Python/2.7/site-packages/SuRF-1.1.4_r352-py2.7.egg/surf/store.py", line 200, in execute_sparql
    return self.reader.execute_sparql(sparql_query, format = format)
  File "/Library/Python/2.7/site-packages/surf.rdflib-1.0.0_r338-py2.7.egg/surf_rdflib/reader.py", line 87, in execute_sparql
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 326, in loads
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 360, in decode
TypeError: expected string or buffer

SADI using sadi.py (take 2)

Jim updated sadi.py to eliminate SuRF dependencies. Instead, it uses just rdflib 4. He still points to https://code.google.com/p/sadi/wiki/BuildingServicesInPython for its documentation, and suggests to use virtualenv.

Flask http://flask.pocoo.org/docs/

virtualenv —-no-site-packages MY_NEW_SADI_ENV_DIR

SADI using Java

(jump ahead to Using Java take 2)

The SADI folks offer a tutorial for setting up a sadi service implemented in Java. It is the best place to start.

The following steps use the skeleton they provide to recreate a VisKO service that converts postscript to pdf. The steps avoid using Eclipse because the maven run configuration that the SADI folks offer isn't appearing.

Step 1: Grab the skeleton and uncompress it.

Step 2: cd sadi-services and create templated Java by running the following maven command. I reuse class names for service names to keep things straightforward. serviceClass is the Java class that will be created (It creates the corresponding directory structure, too). inputClass becomes a @InputClass Java annotation; outputClass, @OutputClass; contactEmail, @ContactEmail, and serviceName, @Name.

mvn ca.wilkinsonlab.sadi:sadi-generator:generate-service       \
  -DserviceClass=edu.rpi.tw.test.data.document.PostscriptToPDF \
  -DinputClass=https://raw.github.com/timrdf/vsr/master/ontologies/vsr.ttl.owl#Postscript \
  -DoutputClass=https://raw.github.com/timrdf/vsr/master/ontologies/vsr.ttl.owl#PDF       \
  -DcontactEmail=lebot@school.edu \
  -DserviceName=PostscriptToPDF

All options:

sadi-generator:generate-service
  A goal that generates the skeleton of a SADI service.
  This goal has the following parameters:
    serviceName
      The name of the service, which will also be used in the path to the
      service servlet. This parameter is required.
    serviceClass
      The fully-qualified name of the Java class that will implement the
      service. This parameter is required.
    serviceURL
      The URL of the service. This parameter is optional and not normally
      required, except in certain baroque network configurations.
    serviceRDF
      A URL or local path to a service description in RDF. This parameter
      is optional, but can be used instead of specifying all of the other
      parameters separately.
    serviceDescription
      The service description. This parameter is optional.
    serviceProvider
      The service provider. This parameter is optional.
    contactEmail
      A contact email address for the service. This parameter is required.
    authoritative
      Whether or not the service is authoritative. This parameter is
      optional, defaulting to false.
    async
      Whether or not the service is asynchronous.  This parameter is
      optional, defaulting to false.
    inputClass
      The URI of the service input class. This parameter is required and
      the URI must resolve to an OWL class definition.
    outputClass
      The URI of the service output class. This parameter is required and
      the URI must resolve to an OWL class definition.
    parameterClass
      The URI of the service parameter class. This parameter is optional,
      but if specified the URI must resolve to an OWL class definition.

Step 3: Add implementation to the java file that maven just created. At this point, you'll need to know the Jena API. Hopefully I'll add Sesame support soon.

vi src/main/java/edu/rpi/tw/test/data/document/PostscriptToPDF.java
   @Override
   public void processInput(Resource input, Resource output)
   {
      /* your code goes here
       * (add properties to output node based on properties of input node...)
       */
      Resource newPDF = input.getModel().createResource("http://example.org/newly-created-PDF-from-given-PS.pdf");
      input.addProperty(Vocab.alternateOf, newPDF);
   }

Step 4: Make sure any RDF vocabulary you created is resolvable to an RDFS/OWL description.

Step 5: Compile and deploy the service

mvn org.mortbay.jetty:jetty-maven-plugin:run

Step 6: See service listed at http://localhost:8080/sadi-services/

Step 7: Invoke the service

curl -d @sample-input-for-PostscriptToPDF.ttl.rdf http://localhost:8080/sadi-services/PostscriptToPDF

sample-input-for-PostscriptToPDF.ttl.rdf:

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF 
 xmlns:nie="http://www.semanticdesktop.org/ontologies/2007/01/19/nie#" 
 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
 xmlns:vsr="https://raw.github.com/timrdf/vsr/master/ontologies/vsr.ttl.owl#">
  <rdf:Description rdf:about="http://www.adobetutorialz.com/content_images/AdobeTechnologies/PostScript/manylines.ps">
    <rdf:type rdf:resource="https://raw.github.com/timrdf/vsr/master/ontologies/vsr.ttl.owl#Postscript"/>
  </rdf:Description>
  <rdf:Description rdf:about="http://www.adobetutorialz.com/content_images/AdobeTechnologies/PostScript/manylines.ps">
    <nie:mimeType>application/postscript</nie:mimeType>
  </rdf:Description>
</rdf:RDF>

rapper -g -o rdfxml sample-input-for-PostscriptToPDF.ttl > sample-input-for-PostscriptToPDF.ttl.rdf

sample-input-for-PostscriptToPDF.ttl

@prefix nie: <http://www.semanticdesktop.org/ontologies/2007/01/19/nie#> .
@prefix vsr: <https://raw.github.com/timrdf/vsr/master/ontologies/vsr.ttl.owl#> .

<http://www.adobetutorialz.com/content_images/AdobeTechnologies/PostScript/manylines.ps> 
   a vsr:Postscript;
   nie:mimeType "application/postscript";
.

The test-service target is worth looking into...

$ mvn ca.wilkinsonlab.sadi:sadi-tester:test-service 
  -DserviceURL=http://localhost:8080/sadi-services/hello 
  -Dinput=http://sadiframework.org/test/hello-input.rdf 
  -Dexpected=http://sadiframework.org/test/hello-output.rdf

When using Eclipse Indigo, change the type doc/generate sadi service.launch to: <launchConfiguration type="org.eclipse.m2e.Maven2LaunchConfigurationType"> This is because the zip that the SADI people offer is for older Eclipse.

Thanks to Nick del Rio for his create war.launch. Plop it into your doc/eclipse/ directory (next to generate sadi service.launch from the sadi-service-skeleton-0.1.1-e3.7.zip), refresh within eclipse, and it'll be available from Run -> Run Configurations... -> Maven Build.

After dropping target/sadi-services.war into a tomcat webapps/ directory, http://localhost:8080/sadi-services/ will list the services.

SADI using Java take 2

Reimplementing lift-ckan.py, from projects/DataFAQs/github/DataFAQs/src/java/sadi-services, run the mvn "generate-service" target like below. You can edit create-new-sadi-java-file and just source it. Unfortunately, the classes that you specify MUST resolve with an OWL description. So, if the class that you use doesn't, just use rdfs:Resource for both and change it in the Java afterwards.

mvn ca.wilkinsonlab.sadi:sadi-generator:generate-service /
  -DserviceName=lift-ckan /
  -DserviceClass=edu.rpi.tw.data.quality.sadi.ckan.LiftCKAN /
  -DinputClass=http://purl.org/twc/vocab/datafaqs#CKANDataset /
  -DoutputClass=http://purl.org/twc/vocab/datafaqs#CKANDataset /
  -DcontactEmail=lebot@rpi.edu

webapp/WEB-INF/web.xml maps the requested URL to the Java class name using the following two snippets. For example, this enables requests to http://localhost:8080/sadi-services/faqt/sparql-service-description/named-graphs to invoke the processInput(Resource input, Resource output) method on the Java class edu.rpi.tw.data.quality.sadi.faqt.sparql_service_description.NamedGraphs.

  <servlet-mapping>
    <servlet-name>named-graphs</servlet-name>
    <url-pattern>/faqt/sparql-service-description/named-graphs</url-pattern>
  </servlet-mapping>
  ...
  <servlet>
    <servlet-name>named-graphs</servlet-name>
    <servlet-class>edu.rpi.tw.data.quality.sadi.faqt.sparql_service_description.NamedGraphs</servlet-class>
  </servlet>
mvn ca.wilkinsonlab.sadi:sadi-generator:generate-service -DserviceClass=edu.rpi.tw.data.quality.sadi.faqt.lodcloud.basic.TaggedWithLOD -DinputClass=http://purl.org/twc/vocab/datafaqs#CKANDataset -D
outputClass=http://purl.org/twc/vocab/datafaqs#Evaluated -DcontactEmail=lebot@rpi.edu -DserviceName=tagged-with-lod

mvn ca.wilkinsonlab.sadi:sadi-generator:generate-service -DserviceClass=edu.rpi.tw.data.quality.sadi.faqt.lodcloud.minimal.TaggedWithTopic -DinputClass=http://purl.org/twc/vocab/datafaqs#CKANDatase
t -DoutputClass=http://purl.org/twc/vocab/datafaqs#Evaluated -DcontactEmail=lebot@rpi.edu -DserviceName=tagged-with-topic

Project-level workflow to develop a Java SADI service:

Issues

SADI people don't like using issues to track unresolved "questions". So I need to keep track of the emails here.

Open:

Not critical:

Resolved:

Comparing SADI to other web services

What's next?

If you're planning to just use existing evaluation services:

  • Skip ahead to see how to set up a FAqT Brick and get some results asap.

If you're trying to write an evaluation service:

  • FAqT Service will describe how to steal our template to create an evaluation service that others can call.
  • Sample FAqT deployment will describe how to deploy the SADI service that you just developed.
  • GSON is a very nice Java library to map JSON into Java objects (and vice versa).
Clone this wiki locally