Skip to content

kendraio/kendra_signpost

Repository files navigation

Kendra Signpost

Overview

A common and overarching problem in the field of Semantic Web development is the mapping of ontologies and vocabularies. Many ontologies, schemas, and vocabularies exist for classification, data exchange, and machine learning purposes, but often the transformations from one ontology to another is lossy or inexact. Kendra Signpost system that was developed as part of P2P-Next was used to manage large data sets of media metadata, defining an architecture for importing data in various formats and providing mappings between them.

The mappings defined in Kendra Signpost allow data from various sources and formats to be integrated for reasoning and inference, and entities from one domain can be used in queries from another. For example, media metadata fields corresponding to a video on YouTube may be mapped to equivalent or similar fields from MPEG7/21 metadata or metadata coming from another video-related SNS like Vimeo or Blip.tv. This feature is relevant to the complementary Kendra Match component as mappings between two networks or profiles can be extended to other networks, thereby improving the quality and richness of user and content recommendations.

In addition to a public-facing web server for accessing Kendra Signpost (Apache with mod_php for the purposes of the prototype), a relational database for storage of content and configuration (MySQL), an RDF triple store for storage of faceted metadata (OpenLink Virtuoso), and a search and retrieval system for indexing the metadata (Apache Solr), Kendra Signpost is comprised of several modules which are relevant to SARACEN, implemented as Drupal Features:

  • Kendra Signpost Mapper is a generalised tool for mapping metadata for content owners that includes mapping to the P2P-Next Rich Metadata schema. Kendra Signpost is generalised in order to drive adoption: content owners can use the Kendra Signpost mapping tools to link their content to other popular metadata mappings, thereby increasing content visibility and discovery. The server-side mapping and inference technology included in the other components of Kendra Signpost are useful when combining data from multiple sources, formats, and vocabularies, such as user profile data from Facebook, Twitter, Google, and OpenSocial.

  • Kendra Signpost Inference Engine is a Python tool that acts as a proxy between the portal hosting the uploaded catalogues, the RDF triple store containing the metadata extracted from catalogues, and the client-side tools that can run queries against the inferred relations generated by the mapping tool.

  • Kendra Signpost Search is a public-facing query builder that allows end users to quickly build queries to find the content that fits their exact criteria, based on the faceted metadata stored within the system. Building queries using a flexible interface in JSON-LD format, complex queries may be assembled with minimal knowledge of the underlying data structures or vocabularies.

Kendra Signpost is free software developed by Kendra Foundation in collaboration with P2P-Next and co-funded by the European Union under the Seventh Framework programme.

Introduction

This is the full Drupal install for the Kendra Signpost Trial website, including Drupal core, required modules, and Kendra Signpost specific modules and configuration.

System requirements

To run a full Kendra Signpost stack, the following components are required:

  • LAMP (Linux/Apache/MySQL/PHP) server for running the Kendra Signpost site
  • Python for running the inference proxy layer
  • Solr for powering the search services
  • Virtuoso for storage of RDF data
  • A Git client - for checking out the code from github.

Installation

These are the steps to get your own Kendra Signpost server up and running:

  1. Download the Kendra Signpost Trial code from the github repository. If you want to make a working copy (in order to push back changes to github) then use git clone git@github.com:kendrainitiative/kendra_signpost_trial.git or if you just want to download the code to run tests, use wget http://github.com/kendrainitiative/kendra_signpost_trial/tarball/master.
  2. Create an empty MySQL database.
  3. In the sites/default folder, prepare for installation by copying default.settings.php to settings.php and making it writable by the webserver, and create a folder sites/default/files and make this writable by the webserver.
  4. Run the Drupal installer, using install.php and selecting the Kendra Signpost Trial install profile. You will need to enter database connection settings and create an admin account.
  5. Configure the connection settings for Virtuoso.

Apache and Python configuration

Configure Apache to execute the python scripts when called. There are rules in the .htaccess file to deal with the rewriting of Solr requests to use the proxy.

Installation instructions for Solr and Kendra Signpost

The instructions below offer a step by step procedure for installing Apache Solr for use with the Kendra Signpost Trial.

install Java

apt-get install openjdk-6-jdk

download and install Apache Solr

wget http://apache.dataphone.se//lucene/solr/1.4.1/apache-solr-1.4.1.tgz
tar xzvf apache-solr-1.4.1.tgz 
cd apache-solr-1.4.1/example/solr/conf/

Backup default configuration files.

mv schema.xml schema.bak
mv solrconfig.xml solrconfig.bak

install Drupal Solr module

cd /var/www/html/
drush dl apachesolr
cd /var/www/html/sites/all/modules/apachesolr

get the PHP Solr module

svn checkout -r22 http://solr-php-client.googlecode.com/svn/trunk/ SolrPhpClient

To prepare solr for use with Drupal we must copy over schema.xml and solrconfig.xml which comes with the solr Drupal module.

cp schema.xml solrconfig.xml /usr/local/src/apache-solr-1.4.1/example/solr/conf/

enable the Solr module

drush en apachesolr #    The following modules will be enabled: apachesolr, search

confirm Drupal solr configuration

Visit http://dev.kendra.org.uk/admin/settings/apachesolr in a browser.

start solr

cd /usr/local/src/apache-solr-1.4.1/example
java -jar start.jar

create an init script to start solr at boot time

cp /etc/init.d/skeleton /etc/init.d/solr
vim /etc/init.d/solr      # see attached configuration script
chmod 700 /etc/init.d/solr
rcconf    # look for the new solr service, select to start at boot time

Testing

The Kendra Signpost Trial includes a content type for uploading CSV data. An example file for testing is included with the source code in the examples folder.

After installation the front page prompts you to do the following:

  1. Configure the endpoint - this is done if you are using the default Kendra RDF repository.
  2. Upload a catalogue - click the link and upload thirdear.csv from the examples folder.
  3. Run CRON - RDF imports are queued to run on cron for better performance
  4. Browse to the catalogue page to see the result of a basic SPARQL query against the data.

Development

Changes to Drupal features should be exported to code using the features module and committed back to github. (t.b.c. - insert commands for doing so).

  • kendra_rdf = includes the commands to connect to the repository and send queries
  • kendra_uploads = includes the feature (cck + views), and some nodeapi stuff to process the uploaded files

Other modules/features should be added (for example for the search/query builder/mapping tool). These should be placed in separate modules and then activated by the install profile by adding them to _kendra_signpost_features() as found in kendra_signpost.profile.

License Terms

Code: Kendra Signpost is licensed under GPL, as per the Drupal licensing terms. Please see LICENSE.txt for more information.

Example metadata: Real World is pleased to be part of the Kendra Trials and has supplied metadata as part of these trials. The Real World Catalogue metadata is copyright Real World Records Ltd and the non exclusive license for its use during these trials is limited to experimental use within the Kendra environment with all other rights reserved to Real World Records Ltd. By accessing this data you accept those terms.