Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose a (JENA) SPARQL Extension #116

Open
Aklakan opened this issue Nov 8, 2017 · 3 comments
Open

Expose a (JENA) SPARQL Extension #116

Aklakan opened this issue Nov 8, 2017 · 3 comments

Comments

@Aklakan
Copy link
Contributor

Aklakan commented Nov 8, 2017

While thinking about certain data integration tasks, I had the idea to just do everything using (Jena) SPARQL extensions, such as done at this project. Note, that Jena's extension system supports maven dependencies to register their own SPARQL extensions simply by including them - no further code necessary. This works by specifying a start-up class in the file src/main/resources/META-INF/services/org.apache.jena.system.JenaSubsystemLifecycle.

In principle, there could be a limes integration, where a pseudo SERVICE url is used to invoke limes.
The body of the LIMES service would contain the configuration, i.e. the two concepts which to interlink, the properties to base the metrics expression on, the metric expression, and the threshold.
In principle it could look something like:

SELECT ?x ?y {
    SERVICE <http://limes> {
        SERVICE <http://dbpedia.org/sparql> { ?x a :Airport ; rdfs:label ?xl }
        SERVICE <http://linkedgeodata.org/sparql> { ?y a :Aerodrome ; rdfs:label ?yl }
        FILTER(limes:trigrams(?xl, ?yl) > 0.9)
    }
}

What do you think about this?

@Aklakan
Copy link
Contributor Author

Aklakan commented May 6, 2018

I made progress on this issue, and I have the first link spec running via SPARQL.

As interlinking is conceptually just created a cartesian product between the entities (or records) of two sources, it can be represented in SPARQL as a JOIN and a FILTER on the condition.
So in principle all interlinking of LIMES could be done only with SPARQL and some function extensions to compute the metrics. However, limes promises to speed this process up by clever indexing.
Ideally, the following example below could be run with and without the SERVICE <plugin://limes> { ... } wrapper - whereas the former case should deliver the better performance using LIMES, and the latter rather naively constructs the cartesian product and is thus in accordance with the formal approach to interlinking. In order to make that happen, all metric and conversion functions would have to be registered to Jena's function library.

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX geom: <http://geovocab.org/geometry#>
PREFIX geos: <http://www.opengis.net/ont/geosparql#>
PREFIX lgdo: <http://linkedgeodata.org/ontology/>
PREFIX plugin: <plugin://>

SELECT * {
  SERVICE <plugin://limes> { ## Ideally the sparql query would still run without this SERVICE keyword
  	SERVICE <http://linkedgeodata.org/sparql> {
  	  ?x a lgdo:RelayBox ; geom:geometry/geos:asWKT ?xl .
  	}

  	SERVICE <http://linkedgeodata.org/sparql> {
  	  ?y a lgdo:RelayBox ; geom:geometry/geos:asWKT ?yl .
  	} 
  	
  	FILTER(plugin:geo_hausdorff(?xl, ?yl) < 0.0001)
  }
}

So I added two components:

  • A property function that performes the execution of a link spec with LIMES using ?s plugin:limes (?o ?confidence "<LIMES>...</LIMES>") - this yields bindings for all pairs ?s and ?o together with their confidence score (?confidence is optional - the ?o and spec args are mandatory).

  • A syntactic transformer that rewrites the content of a SERVICE <plugin://limes> { ... } element to the aforementioned property function and thus builds the (XML) LIMES spec.
    By convention, the first variable encountered in subject position becomes the - what I call - concept variable - i.e. the attribute of the <VAR> element. The property paths that LIMES excepts are computed from the variables mentioned in the metric expression via Bellmann Ford via JgraphT via my Jena-JgraphT binding project on the respective BGPs.

  • Original Limes Example

  • LIMES integration via the SERVICE element

  • LIMES integration via rewrite of SERVICE element to a property function

The namespace and expression handling of limes appears quite awkward - i.e. needlessly complex - to me. For example function chains could be readily converted to jena expressions; e.g. property AS lowercase->someOtherFunc RENAME x could be represented in plain SPARQL as `BIND(plugin:someOtherFunc(plugin:lowercase(?o)) AS ?x) which allows direct reuse of Jena's ARQ machinery.
So I have not yet understood the generic procedure to convert all limes transformations to SPARQL syntax.

@kvndrsslr
Copy link
Contributor

I am interested in this, thanks @Aklakan for the suggestion (and sorry for the years late response haha).
As I am about to rewrite large portions of LIMES for v2, this will be on my wish list.
Also thinking about integration with dcat-suite for dataset management and sparql-integrate to extend the input options!

@kvndrsslr kvndrsslr self-assigned this Jul 2, 2020
@kvndrsslr kvndrsslr added this to the Release 2.0 milestone Jul 2, 2020
@kvndrsslr kvndrsslr changed the title SPARQL Integration? Expose a (JENA) SPARQL Extension Jul 2, 2020
@Aklakan
Copy link
Contributor Author

Aklakan commented Mar 29, 2022

Just wondering whether there were any updates in that direction? In my group we are currently working on linking tasks and maybe it'd be worthwhile to pursue this topic again - especially considering that by now I have written several jena sparql extensions for e.g. the rdf-processing-toolkit in order to represent rather sophisticated data integration tasks as sequences of sparql statements - and linking is still on my wishlist. I need to check how much of my old code that integrated limes as a sparql service clause is still compatible with the current design of limes - but maybe after all this years I could finally provide a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants