Skip to content

harsh9t/sparql-to-gremlin

 
 

Repository files navigation

This is an continuous effort towards enabling automatic support for executing SPARQL queries over Graph systems via Gremlin query language. This is achieved by converting sparql queries to gremlin pattern matching traversals/queries.

We would like to acknowledge Daniel Kupitz who laid the early foundation of work that follows. Many thanks getting us started three-cheers :)

This work is a sub-task of a bigger goal: LITMUS, an open extensible framework for benchmarking diverse data management solutions Proposal - [https://arxiv.org/pdf/1608.02800.pdf] | First working prototype - [https://hub.docker.com/r/litmusbenchmarksuite/litmus/]

##The proposed extentions are listed as follows:

  1. enable support for Union queries [Done]

  2. enable support for Order-By queries [Done]

  3. enable support for Group-By queries [Done]

  4. enable support for LIMIT-OFFSET modifiers [Done]

  5. adding support for ASK queries [Pending, Postponed temporarily]

  6. enable support (translation) for BSBM dataset [http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/] (exeuting SPARQL queries over BSBM dataset [property-graph] ) [Done]

  7. enable support (translation) for Northwind dataset (SPARQL queries over Northwind dataset [property-graph] ) [Done]

  8. enable support for dataset independent query translation [work in progress] (This is allow a user to execute SPARQL queries over almost any dataset without being bothered about the internal mappings and configuration settings)

  9. enable support (translation) of QALD datasets (SPARQL queries over DBpedia) [work in progress]

Note: SPARQL-to-Gremlin work is currently under progress

SPARQL-Gremlin

SPARQL-Gremlin

SPARQL-Gremlin is a compiler used to transform SPARQL queries into Gremlin traversals. It is based on the Apache Jena SPARQL processor ARQ, which provides access to a syntax tree of a SPARQL query.

The current version of SPARQL-Gremlin only uses a subset of the features provided by Apache Jena. The examples below show each implemented feature.

Quick Start

Console Application

The project contains a console application that can be used to compile SPARQL queries and evaluate the resulting Gremlin traversals. For usage examples simply run ${PROJECT_HOME}/bin/sparql-gremlin.sh.

Gremlin Shell Plugin

To use Gremlin-SPARQL as a Gremlin shell plugin, run the following commands (be sure sparql-gremlin-xyz.jar is in the classpath):

gremlin> :install com.datastax sparql-gremlin 0.1
==>Loaded: [com.datastax, sparql-gremlin, 0.1]
gremlin> :plugin use datastax.sparql
==>datastax.sparql activated

Once the plugin is installed and activated, establish a remote connection to execute SPARQL queries:

gremlin> :remote connect datastax.sparql graph
==>SPARQL[graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]]
gremlin> :> SELECT ?name ?age WHERE { ?person v:name ?name . ?person v:age ?age }
==>[name:marko, age:29]
==>[name:vadas, age:27]
==>[name:josh, age:32]
==>[name:peter, age:35]

Prefixes

SPARQL-Gremlin supports the following prefixes to traverse the graph:

Prefix Purpose

e:<label>

out-edge traversal

p:<name>

property traversal

v:<name>

property-value traversal

Note that element IDs and labels are treated like normal properties, hence they can be accessed using the same pattern:

SELECT ?name ?id ?label WHERE { ?element v:name ?name . ?element v:id ?id . ?element v:label ?label }

Examples

Select All

Select all vertices in the graph.
SELECT * WHERE {
}

Match Constant Values

Select all vertices with the label person.
SELECT * WHERE {
  ?person v:label "person"
}

Select Specific Elements

Select the values of the properties name and age for each person vertex.
SELECT ?name ?age
WHERE {
  ?person v:label "person" .
  ?person v:name ?name .
  ?person v:age ?age
}

Pattern Matching

Select only those persons who created a project.
SELECT ?name ?age
WHERE {
  ?person v:label "person" .
  ?person v:name ?name .
  ?person v:age ?age .
  ?person e:created ?project
}

Filtering

Select only those persons who are older than 30.
SELECT ?name ?age
WHERE {
  ?person v:label "person" .
  ?person v:name ?name .
  ?person v:age ?age .
  ?person e:created ?project .
    FILTER (?age > 30)
}

Deduplication

Select the distinct names of the created projects.
SELECT DISTINCT ?name
WHERE {
  ?person v:label "person" .
  ?person e:created ?project .
  ?project v:name ?name .
    FILTER (?age > 30)
}

Multiple Filters

Select the distinct names of all Java projects.
SELECT DISTINCT ?name
WHERE {
  ?person v:label "person" .
  ?person e:created ?project .
  ?project v:name ?name .
  ?project v:lang ?lang .
    FILTER (?age > 30 && ?lang == "java")
}

Pattern Filter

A different way to filter all person who created a project.
SELECT ?name
WHERE {
  ?person v:label "person" .
  ?person v:name ?name .
    FILTER EXISTS { ?person e:created ?project }
}
Filter all person who did not create a project.
SELECT ?name
WHERE {
  ?person v:label "person" .
  ?person v:name ?name .
    FILTER NOT EXISTS { ?person e:created ?project }
}

Meta-Property Access

SELECT ?name ?startTime
WHERE {
  ?person v:name "daniel" .
  ?person p:location ?location .
  ?location v:value ?name .
  ?location v:startTime ?startTime
}

Pattern Matching Union Queries

Select all persons who have developed a software in java using union.
SELECT * WHERE {
  {?person e:created ?software .}
  UNION
  {?software v:lang "java" .}
}

Pattern Matching using Query modifier - Order By

Select all vertices with the label person and order by their age.
SELECT * WHERE {
  ?person v:label "person" .
  ?person v:age ?age .
} ORDER BY (?age)

Pattern Matching using Query modifier - Group By

TBA

About

Gremlinator: An effort towards converting SPARQL queries to Gremlin Graph Pattern Matching Traversals

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 97.6%
  • Groovy 2.4%