Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api-v2): Add an RDF processing façade (DSP-1020) #1754

Merged
merged 36 commits into from Nov 17, 2020

Conversation

benjamingeer
Copy link

@benjamingeer benjamingeer commented Nov 6, 2020

This PR adds an RDF processing façade in webapi/src/main/scala/org/knora/webapi/messages/util/rdf, with two different implementations (Jena and RDF4J).

The API

  • RdfModel, which represents a set of RDF graphs (a default graph and/or one or more named graphs)
  • RdfNode and its subclasses, which represent RDF nodes (IRIs, blank nodes, and literals)
  • Statement, which represents a triple or quad
  • RdfNodeFactory, which creates nodes and statements
  • RdfModelFactory, which creates empty RDF models
  • RdfFormatUtil, which parses and formats RDF
  • RdfFeatureFactory, which returns instances of RdfNodeFactory, RdfModelFactory, and RdfFormatUtil, using feature toggle configuration.

The implementations

  • The Jena-based implementation, in package org.knora.webapi.messages.util.rdf.jenaimpl
  • The RDF4J-based implementation, in package org.knora.webapi.messages.util.rdf.rdf4jimpl

The feature toggle

jena-rdf-library:

  • on means use the Jena implementation
  • off (the default) means use the RDF4J implementation, which was previously the main one used in Knora

Tasks

  • Add traits for the façade.
  • Wrap model building and querying functionality (RdfModel, RdfModelFactory, RdfNodeFactory).
  • Add a feature toggle and feature factory (RdfFeatureFactory).
  • Wrap formatting and parsing (RdfFormatUtil).
  • Use the façade:
    • JsonLDUtil
    • RouteUtilV2
    • KnoraRequestV2
    • KnoraResponseV2
    • HttpTriplestoreConnector:
      • SparqlConstructRequest
      • SparqlExtendedConstructRequest
  • Add featureFactoryConfig to everything that depends on CONSTRUCT requests (most of the changes in this PR)
  • Add abstract test classes, with subclasses that test using Jena and RDF4J:
    • RDF4JModelSpec
    • RDF4JFormatUtilSpec
    • JsonLDUtilSpec
    • KnoraResponseV2Spec
  • Update tests:
    • E2ESpec and subclasses.
    • R2RSpec and subclasses.
    • MetadataMessagesV2Spec
    • MetadataRouteV2E2ESpec
  • Add docs.
  • Clean up Bazel dependencies.

What still uses RDF4J directly

  • Things that use RDF4J's streaming API to process large amounts of data, especially to avoid constructing a large string in TriG format:
    • ProjectsResponderADM.projectDataGetRequestADM
    • HttpTriplestoreConnector.turtleToTrig
    • RepositoryUpdater
  • The repository update plugin tests, which use SPARQL
  • TEIHeader: uses XSLT that depends on the exact format of RDF/XML generated by RDF4J. The XSLT would need to be improved to handle rdf:Description.
  • GravsearchParser: uses RDF4J's SPARQL parser, not worth changing

TODO in a later PR

  • SHACL validation
  • SPARQL querying
  • A streaming parsing/formatting API for processing very large graphs

@benjamingeer benjamingeer marked this pull request as draft November 6, 2020 16:42
@benjamingeer benjamingeer self-assigned this Nov 6, 2020
@benjamingeer benjamingeer changed the title feat: Add an RDF processing façade (DSP-1020) feat(api-v2): Add an RDF processing façade (DSP-1020) Nov 6, 2020
@benjamingeer benjamingeer added API/V2 refactor clean up code labels Nov 6, 2020
@benjamingeer benjamingeer added API/Admin API/V1 enhancement improve existing code or new feature labels Nov 13, 2020
@benjamingeer benjamingeer removed the enhancement improve existing code or new feature label Nov 13, 2020
@benjamingeer benjamingeer marked this pull request as ready for review November 13, 2020 16:09
graphContents match {
case jsonLDArray: JsonLDArray =>
// Add each of the array's elements to the model.
for (elem <- jsonLDArray.value) {
elem match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment here: like "Is the element a JSON-LD Object? Yes. Add to the model. No. Invalid graph "

*/
object JsonLDConstants {
object JsonLDKeywords {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am so glad you renamed this.


case jsonLDArray: JsonLDArray =>
// It has more than one @type.
// More than one.
for (elem <- jsonLDArray.value) {
elem match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment here like: "Is each element of @type string?" Yes. Add the type to the model. No. Throw exception.

val thisModelNamedGraphIris: Set[jena.graph.Node] = datasetGraph.listGraphNodes.asScala.toSet
val thatModelNamedGraphIris: Set[jena.graph.Node] = thatDatasetGraph.listGraphNodes.asScala.toSet

// The two models are isomorphic if:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! without this explanation, I wouldn't have got this part.

val graph1LabelStatement = nodeFactory.makeStatement(
subj = nodeFactory.makeIriNode("http://example.org/6"),
pred = labelPred,
obj = nodeFactory.makeDatatypeLiteral(value = "Lucky's Discount X-Wing Repair", datatype = OntologyConstants.Xsd.String),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:-D

val graph2LabelStatement = nodeFactory.makeStatement(
subj = nodeFactory.makeIriNode("http://example.org/7"),
pred = labelPred,
obj = nodeFactory.makeDatatypeLiteral(value = "Mos Eisley Used Droids", datatype = OntologyConstants.Xsd.String),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Mos Eisley used droids that were not allowed in the Cantina"


// Compare that with the model generated by the JsonLDDocument.
jsonLDOutputModel should ===(jsonLDExpectedModel)
}

"correctly convert an RDF model to JSON-LD if it contains a circular reference" in {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent!

@SepidehAlassi
Copy link
Contributor

@benjamingeer This looks great, thanks for all your work.

@benjamingeer
Copy link
Author

Many thanks for reviewing this!

@benjamingeer benjamingeer merged commit 9170419 into main Nov 17, 2020
@benjamingeer benjamingeer deleted the wip/DSP-1020-rdf-api branch November 17, 2020 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants