Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(triplestores): Support Apache Jena Fuseki #1375

Merged
merged 73 commits into from Apr 2, 2020
Merged

Conversation

benjamingeer
Copy link

@benjamingeer benjamingeer commented Jul 15, 2019

This PR updates and fixes Knora's support for the Apache Jena Fuseki triplestore. Since one of the problems we had with Fuseki seemed to be related to connection handling in the Jetty servlet engine, this PR supports running Fuseki in Apache Tomcat.

  • Add configuration for running Fuseki in Tomcat.
  • When Knora starts, check the status of the Fuseki server by requesting and parsing its status response.
  • Don't update the Lucene index on Fuseki, because Fuseki does this automatically.
  • Support Fuseki in SPARQL templates:
    • Update SPARQL templates for Fuseki to have the same functionality as the ones for GraphDB.
    • Add missing prefix definitions to SPARQL templates.
    • Fix unbound variable in updateUser.scala.txt.
    • Correct resource sorting in getResourcesInProjectPrequery*.scala.txt so it's deterministic when a resource has more than one value for the sort property, by using only the value with the lowest sort order.
    • Optimise getResourcePropertiesAndValues*.scala.txt by querying all statements whose subject is a nested resource, to simplify the graph pattern.
    • Optimise getIncomingReferencesStandard.scala.txt by using a FILTER to simplify the graph pattern.
    • Optimise searchResourceByLabelStandard.scala.txt to do the Lucene index search first in each inner scope.
  • Support Fuseki in Gravsearch:
    • Implement query expansion in SPARQL generated for Fuseki, using property path syntax, for statements that need inference.
    • Don't expand generated statement patterns that already use property path syntax.
    • Support Lucene full-text index searches:
      • Add Gravsearch functions knora-api:matchText and knora-api:matchTextInStandoff, to replace knora-api:match and knora-api:matchInStandoff, with different parameters so both GraphDB and Fuseki can be supported. The existing functions will be deprecated.
    • Support the virtual property knora-api:standoffTagHasStartAncestor using property path syntax.
    • Optimise the SPARQL generated for Fuseki by moving patterns that check knora-base:isDeleted to the end of each block.
  • Support Fuseki in the repository update framework.

Postponed:

  • Support running Fuseki with Tomcat in Docker moved to DSP-30
  • Configure GitHub CI to run tests with Fuseki as well as with GraphDB moved to DSP-31
  • Update documentation moved to DSP-106

Needs #1379.
Resolves #1374.
Resolves DSP-27
Resolves DSP-29

@benjamingeer benjamingeer self-assigned this Jul 15, 2019
@benjamingeer benjamingeer mentioned this pull request Jul 15, 2019
@benjamingeer benjamingeer changed the title Support Apache Jena Fuseki feat(triplestores): Support Apache Jena Fuseki Jul 15, 2019
@tobiasschweizer
Copy link
Contributor

Do you want me to help implement Sparql templates for v2 and also for Gravsearch?

@benjamingeer
Copy link
Author

benjamingeer commented Jul 18, 2019

Do you want me to help implement Sparql templates for v2 and also for Gravsearch?

I've done most of it already, but yes, thanks, I could use some help. Lucene searches aren't working yet with Fuseki. Could you run SearchRouteV2R2RSpec with Fuseki and try to figure it out? Maybe that could be part of #1377.

Here's how to test it:

  1. Download Apache Jena Fuseki 3.14.0 from here.
  2. Download Apache Tomcat 9 from here.
  3. Unzip them both.
  4. Copy fuseki.war into apache-tomcat-9.0.31/webapps/.
  5. Add this to apache-tomcat-9.0.31/conf/tomcat-users.xml to create a user with permission to use Tomcat's webapp manager GUI:
<role rolename="manager-gui"/>
<user username="admin" password="root" roles="manager-gui"/>
  1. On this branch:
$ sudo cp -R knora-api/triplestores/fuseki-tomcat /etc/fuseki
$ sudo chown -R $USER /etc/fuseki
  1. Start Tomcat:
$ cd apache-tomcat-9.0.31/bin
$ ./catalina.sh start
  1. Check that it's running by visiting http://localhost:8080/. You should see:

Screen Shot 2019-07-18 at 13 43 59

  1. Click on Manager App, and enter the username and password you created above ("admin", "root"). Check that the Fuseki webapp is running:

Screen Shot 2019-07-18 at 13 45 12

  1. Visit http://localhost:8080/fuseki/index.html and check that the configured datasets are there:

Screen Shot 2019-07-18 at 13 46 33

  1. Load the test data into knora-test:
$ cd knora-api/webapi/scripts
$ ./fuseki-load-test-data.sh
  1. Change Knora's application.conf to use Fuseki in Tomcat:
        // dbtype = "graphdb-se"
        dbtype = "fuseki"

        fuseki {
            port = 8080
            port = ${?KNORA_WEBAPI_TRIPLESTORE_FUSEKI_PORT}
            repository-name = "knora-test"
            repository-name = ${?KNORA_WEBAPI_TRIPLESTORE_FUSEKI_REPOSITORY_NAME}
            tomcat = true
            tomcat-context = "fuseki"
        }
  1. Start Knora in SBT:
knora-api(wip/1374-fuseki)> webapi / reStart
  1. Try loading a resource through the API: http://0.0.0.0:3333/v2/resources/http%3A%2F%2Frdfh.ch%2F0001%2Fa-thing

  2. Stop Knora: webapi / reStop

Now you should be able to run tests with either GraphDB or Fuseki in SBT like this:

  • To use Fuseki: webapi / Fuseki / testOnly *SearchRouteV2R2RSpec
  • To use GraphDB: webapi / GDBSE / testOnly *SearchRouteV2R2RSpec

Benjamin Geer added 7 commits July 18, 2019 14:20
# Conflicts:
#	webapi/src/main/scala/org/knora/webapi/store/triplestore/http/HttpTriplestoreConnector.scala
# Conflicts:
#	webapi/src/main/scala/org/knora/webapi/responders/admin/ProjectsResponderADM.scala
#	webapi/src/main/scala/org/knora/webapi/store/triplestore/http/HttpTriplestoreConnector.scala
@daschbot
Copy link
Collaborator

This pull request has been mentioned on Discuss DaSCH. There might be relevant details there:

https://discuss.dasch.swiss/t/support-for-apache-jena-fuseki/76/1

@benjamingeer
Copy link
Author

@tobiasschweizer Are you still planning to work on this?

@tobiasschweizer
Copy link
Contributor

I think #1379 could be related to this PR

@benjamingeer
Copy link
Author

@tobiasschweizer OK, do you think it would make sense to merge #1379 first, and then see what else needs to be done for Fuseki Lucene support in this PR?

@tobiasschweizer
Copy link
Contributor

OK, do you think it would make sense to merge #1379 first, and then see what else needs to be done for Fuseki Lucene support in this PR?

Yes, that would be ok. But first we need to decide what the API gets as described in https://discuss.dasch.swiss/t/use-lucene-syntax-in-fulltext-search/81. If this changes, then the results of the fulltext search will change.

@tobiasschweizer
Copy link
Contributor

@tobiasschweizer @loicjaouen Ugh, I would also have to change knora-api:match, which is surely already being used.

you mean you would have to change the function params in Gravsearch?

@benjamingeer
Copy link
Author

you mean you would have to change the function params in Gravsearch?

Yes. The first parameter should be the text value, not the literal valueHasString. This is a design mistake. If we had thought about supporting Fuseki, we would have used the text value as the first parameter.

I can do a workaround, but it will be a bit of a hack, because Fuseki needs the text value, not the literal, to do the Lucene search. The better solution would be to change the parameter in Gravsearch. Do you think it's too late to do that?

@benjamingeer
Copy link
Author

@tobiasschweizer Another option: I could deprecate the existing functions, have them throw an exception if you try to use them with Fuseki, and add new functions with different arguments.

@tobiasschweizer
Copy link
Contributor

Probably this is something we should discuss regarding release management.

@benjamingeer
Copy link
Author

Talked on the phone with @tobiasschweizer, I'm going to take the deprecation approach.

@benjamingeer
Copy link
Author

@subotic All the tests in webapi / Fuseki / test now pass. How do I run the integration tests with Fuseki? SBT doesn't like webapi / Fuseki / it:test.

@benjamingeer
Copy link
Author

Note for future development: Jena's reasoner can do forward chaining as well as backward chaining, so we might be able to use it to optimise queries as we do in GraphDB:

https://jena.apache.org/documentation/inference/#RULEforward

@subotic
Copy link
Collaborator

subotic commented Mar 28, 2020

DSP-31

@subotic
Copy link
Collaborator

subotic commented Apr 2, 2020

Looks good. Managed to get Fuseki setup as per your documentation. Started the tests a few minutes ago.

@benjamingeer
Copy link
Author

@subotic thanks, and don't forget this: #1375 (comment)

Copy link
Collaborator

@subotic subotic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, great work! LGTM.

@benjamingeer
Copy link
Author

@subotic Thanks, but did you figure out how to run the Knora-Sipi integration tests with Fuseki?

@subotic
Copy link
Collaborator

subotic commented Apr 2, 2020

@subotic thanks, and don't forget this: #1375 (comment)

The command to start integration tests with the Fuseki config is: sbt webapi / FusekiIt / test

I had to make a small change. All integration tests also pass!

@benjamingeer
Copy link
Author

@subotic OK, great, thank you for the review!

@benjamingeer benjamingeer merged commit 82f8a55 into develop Apr 2, 2020
@benjamingeer benjamingeer deleted the wip/1374-fuseki branch April 2, 2020 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Try Apache Jena Fuseki again
4 participants