How to handle 'KnoraRules.pie' changes #1506

subotic · 2019-11-07T14:12:53Z

Currently, when using the published Docker images, KnoraRules.pie is backed into the dhlabbasel/knora-graphdb-se Docker image.

If an existing deployment is upgraded to newer Docker images and the new version of dhlabbasel/knora-graphdb-se contains a changed KnoraRules.pie which requires changes to the data, will GraphDB startup? I seem to remember, that it would complain and that we had to re-run the data loading script.

So, what can happen is that we are left with a deployment which will require a very manual process to get working again. Basically:

export data before upgrade
run upgrade script on data
upgrade graphdb image and start a new container
import data

This highly manual and error-prone workflow is not something I'm willing to do.

My suggestions:

freeze KnoraRules.pie forever
get rid of KnoraRules.pie

Or are there any other possibilities that I'm missing?

The text was updated successfully, but these errors were encountered:

benjamingeer · 2019-11-07T16:49:05Z

KnoraRules.pie contains our consistency-checking rules and a few custom inference rules. From time to time we add things to it to support new features in Knora, so I don't think it's possible to freeze it forever. Getting rid of it would mean getting rid of consistency checking in GraphDB, which is a useful feature that users have been very glad to have.

It's unlikely that changes to KnoraRules.pie would require changes to data. But if it happened, wouldn't the situation would be basically the same as any change in Knora requiring data to be updated? I don't understand what you mean by "upgrade graphdb image and start a new container".

I haven't changed KnoraRules.pie in a long time, but I think you have to restart GraphDB, and perhaps also delete the repository and create it again, to get GraphDB to reload KnoraRules.pie. But we could do this automatically in the upgrade script, so it would be done with every upgrade.

subotic · 2019-11-07T18:12:28Z

I don't understand what you mean by "upgrade graphdb image and start a new container".

To upgrade the GraphDB container, one would shut it down and delete it, download the new image and spin up a new container using the data from the previous version. When the new container starts it also starts GraphDB, which then has the new version of KnoraRules.pie. So basically, the old data is started with GraphDB using a new version of KnoraRules.pie. I don't think that GraphDB will start. Thus the upgrade program will not work, because there is no running GraphDB. Maybe I'm wrong. That was my question.

What could work though, if we must have KnoraRules.pie, is that the rules are loaded dynamically by knora-api. Also, at some point in time (sooner than later) there should not be any direct access to GraphDB for data loading operations, but only through knora-api. See #1485.

Sooner or later we will need to part ways with KnoraRules.pie because we are going to move to a different triplestore (https://rya.apache.org was mentioned). Also, if we provide support for Fuseki again, then there we will also need a solution for consistency checking. It would probably be better to start looking at SHACL.

benjamingeer · 2019-11-07T18:50:02Z

To upgrade the GraphDB container, one would shut it down and delete it, download the new image and spin up a new container using the data from the previous version. When the new container starts it also starts GraphDB, which then has the new version of KnoraRules.pie.

Why not automatically update KnoraRules.pie as part of the data update? In other words:

Export the data into a TriG file for the update.
Shut down the container and delete it.
Run the update script, generating a new TriG file.
Start a new container with the new KnoraRules.pie.
Import the new TriG file into GraphDB.

What could work though, if we must have KnoraRules.pie, is that the rules are loaded dynamically by knora-api.

That wouldn't work, because consistency rules have to be defined before any data is loaded into the repository. Otherwise, there could be data that is not checked.

Also, if we provide support for Fuseki again, then there we will also need a solution for consistency checking. It would probably be better to start looking at SHACL.

But I suppose the situation will be the same: the SHACL rules will have to be defined when the repository is created.

subotic · 2019-11-07T21:10:08Z

Export the data into a TriG file for the update.

Shut down the container and delete it.

Run the update script, generating a new TriG file.

Start a new container with the new KnoraRules.pie.

Import the new TriG file into GraphDB.

Yes, this is what I was describing in my first post and what I would like to avoid :-) There are too many manual steps involved. Ideally, I would like the upgrade process to be completely automatic. As the size of the data grows, this will become more and more painful. If the upgrade is painful and time-consuming, then nobody will want to do it, including me. My goal is actually to go towards continuous deployment, i.e., each commit and not yearly releases ;-)

I will have to think of something to be able to automate it.

Just a thought, as our user interface gets developed, I hope that nobody will have the need to manually edit their data. If they still want to do it for some reason, then maybe an external tool, which they could run over their data would be more helpful, then loading it into GraphDB to see if it is formally correct.

Besides the usefulness for the users who edit the data outside of Knora, are there any features of knora-api that depend on KnoraRules.pie and wouldn't work without it?

benjamingeer · 2019-11-08T07:54:47Z

as our user interface gets developed, I hope that nobody will have the need to manually edit their data

Consistency checking in the triplestore isn't only useful for people who edit data outside of Knora. It also protects us from database corruption caused by bugs in Knora. It would be a nightmare if a bug in Knora caused data corruption that was not noticed until after a lot more data was added, making it impossible to fix the problem by reverting to an earlier backup.

If we were using a relational database, I would implement consistency checks in the database, using mechanisms like these:

Oracle Integrity Constraints: https://docs.oracle.com/cd/B12037_01/appdev.101/b10795/adfns_co.htm
PostgreSQL Constraints: https://www.postgresql.org/docs/9.4/ddl-constraints.html
MySQL Check Constraints: https://dev.mysql.com/doc/refman/8.0/en/create-table-check-constraints.html

are there any features of knora-api that depend on KnoraRules.pie and wouldn't work without it?

Ensuring data integrity is a feature. :)

Most of our SPARQL relies heavily on GraphDB inference to optimise queries. KnoraRules.pie provides a custom combination of RDFS inference and the inference rule for owl:TransitiveProperty.

If we didn't use KnoraRules.pie, we wouldn't be able to have this custom mixture of RDFS and OWL inference rules. We would have to use one of GraphDB's standard .pie files for RDFS or OWL inference. These standard rules files could also need to be updated with new versions of GraphDB. So we would still have the same problem.

loicjaouen · 2019-11-08T09:33:20Z

user comment: as previously stated elsewhere, here at Lausanne, we did a couple of knora api versions jumps and we have seen existing data not passing the consistancy checks of a newer pie file.

Our upgrade process is:

run Ben's script, get the resulting trig file
re-init the tiplestore with the script (with the line loading the data commented out) that reloads the current pie file
upload the trig file

For now, when data editing is required by a knora-api update, it needs to be done off line, so through trig files and re-import, so updating the pie file is not a problem.

subotic · 2019-11-10T11:03:14Z

Ok, so we need KnorsRules.pie. Good, that’s settled.

It maybe doesn’t need to be 'hardcoded' in the repository? knora-api could load and unload it (the therein specified rules) as necessary, to allow a more automated upgrade procedure.

benjamingeer · 2019-11-10T11:21:59Z

It maybe doesn’t need to be 'hardcoded' in the repository? knora-api could load and unload it (the therein specified rules) as necessary, to allow a more automated upgrade procedure.

I don’t understand what you mean by “as necessary”. The rules are always necessary. They are necessary when the repository is created, before any data is loaded, to ensure that all data is checked.

Similarly, in relational databases, integrity constraints are specified as part of the CREATE TABLE command.

subotic · 2019-11-10T15:54:32Z

I don’t understand what you mean by “as necessary”.

I'm only talking about administration tasks. For example when dropping a Graph. This, of course, requires correct knowledge of what depends on what. If I want to drop the only data graph or the ontology and data graph of a project, then this should be fairly safe to do with the rules turned off.

As a contrast, currently dropping a graph involves exporting everything but the graphs that we want to delete, emptying the repository, and then reloading the export.

We simply need a way to perform certain administrative tasks in a reasonable timeframe without the need to sit at the computer and perform a number of manual steps where each step can be botched and prolong the whole process. I'm speaking from experience. Everything that could go wrong, I managed to get wrong. It is simply too boring to sit for 20 minutes and wait for GraphDB. I then simply forget what I was doing or skip a step and can the start from the beginning. This is really not fun and we need a much better solution.

If GraphDB does not allow us to do this, then we should think of alternatives. Currently, we have a handful of projects and already somewhat big problems. I'm afraid that when we really begin to pump projects into Knora, that the problems are going to get only worse.

Also, just that we are clear on this. I don't have a problem with consistency, I have a problem that the implementation in GraphDB is very slow, at least for the mentioned cases.

benjamingeer · 2019-11-10T15:58:28Z

As far as I know, GraphDB is the only triplestore that has a production-ready consistency checking implementation. The current implementations of SHACL were all still experimental the last time I checked. And who knows how fast they are.

Dropping a graph could introduce an inconsistency, because there can be links between resources in different graphs.

It sounds to me like you're doing tasks manually that should instead be automated. If all the steps were automated by a single script, there would be no need for you to sit for 20 minutes waiting. You could start the task at the end of the day, and the next morning it would be done.

benjamingeer · 2019-11-10T16:01:34Z

If I want to drop the only data graph or the ontology and data graph of a project, then this should be fairly safe to do with the rules turned off.

Only if you can guarantee that there are no links between resources in different projects. And if, as you say, anything that can go wrong will go wrong, doesn't it seem safer to let the triplestore check this for you?

benjamingeer · 2019-11-10T16:05:39Z

To put it another way, this is the same contradiction I pointed out here:

You don't trust yourself to do a manual process without making mistakes.
You trust yourself to turn off the database's consistency checks, because you know what you're doing.

I think best practice is not to trust any human or computer program not to corrupt the database. The database should always protect itself from inconsistencies. That's why DBMS systems generally have declarative integrity constraints.

loicjaouen · 2019-11-11T09:35:15Z

there is no contradiction but a matter of setting-up procedures.

test it first on a staging server, with consistency on, and it take ages but you know that it works
then because you know it is safe, you do it again on prod, without consistency checking, so the downtime is acceptable

benjamingeer · 2019-11-11T09:37:52Z

In that case, why would knora-api need to deal with turning consistency checking on and off? It doesn't even know whether it's running on production. Only the sysadmin knows this.

benjamingeer · 2019-11-11T09:39:42Z

The idea that "you know it is safe" assumes that you will never make a mistake, e.g. typing a command in the wrong terminal, using the wrong file, etc.

loicjaouen · 2019-11-11T09:56:34Z

In my humble opinion:

why would knora-api need to deal with turning consistency checking on and off?

that would come in handy for the sysadmin and reduce the error prone handling of different commands and terminals

The idea that "you know it is safe" assumes that you will never make a mistake

even with the consistency checker enabled on prod, I run the upgrade script on staging first (as long as I can afford the space) and do a back-up before running on prod. We can put more safety layers, but I think that they have their reasons of being, KnoraRules checks the live operations, and upgrades so far are not running on the live system so they can be checked separately.

benjamingeer · 2019-11-11T10:19:54Z

OK, then, let's try adding a Knora route that turns consistency checking on and off. @subotic do you know how to do this? Would you like to implement it?

subotic · 2019-11-12T06:25:10Z

yes, it is easy, though I'm not sure if we need a route for this. Let me think a bit about this. I wanted to implement it in a different way.

benjamingeer · 2019-11-12T07:16:01Z

If you disable KnoraRules.pie completely, what happens to the triples that were inferred previously? Are they all deleted?

subotic · 2019-11-12T21:15:54Z

I don't think so. But reinferring can be started with the following statement:

INSERT DATA { [] <http://www.ontotext.com/owlim/system#reinfer> [] }

subotic · 2019-11-12T21:18:47Z

Turn off:

INSERT DATA {
    _:b sys:defaultRuleset "none"
}

Turn on:

INSERT DATA {
    _:b sys:defaultRuleset "KnoraRules"
}

subotic added the enhancement improve existing code or new feature label Nov 7, 2019

subotic assigned subotic, benjamingeer and loicjaouen Nov 7, 2019

benjamingeer mentioned this issue Nov 10, 2019

Consistency checking inside 'upgrade' tool #1511

Open

subotic added this to the Backlog milestone Feb 7, 2020

benjamingeer closed this as completed Aug 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to handle 'KnoraRules.pie' changes #1506

How to handle 'KnoraRules.pie' changes #1506

subotic commented Nov 7, 2019

benjamingeer commented Nov 7, 2019

subotic commented Nov 7, 2019

benjamingeer commented Nov 7, 2019

subotic commented Nov 7, 2019

benjamingeer commented Nov 8, 2019 •

edited

loicjaouen commented Nov 8, 2019

subotic commented Nov 10, 2019

benjamingeer commented Nov 10, 2019 •

edited

subotic commented Nov 10, 2019

benjamingeer commented Nov 10, 2019

benjamingeer commented Nov 10, 2019

benjamingeer commented Nov 10, 2019 •

edited

loicjaouen commented Nov 11, 2019

benjamingeer commented Nov 11, 2019

benjamingeer commented Nov 11, 2019

loicjaouen commented Nov 11, 2019

benjamingeer commented Nov 11, 2019

subotic commented Nov 12, 2019

benjamingeer commented Nov 12, 2019

subotic commented Nov 12, 2019

subotic commented Nov 12, 2019

How to handle 'KnoraRules.pie' changes #1506

How to handle 'KnoraRules.pie' changes #1506

Comments

subotic commented Nov 7, 2019

benjamingeer commented Nov 7, 2019

subotic commented Nov 7, 2019

benjamingeer commented Nov 7, 2019

subotic commented Nov 7, 2019

benjamingeer commented Nov 8, 2019 • edited

loicjaouen commented Nov 8, 2019

subotic commented Nov 10, 2019

benjamingeer commented Nov 10, 2019 • edited

subotic commented Nov 10, 2019

benjamingeer commented Nov 10, 2019

benjamingeer commented Nov 10, 2019

benjamingeer commented Nov 10, 2019 • edited

loicjaouen commented Nov 11, 2019

benjamingeer commented Nov 11, 2019

benjamingeer commented Nov 11, 2019

loicjaouen commented Nov 11, 2019

benjamingeer commented Nov 11, 2019

subotic commented Nov 12, 2019

benjamingeer commented Nov 12, 2019

subotic commented Nov 12, 2019

subotic commented Nov 12, 2019

benjamingeer commented Nov 8, 2019 •

edited

benjamingeer commented Nov 10, 2019 •

edited

benjamingeer commented Nov 10, 2019 •

edited