Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cassalog can get into inconsistent state and prevent upgrades from completing #3

Open
jsanda opened this issue Sep 28, 2016 · 4 comments

Comments

@jsanda
Copy link
Contributor

jsanda commented Sep 28, 2016

When Cassalog applies a schema change, it executes whatever CQL statements correspond to the change, and then it updates the change log table, cassalog. If Cassalog fails to update the change log table, maybe due to a request timeout, Cassalog will abort. Depending on the schema change involved, Cassalog can wind up in an inconsistent state. Let's say we have:

schemaChange {
  version '1.0'
  author 'jsanda'
  cql """
CREATE TABLE my_table (
    id text PRIMARY KEY,
    value text
) WITH compaction = { 'class': 'LeveledCompactionStrategy' }
"""
}

Cassalog fails with a timeout exception while trying to update the change log table, or better yet, the JVM in which Cassalog is running gets shutdown abruptly. The next time Cassalog is run it will attempt to apply the above schema change since it was not recorded in the change log. This in turn will result in an error from Cassandra since the table already exists.

In this particular example we can sort of work around the problem by using CREATE TABLE my_table IF NOT EXISTS..., but that won't work in general. Initially I thought maybe the solution would be to use atomic batches, but they cannot be used with DDL statements. We need to figure something else out to prevent Cassalog from getting into an inconsistent state.

@lucasponce
Copy link

Perhaps writting on the casslog before/after of the updateCQL operation it might help if an update was not fully performed.
Also, adding some startCassalog/endCassalog operations into the groovy scripts might help to detect that (I think).

@jsanda
Copy link
Contributor Author

jsanda commented Oct 5, 2016

Perhaps writting on the casslog before/after of the updateCQL operation it might help

I was thinking along the same lines, but there still needs to be a way to verify whether or not DDL statements (i.e., CREATE TABLE) are successfully executed. I spent some time thinking about this today, and verification can be done by querying the system tables. In Cassandra 3, we would query the system_schema.tables table for the existence of a newly created table. For adding/removing a column we would check the system_schema.columns table.

Let's consider the following. A CREATE TABLE statement succeeds, but recording the change in the cassalog table fails. Cassalog will abort execution. When it runs again, there is no way of knowing whether or not the change was attempted.

To fix this, we first record the change in the cassalog table before executing it. We also need a flag in the cassalog table that gets set only after the change is successfully updated. Let's say that the CREATE TABLE query fails. When Cassalog runs again, we find that the flag has not been set, which tells us the change was probably attempted, but we do not have enough info to determine whether or not it succeeded. We can query system_schema.tables for the new table. If we find it, then we set the flag in cassalog table. If we do not find the table, then we apply the change again and repeat the steps.

I don't know if this will handle all DDL scenarios, but I think it will cover the basic ones. For dropping a column, we would query system_schema.columns to verify that the column is not there.

DML statements, i.e., inserting data, is a bit different. If the query is idempotent, then we can simply execute it again. If the query is not idempotent, then maybe the schema change needs to include a user-supplied check to verify it was applied. That check would be performed in the event of an error where the update flag is not set.

@lucasponce
Copy link

In the same scenario, in alerting we have introduced a checker that basically flags if the schema is ok or not querying the tables on the system tables too.
Perhaps trying to do this under the hood of the user it can bring a lot of complexity.
But a checker() function on cassalog might help to identify when a query was correct.
My previous comment was more simple.
Taking as assumption that when an update() call finished correctly, I was thinking on set a previous STARTED/FINISHED flags in cassalog table.
So, in that case, a preliminar check with STARTED flag should stop the process and warns user that Cassandra was in a non-stable status.
Trying to automate this from the update() call means that we need to detect the type of operation and create the specific checker function. That's interesting but perhaps is better to delegate that into the user.
WDYT ?

@jsanda
Copy link
Contributor Author

jsanda commented Oct 5, 2016

Trying to automate this from the update() call means that we need to detect the type of operation and create the specific checker function.

Right now Cassalog only supports raw CQL changes like:

schemaChange {
    version '1.0'
    author 'jsanda'
    cql """
CREATE TABLE foo (
    id uuid PRIMARY KEY,
    value text
)
"""
}

You are right. For raw CQL changes it would be difficult to determine the type of schema change, and I wouldn't not want to do that. For raw schema changes, I would prefer to allow the user to specify the verification/checker function.

We can also introduce some typed schema change functions, something like:

createTable {
    version '1.0'
    author 'jsanda'
    tableName 'foo'
    columns [
        [name: 'id', type: uuid, primaryKey: true],
        [name: value, type: text]
    ]   
}

For something like this, Cassalog knows that is it a CREATE TABLE and the name of the table so it could do the verification check on its own.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants