Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sqitch revert takes very long (minutes) on individual changes with a high number of dependencies on Snowflake #571

Open
dlawrences opened this issue Apr 4, 2021 · 3 comments

Comments

@dlawrences
Copy link

Hi team

I am in the process of finalising a CI/CD implementation of Sqitch over Snowflake and I have observed that the process of reverting changes with a high number of dependencies (direct and indirect) takes very long (2-3 minutes per change if not more).

I've experienced this with, for example, the following dependency chain:

  • stored procedure being reverted depends on 4 objects, out of which another stored procedure, three dimensions and one fact table
  • the fact table depends on three dimensions as well
  • the other stored procedure depends on 4 other objects as well

It seems Sqitch will issue the following individual SQL statement for any specific dependency in the dependency chain:

WITH tag AS (
            SELECT tag, committed_at, project,
                   ROW_NUMBER() OVER (partition by project ORDER BY committed_at) AS rnk
              FROM tags
        )
        SELECT c.change_id, c.project, c.change, t.tag AS asof_tag
          FROM dependencies d
          JOIN changes  c ON c.change_id = d.change_id
          LEFT JOIN tag t ON t.project   = c.project AND t.committed_at >= c.committed_at
         WHERE d.dependency_id = ?
           AND (t.rnk IS NULL OR t.rnk = 1)

Now, given SnowSQL requests actually take that long, this is turning quickly into a "snowball" effect. I've seen the statement above being executed 50-100 times per one change being reverted. Is there anything that can be done to batch up that query for multiple dependencies at once?

Thanks

@theory
Copy link
Collaborator

theory commented Apr 6, 2021

Yikes! That's no fun. Looks like that query gets called for each change to be reverted, to make sure no other changes depend on it:

for my $change (@_) {
$seen{ $change->id } = 1;
my @requiring = grep {
!$seen{ $_->{change_id} }
} $self->changes_requiring_change($change) or next;
# XXX Include change_id in the output?
push @msg => __nx(
'Change "{change}" required by currently deployed change: {changes}',
'Change "{change}" required by currently deployed changes: {changes}',
scalar @requiring,
change => $change->format_name_with_tags,
changes => join ' ', map {
($_->{project} eq $proj ? '' : "$_->{project}:" )
. $_->{change}
. ($_->{asof_tag} // '')
} @requiring
);
}

The query is defined by the changes_requiring_change() method. I suspect I wanted to write it in such a way to run as a single query, but it would have been difficult to do in such a way as to exclude later changes from the list of changes an earlier change deploys on. Since Sqitch was originally written on Postgres and SQLite, this repetition was efficient enough that I didn't bother to figure out how to do it.

If someone could rewrite the query to return the proper results for a list of changes, so it could be called just once per revert, I think that'd make Snowflake reverts a lot less time-consuming. Better yet rewrite the default query, which uses a correlated subquery, which wasn't supported by Snowflake at the time I added Snowflake support to Sqitch, but maybe does now? That query is:

sub changes_requiring_change {
my ( $self, $change ) = @_;
return @{ $self->dbh->selectall_arrayref(q{
SELECT c.change_id, c.project, c.change, (
SELECT tag
FROM changes c2
JOIN tags ON c2.change_id = tags.change_id
WHERE c2.project = c.project
AND c2.committed_at >= c.committed_at
ORDER BY c2.committed_at
LIMIT 1
) AS asof_tag
FROM dependencies d
JOIN changes c ON c.change_id = d.change_id
WHERE d.dependency_id = ?
}, { Slice => {} }, $change->id) };
}

@dlawrences
Copy link
Author

dlawrences commented Apr 7, 2021

I've just executed the correlated subquery on a Snowflake account and it still does not work, but how about the following?

SELECT C.CHANGE_ID
     , C.PROJECT
     , C.CHANGE
     , T.TAG AS ASOF_TAG
FROM SQITCH.DEPENDENCIES D
INNER JOIN SQITCH.CHANGES C
           ON C.CHANGE_ID = D.CHANGE_ID
LEFT JOIN (
          SELECT CHANGES.PROJECT
               , CHANGES.COMMITTED_AT
               , TAG
               , ROW_NUMBER() OVER (PARTITION BY CHANGES.PROJECT ORDER BY CHANGES.COMMITTED_AT) AS RNK
          FROM SQITCH.CHANGES
          JOIN SQITCH.TAGS
               ON CHANGES.CHANGE_ID = TAGS.CHANGE_ID
          ) T
          ON C.PROJECT = T.PROJECT
              AND T.COMMITTED_AT >= C.COMMITTED_AT
WHERE (T.RNK IS NULL OR T.RNK = 1)

@theory
Copy link
Collaborator

theory commented Apr 24, 2021

That still would be executed multiple times, though, right? Ideally, we could design a single query to take an array of change IDs, and return their dependencies — but only if the dependent IDs would not be reverted first. For example, say we have this plan:

widgets
magnets [widgets]
maglevs [magnets, widgets]
engineers

Say they are all deployed, and we want to revert to widgets. That means we would revert changes in this order:

  1. engineers
  2. maglevs
  3. magnets

Before we do, we need to make sure that none have dependencies that would be broken by reverting. If we just query naively, we would see that maglevs depends on magnets, so we can't revert magnets. Except of course we can, because we know that we will be reverting maglevs before reverting magnets, so there would be no conflict.

So the trick is to have a way to get back the list of dependencies only if they would not be removed by an earlier reversion, all before we do any reverting. One query to do that is surely possible, but I punted on figuring it out when I wrote Sqitch, as it wasn't really an issue until Snowflake, where queries are so expensive. Perhaps now is the time to figure it out. Do you have suggestions, @dlawrences?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants