Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"NO TRANSACTION" migrations should mark database as dirty if not successful #728

Open
tumelowill opened this issue Mar 21, 2024 · 3 comments

Comments

@tumelowill
Copy link

tumelowill commented Mar 21, 2024

Migrations that do not have a transaction will not be rolled back. We should mark the database as dirty if the migration failed. Migrations on dirty databases should not be allowed.

Example:

-- 1_create_posts_table.sql
-- +goose Up
CREATE TABLE posts (
  id SERIAL PRIMARY KEY,
  title VARCHAR(255),
  content TEXT,
  author_id INTEGER
);
-- 2_update_posts.sql
-- +goose NO TRANSACTION
-- +goose Up
INSERT INTO posts (title, content, author_id) VALUES ('Hello World', 'This is my first post!', 1);
CREATE INDEX CONCURRENTLY idx_created_at ON posts (created_at); -- This fails as the field created_at does not exist on the posts table

If we have a policy of running migrations when the database schema is outdated, we are now filling the posts table with inserts on each migration run as we have no idea whether the database is dirty.

What we would like to see after attempting to migrate multiple times:

postgres=# select * from public.posts;
 id |    title    |        content         | author_id 
----+-------------+------------------------+-----------
  1 | Hello World | This is my first post! |         1
(1 row)

postgres=# select * from public.goose_db_version;
 id | version_id | is_applied |           tstamp           
----+------------+------------+----------------------------
  1 |          0 | t          | 2024-03-21 14:33:00.782058
  2 |          1 | f          | 2024-03-21 15:08:48.144995
(2 rows)

What we actually see after attempting to migrate multiple times:

postgres=# select * from public.posts;
 id |    title    |        content         | author_id 
----+-------------+------------------------+-----------
  1 | Hello World | This is my first post! |         1
  2 | Hello World | This is my first post! |         1
  3 | Hello World | This is my first post! |         1
  4 | Hello World | This is my first post! |         1
  5 | Hello World | This is my first post! |         1
(5 rows)

postgres=# select * from public.goose_db_version;
 id | version_id | is_applied |           tstamp           
----+------------+------------+----------------------------
  1 |          0 | t          | 2024-03-21 14:33:00.782058
(1 row)
@mfridman
Copy link
Collaborator

That's correct, this is working as intended. However, I can see this may not be suitable for some applications.

I've used several tools that leave the database in a "dirty" state and when you're applying migrations on dozens, or even hundreds of systems recovering becomes very challenging.

When we do have to use +goose NO TRANSACTION we typically isolate those changes to a single file and only down to the statements that truly must run outside a transaction. In practice, for most databases, the number of statements that need to run outside a transaction is very low. So low, that I'd rather not add additional complexity and overhead to handle these edge cases.

@tumelowill
Copy link
Author

tumelowill commented Mar 26, 2024

Interesting thanks. I'm curious how you think we should handle a failed long running migration that involves batch inserting data from a large table into another. Using the Goose's API, I don't think we have a way of knowing whether this migration was previously attempted and cannot stop ourselves from duplicating the work that we had already done. Of course this is something we can write tooling for ourselves to guard against, but it could be nice to have this given to us from Goose directly.

@mfridman
Copy link
Collaborator

mfridman commented Apr 10, 2024

I don't have a good answer for this at the moment.

The times I've had to run migrations that lasted a few hours, we treated those as exceptions, i.e., apply all the migrations up to that point and then run the slow migration in isolation (monitoring it more closely).

When possible, we wrote those migrations to be idempotent, so that if they were interrupted, they could be run again without causing any issues. But obviously, that's not always possible.

In other places, I've resorted to doing "data migrations" in out-of-band jobs without goose, e.g., marking a feature as "no ready" until some data migration or backfill is complete. But the downside is that you must write your own logic to track this.

I'd like to keep this issue open for now and do a bit more thinking. Fwiw the is_applied column in goose is a legacy thing that tracked down migrations, but we decided to remove that (initial #121 (comment))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants