"NO TRANSACTION" migrations should mark database as dirty if not successful #728

tumelowill · 2024-03-21T15:14:28Z

Migrations that do not have a transaction will not be rolled back. We should mark the database as dirty if the migration failed. Migrations on dirty databases should not be allowed.

Example:

-- 1_create_posts_table.sql
-- +goose Up
CREATE TABLE posts (
  id SERIAL PRIMARY KEY,
  title VARCHAR(255),
  content TEXT,
  author_id INTEGER
);

-- 2_update_posts.sql
-- +goose NO TRANSACTION
-- +goose Up
INSERT INTO posts (title, content, author_id) VALUES ('Hello World', 'This is my first post!', 1);
CREATE INDEX CONCURRENTLY idx_created_at ON posts (created_at); -- This fails as the field created_at does not exist on the posts table

If we have a policy of running migrations when the database schema is outdated, we are now filling the posts table with inserts on each migration run as we have no idea whether the database is dirty.

What we would like to see after attempting to migrate multiple times:

postgres=# select * from public.posts;
 id |    title    |        content         | author_id 
----+-------------+------------------------+-----------
  1 | Hello World | This is my first post! |         1
(1 row)

postgres=# select * from public.goose_db_version;
 id | version_id | is_applied |           tstamp           
----+------------+------------+----------------------------
  1 |          0 | t          | 2024-03-21 14:33:00.782058
  2 |          1 | f          | 2024-03-21 15:08:48.144995
(2 rows)

What we actually see after attempting to migrate multiple times:

postgres=# select * from public.posts;
 id |    title    |        content         | author_id 
----+-------------+------------------------+-----------
  1 | Hello World | This is my first post! |         1
  2 | Hello World | This is my first post! |         1
  3 | Hello World | This is my first post! |         1
  4 | Hello World | This is my first post! |         1
  5 | Hello World | This is my first post! |         1
(5 rows)

postgres=# select * from public.goose_db_version;
 id | version_id | is_applied |           tstamp           
----+------------+------------+----------------------------
  1 |          0 | t          | 2024-03-21 14:33:00.782058
(1 row)

The text was updated successfully, but these errors were encountered:

mfridman · 2024-03-26T11:58:54Z

That's correct, this is working as intended. However, I can see this may not be suitable for some applications.

I've used several tools that leave the database in a "dirty" state and when you're applying migrations on dozens, or even hundreds of systems recovering becomes very challenging.

When we do have to use +goose NO TRANSACTION we typically isolate those changes to a single file and only down to the statements that truly must run outside a transaction. In practice, for most databases, the number of statements that need to run outside a transaction is very low. So low, that I'd rather not add additional complexity and overhead to handle these edge cases.

tumelowill · 2024-03-26T15:07:00Z

Interesting thanks. I'm curious how you think we should handle a failed long running migration that involves batch inserting data from a large table into another. Using the Goose's API, I don't think we have a way of knowing whether this migration was previously attempted and cannot stop ourselves from duplicating the work that we had already done. Of course this is something we can write tooling for ourselves to guard against, but it could be nice to have this given to us from Goose directly.

mfridman · 2024-04-10T12:58:14Z

I don't have a good answer for this at the moment.

The times I've had to run migrations that lasted a few hours, we treated those as exceptions, i.e., apply all the migrations up to that point and then run the slow migration in isolation (monitoring it more closely).

When possible, we wrote those migrations to be idempotent, so that if they were interrupted, they could be run again without causing any issues. But obviously, that's not always possible.

In other places, I've resorted to doing "data migrations" in out-of-band jobs without goose, e.g., marking a feature as "no ready" until some data migration or backfill is complete. But the downside is that you must write your own logic to track this.

I'd like to keep this issue open for now and do a bit more thinking. Fwiw the is_applied column in goose is a legacy thing that tracked down migrations, but we decided to remove that (initial #121 (comment))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"NO TRANSACTION" migrations should mark database as dirty if not successful #728

"NO TRANSACTION" migrations should mark database as dirty if not successful #728

tumelowill commented Mar 21, 2024 •

edited

mfridman commented Mar 26, 2024

tumelowill commented Mar 26, 2024 •

edited

mfridman commented Apr 10, 2024 •

edited

"NO TRANSACTION" migrations should mark database as dirty if not successful #728

"NO TRANSACTION" migrations should mark database as dirty if not successful #728

Comments

tumelowill commented Mar 21, 2024 • edited

mfridman commented Mar 26, 2024

tumelowill commented Mar 26, 2024 • edited

mfridman commented Apr 10, 2024 • edited

tumelowill commented Mar 21, 2024 •

edited

tumelowill commented Mar 26, 2024 •

edited

mfridman commented Apr 10, 2024 •

edited