Clarify the reasoning for Define Model Class Migrations #284

pirj · 2021-04-09T10:03:44Z

https://rails.rubystyle.guide/#define-model-class-migrations claims that

or cause serious data corruption

Can you think of a use case when an old migration could cause a serious data corruption?
Yes, it will most probably break db:migrate on a clean database.

There's yet another reason not to use real models in migrations.
If you refer to a real model in an older migration (e.g. for update_all), Rails will cache column information.
And subsequent migrations that would alter that table's schema won't affect the column cache.
This will lead to breakages due to an outdated (fixed for the moment when the model was first referred to in migrations) table schema. Specifically, if you add a column in a later migration and update_all that new column, this will result in an error.
Also, if you run rake db:migrate db:seed in one command, this will also affect your seeds, and it will also fail with:

ActiveModel::UnknownAttributeError: unknown attribute 'some_attr' for ModelName

The text was updated successfully, but these errors were encountered:

andyw8 · 2021-04-09T22:36:36Z

For reference, this was originally added in #205 and discussed in #143

pirj · 2021-07-19T08:36:16Z

There are two more things that recently hit me:

We suggest defining the class on the top level:

class MigrationProduct < ActiveRecord::Base
  self.table_name = :products
end

class ModifyDefaultStatusForProducts < ActiveRecord::Migration

while the MigrationProduct can be defined by multiple migrations.
If it happens that self.table_name = is different, the last one wins.
I don't have a good example, maybe something like:

class MigrationUser < ActiveRecord::Base
  self.table_name = 'accounts' # not 'users'
end

The suggestion would be to define migration model classes inside migration classes:

class ModifyDefaultStatusForProducts < ActiveRecord::Migration
  class MigrationProduct < ActiveRecord::Base
    self.table_name = :products
  end
  ...

pirj · 2021-07-19T09:08:49Z

Migration models are loaded as needed in development, while loaded all at once in production with eager_load. (this needs to be confirmed with the Rails code, if the migrator loads them or it loads the whole db/migrate and instantiates migration classes as needed.

Still, the issue was as following:

# migration 1
class AddCategoryToFoos < ActiveRecord::Migration[6.0]
  def change
    add_column :foos, :category, :string
  end
end

# migration 2
class BackfillCategoryToFoos < ActiveRecord::Migration[6.0]
  class MigrationFoo < ActiveRecord::Base
    self.table_name = 'foos'
  end
  def change
    MigrationFoo.find_each { |foo| foo.update!(category: 'primary') } # <= The real problem is here
  end
end

# migration 3
class RestrainCategoryToFoosToNotNull < ActiveRecord::Migration[6.0]
  def change
    change_column_null :foos, :category, false # BOOM, `PG::NotNullViolation: ERROR: column "category" contains null values`
  end
end

The problem is with foo.update!(category: 'primary').
ActiveRecord::Dirty has no awareness of the recently added category column, and save! decided not to do an UPDATE.

I don't have a good explanation as to why this happened. My theory is that if those two last migrations were executed in the same run, Rails loaded both, and immediately cached model's schema for MigrationFoo.
This was also happening on local.

Might be related to:

config.active_record.use_schema_cache_dump enables users to get schema cache information from db/schema_cache.yml (generated by rails db:schema:cache:dump), instead of having to send a query to the database to get this information. Defaults to true.

But I couldn't find db/schema_cache.yml locally.

One suggestion would be to dynamically create model classes instead:

def change
  MigrationFoo = Class.new(ActiveRecord::Base) { self.table_name = 'foos' }
  
  MigrationFoo.find_each { |foo| foo.update!(category: 'primary') }
end

but I can't recommend it without testing it. And I have some doubts if schema cache will be updated for a new model pointing to the same table, or an existing cached definition (that e.g. lacks category column info) will be used.

pirj · 2021-09-16T11:05:28Z

Another approach is to use:

def change
  MigrationFoo.reset_column_information
  
  MigrationFoo.find_each { |foo| foo.update!(category: 'primary') }
end

See https://stackoverflow.com/a/9728033/202914 for explanation.
Otherwise, an error is raised:

unknown attribute 'category' for MyCoolMigration::MigrationFoo.

Apparently, migrations are loaded all at once, or somehow column information is shared between MyCoolMigration::MigrationFoo and MyPreviousMigration::MigrationFoo.

I'm on the fence if it's worth adding to the guide, as it's for this relatively rare case when a project tries to keep migrations from early days in a state that allows running rails db:create db:migrate rather than rails db:create db:schema:load.

pirj · 2021-10-03T21:09:13Z

reset_column_information is what Rails suggests using:

Sometimes you'll want to add a column in a migration and populate it immediately after. In that case, you'll need to make a call to reset_column_information in order to ensure that the model has the latest column data from after the new column was added.

pirj · 2021-12-17T10:44:51Z

One case when it's not possible to properly use migration model in migration when you use it to schedule an Active Job job:

class MigrationUser < ActiveRecord::Base
  self.table_name = :users
end

def up
  MigrationUser.find_each { |user| SyncWithThirdParty.perform_later(user) }
end

as this will end up with a failure to deserialize it, as the global id would use a weird model name.

As a workaround, global id key should be overridden in the migration model.

khamusa · 2022-04-04T22:06:37Z

An example of data corruption (logical):

You write a migration that adds a new nullable column, and you write code to make sure that certain types of records get backfilled with some type of value. For this, you use ModelName.where(...).update_all(...)
If you have a gem such as act_as_paranoid or any code that added a default scope to your model, migrating data this way will prevent records not matching your default scope from being updated.

In the case of act_as_paranoid, that could not be a big issue since it will likely exclude soft-deleted records, which you may not care that much (I do, though). If you try, however, to update your newly added column to be non-nullable, that could break your production deployment because of the presence of soft-deleted records (which would still be null as they were excluded by the default scope), while it's very likely you won't notice this in development (unless you're constantly soft-deleting records locally, which most developers won't do).

One could argue this is a problem with the usage of default scopes, and I'd also agree. Doesn't change the fact that avoiding direct references to your models in migrations prevents this kind of error.

Thought it was worth sharing.

matteeyah · 2023-11-23T02:07:03Z

I agree with not using regular models in migrations - that's just asking for trouble, but I strongly disagree with defining model classes inside of migration - that is too cumbersome and goes against the first principle of the rails doctrine (Optimize for programmer happiness), since it's a fact that it causes headaches for engineers.

I propose just using raw SQL to change data in a migration like this. I know some people will disagree (maybe even barf a little) when reading this. However, this produces simpler and cleaner code, is much easier for developers to use, but is controversial and not widespread. It's very common in Basecamp (https://discuss.rubyonrails.org/t/patterns-for-data-only-rails-migrations/75320/10) and plays nicely with what the migrations were designed to be. I'll quote the linked reply from DHH here for ease of readability

We mix data and sql statements in migrations all the time at Basecamp. This is what migrations were designed to do! They’re intended to be moments in time. Not to be replayable forever. The stable unit is schema.rb (or structure.sql).

pirj mentioned this issue Feb 6, 2022

Replace references to ActiveRecord::Base with ApplicationRecord #310

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify the reasoning for Define Model Class Migrations #284

Clarify the reasoning for Define Model Class Migrations #284

pirj commented Apr 9, 2021

andyw8 commented Apr 9, 2021 •

edited

pirj commented Jul 19, 2021

pirj commented Jul 19, 2021

pirj commented Sep 16, 2021

pirj commented Oct 3, 2021

pirj commented Dec 17, 2021

khamusa commented Apr 4, 2022 •

edited

matteeyah commented Nov 23, 2023

Clarify the reasoning for Define Model Class Migrations #284

Clarify the reasoning for Define Model Class Migrations #284

Comments

pirj commented Apr 9, 2021

andyw8 commented Apr 9, 2021 • edited

pirj commented Jul 19, 2021

pirj commented Jul 19, 2021

pirj commented Sep 16, 2021

pirj commented Oct 3, 2021

pirj commented Dec 17, 2021

khamusa commented Apr 4, 2022 • edited

matteeyah commented Nov 23, 2023

andyw8 commented Apr 9, 2021 •

edited

khamusa commented Apr 4, 2022 •

edited