Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploy new applications with db:schema:load #118

Open
Odaeus opened this issue Apr 22, 2015 · 29 comments
Open

Deploy new applications with db:schema:load #118

Odaeus opened this issue Apr 22, 2015 · 29 comments

Comments

@Odaeus
Copy link

Odaeus commented Apr 22, 2015

Hello,

I see that someone requested this feature for a different reason and it was denied. I don't understand the reason for dismissal, as the source of truth for the database schema is the /db/schema.rb or /db/structure.sql file. Migrations are only designed to manage changes to the database and it seems a waste to run the entire migration history when deploying a new application. Also it should be possible to delete old migrations if an application has a great many.

There is no need (and it is error prone) to deploy a new instance of an app by replaying the entire migration history. It is much simpler and faster to just load into the database a description of the current schema. - Rails guide

Any chance you would reconsider this position?

@chewi
Copy link

chewi commented Apr 28, 2015

Just ran into this myself. Didn't notice before because I hadn't deployed any old applications on new servers lately. I think it's debatable whether it should attempt rake db:structure:load automatically. While it should just complain that everything already exists on an existing database, I'd be satisfied and feel safer with a manual option. Maybe cap deploy:new_database or cap deploy NEWDB=1 or something. Prepping it manually beforehand is a bit fiddly because you have to do cap deploy:updating and then SSH to the server to run rake db:structure:load yourself. Depending on your setup, this may require several additional steps in between to get the right environment.

P.S. @Odaeus, hello old friend. 😉

@Odaeus
Copy link
Author

Odaeus commented Apr 28, 2015

Hello James!

I ended up adding the following to my deploy.rb:

namespace :deploy do
  namespace :db do
    desc "Load the database schema if needed"
    task load: [:set_rails_env] do
      on primary :db do
        if not test(%Q[[ -e "#{shared_path.join(".schema_loaded")}" ]])
          within release_path do
            with rails_env: fetch(:rails_env) do
              execute :rake, "db:schema:load"
              execute :touch, shared_path.join(".schema_loaded")
            end
          end
        end
      end
    end
  end

  before "deploy:migrate", "deploy:db:load"
end

I agree it should be a manual task with the option of making it automatic, given that it could cause issues with a production database. My example above uses a hidden file as a guard to make sure it only runs once.

@will-in-wi
Copy link
Contributor

Just saw this: https://stackoverflow.com/questions/35821777/rails-how-to-set-up-dbschemaload-for-initial-deploy-with-capistrano

Which made me look for this issue.

My thoughts:
Rails points out in their docs that the schema file is the canonical way of setting up a new database. I see two common Capistrano cases:

  1. New app, new deployment. Capistrano just uses migrations and it works fine.
  2. Old app, new deployment. A clean chain of migrations may not exist all the way back to initial creation. Further, it may be error-prone to set up a database this way.

For any case where the database has no schema version information, it is probably more correct to run schema (or structure) load instead of running migrations.

I'm not sure how we might go about finding out whether the database has had the schema loaded yet, but it would ideal to query this and decide based on that which method to run.

If someone wants to flesh out conceptually how this might be done, I can possibly take a crack at a PR.

@mattbrictson
Copy link
Member

Typically (in my experience) the user that is running the Rails app is a non-privileged user that does not have permission to create databases. The initial step of creating the database is thus done outside of the Capistrano deploy, which probably explains why this not entirely straightforward.

Personally, I try to make sure migrations always work back to the beginning because you never know when you might need to restore an old database backup and migrate it. But, I completely understand that I'm in the minority and rake db:schema:load or db:structure:load is the better way to go. I'm just not sure what is the safe way to do it.

@will-in-wi
Copy link
Contributor

I'm not sure either. My usual case is that the database already exists, but is completely empty. schema:load is then run to set up the initial structure.

It would probably be ideal to be able to run cap env rails:schema:load or something like that. But that presumes a preexisting deploy, which wouldn't exist yet…

@mattbrictson
Copy link
Member

Maybe you could do a trick like:

bundle exec rails runner -e production "exit 1 unless ActiveRecord::SchemaMigration.table_exists?"

And put that inside a Capistrano test to see if we need to load the schema.

@mattbrictson
Copy link
Member

⬆️ That's an expensive test because it needs to load the Rails environment, so never mind.

@mattbrictson
Copy link
Member

We could modify the implementation of the deploy:migrate task to add some conditionals. It looks to see if a special :rails_load_schema variable is set to true (it will default to false). If so, it runs rake db:schema:load instead of rake db:migrate.

Then, we have a special rake task, load_schema, which all it does is call set :rails_load_schema, true. First-time deployment works like this:

cap production load_schema deploy

And then do something similar for db:structure:load.

This is solution is a bit out of the ordinary, though; I can't think of another Capistrano plugin that uses additional task names on the command line preceding deploy to affect conditional behavior.

Another option would be to make a special load_schema_and_deploy task that sets the variable and then does invoke :deploy.

Or use an environment variable:

CAP_RAILS_LOAD_SCHEMA=1 cap production deploy

I dunno, none of these approaches feels exactly right.

@will-in-wi
Copy link
Contributor

Brainstorming here:

What if in Capistrano proper, a task called deploy:first_time (or something similar) was created which only invokes deploy. Then other plugins could hook into that to have tasks which should only be run the first time a deploy is done.

This would then be documented as such.

Does this sound any better?

@mattbrictson
Copy link
Member

You still need the variable trick, though, which doesn't feel particularly elegant:

before "deploy:first_time", "rails:first_time" do
  set :rails_load_schema, true
end

before "db:migrate", "db:load_schema" do
  if fetch(:rails_load_schema)
    # rake db:schema:load
  end
end

@will-in-wi
Copy link
Contributor

Could you add a task in a task?

before "deploy:first_time", "rails:first_time" do
  before "db:migrate", "db:setup"
end

task "db:setup" do
  # rake db:schema:load
end

I haven't tried this.

@mattbrictson
Copy link
Member

Perhaps slightly better: deploy:first_time sets a special Capistrano variable that is exposed in the DSL as e.g. first_deploy?. Then tasks can do conditional logic based on that method without having to manage a special variable of their own.

This is sort of like the special dry_run? method proposed in capistrano/capistrano#1564.

@mattbrictson
Copy link
Member

Could you add a task in a task?

Yes, I think that would work, too.

@shaunakv1
Copy link

Just checking in, any formal solution/tasks within Capistrano for this? Having to ssh into server to load schama for cold starts on old apps does not seem right.

@mattbrictson
Copy link
Member

Just checking in, any formal solution/tasks within Capistrano for this?

No, not to my knowledge.

@wbreeze
Copy link

wbreeze commented Apr 11, 2018

In the way-back machine, Capistrano had a "deploy:cold".

In Capistrano 3.10.1, bundle exec cap staging deploy:updating gives me enough to shell-in and run the db:structure:load or db:schema:load task manually. Shelling into a (successful or failed) deploy that has tried deploy:migrate isn't quite the same.

I definitely wouldn't like every deploy to consider wiping the database and then skip it based on a semaphor file. That feels way too dangerous to me.

@mattbrictson
Copy link
Member

You could run

cap production deploy COLD=1

And then in your deploy.rb you could customize the deployment process as:

task "deploy:db:load" do
  on primary :db do
    within release_path do
      with rails_env: fetch(:rails_env) do
        execute :rake, "db:schema:load"
      end
    end
  end
end
before "deploy:migrate", "deploy:db:load" if ENV["COLD"]

Maybe?

@wbreeze
Copy link

wbreeze commented Apr 11, 2018

Edit: I definitely wouldn't like every deploy to consider wiping the database and then skip it based on a semaphor file. That feels way too dangerous to me.

@vifreefly
Copy link

I think all what we need here is to call rails db:create at the first deploy before db:migrate. Use db:schema:load is a bit dangerous.

Anyway, this issue should be resolved. It is an official rails tasks for Capistrano. Having the first deploy always to be failed (because database is not created) and going to the server to manually type rails db:create doesn't seems for me like automation.

Or at least add to the README info about the assuming fail at the first deploy, so the Capistrano newbies will not be so scared when it's happen.

@JohnSmall
Copy link

I find that deploy:assets:precompile also fails on the first deploy as it runs before the database has been created.

@mattbrictson
Copy link
Member

Right now I don't want to automate db:create or db:schema:load. Neither is what I personally would want for real-world deployments. Plus from developers I've spoken with there is a pretty even split between those that prefer creating an empty db and migrating it vs those that prefer loading the schema file directly. There doesn't seem to be a standard way to do it.

It's OK that capistrano-rails leaves some things up to the individual developer. For example, we can't provide a deploy:restart task out of the box, because that depends on passenger, initd, systemd, or whatever each project is using to manage the server process. Likewise for the database, maybe I can rely on db:create, or maybe I need to go through AWS to provision an RDS instance. To draw an analogy to Heroku: you can't just git push heroku master the first time without manually configuring the PostgreSQL database add-on first. I don't think that's an unreasonable step.

Or at least add to the README info about the assuming fail at the first deploy, so the Capistrano newbies will not be so scared when it's happen.

Agreed. @vifreefly would you be interested in opening a PR for that?

@vifreefly
Copy link

vifreefly commented Nov 22, 2018

@mattbrictson Thanks, I see your point.

In this case, we can probably add another task deploy:db_create (or something like this) which will execute rake db:create.

This task will not be executed by default, but anyone (if it's ok for them) will be able to simply add the hook before 'deploy:migrate', 'deploy:db_create', and have database created automatically before deploy:migrate if it's not exists.
In case if db exists, this task will be executed successfully anyway, because Rails return message like Database 'app_production' already exists with successful (0) status code.

@vifreefly
Copy link

vifreefly commented Nov 22, 2018

About db:migrate vs db:schema:load to call for a first time on empty db.

Maybe db:schema:load is more true but I personally had some issues with it and custom Postgres extensions (for example https://github.com/Casecommons/pg_search). With db:schema:load some required extensions described in migrations wasn't enabled in the new database.
On the other hand, with db:migrate all was fine. I think db:migrate fine as well, unless if you're have a several hundreds/thousands of migrations to migrate and wait.

@wbreeze
Copy link

wbreeze commented Nov 22, 2018

With db:schema:load some required extensions described in migrations wasn't enabled in the new database.

Don't describe required extensions in migrations?

@vifreefly
Copy link

Don't describe required extensions in migrations?

Ok, what is a right way then?

@wbreeze
Copy link

wbreeze commented Nov 22, 2018

Ouch. Pardon me. It really was a question. It probably sounded a little glib. Excuse me. And also this is a little orthoganal to the thread. Maybe we should take it offline?

Now I've read the doc for the extension you referenced and see that they suggest exactly that, using a migration. Indeed, on a fresh install, those execute "CREATE EXTENSION IF NOT EXISTS pg_trgm;" will not have run from a db:structure:load. That doc doesn't give consideration to booting up a fresh DB without running the migration they suggest. I'm not sure it's great advice, only convenient advice.

The right way? Wow. Would it be great if the structure dump captured those? Assuming they don't (and that might not be possible as it might not be standardized). Would it serve to work it into the database install and provisioning of the DB box? The doc does go on to say that the extension installation methods vary across OS's.

Please excuse me for a. pouncing on that and b. diverting this thread. The rails guide advice quoted in the original issue description: I think it's solid advice.

@mulaiko
Copy link

mulaiko commented Sep 6, 2019

Hi,
So, no solution for this case then? (Read through the thread but maybe I missed something...)

@thbar
Copy link

thbar commented Sep 6, 2019

@mulaiko you have solutions in the thread which you can adapt. Here is my current version, using an explicit RUN_DB_SCHEMA_LOAD flag at first deploy:

# adapted from https://gist.github.com/wacaw/657744af41dcf4963646
namespace :deploy do
  desc "Load the initial schema - it will WIPE your database, use with care"
  task :db_schema_load do
    on roles(:db) do
      puts <<~WARN
        ************************** WARNING ***************************
        If you type [YES], rake db:schema:load will WIPE your database
        any other input will cancel the operation.
        **************************************************************
      WARN

      ask :answer, "Are you sure you want to WIPE your database?: "

      if fetch(:answer) == 'YES'
        within release_path do
          with rails_env: fetch(:rails_env) do
            execute :rake, 'db:schema:load'
          end
        end
      else
        puts "Cancelled."
        exit
      end
    end
  end
end

if ENV["RUN_DB_SCHEMA_LOAD"] == '1'
  before "deploy:migrate", "deploy:db_schema_load"
end

Hope this helps!

sauloperez added a commit to coopdevs/timeoverflow that referenced this issue Nov 20, 2019
This needed when deploying to a server for the first time or when you
want to wipe out a database.

The issue I'm fixing is this. Things have evolved and in
https://github.com/coopdevs/timeoverflow-provisioning we no longer set
things up to enable PostgreSQL extensions such as `pg_trgm`. Because
Capistrano runs `db:migrate` when deploying very old migrations
like `db/migrate/00000000000002_setup_pg_trgm.rb` don't work anymore.

I prefer to make this step explicit in the code and with some context
rather than having to do this manually in the server and let the dev in
charge figure it out every time.

Apparently, from capistrano/rails#118 I see
this is a known issue for which there's no provided solution. I copied
a solution proposed by a capistrano/rails member so should be good.

Just run it as follows:

```
$ bundle exec cap staging deploy COLD=1
```
sauloperez added a commit to coopdevs/timeoverflow that referenced this issue Nov 20, 2019
This needed when deploying to a server for the first time or when you
want to wipe out a database.

The issue I'm fixing is this. Things have evolved and in
https://github.com/coopdevs/timeoverflow-provisioning we no longer set
things up to enable PostgreSQL extensions such as `pg_trgm`. Because
Capistrano runs `db:migrate` when deploying very old migrations
like `db/migrate/00000000000002_setup_pg_trgm.rb` don't work anymore.

I prefer to make this step explicit in the code and with some context
rather than having to do this manually in the server and let the dev in
charge figure it out every time.

Apparently, from capistrano/rails#118 I see
this is a known issue for which there's no provided solution. I copied
a solution proposed by a capistrano/rails member so should be good.

Just run it as follows:

```
$ bundle exec cap staging deploy COLD=1
```
@remcode-remco
Copy link

      with rails_env: fetch(:rails_env) do
        execute :rake, 'db:schema:load'
      end

I had to change the line to this in production:
execute :rake, 'db:schema:load DISABLE_DATABASE_ENVIRONMENT_CHECK=1'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests