Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lock_timeout + lock_timeout_retries + concurrent postgres indexes #233

Open
grk opened this issue Jul 24, 2023 · 2 comments
Open

lock_timeout + lock_timeout_retries + concurrent postgres indexes #233

grk opened this issue Jul 24, 2023 · 2 comments

Comments

@grk
Copy link

grk commented Jul 24, 2023

I'm not even sure if this is a bug, but I ran into the following situation this morning:

  1. I have strong_migrations configured with lock_timeout = 10.seconds and lock_timeout_retries = 3.
  2. I have a migration that concurrently adds index to a busy table:
    disable_ddl_transaction!
    
    def change
      add_index :some_table, %i(col1 col2), name: "index_some_table_on_col1_and_col2", algorithm: :concurrently
    end
  3. During the deploy, the migration fails to acquire a lock:
    Migrating to SomeMigrationThatAddsIndex (20230721150045)
    == 20230721150045 SomeMigrationThatAddsIndex: migrating =============
    -- add_index(:some_table, [:col1, :col2], {:name=>"index_some_table_on_col1_and_col2", :algorithm=>:concurrently})
    -- Lock timeout. Retrying in 10 seconds...
    -- add_index(:some_table, [:col1, :col2], {:name=>"index_some_table_on_col1_and_col2", :algorithm=>:concurrently})
    rails aborted!
    StandardError: An error has occurred, all later migrations canceled:
    PG::DuplicateTable: ERROR:  relation "index_some_table_on_col1_and_col2" already exists
    
  4. According to postgres docs, this is to be expected - any errors during concurrent index creation leave behind an invalid index (which makes sense since this was executed with disable_ddl_transaction!), and that index can be deleted or reindexed concurrently.
  5. I worked around this by reindexing by connecting directly with psql and altering the migration by adding if_not_exists: true.

I wonder if this is something that could be solved in scope of this gem. Maybe a safety warning that concurrent index creation + lock_timeout can lead to this scenario?


For reference, used versions:

  • PostgreSQL 14.7 (Ubuntu 14.7-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
  • Ruby 3.2.2
  • Rails 7.0.6
  • strong_migrations 1.6.0
@jjb
Copy link

jjb commented Aug 1, 2023

I am experiencing the same thing and have maybe a different confusion about it.

I experience this from time to time. If I'm unable to get the lock, why would an invalid index be left behind? wouldn't the attempt to make the index need to wait for the lock?

== 20230517202000 AddFoo: migrating         
-- add_index(:foo, :bar_id, {:algorithm=>:concurrently})        
-- Lock timeout. Retrying in 60 seconds...        
-- add_index(:foo, :bar_id, {:algorithm=>:concurrently})        
rake aborted!        
StandardError: An error has occurred, all later migrations canceled:        
        
PG::DuplicateTable: ERROR:  relation "foo" already exists        

i think the behavior might make sense if it were a query timeout. is it possible there's a bug that erroneously reports query timeouts as lock timeouts, maybe only without ddl and/or with concurrent indexing, something like that?

@quentindemetz
Copy link

FWIW I've patched add_index to remove invalid indexes if the command fails 💡

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants