lock_timeout + lock_timeout_retries + concurrent postgres indexes #233

grk · 2023-07-24T11:49:08Z

I'm not even sure if this is a bug, but I ran into the following situation this morning:

I have strong_migrations configured with lock_timeout = 10.seconds and lock_timeout_retries = 3.

I have a migration that concurrently adds index to a busy table:

disable_ddl_transaction!

def change
  add_index :some_table, %i(col1 col2), name: "index_some_table_on_col1_and_col2", algorithm: :concurrently
end

During the deploy, the migration fails to acquire a lock:

Migrating to SomeMigrationThatAddsIndex (20230721150045)
== 20230721150045 SomeMigrationThatAddsIndex: migrating =============
-- add_index(:some_table, [:col1, :col2], {:name=>"index_some_table_on_col1_and_col2", :algorithm=>:concurrently})
-- Lock timeout. Retrying in 10 seconds...
-- add_index(:some_table, [:col1, :col2], {:name=>"index_some_table_on_col1_and_col2", :algorithm=>:concurrently})
rails aborted!
StandardError: An error has occurred, all later migrations canceled:
PG::DuplicateTable: ERROR:  relation "index_some_table_on_col1_and_col2" already exists

According to postgres docs, this is to be expected - any errors during concurrent index creation leave behind an invalid index (which makes sense since this was executed with disable_ddl_transaction!), and that index can be deleted or reindexed concurrently.
I worked around this by reindexing by connecting directly with psql and altering the migration by adding if_not_exists: true.

I wonder if this is something that could be solved in scope of this gem. Maybe a safety warning that concurrent index creation + lock_timeout can lead to this scenario?

For reference, used versions:

PostgreSQL 14.7 (Ubuntu 14.7-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
Ruby 3.2.2
Rails 7.0.6
strong_migrations 1.6.0

The text was updated successfully, but these errors were encountered:

jjb · 2023-08-01T14:18:54Z

I am experiencing the same thing and have maybe a different confusion about it.

I experience this from time to time. If I'm unable to get the lock, why would an invalid index be left behind? wouldn't the attempt to make the index need to wait for the lock?

== 20230517202000 AddFoo: migrating         
-- add_index(:foo, :bar_id, {:algorithm=>:concurrently})        
-- Lock timeout. Retrying in 60 seconds...        
-- add_index(:foo, :bar_id, {:algorithm=>:concurrently})        
rake aborted!        
StandardError: An error has occurred, all later migrations canceled:        
        
PG::DuplicateTable: ERROR:  relation "foo" already exists

i think the behavior might make sense if it were a query timeout. is it possible there's a bug that erroneously reports query timeouts as lock timeouts, maybe only without ddl and/or with concurrent indexing, something like that?

quentindemetz · 2023-08-28T08:34:35Z

FWIW I've patched add_index to remove invalid indexes if the command fails 💡

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lock_timeout + lock_timeout_retries + concurrent postgres indexes #233

lock_timeout + lock_timeout_retries + concurrent postgres indexes #233

grk commented Jul 24, 2023

jjb commented Aug 1, 2023 •

edited

quentindemetz commented Aug 28, 2023

lock_timeout + lock_timeout_retries + concurrent postgres indexes #233

lock_timeout + lock_timeout_retries + concurrent postgres indexes #233

Comments

grk commented Jul 24, 2023

jjb commented Aug 1, 2023 • edited

quentindemetz commented Aug 28, 2023

jjb commented Aug 1, 2023 •

edited