PG::TRDeadlockDetected #235

seanarnold · 2016-10-20T01:27:58Z

We seem to intermittently get Deadlock errors with acts_as_list and Postgres.

PG::TRDeadlockDetected: ERROR: deadlock detected DETAIL: Process 18092 waits for ShareLock on transaction 53198930; blocked by process 30411. Process 30411 waits for ShareLock on transaction 53198931; blocked by process 18092. HINT: See server log for query details. CONTEXT: while updating tuple (15,28) in relation "payment_methods" : UPDATE "payment_methods" SET "updated_at" = '2016-10-20 00:52:19.754301', "sort_order" = ("payment_methods"."sort_order" - 1) WHERE (user_id = 'user_id' AND deleted_at IS NULL) AND ("payment_methods"."sort_order" > 1) AND ("payment_methods"."sort_order" <= 3 AND "payment_methods".id != 'pm_id')

Is this a known issue? I'm using acts_as_list (0.7.7).

The text was updated successfully, but these errors were encountered:

brendon · 2016-10-20T02:22:06Z

Not that we're aware of. Can you try the latest version and see if that helps? Is it to do with with conflicting ORDER BY on this query vs another also updating records? I had a similar deadlock problem and it turned out to be due to ordering being overridden resulting in updates occurring from opposite ends of the table and clashing (that's a very technical explanation!).

We do lock a record using with_lock. I can't remember which version that came out on, so you could try the version before that to see if that is the problem. We're looking at removing or making this lock better.

ddengler · 2016-11-22T10:10:19Z

I can confirm this. Seems to happen as a combination of validate uniqueness and multiple requests hitting the servers at the same time. Currently we delay our requests (chain them to be serial requests) to prevent the deadlock issue.

brendon · 2016-11-22T20:18:22Z

Thanks @ddengler, are you using the latest version?

We really need better locking I think. If you have any experience with that side of things we'd appreciate the help. I think there's an open issue about this.

@seanarnold, have you tried the latest version?

ddengler · 2016-11-22T20:24:56Z

@brendon yes, we are using the latest version. I will ask someone on my team to have a look at it again. Might take a while, as we currently have the issue resolved and some other priorities to work on.

ddengler · 2016-11-22T20:28:32Z

@brendon can you point me to the other open issue? I could not find it right away – only an old PR #135

brendon · 2016-11-22T20:57:32Z

@ddengler, sorry I've had a look myself and can't find anything but that either. The discussion was around needing to lock all the records within a scope rather than just the record being updated (as we do now). This means that the shuffling that occurs when an item is moved will be atomic (is that the right term?). There is also an issue where the scope is empty but two threads attempt to create the first record in that scope simultaneously. We need a way to lock that empty scope also, and I'm not sure if that's possible with SQL at all?

I hope that helps :D No worries about the priorities. Gotta work on the important stuff first :D

swanandp · 2017-02-16T03:42:32Z

Some of this can be solved by using a lock on the parent record. Like in #254

brendon · 2017-08-10T07:16:39Z

@seanarnold I recently removed the only lock that we do in acts_as_list. I'd be interested to see if this fixes your deadlock issue.

brendon · 2017-10-03T19:20:31Z

@seanarnold, we made a change the other day (released as the latest version of the gem) that removed another point of contention. Can you try the latest gem to see if your deadlocks subside?

brendon · 2018-03-19T07:32:11Z

@seanarnold, is this still a problem for you?

ddengler · 2018-03-19T09:53:17Z

@brendon sorry for not getting back to you on this. Sadly I cannot remember what we did back then to "fix" the issue for us. I just checked our error reporting again to be sure. Issue did not pop up for us anymore. We are currently using the latest version.

seanarnold · 2018-03-19T19:58:13Z

We still see the issue in 0.9.9. Have changes to the locking behaviour been done since then?

brendon · 2018-03-19T21:16:13Z

Not since then. We still only lock in the insert_at_position() function. If you have any ideas for how to solve this I'd be more than grateful :D

brendon · 2018-03-19T21:16:54Z

It's probably more the implicit locks that come from updating large portions of the table at once (when shuffling positions etc...).

vi-huynh · 2018-04-13T10:58:31Z

I get deadlock PG::TRDeadlockDetected is same @seanarnold.
I'm using newest version 0.9.11 and update position with 212 rows.

Here is backtrace

- acts_as_list (0.9.11) lib/acts_as_list/active_record/acts/position_column_method_definer.rb:34:in `block (2 levels) in define_class_methods'
 - acts_as_list (0.9.11) lib/acts_as_list/active_record/acts/position_column_method_definer.rb:26:in `block (2 levels) in define_class_methods'
 - activerecord (4.2.7.1) lib/active_record/relation/delegation.rb:70:in `block in decrement_all'
 - activerecord (4.2.7.1) lib/active_record/relation.rb:302:in `scoping'
 - activerecord (4.2.7.1) lib/active_record/relation/delegation.rb:70:in `decrement_all'
 - acts_as_list (0.9.11) lib/acts_as_list/active_record/acts/list.rb:367:in `shuffle_positions_on_intermediate_items'
 - acts_as_list (0.9.11) lib/acts_as_list/active_record/acts/list.rb:418:in `update_positions'

vi-huynh · 2018-04-13T11:07:19Z

@ddengler I just reproduce deadlock on development environment.
I config puma worker is 3 and try send multiple request to server.
And then get deadlock error.

brendon · 2018-04-13T21:56:32Z

@vi-huynh, if you could do some more digging into this to figure out why, that would be helpful :) Keen to get this fixed :)

ddengler · 2018-04-16T10:09:15Z

@vi-huynh makes sense to me as we fixed this by not doing our operations in parallel anymore. We are also using puma but now with workers set to 0 (= 1) and multiple threads within a docker container ( multiple containers within a cluster).

brendon · 2018-04-23T22:12:18Z

Thanks @ddengler for offering some more info :) I'm at a loss as to the cause but it's probably to do with table scan's locking rows even if they're not the rows been updated. Could indexing help alleviate the problem?

swanandp · 2018-04-24T22:20:56Z

Some of this can be solved by using a lock on the parent record. Like in #254

Are you getting a deadlock with or without this?

brendon · 2020-08-16T20:44:53Z

Why are the requests not all ordered by the position column (either all 'asc' or 'desc')? It's not very surprising that there are deadlocks happening: if you want to access several resources, the only scenario where deadlock is guaranteed to not happen is when concurrent requests for the same resources are locked in the same order.

@patleb, I'm happy to look at any solutions you might have to this problem :)

patleb · 2020-08-17T02:46:09Z

@brendon, I realized that it wasn't an ordering problem on the position column, but the id column. Simply put, every request would have to be ordered by id (either all 'asc' or 'desc') when fetching records and then apply the transformation you want on the subquery.

It would solve the problem at the source, but you would have to pretty much rewrite the whole gem to make this happen... which might not be worth the effort.

brendon · 2020-08-17T04:31:16Z

Yea it's been an annoying problem for a long time. I don't have the time to rewrite anything, but am always happy to review PR's :D

brendon · 2024-06-04T02:08:59Z

Closing for now. Check out https://github.com/brendon/positioning which uses an advisory lock to (hopefully) prevent deadlocks.

patleb · 2024-06-04T03:01:24Z

It'll work, but you're essentially preventing parallelization.

brendon · 2024-06-04T04:12:22Z

As mentioned, I'm open to other ideas. The problem is, it needs to be something encapsulated so that it's transparent to the user of the gem. I also don't see how you could expect parallelised updating to work at all with a single list of positions. The list operations have to be done in sequence unless I'm missing something.

This has been a long running issue but so far not much has surfaced in the way of any solutions.

patleb · 2024-06-04T16:54:48Z

You need to make some compromises somewhere, but my previous explanation of why the problem occurs still hold. If you keep the group of rows ordered by id when you want to change the position, then you won't have a deadlock. It breaks when you have, for example, a process updating ids (1,2) and another process ids (2,1) where the former locks id 1 and the latter id 2 both at the same time. Then, they can't release their lock until ids 1 or 2 is releases which will never happen unless you detect the deadlock.

I did implement my own list management after our last encounter here. The biggest compromises I made are:

when the position column (which is a Numeric instead of an Integer) has too many decimals, then we lock the table and rescale the whole column to restart at the minimal number of decimals;
only one row at a time can be updated and, at most, 3 rows will be locked, always ordered by their id;
the position column is global to the table, so the default scope is discarded and it's preferable to update only the position attribute without other attributes (since they could prevent the update to succeed);

I know that the first compromise could be mitigated/solved by the Stern–Brocot tree implementation, but I didn't bother since the applications I work on have a very low churn of repositioning.

brendon · 2024-06-04T19:47:15Z

Thanks @patleb, that's great insight. I totally understand that deadlock issue. Unfortunately the solution of ordering on UPDATE only works with MySQL. Postgresl and Sqlite don't allow an explicit update order. In my gem I work around this (for the uniqueness constraint) by inverting the affected records positions to negative values and then re-inverting them with the adjustment up or down. Not so great, but it works and is predictable. This definitely requires the advisory lock. Ideally the lock would just be around the list management stuff but I end up having to wrap the entire transaction for a reason I can't remember.

Your list management looks more like ranked-model now in that it's sparse rather than sequential integers. I'm glad that you found a solution that works :)

brendon · 2024-06-04T21:32:16Z

Actually, I see that you're using a self referential join to overcome the update order problems. I might look into that for my gem. Does that actually force the records in the table to update in order? I would have thought the order of the self-join wouldn't have any effect?

In terms of order corruption, I noticed it being most prevalent (at least in the case of sequential integers) when processes were executing in parallel and were relying on stale information about the end of the list etc... An advisory lock was the only thing I tried that seemed to work.

patleb · 2024-06-05T00:18:39Z

If you're referring to this, then it's the only way I found for an ordered update. Although, in this case, there's a table lock and it's a window over the whole table where each id has a new position as an integer starting from 0. I think that you could try something similar with a limited window over your chosen ids.

In my case, this method is called periodically in a cron job to check if there's a need rewrite the positions. Since I wrote that, not once this method has been called to actually rewrite the positions. We mostly append records where this functionality is used and where there's a more frequent usage, there's only a handful of people (as admin) moving the records.

The actual code running in the request is built around this method where we assign a new position as the midpoint between the adjacent rows of choice.

patleb · 2024-06-05T00:41:48Z

To actually answer your first question: I don't know if postgres guarantees the update in the specified order, but my guess is that it would, since it's based on an on-the-fly list instead of the underlying table, so there's no previous data structure that postgres would reuse.

For the order corruption part, it makes sense that if there's no guarantee of reserved ordering between two requests acting on intersecting lists, then the assumptions about the limits wouldn't be stable. To prevent this from happening, I made the position global to the table and the position is always the result of the midpoint between two rows, so it's not absolute. For example, position 10.125 means something only with other positions (ex.: [10, 10.125, 10.5] --> [1, 2, 3]).

brendon · 2024-06-05T00:59:37Z

Thanks @patleb, that all makes sense. Unfortunately people usually use scopes (in my experience).

I tried to mitigate things further by encouraging people to position items relative to other items rather than absolutely (though both work).

patleb · 2024-06-05T01:34:43Z

You still can use scopes, it's just that those aren't relevant when updating the position since it's a relative position to adjacent rows globally, not the ones from the scope. Even if your scope filters some rows, you would still get the correct ordering.

patleb · 2024-06-05T01:43:15Z

Just to be clear, I'm talking about my implementation, there's no overlap that I'm aware of about the strategies and assumptions you used in this gem and ranked_model... so, you can't really apply what I did, here.

patleb · 2024-06-05T01:51:40Z

Also, I'm not suggesting to add this to your gems, but I shared my work simply because you said that you don't know how it would be possible to parallelize updating a list. This is how it's done. It could be better as I exposed in the compromises I made, but it allows you to have row-level locks on the absolute minimum number of rows possible while updating the position and preventing deadlocks.

brendon · 2024-06-05T02:03:24Z

You still can use scopes, it's just that those aren't relevant when updating the position since it's a relative position to adjacent rows globally, not the ones from the scope. Even if your scope filters some rows, you would still get the correct ordering.

That's an interesting approach. So do items within different scopes end up clumped together within the global ordering or are they generally dispersed as they're added and reordered?

brendon · 2024-06-05T02:04:24Z

Also, I'm not suggesting to add this to your gems, but I shared my work simply because you said that you don't know how it would be possible to parallelize updating a list. This is how it's done. It could be better as I exposed in the compromises I made, but it allows you to have row-level locks on the absolute minimum number of rows possible while updating the position and preventing deadlocks.

Thanks @patleb, that all makes sense. Thanks for sharing. Always happy to be learning more! :D

patleb · 2024-06-05T03:09:28Z

It's a bit complicated to explain it all, there's a lot of cases to take care of: at top level, there's 6 possible branches and in each of them there's more conditional branches... so, I'll try to keep it simple. To answer your question, let's take one possible scenario:

the table has already some records and we want to insert a row (id = 5) after another row with (id = 10, pos = 10.0) which has other rows after with higher positions;
we fetch the previous row (id = 10) and its immediate next (id = 22, pos = 10.5);
then, within a transaction, we sort by id the 3 rows, lock them and update the row (id = 5) with the new position at 10.25;
I forgot to say that you need a unique index on the position column as well;

It may not be obvious, but whatever happens to the whole list doesn't matter, what matters is the previous row, the immediate next one and the one we want to update. Even if there are multiple processes trying to insert a row within the same adjacent rows, the unique constraint will raise and you would have to retry with a new adjacent row which would be the one the other process might has inserted.

So do items within different scopes end up clumped together within the global ordering or are they generally dispersed as they're added and reordered?

Scope are merely filters in this situation, if you have two scopes which fetch for one [{id: 1, pos: 1.0}, {id: 3, pos: 1.5}, {id: 2, pos: 2.0}] and the other [{id: 1, pos: 1.0}, {id: 2, pos: 2.0}], only the ordering matters, you would have to let go of the absolute positioning idea: it's the position relative to one another that defines the order.

patleb · 2024-06-05T03:37:52Z

The only time where there might be reordering globally is if the position number has too many decimals (ex.: 1.125135125125125125125...5 up to 64 decimals). Otherwise, either the new position is the midpoint or the lowest position minus 1.0 or the highest position plus 1.0. The dispersion is handled by default with the NUMERIC type in postgres.

nmoadev mentioned this issue Feb 8, 2019

MySQL: "Deadlock found..." when creating a list item at position #337

Closed

brendon closed this as completed Jun 4, 2024

PG::TRDeadlockDetected #235

PG::TRDeadlockDetected #235

Comments

seanarnold commented Oct 20, 2016

brendon commented Oct 20, 2016 • edited

ddengler commented Nov 22, 2016

brendon commented Nov 22, 2016

ddengler commented Nov 22, 2016

ddengler commented Nov 22, 2016

brendon commented Nov 22, 2016

swanandp commented Feb 16, 2017

brendon commented Aug 10, 2017

brendon commented Oct 3, 2017

brendon commented Mar 19, 2018

ddengler commented Mar 19, 2018

seanarnold commented Mar 19, 2018

brendon commented Mar 19, 2018

brendon commented Mar 19, 2018

vi-huynh commented Apr 13, 2018 • edited

vi-huynh commented Apr 13, 2018

brendon commented Apr 13, 2018

ddengler commented Apr 16, 2018

brendon commented Apr 23, 2018

swanandp commented Apr 24, 2018

brendon commented Aug 16, 2020

patleb commented Aug 17, 2020 • edited

brendon commented Aug 17, 2020

brendon commented Jun 4, 2024

patleb commented Jun 4, 2024

brendon commented Jun 4, 2024

patleb commented Jun 4, 2024

brendon commented Jun 4, 2024

brendon commented Jun 4, 2024

patleb commented Jun 5, 2024

patleb commented Jun 5, 2024

brendon commented Jun 5, 2024

patleb commented Jun 5, 2024

patleb commented Jun 5, 2024

patleb commented Jun 5, 2024 • edited

brendon commented Jun 5, 2024

brendon commented Jun 5, 2024

patleb commented Jun 5, 2024 • edited

patleb commented Jun 5, 2024

brendon commented Oct 20, 2016 •

edited

vi-huynh commented Apr 13, 2018 •

edited

patleb commented Aug 17, 2020 •

edited

patleb commented Jun 5, 2024 •

edited

patleb commented Jun 5, 2024 •

edited