Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race conditions #184

Open
fish-face opened this issue Feb 11, 2019 · 17 comments · May be fixed by #190
Open

Race conditions #184

fish-face opened this issue Feb 11, 2019 · 17 comments · May be fixed by #190

Comments

@fish-face
Copy link

This app is quite susceptible to race conditions. If two people try to move the same two objects to the top at the same time, they could both end up with the same order value. This is worse than it sounds because you can't reliably disentangle the objects - all the movement methods assume that the order is consistent and will happily swap one of the entangled objects with another, so that now that one is entangled. Moving objects to the top or bottom of the order has the same problem for the same reason.

There is a strong argument this is the application's responsibility to sort out, rather than the model. However even in that case, the admin arrows are susceptible and need to be fixed.

@fish-face
Copy link
Author

Setting unique_together on order along with anything in order_with_respect_to could help, but it cannot be the whole solution to making this robust to concurrency, because order_with_respect_to can include fields which are in another table (e.g. because they go through foreign keys... maybe inheritance can do this as well, not sure what happens to order_with_respect_to then) which cannot then be enforced by a db uniqueness constraint.

@GeeWee
Copy link
Contributor

GeeWee commented Feb 22, 2019

I'm battling the same issues as well here. One solution is to have a "deferred unique together" constraint, which isn't supported by Django, but can be run in a custom postgres migration like this:

class Migration(migrations.Migration):
    dependencies = [
        ('orders', '0003_auto_20190115_1322'),
    ]

    operations = [
        migrations.RunSQL(
            f"ALTER TABLE orders_truckplacement ADD CONSTRAINT position_date_truck_unique_together UNIQUE (execution_date, position, truck_id) DEFERRABLE INITIALLY DEFERRED",
        reverse_sql = "ALTER TABLE orders_truckplacement DROP CONSTRAINT position_date_truck_unique_together"
    )
    ]

However, I'm still able to provoke deadlocks with ATOMIC_REQUESTS. Currently trying to get a reproducible test running.

@GeeWee
Copy link
Contributor

GeeWee commented Feb 22, 2019

Alright, so here's what I've found out so far.
Simple(ish) test case demonstrating the Deadlock problem:

Test case

# In my project I use multiple schemas, there's some extra code in there for that in the actual test-case, but I think the same principle applies.

class Test(APITransactionTestCase):
    def test_deadlock_simple_case(self):
        today = timezone.now().date().isoformat()
        truck = mommy.make(Truck)

        # Create some orders here. The orders aren't particularly relevant, but what's
        # relevant is that each of them has a "Placement" which is an OrdredModel
        # that's ordered with regards to the fields "execution_date" and "truck" 
        # - the 'position is the order field'
        existing_orders = [
            mommy.make_recipe('orders.call_order', placement__execution_date=today, placement__position=i,
                              placement__truck=truck) for i in range(0, 5)]

        # Here we randomly take some existing_orders and switch them around. This will provide a deadlock.
        # This decorator runs the function is several threads. The implementation is not important but it's here: https://gist.github.com/GeeWee/bc5da4f982af4c221aa1d0b8a1a3944e
        @run_concurrently(5, existing_orders)
        def switch_position_around(order):
            with transaction.atomic():
                position = random.randint(0, 5)
                order.placement.to(position)
            connection.close()

        switch_position_around()

Error message

psycopg2.extensions.TransactionRollbackError: deadlock detected
DETAIL:  Process 474 waits for ShareLock on transaction 290268; blocked by process 471.
Process 471 waits for ShareLock on transaction 290269; blocked by process 472.
Process 472 waits for AccessExclusiveLock on tuple (0,3) of relation 1506953 of database 1506144; blocked by process 474.
HINT:  See server log for query details.
CONTEXT:  while updating tuple (0,3) in relation "orders_truckplacement"

I've logged the SQL and it looks like this (some things omitted)

[555] LOG:  statement: BEGIN
[553] LOG:  statement: UPDATE "orders_truckplacement" SET "position" = ("orders_truckplacement"."position" + 1) WHERE ("orders_truckplacement"."execution_date" = '2019-02-22'::date AND "orders_truckplacement"."truck_id" = 1 AND "orders_truckplacement"."position" >= 0 AND "orders_truckplacement"."position" < 1)

[556] LOG:  statement: BEGIN

[552] LOG:  statement: UPDATE "orders_truckplacement" SET "position" = ("orders_truckplacement"."position" - 1) WHERE ("orders_truckplacement"."execution_date" = '2019-02-22'::date AND "orders_truckplacement"."truck_id" = 1 AND "orders_truckplacement"."position" > 0 AND "orders_truckplacement"."position" <= 5)

[555] LOG:  statement: UPDATE "orders_truckplacement" SET "position" = ("orders_truckplacement"."position" + 1) WHERE ("orders_truckplacement"."execution_date" = '2019-02-22'::date AND "orders_truckplacement"."truck_id" = 1 AND "orders_truckplacement"."position" >= 2 AND "orders_truckplacement"."position" < 3)

[556] LOG:  statement: UPDATE "orders_truckplacement" SET "position" = ("orders_truckplacement"."position" + 1) WHERE ("orders_truckplacement"."execution_date" = '2019-02-22'::date AND "orders_truckplacement"."truck_id" = 1 AND "orders_truckplacement"."position" >= 2 AND "orders_truckplacement"."position" < 4)

[553] LOG:  statement: UPDATE "orders_truckplacement" SET "created" = '2019-02-22T12:18:25.334839+00:00'::timestamptz, "modified" = '2019-02-22T12:18:25.502266+00:00'::timestamptz, "call_order_id" = 2, "automatic_order_id" = NULL, "archived_order_id" = NULL, "execution_date" = '2019-02-22'::date, "truck_id" = 1, "position" = 0 WHERE "orders_truckplacement"."id" = 2

[552] LOG:  statement: UPDATE "orders_truckplacement" SET "created" = '2019-02-22T12:18:25.303821+00:00'::timestamptz, "modified" = '2019-02-22T12:18:25.502939+00:00'::timestamptz, "call_order_id" = 1, "automatic_order_id" = NULL, "archived_order_id" = NULL, "execution_date" = '2019-02-22'::date, "truck_id" = 1, "position" = 5 WHERE "orders_truckplacement"."id" = 1

[553] ERROR:  deadlock detected

[553] DETAIL:  Process 553 waits for ShareLock on transaction 291529; blocked by process 552.
Process 552 waits for ShareLock on transaction 291528; blocked by process 553.
Process 553: UPDATE "orders_truckplacement" SET "created" = '2019-02-22T12:18:25.334839+00:00'::timestamptz, "modified" = '2019-02-22T12:18:25.502266+00:00'::timestamptz, "call_order_id" = 2, "automatic_order_id" = NULL, "archived_order_id" = NULL, "execution_date" = '2019-02-22'::date, "truck_id" = 1, "position" = 0 WHERE "orders_truckplacement"."id" = 2
Process 552: UPDATE "orders_truckplacement" SET "created" = '2019-02-22T12:18:25.303821+00:00'::timestamptz, "modified" = '2019-02-22T12:18:25.502939+00:00'::timestamptz, "call_order_id" = 1, "automatic_order_id" = NULL, "archived_order_id" = NULL, "execution_date" = '2019-02-22'::date, "truck_id" = 1, "position" = 5 WHERE "orders_truckplacement"."id" = 1
[553] HINT:  See server log for query details.
[553] CONTEXT:  while updating tuple (0,2) in relation "orders_truckplacement"
[553] STATEMENT:  UPDATE "orders_truckplacement" SET "created" = '2019-02-22T12:18:25.334839+00:00'::timestamptz, "modified" = '2019-02-22T12:18:25.502266+00:00'::timestamptz, "call_order_id" = 2, "automatic_order_id" = NULL, "archived_order_id" = NULL, "execution_date" = '2019-02-22'::date, "truck_id" = 1, "position" = 0 WHERE "orders_truckplacement"."id" = 2

I've also been able to provoke the following deadlock:

ERROR:  deadlock detected
DETAIL:  Process 600 waits for ShareLock on transaction 291799; blocked by process 602.
Process 602 waits for ShareLock on transaction 291800; blocked by process 600.
Process 600: UPDATE "orders_truckplacement" SET "position" = ("orders_truckplacement"."position" + 1) WHERE ("orders_truckplacement"."execution_date" = '2019-02-22'::date AND "orders_truckplacement"."truck_id" = 1 AND "orders_truckplacement"."position" >= 0 AND "orders_truckplacement"."position" < 2)

It seems the issue is that django-ordered-model doesn't lock the table properly for updates. I think the correct way to do it is to call SELECT FOR UPDATE with a deterministic order. Preferably we'd do it in a subquery like this/this

Now we'd obviously prefer not to dive into raw SQL as we'll lose database portability. Django comes with select_for_update which translates into a SELECT ... FOR UPDATE sql statement.
The kicker here is that it only locks the rows when the Queryset is evaluated, and as we use our queryset to do an update function - it never is. I've confirmed this by looking at the SQL.

Calling select_for_update and forcing an evaluation of the queryset fixes the deadlock issues:

    # New implementation of get_ordering_queryset
    def get_ordering_queryset(self, qs=None):
        qs = qs or self._get_class_for_ordering_queryset().objects.all()
        order_with_respect_to = self.order_with_respect_to
        if order_with_respect_to:
            order_values = self._get_order_with_respect_to()
            qs = qs.filter(**dict(order_values))
        #### BELOW HERE IS NEW
        qs = qs.select_for_update()
        len(qs) # Force evaluation of qs
        return qs

However we still do end up with potential race-conditions as I'm able to get some conflicting position keys, but I'm not quite sure why. I can reproduce it with the same test, but it takes much longer.

@GeeWee
Copy link
Contributor

GeeWee commented Feb 22, 2019

I've managed to reproduce it. I've produced a more detailed output below, but the gist is this:

We try to move two things around at once. Order 1 and 2.
Let's assume we try to move them to the same spot, 2 - and say Order 1 is located at position 0, and order 2 is at position 1
Both requests start at the same time. Order 1 finishes before order 2.
This means that Order 2 is now actually in position 0, even though it thinks it's in position 1 (because it was, when it fetched itself from the database) - this means that when it pushes orders around, e.g.

            qs.filter(**{self.order_field_name + '__lt': getattr(self, self.order_field_name),
                         self.order_field_name + '__gte': order})\

It does so by using the "stale" value, and so it ends up with duplicate values.
I'm not sure how to solve this. I think you need to get the value of the position inside the database query, but I can't seem to make it work.

Detailed SQL log


# We're moving two orders to position 2. 

# Order 1 is in position 0 to start off with
<1> moving 0 to 2

# And order 2 is in position 1
<2> moving 1 to 2

--> Unimportant sql 
<1>, PRE_UPDATE_PLACEMENTS:  ['id: 1, pos: 0', 'id: 2, pos: 1', 'id: 3, pos: 2', 'id: 4, pos: 3', 'id: 5, pos: 4']
<1> SELECT "orders_truckplacement"."id", "orders_truckplacement"."created", "orders_truckplacement"."modified", "orders_truckplacement"."call_order_id", "orders_truckplacement"."automatic_order_id", "orders_truckplacement"."archived_order_id", "orders_truckplacement"."execution_date", "orders_truckplacement"."truck_id", "orders_truckplacement"."position" FROM "orders_truckplacement" WHERE ("orders_truckplacement"."execution_date" = 2019-02-22 AND "orders_truckplacement"."truck_id" = 1) ORDER BY "orders_truckplacement"."position" ASC FOR UPDATE
Moving orders up

<2>, PRE_UPDATE_PLACEMENTS:  ['id: 1, pos: 0', 'id: 2, pos: 1', 'id: 3, pos: 2', 'id: 4, pos: 3', 'id: 5, pos: 4']
<2> SELECT "orders_truckplacement"."id", "orders_truckplacement"."created", "orders_truckplacement"."modified", "orders_truckplacement"."call_order_id", "orders_truckplacement"."automatic_order_id", "orders_truckplacement"."archived_order_id", "orders_truckplacement"."execution_date", "orders_truckplacement"."truck_id", "orders_truckplacement"."position" FROM "orders_truckplacement" WHERE ("orders_truckplacement"."execution_date" = 2019-02-22 AND "orders_truckplacement"."truck_id" = 1) ORDER BY "orders_truckplacement"."position" ASC FOR UPDATE

<1>, PRE_UPDATE_PLACEMENTS:  ['id: 1, pos: 0', 'id: 2, pos: 1', 'id: 3, pos: 2', 'id: 4, pos: 3', 'id: 5, pos: 4']
<1> UPDATE "orders_truckplacement" SET "position" = ("orders_truckplacement"."position" - 1) WHERE ("orders_truckplacement"."execution_date" = 2019-02-22 AND "orders_truckplacement"."truck_id" = 1 AND "orders_truckplacement"."position" > 0 AND "orders_truckplacement"."position" <= 2)
<1>, PRE_UPDATE_PLACEMENTS:  ['id: 2, pos: 0', 'id: 1, pos: 0', 'id: 3, pos: 1', 'id: 4, pos: 3', 'id: 5, pos: 4']
<1> UPDATE "orders_truckplacement" SET "created" = 2019-02-22 14:05:12.429745+00:00, "modified" = 2019-02-22 14:05:12.642459+00:00, "call_order_id" = 1, "automatic_order_id" = NULL, "archived_order_id" = NULL, "execution_date" = 2019-02-22, "truck_id" = 1, "position" = 2 WHERE "orders_truckplacement"."id" = 1
<1> Finalized placement for order 1

---> ORDER 1 FINALIZED HERE. That means Order 2's position is *actually* 0, not 1, but it doesn't know that!

--> ORDER 2 is position 0 here, but thinks it's position 1
<2>, PRE_UPDATE_PLACEMENTS:  ['id: 2, pos: 0', 'id: 3, pos: 1', 'id: 5, pos: 2', 'id: 4, pos: 3', 'id: 1, pos: 4']

--> So it moves the order from position 2 to position 1.
<2> UPDATE "orders_truckplacement" SET "position" = ("orders_truckplacement"."position" - 1) WHERE ("orders_truckplacement"."execution_date" = 2019-02-22 AND "orders_truckplacement"."truck_id" = 1 AND "orders_truckplacement"."position" > 1 AND "orders_truckplacement"."position" <= 2)

--> Now we have two in position 1, and None of them are order 2.
<2>, PRE_UPDATE_PLACEMENTS:  ['id: 2, pos: 0', 'id: 5, pos: 1', 'id: 3, pos: 1', 'id: 4, pos: 3', 'id: 1, pos: 4']
--> Order 2 moves to position 2, and we're left with nothing in position 0, and two orders in position 1
<2> UPDATE "orders_truckplacement" SET "created" = 2019-02-22 14:05:12.482749+00:00, "modified" = 2019-02-22 14:05:12.728449+00:00, "call_order_id" = 2, "automatic_order_id" = NULL, "archived_order_id" = NULL, "execution_date" = 2019-02-22, "truck_id" = 1, "position" = 2 WHERE "orders_truckplacement"."id" = 2
<2> Finalized placement for order 2

@GeeWee
Copy link
Contributor

GeeWee commented Feb 22, 2019

I've seemingly solved the issues of the race conditions in the to function. If we instead of using the attribute on the model get it from the database in a subquery, we never get inconsistent results.

The filtering in the to then looks something like this:

            subquery = self._get_class_for_ordering_queryset().objects.filter(pk=self.pk).values(self.order_field_name)
            qs = qs.filter(**{self.order_field_name + '__lt': Subquery(subquery),
                         self.order_field_name + '__gte': order})

If you'd take these changes, I'd be willing to put in a PR. I'm not a database expert, and can only test on Postgres, but I think these solutions are correct.

@GeeWee
Copy link
Contributor

GeeWee commented Mar 12, 2019

I've added a PR to fix most of these issues in #190

@mkgs
Copy link

mkgs commented Jan 3, 2020

+1
Are there any arguments against these changes? The project should guarantee ordering consistency.

@haplo
Copy link

haplo commented Feb 4, 2021

Is there an issue with the given PR? I find it discouraging that the maintainers haven't merged it nor commented on this issue after almost 2 years.

@imomaliev
Copy link
Collaborator

@haplo Hi, there is no issue other than no time to work on this. If you are up to it you could pick up that PR, rebase it and I will review it.

@haplo
Copy link

haplo commented Feb 4, 2021

@imomaliev Fair enough! I will fork the project, rebase #190 and test the integration with my project. Hopefully we will be able to get this merged and released.

@haplo
Copy link

haplo commented Feb 4, 2021

I have created GeeWee#1, if @GeeWee can review and accept it. If he is not available I can create a new PR directly in this repository, but I wanted to acknowledge his original contribution.

@GeeWee
Copy link
Contributor

GeeWee commented Feb 5, 2021

I've accepted the merge, thanks!

@haplo
Copy link

haplo commented Feb 5, 2021

Can someone with a live deployment of django-ordered-model try out the branch? I'm not going to be able to test it.

@jasperoosthoek
Copy link

I just ran into this issue in a real life situation. Two instances ended up having the same order . I would really appreciate if this could be solved. The only option I have right now is to manually change the order.

@haplo
Copy link

haplo commented Jan 3, 2022

I just ran into this issue in a real life situation. Two instances ended up having the same order . I would really appreciate if this could be solved. The only option I have right now is to manually change the order.

@jasperoosthoek Could you try with @GeeWee's solution?

@sergei-maertens
Copy link

sergei-maertens commented Jun 24, 2022

Can someone with a live deployment of django-ordered-model try out the branch? I'm not going to be able to test it.

I've just set up a test case for our situation with threading to simulate concurrent requests made by our UI. I hope I'll be able to test out the branch with that test, as I was able to pin point the cause to the to method, which holds an instance that may now have stale data because of other modifications made in another thread. I was thinking that the order field should be referenced via an F object or indeed, a subquery can work too.

For now I'm looking at select_for_update to lock the rows involved, which allows atomic transactions without deadlocks it seems.

edit: turns out that the approach is basically the same - locking the involved rows and refreshing the instance to produce the correct UPDATE queries.

sergei-maertens added a commit to open-formulieren/open-forms that referenced this issue Jun 24, 2022
This is extremely hard to actually reproduce... I'm sorry.

Basically the idea is that multiple requests can arrive at the same
time (and they will do because of how the frontend is coded!) which kicks
off a number of threads that all start with the same view on the data.

django-ordered-model then performs some UPDATE queries based on that
view of the data and finally a save call to the object itself to
set the order. That last action often corrects a lot of things,
but the UPDATE query causes mayhem.

See also django-ordered-model/django-ordered-model#184.
sergei-maertens added a commit to open-formulieren/open-forms that referenced this issue Jun 24, 2022
This is extremely hard to actually reproduce... I'm sorry.

Basically the idea is that multiple requests can arrive at the same
time (and they will do because of how the frontend is coded!) which kicks
off a number of threads that all start with the same view on the data.

django-ordered-model then performs some UPDATE queries based on that
view of the data and finally a save call to the object itself to
set the order. That last action often corrects a lot of things,
but the UPDATE query causes mayhem.

See also django-ordered-model/django-ordered-model#184.
sergei-maertens added a commit to open-formulieren/open-forms that referenced this issue Jun 29, 2022
This is extremely hard to actually reproduce... I'm sorry.

Basically the idea is that multiple requests can arrive at the same
time (and they will do because of how the frontend is coded!) which kicks
off a number of threads that all start with the same view on the data.

django-ordered-model then performs some UPDATE queries based on that
view of the data and finally a save call to the object itself to
set the order. That last action often corrects a lot of things,
but the UPDATE query causes mayhem.

See also django-ordered-model/django-ordered-model#184.
sergei-maertens added a commit to open-formulieren/open-forms that referenced this issue Jun 30, 2022
This is extremely hard to actually reproduce... I'm sorry.

Basically the idea is that multiple requests can arrive at the same
time (and they will do because of how the frontend is coded!) which kicks
off a number of threads that all start with the same view on the data.

django-ordered-model then performs some UPDATE queries based on that
view of the data and finally a save call to the object itself to
set the order. That last action often corrects a lot of things,
but the UPDATE query causes mayhem.

See also django-ordered-model/django-ordered-model#184.
sergei-maertens added a commit to open-formulieren/open-forms that referenced this issue Jul 1, 2022
This is extremely hard to actually reproduce... I'm sorry.

Basically the idea is that multiple requests can arrive at the same
time (and they will do because of how the frontend is coded!) which kicks
off a number of threads that all start with the same view on the data.

django-ordered-model then performs some UPDATE queries based on that
view of the data and finally a save call to the object itself to
set the order. That last action often corrects a lot of things,
but the UPDATE query causes mayhem.

See also django-ordered-model/django-ordered-model#184.
@samul-1
Copy link

samul-1 commented Nov 30, 2022

Hello, what's the status on this? I'm looking to adopt this package for my application but I'm a little concerned about the possible race condition issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants