Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry delete when deadlocked #91

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

aaronjensen
Copy link

Fixes #63

Unlike #90, this does not retry reservations because that is handled by delayed_job: https://github.com/collectiveidea/delayed_job/blob/master/lib/delayed/worker.rb#L281

@joshuapinter
Copy link

Have been using this in production for the last few hours. Seems to be working great.

@betamatt
Copy link
Collaborator

This fix looks specific to one database. MySQL perhaps?

@aaronjensen
Copy link
Author

yes, it is, it should be safe on other databases, and other databases do not have the problem (to my knowledge). If we want to retry on deadlocks w/ other databases we can look for these: https://github.com/mperham/deadlock_retry/blob/master/lib/deadlock_retry.rb#L16-L20

@joshgoebel
Copy link

Are 100 retires really necessary? What about a small sleep? Are there any plans to merge this?

@aaronjensen
Copy link
Author

Are 100 retires really necessary? What about a small sleep? Are there any plans to merge this?

No, probably not. I don't remember why I picked 100 but it shouldn't matter as if it's a deadlock it should go through eventually and it should have every opportunity w/o being infinite. A small sleep would be ok.

@joshgoebel
Copy link

@danielmorrison Any chance of getting this or something similar merged?

@klausmeyer
Copy link
Contributor

+1 We also had problems with this in the past and had to downgrade to v0.3.3 which is working fine.

@snackycracky
Copy link

did you all see this approach ? ndbroadbent@d311b55 as discussed in #63

@klausmeyer
Copy link
Contributor

Since we got #89 merged and using a monkey-patch to use plain sql instead of the "optimized" mysql-query version everything works just fine for us. No more deadlocks.

@snackycracky
Copy link

so then close this one here, no?

@klausmeyer
Copy link
Contributor

👍 It's a hack that maybe solves the symptom but not the actual problem.
It may be ok to use in certain circumstances but IMO it shouldn't be shipped with the gem. Better approach would be to provide a clean option to use the code-path created in #89 w/o monkey-patching.

@aaronjensen
Copy link
Author

I'm not sure that I would agree that this is a hack. If the method of reservation is faster but occasionally deadlocks (and the deadlocks could not be prevented because locking order cannot be specified for the particular locks that are being acquired) then retrying is the right thing to do. If there were a solution that were faster and did not have the deadlock issue then that would be better, of course, but if not then this solution is best. Retrying on deadlock in general is not a hack.

@snackycracky
Copy link

well it's not a hack but its no solution as @klausmeyer said. The pr feels like a "try-catch" because you don't trust the code. This responsibility should be implemented by the host-application.

@klausmeyer
Copy link
Contributor

Or at least there should be a switch to turn it on or off.

@snackycracky
Copy link

The failed job will automatically be retried as usual with Dj.
So if you think this to the end, then you also want such a switch for every special exception.
that's just not good.

And I don't like prs which just cure the symptoms like you said.

@klausmeyer
Copy link
Contributor

That's true. I just wanted to say that in case this would be included in the official gem I'd like to have a way to turn it off but I still would prefer to not have it included at all.

@aaronjensen
Copy link
Author

well it's not a hack but its no solution as @klausmeyer said. The pr feels like a "try-catch" because you don't trust the code. This responsibility should be implemented by the host-application.

Sorry, but I still disagree. Either the optimization for selecting records needs to be changed completely to something that cannot possibly deadlock or we need to just accept the fact that this particular query can and will deadlock, and that is OK. You cannot always prevent deadlocks and the thing to do when you cannot is to retry. This is how it works. This should not be monkey patched in on the host application. The only solutions I know of are:

  1. Leave the job locking code as is and retry on deadlock.
  2. Replace the job locking code with something that cannot possibly deadlock.

There are no other solutions. If you have any others I'm open to them. Making the code easier to monkey patch is not a solution. The job locking algorithm is either susceptible to deadlocks or it isn't. This one is.

The failed job will automatically be retried as usual with Dj.

I'm not sure if you're referring to with this, but if the job was retried at this point that would be a bad thing since the job already happened. This retry is on the delete. I don't actually remember the effect of this delete failing (if the job retries, the row just gets stuck, or what).

That's true. I just wanted to say that in case this would be included in the official gem I'd like to have a way to turn it off but I still would prefer to not have it included at all.

Would you mind saying what you're concerned about here? This retry logic is specifically around deleting the delayed job record and only responds to deadlocks (deadlocks which are known to happen and to my knowledge unpreventable given the order in which mysql acquires locks and the way that this gem is selecting the next job.)

@snackycracky
Copy link

I agree that deleting the job after successful execution is a concern of this gem.

@sferik sferik force-pushed the master branch 2 times, most recently from 1240e8d to 639c9e5 Compare December 22, 2014 17:16
@defeated
Copy link

defeated commented May 2, 2015

+1 for this fix, we've applied it as a monkeypatch and it seems to have smoothed out our issue w/ deadlocks-on-delete crashing workers. (Our environment is multiple workers on multiple machines, with a large queue of fast-running jobs.)

@MaciekLesiczka
Copy link

MaciekLesiczka commented Jun 24, 2020

+1 for the PR, 100% valid strategy for deadlocks, not a hack. For heavy throughput with many workers, this is a big deal. From my experience, this problem is one of the reasons people move out from DJs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mysql2::Error: Deadlock when attempting to lock a job
8 participants