You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're monitoring our queues for "queued" jobs, which aren't part of any queue (anymore), and in our case, we actually only face one type of issue: if workers get shutdown/restarted under load, there is a high probability, that there are orphaned jobs afterwards.
Our workers usually process multiple queues, so I know, that they use the (unreliable) blpop-command (#1716).
I don't know exactly, how the shutdown process works, but when the kill signal arrives, how is the blocking blpop getting stopped? Is it possible, that the connection is aborted in the middle of popping a job-id off the queue?
I've found an older redis-issue, which states that there is no option to reliably closing a blpop-command. I don't know, if this is still the case(?) edit: With redis >= 5.0 there is a "CLIENT UNBLOCK" command, which may help here.
If this is really an issue with redis, I can only think of one possible solution: don't allow to interrupt the redis-connection, but instead set the blocking-timeout quite low, so that the shutdown-state of the worker can be checked after every timeout. So if a worker gets the shutdown-signal, it actually waits for the blpop to timeout. Of course, this increases the CPU-usage of the worker.
I know, there are other similar issues (#758, #1716), but this issue seems to be another facet of it.
The text was updated successfully, but these errors were encountered:
We're monitoring our queues for "queued" jobs, which aren't part of any queue (anymore), and in our case, we actually only face one type of issue: if workers get shutdown/restarted under load, there is a high probability, that there are orphaned jobs afterwards.
Our workers usually process multiple queues, so I know, that they use the (unreliable) blpop-command (#1716).
I don't know exactly, how the shutdown process works, but when the kill signal arrives, how is the blocking blpop getting stopped? Is it possible, that the connection is aborted in the middle of popping a job-id off the queue?
I've found an older redis-issue, which states that there is no option to reliably closing a blpop-command. I don't know, if this is still the case(?)
edit: With redis >= 5.0 there is a "CLIENT UNBLOCK" command, which may help here.
If this is really an issue with redis, I can only think of one possible solution: don't allow to interrupt the redis-connection, but instead set the blocking-timeout quite low, so that the shutdown-state of the worker can be checked after every timeout. So if a worker gets the shutdown-signal, it actually waits for the blpop to timeout. Of course, this increases the CPU-usage of the worker.
I know, there are other similar issues (#758, #1716), but this issue seems to be another facet of it.
The text was updated successfully, but these errors were encountered: