Orphaned "queued" job after shutdown/restart of worker #2079

th3hamm0r · 2024-04-29T06:20:55Z

We're monitoring our queues for "queued" jobs, which aren't part of any queue (anymore), and in our case, we actually only face one type of issue: if workers get shutdown/restarted under load, there is a high probability, that there are orphaned jobs afterwards.
Our workers usually process multiple queues, so I know, that they use the (unreliable) blpop-command (#1716).

I don't know exactly, how the shutdown process works, but when the kill signal arrives, how is the blocking blpop getting stopped? Is it possible, that the connection is aborted in the middle of popping a job-id off the queue?
I've found an older redis-issue, which states that there is no option to reliably closing a blpop-command. I don't know, if this is still the case(?)
edit: With redis >= 5.0 there is a "CLIENT UNBLOCK" command, which may help here.

If this is really an issue with redis, I can only think of one possible solution: don't allow to interrupt the redis-connection, but instead set the blocking-timeout quite low, so that the shutdown-state of the worker can be checked after every timeout. So if a worker gets the shutdown-signal, it actually waits for the blpop to timeout. Of course, this increases the CPU-usage of the worker.

I know, there are other similar issues (#758, #1716), but this issue seems to be another facet of it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Orphaned "queued" job after shutdown/restart of worker #2079

Orphaned "queued" job after shutdown/restart of worker #2079

th3hamm0r commented Apr 29, 2024 •

edited

Orphaned "queued" job after shutdown/restart of worker #2079

Orphaned "queued" job after shutdown/restart of worker #2079

Comments

th3hamm0r commented Apr 29, 2024 • edited

th3hamm0r commented Apr 29, 2024 •

edited