[Bug]: Delayed jobs don't move to waiting state after some days #2534

oanguenot · 2024-04-21T11:48:01Z

Version

v5.7.3

Platform

NodeJS

What happened?

Hello,
I have a node application that schedules delayed jobs with a delay from 2s to 1 hour.
When the job is finished, I remove it from the queue and add a new one (with the same id/name) and with a new delay (depending on the result).

Everything works fine during some days (1 to 3) and then without any reasons, the worker stops to run jobs: no more jobs are processed. But my nodeJs application still answers to Web requests so is still alive.

I added logs to all event handlers. I didn't notice any errors.

But, the event "waiting" from the queueEvents is not fired at the time a job need to be launched.

What is strange is that if some hours after (or any time), I add manually a new job to the queue, the worker "wakes-up" and runs all these old delayed jobs.

How to debug this case ?
--> As said, I put an event listener to all Queue, Worker and QueueEvents events, but I didn't see something different.
What could be the reason to not move a job to the 'waiting' state when it is the time to handle it ?

Thanks for your help

How to reproduce.

No response

Relevant log output

No response

Code of Conduct

I agree to follow this project's Code of Conduct

roggervalf · 2024-04-21T15:43:31Z

hi @oanguenot we are tracking that in this issue #2466

roggervalf · 2024-04-21T15:44:21Z

btw what it would help us, is to see which values is passed to bzpopmin command

oanguenot · 2024-04-21T15:57:16Z

Thanks @roggervalf for your quick answer!
In one side, I'm happy to see that problem seems not in my code because I spent days to track it without success but in other side, this is still a problem in front of us :-)
I added a comment to the #2466 and will be happy to help one way or another.

I don't know what is bzpopmin. How or where can I find the values ?
Thanks

roggervalf · 2024-04-21T16:20:57Z

hi @oanguenot in order to see your commands You may need to get into your redis instances with redis-cli and then use monitor command

oanguenot · 2024-04-21T18:55:00Z

Is it what you need ?

Should I let the monitor opens until it blocks and should I see if I got a timeout of zero ?

roggervalf · 2024-04-21T19:26:01Z

yeah we would like to know which value is blocking that command as we we're doing some fixes to prevent passing 0

roggervalf · 2024-04-21T19:27:54Z

also the value that is blocking that command could be a different value than 0, that's what we want to know

roggervalf · 2024-04-23T04:44:51Z

hey @oanguenot, btw which are your queue settings or which values are you using for adding delayed jobs?

oanguenot · 2024-04-23T18:19:56Z

Hi @roggervalf,

Here are my settings:

queue = new Queue("services", {
    connection: {
      host: CONFIG().redisDbUrl,
      port: CONFIG().redisDbPort,
    },
  });

I use the following when adding new jobs:

 const job = await queue.add(
        `${service}-${instance.id}`,
        {
          userId: instance.userId,
          instanceId: instance.id,
          serviceId: service,
          immediate: false,
          retriedCounter,
        },
        {
          jobId: `${service}-${instance.id}`,
          removeOnComplete: true,
          removeOnFail: true,
          delay: delay + randomDelay,
        }
      );

I think, nothing really special.

On my own and after around 60 hours, all jobs have been proceeded on time (redis monitoring active).

roggervalf · 2024-04-24T13:35:55Z

thank you @oanguenot, pls let us know if it happens again. One last questions, before how frequent it happened?

oanguenot · 2024-04-24T18:59:01Z

It happened every 2 or 3 days, but I can't remember when it started. It seems to have worked very well a few versions ago or I didn't notice due to other manual restarts done on my own

oanguenot · 2024-04-27T19:12:35Z

Everything has been running smoothly for the past 6 days. No problem so far.

roggervalf · 2024-04-27T20:02:06Z

thank you @oanguenot, also we release a new performance change regarding this topic. You can try version 5.7.6. Pls let us know how it goes

manast · 2024-04-30T09:37:19Z

I would recommend upgrading to 5.7.7 even, as it will mitigate a potential issue we have discovered with IORedis in the case of network partitions.

oanguenot added the bug Something isn't working label Apr 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Delayed jobs don't move to waiting state after some days #2534

[Bug]: Delayed jobs don't move to waiting state after some days #2534

oanguenot commented Apr 21, 2024

roggervalf commented Apr 21, 2024

roggervalf commented Apr 21, 2024

oanguenot commented Apr 21, 2024

roggervalf commented Apr 21, 2024

oanguenot commented Apr 21, 2024 •

edited

roggervalf commented Apr 21, 2024

roggervalf commented Apr 21, 2024

roggervalf commented Apr 23, 2024

oanguenot commented Apr 23, 2024

roggervalf commented Apr 24, 2024

oanguenot commented Apr 24, 2024 •

edited

oanguenot commented Apr 27, 2024

roggervalf commented Apr 27, 2024 •

edited

manast commented Apr 30, 2024

[Bug]: Delayed jobs don't move to waiting state after some days #2534

[Bug]: Delayed jobs don't move to waiting state after some days #2534

Comments

oanguenot commented Apr 21, 2024

Version

Platform

What happened?

How to reproduce.

Relevant log output

Code of Conduct

roggervalf commented Apr 21, 2024

roggervalf commented Apr 21, 2024

oanguenot commented Apr 21, 2024

roggervalf commented Apr 21, 2024

oanguenot commented Apr 21, 2024 • edited

roggervalf commented Apr 21, 2024

roggervalf commented Apr 21, 2024

roggervalf commented Apr 23, 2024

oanguenot commented Apr 23, 2024

roggervalf commented Apr 24, 2024

oanguenot commented Apr 24, 2024 • edited

oanguenot commented Apr 27, 2024

roggervalf commented Apr 27, 2024 • edited

manast commented Apr 30, 2024

oanguenot commented Apr 21, 2024 •

edited

oanguenot commented Apr 24, 2024 •

edited

roggervalf commented Apr 27, 2024 •

edited