Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

laravel horizon jobs stuck in pending #1280

Open
khalidgxg opened this issue May 29, 2023 · 27 comments
Open

laravel horizon jobs stuck in pending #1280

khalidgxg opened this issue May 29, 2023 · 27 comments

Comments

@khalidgxg
Copy link

Horizon Version

5.15

Laravel Version

10.09.0

PHP Version

8.1

Redis Driver

Predis

Redis Version

2.1.2

Database Driver & Version

No response

Description

Hi all,

I am writing to report a bug related to the job processing in Laravel Horizon. Specifically, I have encountered a situation where a job appears to be stuck in the "pending" status despite having a "completed_at" timestamp, and another job is stuck in the "reserved" status without progressing further.

Here are the details of the problematic jobs:

  • Job with "pending" status but "completed_at" :

    "status": "pending",
    "completed_at": "1685113859.5327",
    "reserved_at": "1685113859.5284",

  • Job with "reserved" status :

    "status": "reserved",
    "completed_at": null,
    "reserved_at": "1685110558.3953",

In the first case, the job has a "completed_at" timestamp indicating its successful completion, but it remains in the "pending" status. On the other hand, the second job is stuck in the "reserved" status without progressing further.

Could you please investigate this issue and provide guidance on how to resolve it? It seems that the job statuses are not being updated correctly, causing confusion in monitoring and processing.

Thank you for your attention to this matter. I look forward to your response and assistance in resolving this bug.

Best regards,
khalid

Steps To Reproduce

  1. Set up a Laravel application with Laravel Horizon installed. Make sure you have the necessary dependencies and configurations in place.
  2. Create a job that exhibits the issue. For example, you can create a custom job class StuckJob that performs some task, such as writing to a log file or making an API request.
  3. Configure your application to use Laravel Horizon as the queue driver. Ensure that the queue connections and supervisors are properly set up.
  4. Push multiple instances of the StuckJob to the queue using the Laravel Horizon job queueing mechanism. You can use the dispatch() function or Horizon-specific methods like Horizon::queue() to push the jobs.
  5. Monitor the Horizon dashboard to observe the job processing. Keep an eye on the status of the jobs you pushed.
  6. Check if any of the jobs get stuck in the "pending" status despite having a "completed_at" timestamp. Note down the relevant job ID, connection, queue, and payload details.
  7. Repeat the process with another job to observe if any jobs get stuck in the "reserved" status without progressing further. Again, note down the relevant job ID, connection, queue, and payload details.
  8. Take note of the Laravel Horizon version you are using in your application.
@driesvints
Copy link
Member

Hey there,

Can you first please try one of the support channels below? If you can actually identify this as a bug, feel free to open up a new issue with a link to the original one and we'll gladly help you out.

Thanks!

@khalidgxg
Copy link
Author

ok thanks

@fabriciojs
Copy link

Got the same thing happening, latest Laravel & Horizon versions.

Jobs even though processed correctly keep accumulating on the Pending list in the Dashboard, although the queues are empty.

I could not find more accurate reports or solutions specifically to this situation.

@khalidgxg did you learn what was causing it for you?

And @driesvints I assume if one can showcase a repo with Laravel+Horizon that could consistenly show/reproduce the issue happening as reported, it would then qualify for you guys to pursue it as a bug, right? I might try to put something together.

@cccdz
Copy link

cccdz commented Sep 1, 2023

@driesvints @fabriciojs #1034

@driesvints
Copy link
Member

Hey all. This should have been fixed already in 4.x, does that not work for you? laravel/telescope#1349

@driesvints
Copy link
Member

Telescope that is.

@cccdz
Copy link

cccdz commented Sep 1, 2023

@driesvints #1185
This happens when the consumer executes the event faster than the production event, because horizon is implemented through events, which are first dropped into a queue and then triggered, so there may be a situation where the consumer finishes consuming before starting to execute the event.

@cccdz
Copy link

cccdz commented Sep 1, 2023

image

@github-actions
Copy link

github-actions bot commented Sep 1, 2023

Thank you for reporting this issue!

As Laravel is an open source project, we rely on the community to help us diagnose and fix issues as it is not possible to research and fix every issue reported to us via GitHub.

If possible, please make a pull request fixing the issue you have described, along with corresponding tests. All pull requests are promptly reviewed by the Laravel team.

Thank you!

@driesvints
Copy link
Member

Thank you. We'd appreciate any help through a PR for this.

@cccdz
Copy link

cccdz commented Sep 1, 2023

image image I have a suggestion, is that these two listening methods can be used Lua script, and then in the pushed method of the Lua script to determine whether the key already exists, if the key exists, then on behalf of the completion of the event has been carried out, it is not written to the pending ordered list of collections, just update the hash key, I do not know if this is feasible.

@joelvh
Copy link

joelvh commented Sep 22, 2023

@driesvints is there a way to release these reserved jobs to they can be processed again?

@driesvints
Copy link
Member

driesvints commented Sep 22, 2023

I don't know sorry.

@joelvh
Copy link

joelvh commented Sep 25, 2023

@themsaid do you maybe know if it's possible to release these jobs that get stuck as reserved to be processed again? Thanks!

@graemlourens
Copy link
Contributor

I'd like to add here as well that we're experiencing the same issue with jobs being stuck in 'pending' even if completed_at is present in horizon dashboard, and the jobs actually completed successfully.

We have not been able to determine the root cause. We dispatch millions of jobs a month, and it affects only approximately 20 per day. Still, it's rather unsettling and we'd love to find a solution.

@graemlourens
Copy link
Contributor

@pnlinh you are WAY out of date with laravel, horizon & php. Please update to most recent versions and test again. There is no sense in asking for help with such outdated versions.

@pnlinh
Copy link

pnlinh commented Nov 21, 2023

@pnlinh you are WAY out of date with laravel, horizon & php. Please update to most recent versions and test again. There is no sense in asking for help with such outdated versions.

Thanks for your suggestion but my project cannot upgrade now. I added delay value to jobs, it seems it works.

@laravel laravel deleted a comment from pnlinh Nov 27, 2023
@driesvints
Copy link
Member

@pnlinh please try to focus the discussion on supported Laravel/Horizon versions, thanks.

@fwilliamconceicao
Copy link

Even with updated versions I still have this issue.

@ithuis
Copy link

ithuis commented Jan 23, 2024

i set 'TELESCOPE_JOB_WATCHER' to false in config, and they all came flooding back into completed.

"Watchers\JobWatcher::class => env('TELESCOPE_JOB_WATCHER', false)"

mentioned laravel/telescope#1349 (comment)

@Kladislav
Copy link

same issue..

@driesvints
Copy link
Member

Hey all. Extra messages that you're experiencing this issue aren't really helpful. Instead, please try posting extra findings around the issue or help out with a PR, thanks.

@lucaspanik
Copy link

image image I have a suggestion, is that these two listening methods can be used Lua script, and then in the pushed method of the Lua script to determine whether the key already exists, if the key exists, then on behalf of the completion of the event has been carried out, it is not written to the pending ordered list of collections, just update the hash key, I do not know if this is feasible.

I've been facing this problem for almost 3 years, where the internal solution provided is sleep(3) inside all jobs.
#1034

What does the answer above make sense, since with sleep(3) the events have time to orchestrate themselves normally.

Would this be a possible point of investigation?

@fwilliamconceicao
Copy link

image image I have a suggestion, is that these two listening methods can be used Lua script, and then in the pushed method of the Lua script to determine whether the key already exists, if the key exists, then on behalf of the completion of the event has been carried out, it is not written to the pending ordered list of collections, just update the hash key, I do not know if this is feasible.

I've been facing this problem for almost 3 years, where the internal solution provided is sleep(3) inside all jobs. #1034

What does the answer above make sense, since with sleep(3) the events have time to orchestrate themselves normally.

Would this be a possible point of investigation?

This worked for me for a couple of months, but since the application scaled up and we had more workload this became a huge headhache.

What I'm doing right now it's migrating everything for serverless services, jobs, and isolated applications with C#.

The only way to stop this behavior is to stop using Horizon for huge workloads.

@cccdz
Copy link

cccdz commented Feb 27, 2024

class RedisJobRepository extends HorizonRedisJobRepository
{
    /**
     * 保留
     *
     * @param $connection
     * @param $queue
     * @param JobPayload $payload
     * @return void
     * @throws RedisException
     */
    public function reserved($connection, $queue, JobPayload $payload): void
    {
        // 循環總時長
        $totalTime = 0;

        // 如果horizon的任務不是pending狀態
        while ('pending' !== redis('horizon')->hget($payload->id(), 'status')) {
            // 如果循環時間大於等於1s
            if ($totalTime >= 1000000) {
                break;
            }

            // sleep 5ms
            usleep(5000);

            $totalTime += 5000;
        }

        parent::reserved($connection, $queue, $payload);
    }
}


$this->app->singleton(JobRepository::class, RedisJobRepository::class);

It can be temporarily avoided in this way.

@fwilliamconceicao
Copy link

class RedisJobRepository extends HorizonRedisJobRepository
{
    /**
     * 保留
     *
     * @param $connection
     * @param $queue
     * @param JobPayload $payload
     * @return void
     * @throws RedisException
     */
    public function reserved($connection, $queue, JobPayload $payload): void
    {
        // 循環總時長
        $totalTime = 0;

        // 如果horizon的任務不是pending狀態
        while ('pending' !== redis('horizon')->hget($payload->id(), 'status')) {
            // 如果循環時間大於等於1s
            if ($totalTime >= 1000000) {
                break;
            }

            // sleep 5ms
            usleep(5000);

            $totalTime += 5000;
        }

        parent::reserved($connection, $queue, $payload);
    }
}


$this->app->singleton(JobRepository::class, RedisJobRepository::class);

It can be temporarily avoided in this way.

This is a good solution tho. But have you tested with a huge workload? My workload's very big and when I started adding 2000 sleep everything started to overlap. I didn't try with 5k, might be it's a good workaround but anyway, it's not a good solution.

@cccdz
Copy link

cccdz commented Feb 27, 2024

class RedisJobRepository extends HorizonRedisJobRepository
{
    /**
     * 保留
     *
     * @param $connection
     * @param $queue
     * @param JobPayload $payload
     * @return void
     * @throws RedisException
     */
    public function reserved($connection, $queue, JobPayload $payload): void
    {
        // 循環總時長
        $totalTime = 0;

        // 如果horizon的任務不是pending狀態
        while ('pending' !== redis('horizon')->hget($payload->id(), 'status')) {
            // 如果循環時間大於等於1s
            if ($totalTime >= 1000000) {
                break;
            }

            // sleep 5ms
            usleep(5000);

            $totalTime += 5000;
        }

        parent::reserved($connection, $queue, $payload);
    }
}


$this->app->singleton(JobRepository::class, RedisJobRepository::class);

通过这种方式可以暂时避免。

这是一个很好的解决方案。但你测试过巨大的工作量吗?我的工作量非常大,当我开始添加 2000 睡眠时,一切都开始重叠。我没有尝试使用 5k,这可能是一个很好的解决方法,但无论如何,这不是一个好的解决方案。

I this is within 1 second to detect whether it is a pending state, every 5ms cycle detection, if it is a pending state means that the event has been executed, you can carry out the next operation, I also do the anti-dumbness, if 1 second after the event has not been executed, the task will not care about it, so that he stays in the pending list, but this extreme case is almost zero!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests