Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[5.x] Ensure graceful termination of workers marked for termination #1433

Open
wants to merge 1 commit into
base: 5.x
Choose a base branch
from

Conversation

tarexme
Copy link

@tarexme tarexme commented Apr 27, 2024

Fix #1432

Description

There appears to be an issue where workers marked for termination while processing jobs do not terminate gracefully when horizon:terminate is subsequently invoked. These workers, while still actively running, are overlooked during the supervisor's termination process. As a result, instead of terminating gracefully, they are killed upon the supervisor's exit.

Steps To Reproduce

  1. Launch the master supervisor with the fast_termination option set to false using horizon command.
  2. Send a long-running job to the queue. Ensure that this job is being processed by a worker.
  3. Wait for scaleDown() method to be triggered on ProcessPool, ensuring that the process handling the long-running job is marked for termination. For consistent test results, use the code snippet below to simulate a supervisor restart during which all worker processes are marked for termination by scaling process pools down to 0.
  4. Terminate horizon using horizon:terminate command.
// Dispatch a long-running job that sleeps for 60 seconds
SleepJob::dispatch(60);
sleep(10); // Make sure that job is picked up

// Trigger a restart on all supervisors, marking workers for termination
foreach (app(SupervisorRepository::class)->names() as $name) {
    app(HorizonCommandQueue::class)->push(
        $name, Restart::class
    );
}

sleep(10); // Make sure that all supervisors have restarted

// Call the terminate command
Artisan::call(TerminateCommand::class);
class SleepJob implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    public function __construct(protected readonly int $sleepDuration)
    {
        // Nothing
    }

    /**
     * Execute the job.
     */
    public function handle(): void
    {
        for ($i = 0; $i < $this->sleepDuration; $i++) {
            sleep(1);
        }

        Log::debug('Job finished.');
    }
}

@driesvints
Copy link
Member

Please add a thorough description to your PR and not just link to an issue. This will help the people reviewing your PR.

@tarexme tarexme changed the title [5.x] Fix #1432 [5.x] Ensure graceful termination of workers marked for termination Apr 29, 2024
@taylorotwell taylorotwell marked this pull request as draft April 29, 2024 14:48
@art-vanesyan
Copy link

are there any updates? the same situation

@driesvints
Copy link
Member

@taylorotwell going to re-open this one as we've had two reports now of graceful termination not working properly with Horizon. This PR seems to fix it for both cases. Other one here: #1450

@nckrtl
Copy link

nckrtl commented May 27, 2024

To add to this remark: #1450 (comment) this only seems to be true in the case of 1 job being processed on termination. When it doesn't occur, the worker is included in the runningProcesses collection. So although this PR fixes it, it feels like a patch in the wrong place as it feels that the issue lies deeper.

$this->terminatingProcesses() should also not be relevant in this case as its only being used when scaling down processes. I think the real issue lies within the scale() method in ProcessPool.php. When there is 1 process idle and a job is being pushed to the queue, the scale function is being called:

    public function scale($processes)
    {
        $processes = max(0, (int) $processes);

        if ($processes === count($this->processes)) {
            return;
        }

        if ($processes > count($this->processes)) {
            $this->scaleUp($processes);
        } else {
            $this->scaleDown($processes);
        }
    }
    

At the moment of that scale check $this->processes is 2 as another process has already been added once the job has been added to the queue. So then scaleDown is called. And in scaleDown it takes the first process in the array and marks that process for termination. But in this scenario that process is actually doing work and shouldn't be marked for termination, it should be the most recently added process.

That's why taking the last process in the array and terminate that one instead of the first one also fixed the issue: #1450 (comment).

So yeah, this PR will work but is not fixing the cause.

@driesvints
Copy link
Member

Thanks @nckrtl. @taylorotwell do you feel like we should first address the real underlying issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Graceful termination fails for workers marked for termination during job processing
4 participants