Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit jobs to 5 and wrap in exception handler to release #4366

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

zestysoft
Copy link

@unixfox I've narrowed down the edits to this. If I revert these changes I see the errors again.

#4362

Signed-off-by: Ian Brown <ian@zestysoft.com>
@zestysoft zestysoft requested a review from a team as a code owner December 31, 2023 17:32
@zestysoft zestysoft requested review from unixfox and removed request for a team December 31, 2023 17:32
@SamantazFox
Copy link
Member

Can you please explain what this code does?

@zestysoft
Copy link
Author

Can you please explain what this code does?

@SamantazFox I've added inline comments.

@SamantazFox
Copy link
Member

SamantazFox commented Jan 10, 2024

Oooh, okay, I see what you did here!

For your information, Jobs in invidious aren't scheduled in any way. Each job is running in it's own "fiber" (co-routine), and is basically a While True loop, with a delay (sleep) on each iteration. This sleep allows the internal Crystal scheduler to jump between fibers. And the role of job.begin is to spawn that fiber, then exit right away.

In essence, your code doesn't prevent jobs from running, and if it did, half of invidious would break.

Please be aware that a more advanced job management library already exists, but we don't have the time (and no urgent need) to integrate it.

@zestysoft
Copy link
Author

So I started with this purely to help in troubleshooting the issue, but somehow after implementing this change, I could not and still have not reproduced the issue. If I revert the change, the issue comes back though, so this is reproducible.

Btw, I'm not interested in the bounty at all (assuming that's even on the table).

@zestysoft
Copy link
Author

zestysoft commented Jan 10, 2024

The details in the issue I created shows that invidious was only creating a few connections to the DB, but that, for reasons still unknown, had something similar to jobs locking up (race condition?) until it got to the point where it just sat there not talking to the DB at all. It def wasn't using the default 100 connections that the DB was configured for, and the DB wasn't overwhelmed with requests -- it had answers waiting for invidious to pick up that it just wasn't.

So I figured I'd limit the # of jobs, and handle any exceptions that might have happened in a job such that the slots couldn't become "zombied".

@SamantazFox
Copy link
Member

So I started with this purely to help in troubleshooting the issue, but somehow after implementing this change, I could not and still have not reproduced the issue. If I revert the change, the issue comes back though, so this is reproducible.

I think that it might have to do with the error handling rather than the semaphore part.
Can you test different variations (no semaphore, no rescuing, less/more channels)?

Comment on lines -4 to -11
# Automatically generate a structure that wraps the various
# jobs' configs, so that the following YAML config can be used:
#
# jobs:
# job_name:
# enabled: true
# some_property: "value"
#
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't touch comments for code that is not related to your PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants