Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Subscription Rescan Memory Useage #500

Closed
2 tasks done
PhuriousGeorge opened this issue Jul 22, 2023 · 4 comments
Closed
2 tasks done

[Bug]: Subscription Rescan Memory Useage #500

PhuriousGeorge opened this issue Jul 22, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@PhuriousGeorge
Copy link
Contributor

I've read the documentation

Operating System

Unraid

Your Bug Report

Describe the bug

I honestly can't understand why TA gobbles RAM so much. I've 'only' got 47GB available to the server and recently TA has been overloading & locking up my server if I have any other container(s) cumulatively using more than 5GB RAM, requiring a manual reboot. I've discussed this a few times and received various answers before I ultimately decided to limit the container. I've limited TA to 8GB using --memory=8G. This is not related to ES at all, TA is the only container I've had to limit. Now with this limit implemented, my subscriptions completely fail to scan every time. I receive a "lost worker" error in the log, which I can only guess is the rescan task, as the rescan has not completed in the last 4 days.

I do understand I'm an "edge case", but the resource usage is a bit edgier ;)

Steps To Reproduce

Have 1944 subscriptions
Limit TA container to 8GB RAM
Initiate 'Rescan Subscriptions' task
Receive "Worker Lost" message

Expected behavior

Relevant log output

[2023-07-21 20:23:13,077: ERROR/MainProcess] Process 'ForkPoolWorker-32' pid:115 exited with 'signal 9 (SIGKILL)'
[2023-07-21 20:23:13,093: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL) Job: 0.')
Traceback (most recent call last):
  File "/root/.local/lib/python3.10/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost
    raise WorkerLostError(
billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL) Job: 0.

Anything else?

No response

@bbilly1
Copy link
Member

bbilly1 commented Jul 22, 2023

I think it's time to start treating this process as a channel by channel task and not a rescan all subscriptions. Then we can also treat the checking of what video IDs need to be added per channel and not over the total archive. That's probably slightly slower as this will result into more complex and repeated queries against the index, but over all most likely a more robust approach.

You might also be able to tweak that with --memory-reservation and --memory-swap (not sure of unraid has swap though...).

@bbilly1 bbilly1 added the bug Something isn't working label Jul 22, 2023
@Nimdae
Copy link

Nimdae commented Jan 3, 2024

I appear to be encountering a related issue. I do observe the memory consumption with scanning, but this, alone isn't causing me an issue. However, the memory doesn't get released. This issue was posted for a previous version of TA so I don't know if this was a problem before v0.4.5.

So every time I run a rescan on 133 subscriptions, 280MB of memory is consumed (it may have been higher before, I changed some scanning settings recently) and never released. After a few days, this means it can crack 10GB of memory usage.

I discovered this when I ran a rescan and my machine became unresponsive. I found it was consuming over 50GB of memory.

I'm running on Fedora Server.

I was debating opening a new issue on this but if you're considering reworking how scans work, the defect could be resolved by that work. However, if you want me to open a new issue for the leak, I can do that.

@lamusmaser
Copy link
Collaborator

We have observed the leak in this thread on Discord: https://discordapp.com/channels/920056098122248193/1179480913701241002

Overall, there are methods that can be utilized, as reported there, which would only allow a certain number of tasks for an individual worker, which then allows the worker to be replaced after a certain point:

From the thread:

So that would be in run.sh, the entry script, change the worker initiation line to celery -A home.tasks worker --loglevel=INFO --max-tasks-per-child 10 & for example to experiment, then please report back.

PhuriousGeorge added a commit to PhuriousGeorge/tubearchivist that referenced this issue Jan 5, 2024
Limit worker lifespan to save our precious RAM as discussed on [Discord](https://discord.com/channels/920056098122248193/1179480913701241002/1180026088802496512)

Mitigates tubearchivist#500 though RAM usage can still ramp rather high before worker is culled
bbilly1 pushed a commit that referenced this issue Jan 15, 2024
Limit worker lifespan to save our precious RAM as discussed on [Discord](https://discord.com/channels/920056098122248193/1179480913701241002/1180026088802496512)

Mitigates #500 though RAM usage can still ramp rather high before worker is culled
@bbilly1 bbilly1 added the pending-release Fixed and pending release label May 11, 2024
@bbilly1 bbilly1 removed the pending-release Fixed and pending release label May 22, 2024
@bbilly1
Copy link
Member

bbilly1 commented May 22, 2024

v0.4.8 brings various additional improvements in memory management. closing this issue for now.

@bbilly1 bbilly1 closed this as completed May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants