[Bug]: Subscription Rescan Memory Useage #500

PhuriousGeorge · 2023-07-22T02:15:22Z

I've read the documentation

I'm running the latest version of Tube Archivist and have read the release notes.
I have read the how to open an issue guide, particularly the bug report section.

Operating System

Unraid

Your Bug Report

Describe the bug

I honestly can't understand why TA gobbles RAM so much. I've 'only' got 47GB available to the server and recently TA has been overloading & locking up my server if I have any other container(s) cumulatively using more than 5GB RAM, requiring a manual reboot. I've discussed this a few times and received various answers before I ultimately decided to limit the container. I've limited TA to 8GB using --memory=8G. This is not related to ES at all, TA is the only container I've had to limit. Now with this limit implemented, my subscriptions completely fail to scan every time. I receive a "lost worker" error in the log, which I can only guess is the rescan task, as the rescan has not completed in the last 4 days.

I do understand I'm an "edge case", but the resource usage is a bit edgier ;)

Steps To Reproduce

Have 1944 subscriptions
Limit TA container to 8GB RAM
Initiate 'Rescan Subscriptions' task
Receive "Worker Lost" message

Expected behavior

Relevant log output

[2023-07-21 20:23:13,077: ERROR/MainProcess] Process 'ForkPoolWorker-32' pid:115 exited with 'signal 9 (SIGKILL)'
[2023-07-21 20:23:13,093: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL) Job: 0.')
Traceback (most recent call last):
  File "/root/.local/lib/python3.10/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost
    raise WorkerLostError(
billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL) Job: 0.

Anything else?

No response

The text was updated successfully, but these errors were encountered:

bbilly1 · 2023-07-22T05:38:07Z

I think it's time to start treating this process as a channel by channel task and not a rescan all subscriptions. Then we can also treat the checking of what video IDs need to be added per channel and not over the total archive. That's probably slightly slower as this will result into more complex and repeated queries against the index, but over all most likely a more robust approach.

You might also be able to tweak that with --memory-reservation and --memory-swap (not sure of unraid has swap though...).

Nimdae · 2024-01-03T17:00:12Z

I appear to be encountering a related issue. I do observe the memory consumption with scanning, but this, alone isn't causing me an issue. However, the memory doesn't get released. This issue was posted for a previous version of TA so I don't know if this was a problem before v0.4.5.

So every time I run a rescan on 133 subscriptions, 280MB of memory is consumed (it may have been higher before, I changed some scanning settings recently) and never released. After a few days, this means it can crack 10GB of memory usage.

I discovered this when I ran a rescan and my machine became unresponsive. I found it was consuming over 50GB of memory.

I'm running on Fedora Server.

I was debating opening a new issue on this but if you're considering reworking how scans work, the defect could be resolved by that work. However, if you want me to open a new issue for the leak, I can do that.

lamusmaser · 2024-01-03T17:51:00Z

We have observed the leak in this thread on Discord: https://discordapp.com/channels/920056098122248193/1179480913701241002

Overall, there are methods that can be utilized, as reported there, which would only allow a certain number of tasks for an individual worker, which then allows the worker to be replaced after a certain point:

From the thread:

So that would be in run.sh, the entry script, change the worker initiation line to celery -A home.tasks worker --loglevel=INFO --max-tasks-per-child 10 & for example to experiment, then please report back.

Limit worker lifespan to save our precious RAM as discussed on [Discord](https://discord.com/channels/920056098122248193/1179480913701241002/1180026088802496512) Mitigates tubearchivist#500 though RAM usage can still ramp rather high before worker is culled

Limit worker lifespan to save our precious RAM as discussed on [Discord](https://discord.com/channels/920056098122248193/1179480913701241002/1180026088802496512) Mitigates #500 though RAM usage can still ramp rather high before worker is culled

bbilly1 · 2024-05-22T17:03:08Z

v0.4.8 brings various additional improvements in memory management. closing this issue for now.

bbilly1 added the bug Something isn't working label Jul 22, 2023

PhuriousGeorge mentioned this issue Jan 4, 2024

Limit worker lifespan - RAM useage mitigation #639

Closed

PhuriousGeorge mentioned this issue Jan 5, 2024

Limit worker lifespan - RAM useage mitigation #644

Merged

bbilly1 added a commit that referenced this issue May 11, 2024

split channel rescan checking with existing index, #500

97bc03f

bbilly1 added the pending-release Fixed and pending release label May 11, 2024

bbilly1 removed the pending-release Fixed and pending release label May 22, 2024

bbilly1 closed this as completed May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Subscription Rescan Memory Useage #500

[Bug]: Subscription Rescan Memory Useage #500

PhuriousGeorge commented Jul 22, 2023

bbilly1 commented Jul 22, 2023

Nimdae commented Jan 3, 2024

lamusmaser commented Jan 3, 2024

bbilly1 commented May 22, 2024

[Bug]: Subscription Rescan Memory Useage #500

[Bug]: Subscription Rescan Memory Useage #500

Comments

PhuriousGeorge commented Jul 22, 2023

I've read the documentation

Operating System

Your Bug Report

Describe the bug

Steps To Reproduce

Expected behavior

Relevant log output

Anything else?

bbilly1 commented Jul 22, 2023

Nimdae commented Jan 3, 2024

lamusmaser commented Jan 3, 2024

bbilly1 commented May 22, 2024