New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[1.x] system under heavy load stops to handle calls wia ws api #3358
Comments
may we have some partial debug_locks implementation only for locks on session creation to debug it? |
Do other requests work though? E.g. "info" request or attaching a handle for a different plugin. That could help understanding if the deadlock lies in the transport (WebSocket) or somewhere else.
Do you mean
The debug_locks has a massive impact on verbosity, that's probably the reason of performance issue. If you are using Janus in a containerized environment with cgroups v2, the huge log file increasing might increase the memory allocated (due to pages being kept in buffer) and might explain the OOM.
There is no such option available. You might try customizing the code and just logs the |
I mean the following request isn't working
I didn't tried this, but I can try next time this happens. |
Taking a deeper inspection at the logs you shared, the issue seems to start with some errors:
Those reminds me of situations where the host memory is exhausting (like the issue about cgroups v2 I already mentioned). Are you running janus in containers with a memory limit? If you suspect a memory leak, try running your janus app in a lab environment under |
we are running it in containers, but without memory limits set
yes |
All right, this is a long shot, but can you check the status of the memory in the containers?
replace long-id with the id of the docker container. If you see the |
Got this on another customer. We are using this on our production, so in case we got it on our servers, we will enable debug_locks, since we have no such high load as on customers servers, we will try to catch it. |
got info from our customer, after janus restart
|
We will try to reproduce it on our lab with enabling recording and setting memory limits on docker container. |
we unable to reproduce it on our test stand, but it consistently repeats on customers envs |
Those data are useless after a restart, we need them while the issue exists (btw, the
If you suspect a deadlock wait for the issue and then provide the output of this
|
What version of Janus is this happening on?
master
Have you tested a more recent version of Janus too?
we use master
Was this working before?
we had no issue on lower versions, but we had another issue with memory leaks on them, so we got OOM some time.
Is there a gdb or libasan trace of the issue?
no
Under some circuumstances janus stops to handle any api calls via ws - ws is accepting connections, but create_session has timeouts and janus doesn't work. We tried to add debug_locks, but we got situation what janus server eats all memory and IO and stuck, so we got problems with performance before issue appeared. This happens on one of our customers with very heavy load on januses.
How can we debug this and what can be a cause of it?
https://gist.github.com/spscream/84aa7bca6f8e3f43e07d4c58f414e9cd - recent log of such situation with debug_level = 4
The text was updated successfully, but these errors were encountered: