-
Notifications
You must be signed in to change notification settings - Fork 932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug]: FATAL: unhandled exception PJLIB/No memory! #727
Comments
It's not a memory issue. It's trying to allocate an absurdly large amount, due to likely reading a value from freed memory. Unless it can be reproduced it would likely be hard to isolate and identify, but the backtrace shows it having to do with a transport so information about transports in use would be needed, along with attaching a full backtrace[1]. [1] https://docs.asterisk.org/Development/Debugging/Getting-a-Backtrace/?h=backtrace |
Thanks @jcolp - yes - I probably should have mentioned the insanely large allocation size ;) Here is my current transport configuration - used by all endpoints on the system; I'll run ast_coredumper in a few minutes to get those details.
|
@mikepultz If you still have that coredump around, I'd be interested in the results of the following...
|
Yup
|
@mikepultz Thanks. That just confirms that the "src" pointer parameter is either corrupted or the contents of that location are. It looks like the build is optimized but the full coredump may help anyway.
|
hey @gtjoseph - that --tarball-coredumps is pretty aggressive- it looks like it takes a copy of most of my instance, including contents of root's and users home directories- I can't share that data from a production system. I can include a tar of the asterisk binaries, all libraries on the system, and the core dump if that gives you what you need? I definitely don't have the debug symbols on the system- it's a custom build that we package for our environment; I can include the build string as well if that's helpful? Mike |
Eh what??? --tarball-coredumps should only grab the coredump itself, the *.txt files, the asterisk binary, the modules, and /etc/os-release. It should never try root or home directories or anything else. Not even /etc/asterisk. I know it works fine on RHEL but I wonder if Amazon Linux does something goofy with the directory layout. In any case, what we'd really need is the asterisk binary and modules, the accompanying debug symbols if the binaries are stripped, the coredump itself, and /etc/os-release. From that we can usually spin up a matching docker container, copy in the binaries and symbols, and run gdb. The symbols are really important though. Are you certain they're not available? You wouldn't have been able to run that gdb command snippet I gave you without them. What does the |
Ok- so it's doing something really weird then; it was taking a while to run, so I checked the process list, which showed: and when I looked in that /tmp directory, I saw:
The main issue is that we use our own RPM package for our systems- I'll see if I can build a debuginfo RPM for our package and then tar everything up to send over. Mike |
That's weird. I'm sure it doesn't have anything to do with this issue but I'd like to figure out why ast_coredumper isn't working in your environment. What AMI id are you using for Amazon Linux 2? Are you doing anything special with the filesystem layout or asterisk installation directories? Also, the core-asterisk-2024-04-30T15-11-04Z-info.txt file has no good info in it so can you tell me the exact version of asterisk you're running? If you're building from git, the commit-id would work. Are you applying your own patches to the source? |
hey @gtjoseph I just emailed in a dropbox link with the files; hopefully it has everything you need - if there's something missing or would make it easier, just let me know and I'll see if I can provide it. RE: ast_coredumper
I've included the full asterisk install dir (including ast_coredumper) from my system, as well as a .patch file with all the changes I make, and all the config files, with the layout from my system in that dropbox file I sent- hopefully that gives you what you need. Mike |
Severity
Major
Versions
21.2.0
Components/Modules
pjproject
Operating Environment
Amazon Linux 2 (RHEL)
Frequency of Occurrence
One Time
Issue Description
One of our Asterisk instances crashed today when trying to allocate memory to a pj_pool:
[2024-04-30 11:11:04] ERROR[28315]: pjproject:<?>: except.c ..!!!FATAL: unhandled exception PJLIB/No memory!
This is the first time it's happened, and we have 6 identical instances total (load balanced) that have been running for about a month, and so far it's only happened the once.
The system is not experiencing memory issues, and it doesn't appear to be a memory leak (no visible downward trend over time in on our Available memory graphs):
The servers are not under a high load - there was only around 120 active calls when this happened. I've included a backtrace from the core dump.
It's a production device, so I'm not able to run it under Valgrind, but I can probably provide some redacted config files if that's helpful..
Relevant log output
Asterisk Issue Guidelines
The text was updated successfully, but these errors were encountered: