Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage increases everytime tls.reload is executed #3823

Open
denzs opened this issue Apr 24, 2024 · 9 comments
Open

Memory usage increases everytime tls.reload is executed #3823

denzs opened this issue Apr 24, 2024 · 9 comments
Assignees

Comments

@denzs
Copy link

denzs commented Apr 24, 2024

Description

We are using Kamailio 5.7.4 on Debian 12 (from http://deb.kamailio.org/kamailio57) with rtpengine as an Edgeproxy for our clients. The instance terminates SIP/TLS (with Cliencertificates) and forwards the SIP Traffic to internal systems.

After some days we are getting errors like this
tls_complete_init(): tls: ssl bug #1491 workaround: not enough memory for safe operation: shm=7318616 threshold1=8912896

First we thought Kamailio just doesnt have enough memory, so we doubled it..

But after some days the Logmessage (and Userissues) occured again.

So we monitored the shmmem statistics and found that used and max_used are constantly growing til it reaches the limit.

As i mentioned we are using client-certificates and so we are also using the CRL feature.
We do have a systemd-timer which fetches the CRL every hour and runs 'kamcmd tls.reload' when finished.

Our tls.cfg looks like this:

[server:default]
method = TLSv1.2+
private_key = /etc/letsencrypt/live/hostname.de/privkey.pem
certificate = /etc/letsencrypt/live/hostname.de/fullchain.pem
ca_list = /etc/kamailio/ca_list.pem
ca_path = /etc/kamailio/ca_list.pem
crl = /etc/kamailio/combined.crl.pem
verify_certificate = yes
require_certificate = yes

[client:default]
verify_certificate = yes
require_certificate = yes

After testing a bit we found that every time tls.reload is executed Kamailio consumes a bit more memory which eventually leads to all the memory being consumed which leads to issues for our users.

See following example:

[0][root@edgar-dev:~]# while true ; do /usr/sbin/kamcmd tls.reload ; /usr/sbin/kamcmd core.shmmem ; sleep 1 ; done
Ok. TLS configuration reloaded.
{
	total: 268435456
	free: 223001520
	used: 41352552
	real_used: 45433936
	max_used: 45445968
	fragments: 73
}
Ok. TLS configuration reloaded.
{
	total: 268435456
	free: 222377960
	used: 41975592
	real_used: 46057496
	max_used: 46069232
	fragments: 78
}
Ok. TLS configuration reloaded.
{
	total: 268435456
	free: 221748664
	used: 42604992
	real_used: 46686792
	max_used: 46698080
	fragments: 77
}
Ok. TLS configuration reloaded.
{
	total: 268435456
	free: 221110832
	used: 43242408
	real_used: 47324624
	max_used: 47335608
	fragments: 81
}
^C
[130][root@edgar-dev:~]# 

Troubleshooting

Reproduction

Everytime tls.reload is called the memory consumptions grows..

Debugging Data

If you let me know what would be interesting for tracking this down, i am happy to provide logs/debugging data!

Log Messages

If you let me know what would be interesting for tracking this down, i am happy to provide logs/debugging data!

SIP Traffic

SIP doesnt seem to be relevant here

Possible Solutions

Calling tls.reload less often or restart kamailio before memory is consumed ;)

Additional Information

version: kamailio 5.7.4 (x86_64/linux) 
flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, MEM_JOIN_FREE, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLOCKLIST, HAVE_RESOLV_RES, TLS_PTHREAD_MUTEX_SHARED
ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
id: unknown 
compiled with gcc 12.2.0
  • Operating System:
* Debian GNU/Linux 12 (bookworm)
* Linux edgar-dev 6.1.0-20-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.85-1 (2024-04-11) x86_64 GNU/Linux
@denzs
Copy link
Author

denzs commented May 7, 2024

I just realized that i forgot to mention.. in addition to the logged error message our clients start to get connection issues as well, so we have to restart Kamailio asap in that case..

@sergey-safarov
Copy link
Member

@denzs do you have a monitoring tool? Prometheus + Graphana graphs?

@miconda
Copy link
Member

miconda commented May 7, 2024

Probably this part has to be reviewed ... first the tls reload was initially designed to be done rather rarely, when the certificates expires. The CRL feature was also not much in use, at least in what I could experience so far, most of the deployments are with server-side only certificates.

Furthermore, I am not sure if old certificates can be cleared right away after the restart, existing connections are not closed and there might be some references to their certificates.

Are you doing the reload only if there are changes in the content of the crl or certificate files? Or the reload is done anyhow?

@denzs
Copy link
Author

denzs commented May 7, 2024

@sergey-safarov yes we do :)
image

@miconda at the moment we do the tls.reload unconditionally and quite 'high frequently' to ensure the CRLs are up to date.. of course we can check if the CRL changed, but from my point of view that would only delay the neccesary restart of kamailio..

@denzs
Copy link
Author

denzs commented May 7, 2024

image
This Screenshot is from our dev environment (with no tls-clients connected) running:

while true ; do /usr/sbin/kamcmd tls.reload ; /usr/sbin/kamcmd tls.reload ; sleep 0.5 ; done

Parallel watching core.shmmem outpot looks like:

Ok. TLS configuration reloaded.
{
	total: 268435456
	free: 1894256
	used: 262444424
	real_used: 266541200
	max_used: 266550968
	fragments: 85
}
error: 500 - Error while fixing TLS configuration (consult server log)
{
	total: 268435456
	free: 1208784
	used: 263491296
	real_used: 267226672
	max_used: 268435208
	fragments: 11749
}
Ok. TLS configuration reloaded.
{
	total: 268435456
	free: -9223372036854776
	used: 267589696
	real_used: 271686888
	max_used: 271696928
	fragments: 87
}

@sergey-safarov
Copy link
Member

Could you compare it with a graph for our server for last 60 days and about 25 WebRTC clients?
image

and
image

Here used Kamailio 5.7.2 with Letencrypt server.
Cert reloads once per two-mouth. We dot use CRL.
To avoid too often cert reloads we compare currently used certificates and the last cert using commands like.

    rsync -l --recursive --info=name --dry-run ${LECRTSDIR} ${LETARGETDIR} >${CHKUPDLOG}
    # Synchronizing certificates.
    if [ ! -s ${CHKUPDLOG} ]; then
        echo "Check updates. No changes required"
        rm -f ${CHKUPDLOG}
    else
        echo "Has new certificates. Start sync"
        rsync -azlcv --recursive --delete --info=name ${LECRTSDIR} ${LETARGETDIR} >"${SYNCLOG}"
        rm -f ${CHKUPDLOG}
    fi

@denzs
Copy link
Author

denzs commented May 8, 2024

The problem actually occured after we added the CRL some weeks ago.. without CRL there was no such behaviour.
And of course there are a lot options to mitigate the issue respectively decrease the propability by doing less reloads by decreasing the cycle and/or check if there was a change at the CRL at all..

Anyhow i thought raising an issue makes sense, because from my point of view there is definitively some memory leaking when using tls.reload in combination with a CRL..

@henningw
Copy link
Contributor

henningw commented May 8, 2024

If it happens only with adding a CRL, it looks indeed like an issue in this code path. In the end using CRL is probably quite rare.

@henningw henningw assigned henningw and xkaraman and unassigned henningw May 8, 2024
@xkaraman
Copy link
Contributor

xkaraman commented May 9, 2024

After some time debuging, I could replicate this issue of memory increase when using a CRL and tls.reload.

One possible issue according to memory statistics printed frequently while we have while true ; do /usr/sbin/kamcmd tls.reload ; /usr/sbin/kamcmd tls.reload ; sleep 0.5 ; done running is:

INFO: qm_sums: qm_sums():  count=  5288 size=    183440 bytes from tls: tls_init.c: ser_realloc(372)
INFO: qm_sums: qm_sums():  count= 17378 size=   1275712 bytes from tls: tls_init.c: ser_malloc(364)
---
INFO: qm_sums: qm_sums():  count=  5341 size=    242768 bytes from tls: tls_init.c: ser_realloc(372)
INFO: qm_sums: qm_sums():  count= 17325 size=   1381936 bytes from tls: tls_init.c: ser_malloc(364)
---
INFO: qm_sums: qm_sums():  count=  5331 size=    248544 bytes from tls: tls_init.c: ser_realloc(372)
INFO: qm_sums: qm_sums():  count= 17335 size=   1422112 bytes from tls: tls_init.c: ser_malloc(364)
---
INFO: qm_sums: qm_sums():  count=  5360 size=    290560 bytes from tls: tls_init.c: ser_realloc(372)
INFO: qm_sums: qm_sums():  count= 17306 size=   1466000 bytes from tls: tls_init.c: ser_malloc(364)

Memory here increases until we exhaust the shared memory max allocation and then tls.reload fails.

Some notes:
When using tls.reload without a CRL, I didn't see any notable increase in memory usage. The above-noted allocations are steady around

count=  9415 size=    948432 bytes from tls: tls_init.c: ser_malloc(364)
count=  1011 size=    151408 bytes from tls: tls_init.c: ser_realloc(372)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants