Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tcpops module event_route[tcp:closed] route block $conid is null #3767

Open
liangbaolin opened this issue Feb 26, 2024 · 3 comments
Open

tcpops module event_route[tcp:closed] route block $conid is null #3767

liangbaolin opened this issue Feb 26, 2024 · 3 comments
Labels

Comments

@liangbaolin
Copy link

liangbaolin commented Feb 26, 2024

Description

During the process of using version 5.7.4 of Kamailio, I found that the $conid cannot be correctly obtained in the TCP: closed routing block of the TCP module event_route. I am certain that it is normal in version 5.4.8.

Troubleshooting

 pv [pv_core.c:2725]: pv_get_tcpconn_id()

Reproduction

The test scenario initiated a register registration process for me, and after completing the disconnection of the TCP connection, triggered this event.

Log Messages

2024-02-26T01:35:55.902002512Z 52(98) DEBUG: <core> [core/receive.c:635]: receive_msg(): cleaning up
2024-02-26T01:35:55.902034482Z 52(98) DEBUG: <core> [core/tcp_read.c:280]: tcp_read_data(): EOF on connection 0x7f99a1546a80 (state: 0, flags: c018) - FD 24, bytes 0, rd-flags 10000 ([12.0.1.6]:15821 -> [12.0.1.6]:5060)52(98) DEBUG: <core> [core/tcp_read.c:343]: tcp_read(): read 0 bytes:
2024-02-26T01:35:55.902051542Z
2024-02-26T01:35:55.902058282Z 52(98) DEBUG: <core> [core/tcp_read.c:1532]: tcp_read_req(): read= 0 bytes, parsed=0, state=0, error=1
2024-02-26T01:35:55.902063282Z 52(98) DEBUG: <core> [core/tcp_read.c:1534]: tcp_read_req(): last char=0x00, parsed msg=
2024-02-26T01:35:55.902069402Z
2024-02-26T01:35:55.902074032Z 52(98) DEBUG: <core> [core/tcp_read.c:1544]: tcp_read_req(): EOF
2024-02-26T01:35:55.902101372Z 52(98) DEBUG: <core> [core/io_wait.h:598]: io_watch_del(): DBG: io_watch_del (0x559e8a029740, 24, -1, 0x10) fd_no=2 called
2024-02-26T01:35:55.902153162Z 52(98) DEBUG: <core> [core/tcp_read.c:1927]: handle_io(): removing from list 0x7f99a1546a80 id 2 fd 24, state 2, flags c018, main fd 134, refcnt 2 ([12.0.1.6]:15821 -> [12.0.1.6]:5060)
2024-02-26T01:35:55.902168112Z 52(98) DEBUG: <core> [core/tcp_read.c:1702]: release_tcpconn(): releasing con 0x7f99a1546a80, state -1, fd=24, id=2 ([12.0.1.6]:15821 -> [12.0.1.6]:5060)
2024-02-26T01:35:55.902173372Z 52(98) DEBUG: <core> [core/tcp_read.c:1705]: release_tcpconn(): extra_data (nil)
2024-02-26T01:35:55.902208402Z 83(129) DEBUG: <core> [core/tcp_main.c:3747]: handle_tcp_child(): reader response= 7f99a1546a80, -1 from 1
2024-02-26T01:35:55.902224742Z 83(129) DEBUG: <core> [core/tcp_main.c:3904]: handle_tcp_child(): case CONN_EOF tcp_emit_closed_event before 5
2024-02-26T01:35:55.902229802Z 83(129) DEBUG: <core> [core/tcp_main.c:3671]: tcp_emit_closed_event(): TCP closed event creation triggered (reason: 0)
2024-02-26T01:35:55.902318572Z 83(129) DEBUG: tcpops [tcpops.c:311]: tcpops_handle_tcp_closed(): received TCP closed event
2024-02-26T01:35:55.902331352Z 83(129) DEBUG: tcpops [tcpops.c:261]: tcpops_tcp_closed_run_route(): event reason id: 0 rt: 1
2024-02-26T01:35:55.902335132Z 83(129) ERROR: tcpops [tcpops.c:278]: tcpops_tcp_closed_run_route(): run_top_route() before tev->id: 2
2024-02-26T01:35:55.902480972Z 83(129) DEBUG: <core> [core/tcp_main.c:1713]: _tcpconn_find(): 2  port 0
2024-02-26T01:35:55.902493872Z 83(129) DEBUG: pv [pv_core.c:2725]: pv_get_tcpconn_id(): no conid
2024-02-26T01:35:55.902498762Z 83(129) DEBUG: pv [pv_core.c:2726]: pv_get_tcpconn_id(): msg->rcv.proto_reserved1: 2
2024-02-26T01:35:55.902503592Z 83(129) DEBUG: <core> [core/tcp_main.c:1713]: _tcpconn_find(): 2  port 0
2024-02-26T01:35:55.902508262Z 83(129) DEBUG: <core> [core/pvapi.c:1418]: pv_printf_mode(): final buffer length 111
2024-02-26T01:35:55.902513422Z 83(129) ERROR: <script>: tcp connection closed conid: (<null>), tcp(conid): <null>, Ri: 12.0.1.185, Rp: 5060 , si: 12.0.1.6 , sp: 15821
2024-02-26T01:35:55.902607692Z 83(129) DEBUG: <core> [core/tcp_main.c:1713]: _tcpconn_find(): 2  port 0
2024-02-26T01:35:55.902625622Z 83(129) DEBUG: pv [pv_core.c:2725]: pv_get_tcpconn_id(): no conid
2024-02-26T01:35:55.902630672Z 83(129) DEBUG: pv [pv_core.c:2726]: pv_get_tcpconn_id(): msg->rcv.proto_reserved1: 2
2024-02-26T01:35:55.902635822Z 83(129) ERROR: tcpops [tcpops.c:280]: tcpops_tcp_closed_run_route(): run_top_route() after tev->id: 2
2024-02-26T01:35:58.777642759Z 41(87) DEBUG: ims_dialog [dlg_handlers.c:2026]: print_all_dlgs(): ********************41(87) DEBUG: ims_dialog [dlg_handlers.c:2027]: print_all_dlgs(): printing 65536 dialogs

SIP Traffic

my test script block
event_route[tcp:closed] {
        xlog("L_ERR", "$proto connection closed conid: ($conid), tcp(conid): $tcp(conid), Ri: $Ri, Rp: $Rp , si: $si , sp: $sp\n");
}

file: src/modules/pv/pv_core.c
int pv_get_tcpconn_id(struct sip_msg *msg, pv_param_t *param, pv_value_t *res)
{
...

        if((con = tcpconn_get(msg->rcv.proto_reserved1, 0, 0, 0, 0)) == NULL)  // return false
        {
                LM_DBG("no conid\n"); //additional  custom log 
                LM_DBG("msg->rcv.proto_reserved1: %d\n",msg->rcv.proto_reserved1);
                return pv_get_null(msg, param, res);
        }
....
}

file: src/core/tcp_main.c
struct tcp_connection *tcpconn_lookup(int id, struct ip_addr *ip, int port,
		union sockaddr_union *local_addr, int try_local_port, ticks_t timeout)
{
	 ...
	if(unlikely(c == NULL)) {
		c = _tcpconn_find(id, ip, port, &local_ip, local_port); // retrun c == null
	}
	...
}

struct tcp_connection *_tcpconn_find(
		int id, struct ip_addr *ip, int port, struct ip_addr *l_ip, int l_port)
{

	...
	if(likely(id)) {
		hash = tcp_id_hash(id);
		for(c = tcpconn_id_hash[hash]; c; c = c->id_next) { //tcpconn_id_hash  find c is null
#ifdef EXTRA_DEBUG
			LM_DBG("c=%p, c->id=%d, port=%d\n", c, c->id, c->rcv.src_port);
			print_ip("ip=", &c->rcv.src_ip, "\n");
#endif
	....
}

inline static int tcpconn_try_unhash(struct tcp_connection *tcpconn)
{
...
			_tcpconn_detach(tcpconn); //executed
...
			lock_release(&tcpconn->write_lock);
...
}

static inline void _tcpconn_detach(struct tcp_connection *c)
{
	...
	tcpconn_listrm(tcpconn_id_hash[c->id_hash], c, id_next, id_prev); //remove c 
	...
}

Possible Solutions

Based on the log: 
2024-02-26T01:35:55.902229802Z 83(129) DEBUG: <core> [core/tcp_main.c:3671]: tcp_emit_closed_event(): TCP closed event creation triggered (reason: 0)
2024-02-26T01:35:55.902224742Z 83(129) DEBUG: <core> [core/tcp_main.c:3904]: handle_tcp_child(): case CONN_EOF tcp_emit_closed_event before 5
locate the relevant code:
before:
                        if(tcpconn_try_unhash(tcpconn))
                                tcpconn_put(tcpconn);
                        LM_DBG("case CONN_EOF tcp_emit_closed_event before 5\n"); // custom log
                        tcp_emit_closed_event(tcpconn);
                        tcpconn_put_destroy(tcpconn); /* deref & delete if refcnt==0 */
try fix:
after:
                       tcp_emit_closed_event(tcpconn); //change execute order,trigger first and erase later
                        if(tcpconn_try_unhash(tcpconn))
                                tcpconn_put(tcpconn);
                        LM_DBG("case CONN_EOF tcp_emit_closed_event before 5\n");  // custom log
                        //tcp_emit_closed_event(tcpconn); //
                        tcpconn_put_destroy(tcpconn); /* deref & delete if refcnt==0 */

I'm not sure if this change will cause any other issues, but the close event can be triggered normally during runtime, and other events should have similar issues that require adjusting the order.
Is there any other better solution?

Additional Information

  • Kamailio Version - output of kamailio -v
version: kamailio 5.7.4 (x86_64/linux) 9920d9-dirty
flags: , EXTRA_DEBUGUSE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, MEM_JOIN_FREE, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLOCKLIST, HAVE_RESOLV_RES, TLS_PTHREAD_MUTEX_SHARED
ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
id: 9920d9 -dirty
compiled on 09:59:52 Feb 26 2024 with gcc 7.5.0
Copy link

github-actions bot commented Apr 9, 2024

This issue is stale because it has been open 6 weeks with no activity. Remove stale label or comment or this will be closed in 2 weeks.

@github-actions github-actions bot added the Stale label Apr 9, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 23, 2024
@henningw henningw reopened this Apr 23, 2024
@miconda
Copy link
Member

miconda commented Apr 23, 2024

@liangbaolin: I pushed a few commits to address this one, last being f5c05a6 Can you try with git master branch and see if ok? If yes, patches will be backported.

Copy link

github-actions bot commented Jun 5, 2024

This issue is stale because it has been open 6 weeks with no activity. Remove stale label or comment or this will be closed in 2 weeks.

@github-actions github-actions bot added the Stale label Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants