Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with usrloc 'handle_lost_tcp' and dmq_usrloc 'usrloc_delete' #3479

Open
alternatenetwork opened this issue Jun 6, 2023 · 4 comments
Open
Labels

Comments

@alternatenetwork
Copy link

alternatenetwork commented Jun 6, 2023

Description

When a TLS connection is closed and handle_lost_tcp=1 the entry is deleted for usrloc. This deletion is not synced via DMQ even though usrloc_delete=1.

There is also an issue on the same server if a re-registration happens after it's deleted by handle_lost_tcp (before the registration timeout was supposed to expire) then save("location", "0x01") = 2 even though the usrloc table is empty. So it seems like the way handle_lost_tcp deletes is not deleting fully in registrar.

usrloc db_mode is 0

Troubleshooting

Reproduction

Setup 2 servers with usrloc and dmq_usrloc.

  1. Register with expire=120.
  2. Kill TCP connection.
  3. handle_lost_tcp deletes registration from usrloc table when let's say expire=100
  4. registration still exists in 2nd server's usrloc table
  5. Reregister to same server while 2nd server's expire=100
  6. save("location", "0x01") = 2

Additional Information

  • Kamailio Version - output of kamailio -v
version: kamailio 5.6.1 (x86_64/linux) d8f98b
flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLOCKLIST, HAVE_RESOLV_RES, TLS_PTHREAD_MUTEX_SHARED
ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
id: d8f98b 
compiled on 19:05:04 Aug 16 2022 with gcc 8.3.0
  • Operating System:
Debian 10
@henningw henningw added the bug label Jun 8, 2023
@henningw
Copy link
Contributor

henningw commented Jun 8, 2023

Thanks for the report, will have a look

@henningw
Copy link
Contributor

We have managed to reproduce it. Its actually quite a tricky bug, that it will synchronise the data back to itself in certain conditions.

Could you please verify the output of kamcmd dmq.list_nodes? It should show only the number of nodes. In our broken scenario it was showing the number of nodes two times. We will investigate why this happens now.

@alternatenetwork
Copy link
Author

alternatenetwork commented Jun 16, 2023

@henningw I don't think we are talking about the same issue here. I don't see any data being synchronized anywhere. I checked kamcmd dmq.list_nodes but I don't see the same issue you are talking about.

Our issue is that when handle_lost_tcp logic is ran, instead of deleting the entry it instead labels it as "deleted" and allows some timer to remove it after 10sec or so. This process of setting it "deleted" then removing the entry doesn't sync with DMQ. It also causes issues with save("location") because when a rereg happens during this 10sec when the entry is labeled as "deleted" but not actually removed it sees it as an update in the registrar instead of a new registration.

I purpose changing the handle_lost_tcp logic so that it uses the same logic as unregister("location"). As the unregister command correctly removes the entry from the registrar table and correctly syncs via DMQ.

@henningw
Copy link
Contributor

You are right, there are probably three issues here:

  1. if the DMQ is somehow synchronizing with itself the deletion on TCP connection lost will not work at all
  2. The TCP connection lost will be not synchronized with DMQ
  3. If an entry was marked expired because of TCP connection lost, and a new REGISTER is received before the timer was run to delete it, it will update the registration

Regarding 1 - this is another topic, which we are looking into right now.
About 2, yes the logic could be changed to actually remove/unregister it right now. This would also fix the issue 3.
Will look also into that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants