Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGABRT after tcp_close_handler derefs conn one too many times (race condition on socket close) #261

Open
balgillo opened this issue Aug 31, 2022 · 2 comments

Comments

@balgillo
Copy link

balgillo commented Aug 31, 2022

I have observed the following issue in re:

  1. Send a SIP message and store a reference to it.
    • Note: conn_send allocates a struct sip_conn with nrefs=1 (from the allocation, zero refs from adding it to ht_conn). Then the struct sip_msg stores another reference so nrefs=2.
  2. Dereference the sip struct (sip_close).
  3. Block main thread for a couple of seconds until remote party closes the socket cleanly.
  • This can occur when debug logging is enabled with a slow logging callback.
  • In the meantime, the remote party closes the socket, tcp_close_handler dereferences the connection (now nrefs=1) and sip_close flushes ht_conn which dereferences it again (now nrefs=0 and the connection is freed).
  1. Dereference the SIP message
  • but struct sip_conn has already been freed.

Result: SIGABRT after:

mem: mem_deref: magic check failed 0xb5b5b5b5 (0x120d5c0)

Reason: the connection struct has been dereferenced once by tcp_close_handler and once too many by sip_close flushing sip->ht_conn.

I can fix this by removing the call to mem_deref from tcp_close_handler in transp.c. But, is that a good change? Could it cause memory leaks?

P.S. this issue is actually occurring in a custom fork of an older version of re. I don't know for sure that it affects the latest re, but the relevant code looks the same.

@balgillo
Copy link
Author

balgillo commented Sep 5, 2022

I've been looking into this further and I now have a clearer explanation and proposed fix.

Normally, connections are cleared up by hash_flush(sip->ht_conn), which unlinks them from the conn list and also mem_deref's them.

The cleanup code in transp.c does something similar to hash_flush:

conn_close(conn); // this calls hash_unlink(&conn->he)
mem_deref(conn);

However, if hash_flush(sip->ht_conn) has already been called before this code, then hash_unlink(&conn->he) does nothing, but this code still dereferences conn one more time. So this is one too many times and we get a crash.

Proposed solution: rename the conn_close function to conn_close_deref and at the bottom of it:

  if (conn->he.list != NULL)
  {
    // do what hash_flush would do.
    hash_unlink(&conn->he);
    mem_deref(conn);
  }

Then call conn_close_deref(conn) instead of the two line pattern above.

@balgillo
Copy link
Author

balgillo commented Sep 6, 2022

I've made a PR with a fix. I hope someone can take a look and approve it.

#262

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant