-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nameserver_probe_callback() accesses a nameserver object freed by evdns_base_clear_nameservers_and_suspend() #1506
Comments
Thank you for such a detailed report! I don't remember the code precisely, but likely this simple fix will not work in multi-threaded environment, when events of the event_base could be processed from multiple threads, since in this case this deferred callback can be runned in parallel with It is "ok" right now because This can be fixed by adding some refcnt (like the code for bufferevent does), but this will require locks right now (since atomic is "on pause" - #1378), though unlikely this will be a hot path
Out of curiosity, what is libsdwan? |
oh, I meant to say we are using an older version of libevent :-) libsdwan is our internal library. |
@stablebits you were writing about the fix, any progress? or we can close this? |
@azat my last comment was incorrect and because there was no reaction from you yet, I simply deleted it. Below is my initial take (I got distracted by other activitie, so this hasn't been tested yet): Comments:
However, there are 2 issues. As you mentioned, while
ready to access the destroyed In order to address these 2 issues, this patch wraps the code accessing It may well be that I missed some cases though. We'll run this code on our systems for a 1-2 weeks to see whether it solves the problem (there are a few crashes per day on average). |
Hi @stablebits, sorry for the delay I couldn't allocate time for libevent for quit a long time...
That is correct.
Yes, your analysis looks correct. Can you please submit your patch as a PR? With a few comments:
Apart from it, great analysis, thank you! |
We are using an older version of libsdwan but, as far as I can see, the current latest upstream version seems to have the same issue.
What we see (sentry reports, didn't reproduce locally yet):
reply_run_callback()
->nameserver_probe_callback()
-> crashevdns_base_clear_nameservers_and_suspend()
was called right before the crash.What I see in the code:
Before destroying
nameserver
objects,evdns_base_clear_nameservers_and_suspend()
doeshttps://github.com/libevent/libevent/blob/c9ec6aafb6a025f03636ae2ae7c18a2d4b8a7ed8/evdns.c#L3134C1-L3137C4
but
evdns_cancel_request()
exits whenpending_cb
is setlibevent/evdns.c
Line 3655 in c9ec6aa
which means that
reply_schedule_callback()
was already called for this request and the callback is waiting to be run by the event loop.Once it runs,
reply_run_callback()
->nameserver_probe_callback()
runs and accesses the freednameserver
object (which is either still a freed or reused block of memory at this time).A quick/partial fix could probably be something like this:
reply.data.raw
andreply.cname
are still freed byreply_run_callback()
but the object is potentially fragile in this case (e.g. perhaps do a cleanup when resettinghave_reply
).Does this scenario look plausible?
The text was updated successfully, but these errors were encountered: