-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rpc: Improve rpc clnt connection cleanup process #4329
base: devel
Are you sure you want to change the base?
Conversation
/run regression |
1 similar comment
/run regression |
/run regression |
0 test(s) failed 1 test(s) generated core 1 test(s) needed retry 1 flaky test(s) marked as success even though they failed |
During the first rpc clnt submission we take the rpc reference and register the call_bail function for the timer thread. The timer thread call call_bail function every 10s basis. In case if a client trigger a shutdown request it try to call rpc_clnt_connection_cleanup to cleanup the rpc connection.The rpc_clnt_connection would not be able to cleanup the rpc connection successfully due to the cleanup_started flag being set by the upper xlator. The rpc reference will be unref only after trigger a call_bail function so basically if somehow call_bail is triggered just before start a shutdown process the application has to wait for 10s to cleanup the rpc connection eventually the process becomes slow. Solution: Unref the rpc object based on the conn->timer/conn->reconnect pointer value as we are doing the same for ping_timer. These pointer are always modified under the critical section so we can assume if pointer is valid it means rpc reference is also valid. Fixes: gluster#4320 credits: Xavi Hernandez <xhernandez@redhat.com> Change-Id: Ib947b8bfcbe1b49e1ed05a50a84de6f92afbca13 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
/run regression |
1 test(s) failed 0 test(s) generated core 3 test(s) needed retry |
/run regression |
1 test(s) failed 0 test(s) generated core 1 test(s) needed retry |
During the first rpc clnt submission we take the rpc reference and register the call_bail function for the timer thread. The timer thread call call_bail function every 10s basis. In case if a client trigger a shutdown request it try to call rpc_clnt_connection_cleanup to cleanup the rpc connection.The rpc_clnt_connection would not be able to cleanup the rpc connection successfully due to the cleanup_started flag being set by the upper xlator. The rpc reference will be unref only after trigger a call_bail function so basically if somehow call_bail is triggered just before start a shutdown process the application has to wait for 10s to cleanup the rpc connection eventually the process becomes slow.
Solution: Unref the rpc object based on the conn->timer/conn->reconnect pointer value as we are doing the same for ping_timer. These pointer are always modified under the critical section so we can assume if pointer is valid it means rpc reference is also valid.
Fixes: #4320
credits: Xavi Hernandez xhernandez@redhat.com
Change-Id: Ib947b8bfcbe1b49e1ed05a50a84de6f92afbca13