Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly exit ShuffleWorker process when terminate future finished #75

Open
Aitozi opened this issue Jul 24, 2022 · 1 comment
Open

Comments

@Aitozi
Copy link
Contributor

Aitozi commented Jul 24, 2022

In our usage, we encounter a case where the shuffle worker registers timeout and triggers a fatal error, but the shuffle worker process does not exit and this leads to no new worker being spawned to replace the current one .

The reason behind this is that the shuffle worker will execute closeAsync and shutdown all the component services. Obviously, the process will exit after all the non-daemon threads exit. But our metric client start extra thread not close rightly which cause this problem, this should fix by close these threads in the reporter#close method.

But I still think we should improve the shutdown logic a bit. We could explicitly exit the shuffle worker when the termination future completed. So that it will be safe for any situation when there are threads that can not be freed timely.

@Aitozi
Copy link
Contributor Author

Aitozi commented Jul 24, 2022

cc @wsry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant