Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow that one role of task executor could make other roles exit #636

Open
zuston opened this issue Jan 19, 2022 · 3 comments
Open

Allow that one role of task executor could make other roles exit #636

zuston opened this issue Jan 19, 2022 · 3 comments
Labels
enhancement New feature or request

Comments

@zuston
Copy link
Member

zuston commented Jan 19, 2022

Why

Sometimes when using Tensorflow estimator api, user will do some other things in the role of CHIEF after training finished, but that will cause a lot of waste of resources PS used.

So maybe we need to introduce new mechanism to allow users to mark training job finished in their python script and notify AM to stop other task executors.

@zuston
Copy link
Member Author

zuston commented Jan 19, 2022

Maybe this is a great improvement for saving resources. @oliverhu Please let me what you think.

@oliverhu
Copy link
Member

Can you elaboate a bit more? It is not a problem for us

@zuston
Copy link
Member Author

zuston commented Jan 19, 2022

As we know that PS wont stop until chief finished. But actually this is only for training.
If chief has two tasks:

  1. training. Need to cooperate with PS.
  2. do some other tasks which maybe a time-consuming operation. No need to cooperate with PS.

@zuston zuston added the enhancement New feature or request label Mar 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants