Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed training does not exit properly after completion #129

Open
Licko0909 opened this issue Jun 22, 2022 · 0 comments
Open

Distributed training does not exit properly after completion #129

Licko0909 opened this issue Jun 22, 2022 · 0 comments

Comments

@Licko0909
Copy link

Dear Author,
This is a great result, I am recently using TAPE for sequence design task, I have no problem in single card training process, but I encounter a little problem using distributed training.
I'm looking forward to hearing from you.

Problem Description:

  • After the distributed training is finished, the program cannot be exited normally and remains in running state
  • There have been no problems with the training process.
  • 4 RTX6000 on one machine

image

image

I'm looking forward to hearing from you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant