-
Notifications
You must be signed in to change notification settings - Fork 621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Got ChkStopThr and IntMsgThr after the training finished #7468
Comments
Hey @ease-zh, thanks for flagging this! Would you mind sending the |
Hi @ease-zh, thanks for sharing the |
Hey @ease-zh -- following up here for @luisbergua did you get a chance to locate the Look forward to hearing back |
@luisbergua No, there are no other error logs, everything seems good. |
Hi @ease-zh, thanks for confirming this! It might be then that the error is on the server side since it seems you're running wandb on a local server. Have you been able to execute runs successfully in the past using that server? Also, could you please reproduce the error and right after pull the Debug Bundle of the instance and share it with us? |
@luisbergua Yes, most time I can execute runs successfully. The errors were reported frequently only when I raised the issue, and recent tasks could run normally. |
After the training was done, wandb logs "Run history", "Run summary" and "Find logs at ...", then it throws two exceptions: ChkStopThr and IntMsgThr. I was using Ubuntu, and already set "WANDB_START_METHOD=thread" as said in #3223 , unfortunately, it did not work for me.
Below is the logs in console:
And the debug.log.
And the debug-internal.log is too large to upload here, please tell me which part is useful to solve the problem.
Also, this problem only occurred in times, even I run the same code.
The text was updated successfully, but these errors were encountered: