New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exception during training #324
Comments
On 22/02/11 12:16AM, J. R. Schmid wrote:
Latest `binary_datasets` commit, same command and data that always used to go hundreds of epochs no problem:
As the error message says "please report a bug to PyTorch. This is a
pytorch internal crash that don't really have any influence on. Does it
occur reliably? If yes an immediate solution would be to disable workers
in your call as the crash happens somewhere in the shared memory code.
|
There's this issue on the main repository: pytorch/pytorch#1355. Possible reasons might be indeed running into shared memory limits or if using augmentation the OpenCV OpenMP issue mentioned in there as well. Kraken sets OpenMP threads to 0 when using the GPU but the last time I looked the only way to do that reliably system-wide is through the environment variable. |
My bad I didn't see this. Currently running again with the original |
Latest
binary_datasets
commit, same data that always used to go hundreds of epochs no problem:The text was updated successfully, but these errors were encountered: