Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

THREAD PANIC and Segmentation fault when passing data parallelized model between threads #483

Open
farleylai opened this issue Sep 29, 2017 · 0 comments

Comments

@farleylai
Copy link

The use case is to create multi-GPU model variants in multiple threads and even for later multi-threaded training. Only when the model is made data parallel using the DataParallelTable would the following THREAD PANIC and Segmentation fault be thrown when the data parallel model is passing between main thread and worker threads.

FATAL THREAD PANIC: (read) ...Downloads/pkg/torch/install/share/lua/5.1/torch/File.lua:370: table index is nil
THCudaCheck FAIL file=../Downloads/pkg/torch/extra/cutorch/torch/generic/Storage.c line=238 error=29 : driver shutting down
FATAL THREAD PANIC: (write) ...Downloads/pkg/torch/install/share/lua/5.1/torch/File.lua:210: cuda runtime error (29) : driver shutting down at /home/ml/farleylai/Downloads/pkg/torch/extra/cutorch/torch/generic/Storage.c:238	
Segmentation fault (core dumped)

The model is made data parallel using the multi-GPU example code:
function Models.parallelize(model)
if opt.nGPU > 1 then
local gpus = torch.range(1, opt.nGPU):totable()
local dpt = nn.DataParallelTable(1, true, true):add(model, gpus):threads(function() require 'cudnn' cudnn.benchmark = true end)
dpt.gradInput = nil
model = dpt:cuda()
end
return model
end

Any ideas or justifications?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant