Got stuck in getBatch with larger batch size #21

joeyhng · 2016-03-27T23:43:14Z

The following code reproduce the error:

local Dataset = require 'dataset.Dataset'

local opt = lapp[[
Got stuck in torch-dataset with batchSize == 128

(options)
   --batchSize     (default 128)    how many images in a mini-batch?
]]

-- create tmp csv file containing lots of rows
local tmpcsv = paths.tmpname() .. '.csv'
f = io.open(tmpcsv, 'w')
f:write('filename\n')
for i=1,300 do
  f:write(paths.tmpname() .. '\n')
end
f:close()

dataset = Dataset(tmpcsv)

getBatch, numBatches, reset = dataset.sampledBatcher({
  batchSize = opt.batchSize,
  inputDims = {10, 256},
  verbose = true,
  poolSize = 4,
  get = function(x)
    return torch.FloatTensor(10,256)
  end,
  processor = function(res, processorOpt, input) 
    return true
  end,
})

print('before getBatch')
local batch = getBatch()
print('finish getBatch')

Strangely the program works when batchSize is 64, but got stuck in getBatch() when batchSize is 128.

I came across this problem for several different problems which use custom get function and load data with other non-default method like image.load, where batchSize 64 works but not 128.

Any idea is appreciated. Thanks!

zakattacktwitter · 2016-03-29T17:05:29Z

Hi,

I am not sure what you are trying to accomplish with this sample code? Can you provide a high level explanation of what you want to use Dataset for?

Thanks,
Zak

joeyhng · 2016-03-29T17:59:21Z

In my actual application, I'm usually trying to do something like this:

getBatch, numBatches, reset = dataset.sampledBatcher({
  batchSize = opt.batchSize,
  inputDims = {10, 256},
  verbose = true,
  poolSize = 4,
  get = function(x)
    return torch.load(x) -- or some other loading function like image.load / npy4th.load
  end,
  processor = function(res, processorOpt, input) 
    local x = augment(res) -- some data augmentation function
    input:copy(x)
    return true
  end,
})

which I use a custom get function to load the data, and do some data augmentation in processor.

This issue happens to me in different similar scenario where larger batch size got stuck. Thanks for your help.

zakattacktwitter · 2016-03-29T18:42:24Z

Try not setting the poolSize option, that's a tricky one to set.

joeyhng · 2016-03-29T19:20:31Z

Yes, I find that not setting poolSize removes this error, but sometimes when I run for longer time the process got killed (just printed "Killed" in stderr), and I haven't figured out why yet. I suspect it is because of creating too many threads.

Should the poolSize limited by the number of cores on the machine? Are there any guideline for how to set it?

zakattacktwitter · 2016-03-29T22:12:48Z

Its not really meant for users to set. I should probably remove.

The threads are created once at the start and no more are created after it. So it doesn't make sense that it crashed due to too many threads.

They way you are using Dataset, putting torch.load in custom get function, will create a ton of garbage and definitely won't be speedy.

How is your data laid out? Is it a whole bunch of little files on disk? If you describe your data I can help you use Dataset to sample it efficiently.

joeyhng · 2016-03-29T22:36:36Z

I'm processing video data, which are saved in a hard drive mounted in the system. Usually I save them in two formats:

Extracted frame level features, usually in npy or t7 format. Each file would contain the extracted features of a specific video, which is a T x D tensor.
Video frames in image format. Each video will have a directory, each containing a number of .jpg files representing the frames. I usually sample and load a few consecutive frames from the directory in the get or processor function.

Thanks a lot for your help!

zakattacktwitter · 2016-05-04T17:33:15Z

Hi,

You can now adjust poolSize as much as you want.

The deadlock has been fixed in the IPC ( https://github.com/twitter/torch-ipc ) package. Just get the latest version of it and you should be good to go.

Thanks,
Zak

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Got stuck in getBatch with larger batch size #21

Got stuck in getBatch with larger batch size #21

joeyhng commented Mar 27, 2016

zakattacktwitter commented Mar 29, 2016

joeyhng commented Mar 29, 2016

zakattacktwitter commented Mar 29, 2016

joeyhng commented Mar 29, 2016

zakattacktwitter commented Mar 29, 2016

joeyhng commented Mar 29, 2016

zakattacktwitter commented May 4, 2016

Got stuck in getBatch with larger batch size #21

Got stuck in getBatch with larger batch size #21

Comments

joeyhng commented Mar 27, 2016

zakattacktwitter commented Mar 29, 2016

joeyhng commented Mar 29, 2016

zakattacktwitter commented Mar 29, 2016

joeyhng commented Mar 29, 2016

zakattacktwitter commented Mar 29, 2016

joeyhng commented Mar 29, 2016

zakattacktwitter commented May 4, 2016