train.cfg in video-c3d #448

lixiangchun · 2018-03-19T10:55:29Z

Any example about the content of train.cfg used in video-c3d?

The text was updated successfully, but these errors were encountered:

baojun-nervana · 2018-03-19T16:46:01Z

@lixiangchun Below is an example. Hope it can help you.

manifest = [train:/dataset/aeon/V3D/ucf-extracted/train-index.csv, test:/dataset/aeon/V3D/ucf-extracted/test-index.csv]
manifest_root = /dataset/aeon/V3D/ucf-extracted
backend = gpu
epochs = 10
batch_size = 32
eval_freq = 1
log = video-c3d.log
output_file = video-c3d.hdf5
device_id = 0
data_dir = /dataset

lixiangchun · 2018-03-21T13:46:51Z

@baojun-nervana Thanks for your help.

Error in running python3 examples/video-c3d/train.py:

Traceback (most recent call last):
  File "/media/storage1/software/github/neon/examples/video-c3d/train.py", line 31, in <module>
    parser = NeonArgparser(__doc__, default_config_files=config_files)
  File "/usr/local/lib/python3.5/dist-packages/neon/util/argparser.py", line 80, in __init__
    super(NeonArgparser, self).__init__(*args, **kwargs)
TypeError: __init__() got multiple values for argument 'add_config_file_help'

baojun-nervana · 2018-03-21T17:55:06Z

That might be an issue related to configargparse version. That occurs on the newest version of the configargparse. The requirements.txt file recommends to use the following version.

configargparse==0.10.0

lixiangchun · 2018-03-22T01:53:12Z

Thanks, it works now.

However, I found that this repo only supports CPU or MLK as backend.The training process is very slow.

How to enable GPU as the backend for this repo?

baojun-nervana · 2018-03-22T02:38:58Z

@lixiangchun The example can run with GPU backend. What error did you see with gpu backend?
you might need to install the gpu dependencies.
https://github.com/NervanaSystems/neon/blob/master/gpu_requirements.txt

lixiangchun · 2018-03-22T03:00:12Z

@baojun-nervana After installing all packages in gpu_requirements.txt, the GPU backend can be used; however, the following error occurs:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/pycuda/tools.py", line 426, in context_dependent_memoize
    return ctx_dict[cur_ctx][args]
KeyError: <pycuda._driver.Context object at 0x7f3534cbe450>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/media/storage1/software/github/neon/examples/video-c3d/train.py", line 57, in <module>
    model.fit(train, optimizer=opt, num_epochs=args.epochs, cost=cost, callbacks=callbacks)
  File "/usr/local/lib/python3.5/dist-packages/neon/models/model.py", line 183, in fit
    self._epoch_fit(dataset, callbacks)
  File "/usr/local/lib/python3.5/dist-packages/neon/models/model.py", line 205, in _epoch_fit
    x = self.fprop(x)
  File "/usr/local/lib/python3.5/dist-packages/neon/models/model.py", line 236, in fprop
    res = self.layers.fprop(x, inference)
  File "/usr/local/lib/python3.5/dist-packages/neon/layers/container.py", line 395, in fprop
    x = l.fprop(x, inference=inference)
  File "/usr/local/lib/python3.5/dist-packages/neon/layers/layer.py", line 1061, in fprop
    bias=self.weight_bias, bsum=self.batch_sum, layer_op=self)
  File "/usr/local/lib/python3.5/dist-packages/neon/backends/nervanagpu.py", line 1990, in fprop_conv
    return self._execute_conv("fprop", layer, layer.fprop_kernels, repeat)
  File "/usr/local/lib/python3.5/dist-packages/neon/backends/nervanagpu.py", line 2072, in _execute_conv
    kernels.execute(repeat)
  File "/usr/local/lib/python3.5/dist-packages/neon/backends/convolution.py", line 551, in execute
    kernel = kernel_specs.get_kernel(self.kernel_name, self.kernel_options)
  File "<decorator-gen-35>", line 2, in get_kernel
  File "/usr/local/lib/python3.5/dist-packages/pycuda/tools.py", line 430, in context_dependent_memoize
    result = func(*args)
  File "/usr/local/lib/python3.5/dist-packages/neon/backends/kernel_specs.py", line 842, in get_kernel
    run_command([ "ptxas -v -arch", arch, "-o", cubin_file, ptx_file ])
  File "/usr/local/lib/python3.5/dist-packages/neon/backends/kernel_specs.py", line 785, in run_command
    raise RuntimeError("Error(%d):\n%s\n%s" % (proc.returncode, cmd, err))
RuntimeError: Error(136):
ptxas -v -arch sm_61 -o /home/lixc/.cache/neon/kernels/cubin/sconv_direct_fprop_64x32_SN_bias.cubin /home/lixc/.cache/neon/kernels/ptx/sconv_direct_fprop_64x32_SN_bias.ptx
b'Floating point exception (core dumped)\n'

My train.cfg is:

manifest = [train:/media/storage1/project/deep_learning/c3d_ucf/data/ucf-extracted/train-index.csv, test:/media/storage1/project/deep_learning/c3d_ucf/data/ucf-extracted/test-index.csv]
manifest_root = /media/storage1/project/deep_learning/c3d_ucf/data/ucf-extracted
backend = gpu
epochs = 10
batch_size = 16
eval_freq = 1
log = video-c3d.log
output_file = video-c3d.hdf5
device_id = 1
data_dir = train_output_dir
serialize = 1

Training was done via:

export LD_LIBRARY_PATH=/media/storage1/software/github/neon/mklml_lnx_2018.0.1.20171227/lib:$LD_LIBRARY_PATH
python3 /media/storage1/software/github/neon/examples/video-c3d/train.py -c train.cfg

baojun-nervana · 2018-03-22T22:16:03Z

@lixiangchun Are you using cuda9?
I am using cuda8 and there was issue reported on cuda9.

$nvcc --version │·
nvcc: NVIDIA (R) Cuda compiler driver │·
Copyright (c) 2005-2016 NVIDIA Corporation │·
Built on Tue_Jan_10_13:22:03_CST_2017 │·
Cuda compilation tools, release 8.0, V8.0.61

lixiangchun · 2018-03-23T15:27:14Z

@baojun-nervana Thanks. Yes, I use cuda9. Will go back to cuda8 and try again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

train.cfg in video-c3d #448

train.cfg in video-c3d #448

lixiangchun commented Mar 19, 2018

baojun-nervana commented Mar 19, 2018

lixiangchun commented Mar 21, 2018

baojun-nervana commented Mar 21, 2018

lixiangchun commented Mar 22, 2018 •

edited

baojun-nervana commented Mar 22, 2018

lixiangchun commented Mar 22, 2018

baojun-nervana commented Mar 22, 2018

lixiangchun commented Mar 23, 2018

train.cfg in video-c3d #448

train.cfg in video-c3d #448

Comments

lixiangchun commented Mar 19, 2018

baojun-nervana commented Mar 19, 2018

lixiangchun commented Mar 21, 2018

baojun-nervana commented Mar 21, 2018

lixiangchun commented Mar 22, 2018 • edited

baojun-nervana commented Mar 22, 2018

lixiangchun commented Mar 22, 2018

baojun-nervana commented Mar 22, 2018

lixiangchun commented Mar 23, 2018

lixiangchun commented Mar 22, 2018 •

edited