imagenet_256_cc.yml runtime error #9

mateibejan1 · 2022-07-25T07:33:48Z

I'm trying to test the 256 ImageNet model on the deblurring task on the OOD data you provide in your adiacent repository. I'm getting this error:

ERROR - main.py - 2022-07-25 10:25:13,026 - Traceback (most recent call last):
  File "/Users/mbejan/Documents/diffusion/ddrm/main.py", line 164, in main
    runner.sample()
  File "/Users/mbejan/Documents/diffusion/ddrm/runners/diffusion.py", line 161, in sample
    self.sample_sequence(model, cls_fn)
  File "/Users/mbejan/Documents/diffusion/ddrm/runners/diffusion.py", line 249, in sample_sequence
    for x_orig, classes in pbar:
  File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/site-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 438, in __iter__
    return self._get_iterator()
  File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 384, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1048, in __init__
    w.start()
  File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'Diffusion.sample_sequence.<locals>.seed_worker'

This is the script that creates the behaviour from above:

python main.py --ni \
  --config imagenet_256_cc.yml \
  --doc ood \
  --timesteps 20 \
  --eta 0.85 \
  --etaB 1 \
  --deg deblur_uni \
  --sigma_0 0.05 \

My imagenet_256_cc.yml is the same as the one your provide apart from the out_of _distribution argument, which is set to true.

The text was updated successfully, but these errors were encountered:

lshaw8317 · 2023-02-28T08:33:49Z

#18 is related. I also had the same error. Adding global seed_worker to Diffusion.sample_sequence in diffusion.py fails to resolve issue:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\shaw\Anaconda3\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\shaw\Anaconda3\lib\multiprocessing\spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'seed_worker' on <module 'runners.diffusion' from 'C:\\Users\\shaw\\Documents\\Year 2\\Diffusion Models\\ddrm\\runners\\diffusion.py'>

The reason (in my case) is that when running on Windows the multiprocessing module uses spawn and so one must (according to docs):

Wrap most of you main script’s code within if name == 'main': block, to make sure it doesn’t run again (most likely generating error) when each worker process is launched. You can place your dataset and DataLoader instance creation logic here, as it doesn’t need to be re-executed in workers.

Make sure that any custom collate_fn, worker_init_fn or dataset code is declared as top level definitions, outside of the main check. This ensures that they are available in worker processes. (this is needed since functions are pickled as references only, not bytecode.)

It is difficult to implement this advice since the seed_worker function needs access to the input args coming from the config file.
Simplest "solution" was to just set the worker_init_fn argument to None as below (within Diffusion.sample_sequence):

val_loader = data.DataLoader(
            test_dataset,
            batch_size=config.sampling.batch_size,
            shuffle=True,
            num_workers=config.data.num_workers,
            worker_init_fn=None,
            generator=g,
        )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

imagenet_256_cc.yml runtime error #9

imagenet_256_cc.yml runtime error #9

mateibejan1 commented Jul 25, 2022 •

edited

lshaw8317 commented Feb 28, 2023 •

edited

imagenet_256_cc.yml runtime error #9

imagenet_256_cc.yml runtime error #9

Comments

mateibejan1 commented Jul 25, 2022 • edited

lshaw8317 commented Feb 28, 2023 • edited

mateibejan1 commented Jul 25, 2022 •

edited

lshaw8317 commented Feb 28, 2023 •

edited