Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

训练集图像尺寸差异过大应该如何配置? #3083

Open
weirman opened this issue Jan 26, 2024 · 10 comments
Open

训练集图像尺寸差异过大应该如何配置? #3083

weirman opened this issue Jan 26, 2024 · 10 comments
Assignees

Comments

@weirman
Copy link

weirman commented Jan 26, 2024

请问下,分类数据集图像尺寸差异过大,应该如何设置 ResizeImage 的相关配置。
我希望将图像的短边resize到统一尺寸,长边按照短边进行缩放。
目前看到这个回答
当我设置

        - ResizeImage:
            resize_short: 48

提示

    raise ValueError('all input arrays must have the same shape')
ValueError: all input arrays must have the same shape
@changdazhou
Copy link
Contributor

@weirman
Copy link
Author

weirman commented Jan 26, 2024

我在训练过程中使用的yml文件信息如下:

DataLoader:
  Train:
    dataset:
      name: ImageNetDataset
      image_root: /
      cls_label_path:  /mnt/cls_train/train_new.txt
      transform_ops:
        - DecodeImage:
            to_rgb: True
            channel_first: False
        - ResizeImage:
            resize_short: 48
        - RandFlipImage:
            flip_code: 1
        - TimmAutoAugment:
            prob: 1.0
            config_str: rand-m9-mstd0.5-inc1
            interpolation: bicubic
            img_size: [320, 48]
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
        - RandomErasing:
            EPSILON: 1.0
            sl: 0.02
            sh: 1.0/3.0
            r1: 0.3
            attempt: 10
            use_log_aspect: True
            mode: pixel
    sampler:
      name: DistributedBatchSampler
      batch_size: 512
      drop_last: False
      shuffle: True
    loader:
      num_workers: 8
      use_shared_memory: True

但是有错误提示

Traceback (most recent call last):
  File "/root/miniconda3/envs/myconda/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/root/miniconda3/envs/myconda/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/paddle/io/dataloader/dataloader_iter.py", line 604, in _thread_loop
    batch = self._get_data()
  File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/paddle/io/dataloader/dataloader_iter.py", line 752, in _get_data
    batch.reraise()
  File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/paddle/io/dataloader/worker.py", line 178, in reraise
    raise self.exc_type(msg)
ValueError: DataLoader worker(6) caught ValueError with message:
Traceback (most recent call last):
  File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/paddle/io/dataloader/worker.py", line 363, in _worker_loop
    batch = fetcher.fetch(indices)
  File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/paddle/io/dataloader/fetcher.py", line 86, in fetch
    data = self.collate_fn(data)
  File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/paddle/io/dataloader/collate.py", line 75, in default_collate_fn
    return [default_collate_fn(fields) for fields in zip(*batch)]
  File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/paddle/io/dataloader/collate.py", line 75, in <listcomp>
    return [default_collate_fn(fields) for fields in zip(*batch)]
  File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/paddle/io/dataloader/collate.py", line 56, in default_collate_fn
    batch = np.stack(batch, axis=0)
  File "<__array_function__ internals>", line 180, in stack
  File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/numpy/core/shape_base.py", line 426, in stack
    raise ValueError('all input arrays must have the same shape')
ValueError: all input arrays must have the same shape

Traceback (most recent call last):
  File "tools/train.py", line 32, in <module>
    engine.train()
  File "/mnt/PaddleClas-release-2.5_240124/PaddleClas-release-2.5/ppcls/engine/engine.py", line 356, in train
    self.train_epoch_func(self, epoch_id, print_batch_step)
  File "/mnt/PaddleClas-release-2.5_240124/PaddleClas-release-2.5/ppcls/engine/train/train.py", line 24, in train_epoch
    for iter_id, batch in enumerate(engine.train_dataloader):
  File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/paddle/io/dataloader/dataloader_iter.py", line 825, in __next__
    self._reader.read_next_list()[0]
SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception.
  [Hint: Expected killed_ != true, but received killed_:1 == true:1.] (at ../paddle/fluid/operators/reader/blocking_queue.h:175)

应该是图片虽然被按照 resize_short 进行了缩放,但是没有填充,导致输入图像的宽度不在相同尺寸导致的吧?

@changdazhou
Copy link
Contributor

那crop加上在看下呢,能提供一下训练的哪个模型吗

@weirman
Copy link
Author

weirman commented Jan 26, 2024

在训练文种分类模型,crop是指的添加CropImage中的size么。
看代码仅在RandomResizedCrop仅仅在里面有CropWithPadding

@changdazhou
Copy link
Contributor

建议参考这个配置文件修改一下配置试试哈

@weirman
Copy link
Author

weirman commented Jan 26, 2024

好的,我测试一下,感觉这个模型很难训练,尤其是长宽比出现了很大的变化。

@weirman
Copy link
Author

weirman commented Jan 28, 2024

建议参考这个配置文件修改一下配置试试哈

这个文件是否已经被弃用了呢?

我的配置文件如下:

DataLoader:
  Train:
    dataset:
      name: ImageNetDataset
      image_root: /
      cls_label_path:  /mnt/cls_train/train.txt
      transform_ops:
        - DecodeImage:
            to_rgb: True
            channel_first: False
            
        - CropWithPadding:
            prob: 0.2
            padding_num: 0
            size: [112, 112]
            scale: [0.2, 1.0]
            ratio: [0.75, 1.3333333333333333]
        - RandFlipImage:
            flip_code: 1
        - TimmAutoAugment:
            prob: 1.0
            config_str: rand-m9-mstd0.5-inc1
            interpolation: bicubic
            img_size: [320, 48]
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
        - RandomErasing:
            EPSILON: 1.0
            sl: 0.02
            sh: 1.0/3.0
            r1: 0.3
            attempt: 10
            use_log_aspect: True
            mode: pixel
    sampler:
      name: DistributedBatchSampler
      batch_size: 512
      drop_last: False
      shuffle: True
    loader:
      num_workers: 8
      use_shared_memory: True

错误信息:with msg: 'CropWithPadding' object has no attribute '_get_param',另外我发现这个文件中使用的是transform,其他文件都是使用的transform_ops

错误原因:_get_param 被引用了,但是没有声明,这里应该是一个bug。

@cuicheng01
Copy link
Collaborator

把完整的配置提供一下吧,我们帮你复现下问题

@weirman
Copy link
Author

weirman commented Feb 1, 2024

我使用的是PULC language的默认参数,仅仅修改了

DataLoader:
        - CropWithPadding:
            prob: 0.2
            padding_num: 0
            size: [112, 112]
            scale: [0.2, 1.0]
            ratio: [0.75, 1.3333333333333333]

其他都没有改动。问题出现在_get_param(),可以发现在CropWithPadding类中调用了self._get_param,但是其实没有对_get_param进行定义。

@changdazhou
Copy link
Contributor

好的,我们已经记录,后续会进行测试

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants