The training was broken at 1k #3640

zrr1005 · 2024-04-17T12:13:11Z

I set max_iters to 80k and val_interval to 1000, and the program went 1k times wrongI set max_iters to 80k and val_interval to 1000, and the program went 1k times wrong.

Here is the error message:
Traceback (most recent call last):
File "tools/train.py", line 104, in
main()
File "tools/train.py", line 100, in main
runner.train()
File "/home/amax/anaconda3/envs/r_mmseg/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1777, in train
model = self.train_loop.run() # type: ignore
File "/home/amax/anaconda3/envs/r_mmseg/lib/python3.8/site-packages/mmengine/runner/loops.py", line 292, in run
self.runner.val_loop.run()
File "/home/amax/anaconda3/envs/r_mmseg/lib/python3.8/site-packages/mmengine/runner/loops.py", line 371, in run
self.run_iter(idx, data_batch)
File "/home/amax/anaconda3/envs/r_mmseg/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/amax/anaconda3/envs/r_mmseg/lib/python3.8/site-packages/mmengine/runner/loops.py", line 392, in run_iter
self.evaluator.process(data_samples=outputs, data_batch=data_batch)
File "/home/amax/anaconda3/envs/r_mmseg/lib/python3.8/site-packages/mmengine/evaluator/evaluator.py", line 60, in process
metric.process(data_batch, _data_samples)
File "/media/amax/Newsmy1/A_project/mmsegmentation-main/mmseg/evaluation/metrics/iou_metric.py", line 85, in process
self.intersect_and_union(pred_label, label, num_classes,
File "/media/amax/Newsmy1/A_project/mmsegmentation-main/mmseg/evaluation/metrics/iou_metric.py", line 186, in intersect_and_union
pred_label = pred_label[mask]
IndexError: The shape of the mask [497, 512] at index 0 does not match the shape of the indexed tensor [1080, 1920] at index 0

How should I solve this problem?

Zoulinx · 2024-04-18T01:45:22Z

Inference error: Image and label size mismatch
This error indicates that there is a discrepancy between the input image and the label during inference. There are several potential causes for this error, including:
Inconsistent image and label sizes
Data augmentation issues: The data augmentation process might only be resizing the image, leaving the label untouched
It's crucial to ensure that the initial image and label sizes are consistent, especially for beginners. This is because mmseg currently restores the inference results to their original size based on the applied augmentations during testing.

zrr1005 · 2024-04-18T02:19:39Z

Inference error: Image and label size mismatch This error indicates that there is a discrepancy between the input image and the label during inference. There are several potential causes for this error, including: Inconsistent image and label sizes Data augmentation issues: The data augmentation process might only be resizing the image, leaving the label untouched It's crucial to ensure that the initial image and label sizes are consistent, especially for beginners. This is because mmseg currently restores the inference results to their original size based on the applied augmentations during testing.

Thank you for your thoughts. I checked that the image and label size matched each other, and I modified the data configuration file according to./config/datasets/ade20k. Changed dataset_type,data_root,scale,img_path,seg_map_path, nothing else. Below is my dataset configuration file：

dataset settings

dataset_type = 'CoalDataset'
data_root = '/media/amax/Newsmy1/A_data/coalFlow_1'
crop_size = (512, 512)

train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', reduce_zero_label=False),
dict(
type='RandomResize',
scale=(1920, 1080),
ratio_range=(0.5, 2.0),
keep_ratio=True),
dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
dict(type='RandomFlip', prob=0.5),
dict(type='PhotoMetricDistortion'),
dict(type='PackSegInputs')
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='Resize', scale=(1920, 1080), keep_ratio=True),
# add loading annotation after Resize because ground truth
# does not need to do resize data transform
dict(type='LoadAnnotations', reduce_zero_label=False),
dict(type='PackSegInputs')
]
img_ratios = [0.5, 0.75, 1.0, 1.25, 1.5, 1.75]
tta_pipeline = [
dict(type='LoadImageFromFile', backend_args=None),
dict(
type='TestTimeAug',
transforms=[
[
dict(type='Resize', scale_factor=r, keep_ratio=True)
for r in img_ratios
],
[
dict(type='RandomFlip', prob=0., direction='horizontal'),
dict(type='RandomFlip', prob=1., direction='horizontal')
], [dict(type='LoadAnnotations')], [dict(type='PackSegInputs')]
])
]
train_dataloader = dict(
batch_size=4,
num_workers=4,
persistent_workers=True,
sampler=dict(type='InfiniteSampler', shuffle=True),
dataset=dict(
type=dataset_type,
data_root=data_root,
data_prefix=dict(
img_path='images/training',
seg_map_path='annotations/training'),
pipeline=train_pipeline)
)
val_dataloader = dict(
batch_size=1,
num_workers=4,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
data_root=data_root,
data_prefix=dict(
img_path='images/validation',
seg_map_path='annotations/validation'),
pipeline=train_pipeline)
)
test_dataloader = val_dataloader
val_evaluator = dict(type='IoUMetric', iou_metrics=['mIoU'])
test_evaluator = val_evaluator

zrr1005 · 2024-04-22T02:10:58Z

I noticed my problem. In the above configuration file, the pipline in val_dataloader should be test_pipeline, I wrote train_pipeline

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The training was broken at 1k #3640

The training was broken at 1k #3640

zrr1005 commented Apr 17, 2024

Zoulinx commented Apr 18, 2024

zrr1005 commented Apr 18, 2024 •

edited

zrr1005 commented Apr 22, 2024

The training was broken at 1k #3640

The training was broken at 1k #3640

Comments

zrr1005 commented Apr 17, 2024

Zoulinx commented Apr 18, 2024

zrr1005 commented Apr 18, 2024 • edited

dataset settings

zrr1005 commented Apr 22, 2024

zrr1005 commented Apr 18, 2024 •

edited