Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The training was broken at 1k #3640

Open
zrr1005 opened this issue Apr 17, 2024 · 3 comments
Open

The training was broken at 1k #3640

zrr1005 opened this issue Apr 17, 2024 · 3 comments

Comments

@zrr1005
Copy link

zrr1005 commented Apr 17, 2024

I set max_iters to 80k and val_interval to 1000, and the program went 1k times wrongI set max_iters to 80k and val_interval to 1000, and the program went 1k times wrong.

  • Here is the error message:
    Traceback (most recent call last):
    File "tools/train.py", line 104, in
    main()
    File "tools/train.py", line 100, in main
    runner.train()
    File "/home/amax/anaconda3/envs/r_mmseg/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1777, in train
    model = self.train_loop.run() # type: ignore
    File "/home/amax/anaconda3/envs/r_mmseg/lib/python3.8/site-packages/mmengine/runner/loops.py", line 292, in run
    self.runner.val_loop.run()
    File "/home/amax/anaconda3/envs/r_mmseg/lib/python3.8/site-packages/mmengine/runner/loops.py", line 371, in run
    self.run_iter(idx, data_batch)
    File "/home/amax/anaconda3/envs/r_mmseg/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
    File "/home/amax/anaconda3/envs/r_mmseg/lib/python3.8/site-packages/mmengine/runner/loops.py", line 392, in run_iter
    self.evaluator.process(data_samples=outputs, data_batch=data_batch)
    File "/home/amax/anaconda3/envs/r_mmseg/lib/python3.8/site-packages/mmengine/evaluator/evaluator.py", line 60, in process
    metric.process(data_batch, _data_samples)
    File "/media/amax/Newsmy1/A_project/mmsegmentation-main/mmseg/evaluation/metrics/iou_metric.py", line 85, in process
    self.intersect_and_union(pred_label, label, num_classes,
    File "/media/amax/Newsmy1/A_project/mmsegmentation-main/mmseg/evaluation/metrics/iou_metric.py", line 186, in intersect_and_union
    pred_label = pred_label[mask]
    IndexError: The shape of the mask [497, 512] at index 0 does not match the shape of the indexed tensor [1080, 1920] at index 0

How should I solve this problem?

@Zoulinx
Copy link
Contributor

Zoulinx commented Apr 18, 2024

Inference error: Image and label size mismatch
This error indicates that there is a discrepancy between the input image and the label during inference. There are several potential causes for this error, including:
Inconsistent image and label sizes
Data augmentation issues: The data augmentation process might only be resizing the image, leaving the label untouched
It's crucial to ensure that the initial image and label sizes are consistent, especially for beginners. This is because mmseg currently restores the inference results to their original size based on the applied augmentations during testing.

@zrr1005
Copy link
Author

zrr1005 commented Apr 18, 2024

Inference error: Image and label size mismatch This error indicates that there is a discrepancy between the input image and the label during inference. There are several potential causes for this error, including: Inconsistent image and label sizes Data augmentation issues: The data augmentation process might only be resizing the image, leaving the label untouched It's crucial to ensure that the initial image and label sizes are consistent, especially for beginners. This is because mmseg currently restores the inference results to their original size based on the applied augmentations during testing.

Thank you for your thoughts. I checked that the image and label size matched each other, and I modified the data configuration file according to./config/datasets/ade20k. Changed dataset_type,data_root,scale,img_path,seg_map_path, nothing else. Below is my dataset configuration file:

dataset settings

dataset_type = 'CoalDataset'
data_root = '/media/amax/Newsmy1/A_data/coalFlow_1'
crop_size = (512, 512)

train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', reduce_zero_label=False),
dict(
type='RandomResize',
scale=(1920, 1080),
ratio_range=(0.5, 2.0),
keep_ratio=True),
dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
dict(type='RandomFlip', prob=0.5),
dict(type='PhotoMetricDistortion'),
dict(type='PackSegInputs')
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='Resize', scale=(1920, 1080), keep_ratio=True),
# add loading annotation after Resize because ground truth
# does not need to do resize data transform
dict(type='LoadAnnotations', reduce_zero_label=False),
dict(type='PackSegInputs')
]
img_ratios = [0.5, 0.75, 1.0, 1.25, 1.5, 1.75]
tta_pipeline = [
dict(type='LoadImageFromFile', backend_args=None),
dict(
type='TestTimeAug',
transforms=[
[
dict(type='Resize', scale_factor=r, keep_ratio=True)
for r in img_ratios
],
[
dict(type='RandomFlip', prob=0., direction='horizontal'),
dict(type='RandomFlip', prob=1., direction='horizontal')
], [dict(type='LoadAnnotations')], [dict(type='PackSegInputs')]
])
]
train_dataloader = dict(
batch_size=4,
num_workers=4,
persistent_workers=True,
sampler=dict(type='InfiniteSampler', shuffle=True),
dataset=dict(
type=dataset_type,
data_root=data_root,
data_prefix=dict(
img_path='images/training',
seg_map_path='annotations/training'),
pipeline=train_pipeline)
)
val_dataloader = dict(
batch_size=1,
num_workers=4,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
data_root=data_root,
data_prefix=dict(
img_path='images/validation',
seg_map_path='annotations/validation'),
pipeline=train_pipeline)
)
test_dataloader = val_dataloader
val_evaluator = dict(type='IoUMetric', iou_metrics=['mIoU'])
test_evaluator = val_evaluator

@zrr1005
Copy link
Author

zrr1005 commented Apr 22, 2024

I noticed my problem. In the above configuration file, the pipline in val_dataloader should be test_pipeline, I wrote train_pipeline

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants