Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] can not train yolov3 #60

Open
lucasjinreal opened this issue Jun 24, 2021 · 3 comments
Open

[BUG] can not train yolov3 #60

lucasjinreal opened this issue Jun 24, 2021 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@lucasjinreal
Copy link

Hi, not sure its a bug or something missed on my part. But I have a question about yolov3 preprocess_image.

Here is error I got:

   mask[b, a, gj, gi] = 1
IndexError: index 23 is out of bounds for dimension 3 with size 14

this means the label (wh) doesn't same as my image resized, see your image resized to 544 and stride is 32, then your label max would not exceed 17, but I got some index like 23.

And this prerocess_image code:

def preprocess_image(self, batched_inputs, training):
        """
        Normalize, pad and batch the input images.
        """
        images = [x["image"].to(self.device) for x in batched_inputs]
        bs = len(images)
        images = [self.normalizer(x) for x in images]

        images = ImageList.from_tensors(
            images, size_divisibility=0, pad_ref_long=True)
        logger.info('images ori shape: {}'.format(images.tensor.shape))
        logger.info('images ori shape: {}'.format(images.image_sizes))

        # sync image size for all gpus
        comm.synchronize()
        if training and self.iter % self.change_iter == 0:
            if self.iter < self.max_iter - 20000:
                meg = torch.LongTensor(1).to(self.device)
                comm.synchronize()
                if comm.is_main_process():
                    size = np.random.choice(self.multi_size)
                    meg.fill_(size)

                if comm.get_world_size() > 1:
                    comm.synchronize()
                    dist.broadcast(meg, 0)
                self.size = meg.item()

                comm.synchronize()
            else:
                self.size = 608

        if training:
            # resize image inputs
            modes = ['bilinear', 'nearest', 'bicubic', 'area']
            mode = modes[random.randrange(4)]
            if mode == 'bilinear' or mode == 'bicubic':
                images.tensor = F.interpolate(
                    images.tensor, size=[self.size, self.size], mode=mode, align_corners=False)
            else:
                images.tensor = F.interpolate(
                    images.tensor, size=[self.size, self.size], mode=mode)

            if "instances" in batched_inputs[0]:
                gt_instances = [
                    x["instances"].to(self.device) for x in batched_inputs
                ]
            elif "targets" in batched_inputs[0]:
                log_first_n(
                    logging.WARN,
                    "'targets' in the model inputs is now renamed to 'instances'!",
                    n=10)
                gt_instances = [
                    x["targets"].to(self.device) for x in batched_inputs
                ]
            else:
                gt_instances = None

            targets = [
                torch.cat(
                    [instance.gt_classes.float().unsqueeze(-1), instance.gt_boxes.tensor], dim=-1
                )
                for instance in gt_instances
            ]
            labels = torch.zeros((bs, 100, 5))
            for i, target in enumerate(targets):
                labels[i][:target.shape[0]] = target
            labels[:, :, 1:] = labels[:, :, 1:] / 512. * self.size
        else:
            labels = None

        self.iter += 1
        return images, labels

The image resized 2 times, but the label seems doesn't have any changes. Any idea how the error get ? (Maybe your code have some automatically way to solve image and labels, but I don't know where)

@lucasjinreal lucasjinreal added the bug Something isn't working label Jun 24, 2021
@FateScript
Copy link
Member

It's a yolov3 on coco dataset or on your own dataset?

@lucasjinreal
Copy link
Author

coco. I found it hard to converge on coco as well, I changed to retangle input size rather than force resize. Also your code have a bug in build_target which make w and h oppsite.

@FateScript
Copy link
Member

Thanks for your report, we will try in our inner version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants