Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding the role of max_image_size in image quantization #420

Open
gpantaz opened this issue Sep 29, 2023 · 2 comments
Open

Question regarding the role of max_image_size in image quantization #420

gpantaz opened this issue Sep 29, 2023 · 2 comments

Comments

@gpantaz
Copy link

gpantaz commented Sep 29, 2023

Hello!

I would like to ask a question regarding the image quantization. I dont really understand why you divide coordinates of the bounding box with the max_image_size (= 512), instead of the patch_image_size

OFA/utils/transforms.py

Lines 240 to 243 in a36b91c

if "boxes" in target:
boxes = target["boxes"]
boxes = boxes / self.max_image_size
target["boxes"] = boxes

Assuming a bounding box [x1, y1, x2 x2] with width w and height h, to me it seems that the quantization of each coord would be x1 / w * (num_bins -1). For example for a bounding box [120, 200, 150, 220] with w = 600 and h = 800 the quantized x1 would be: 120 / 600 * (num_bins -1).

Could you also explain the choice behind the value of the max_image_size?

Thanks :)

@JJJYmmm
Copy link

JJJYmmm commented Mar 2, 2024

I have the same problem. : )

@JJJYmmm
Copy link

JJJYmmm commented Mar 2, 2024

Maybe it's just a coords normalization operation in both training and prediction.
However, when using bin2coord, it causes the coordinates to go out of the image(task.cfg.max_image_size >= task.cfg.patch_image_size).

def bin2coord(bins, w_resize_ratio, h_resize_ratio):
    bin_list = [int(bin[5:-1]) for bin in bins.strip().split()]
    coord_list = []
    coord_list += [bin_list[0] / (task.cfg.num_bins - 1) * task.cfg.max_image_size / w_resize_ratio]
    coord_list += [bin_list[1] / (task.cfg.num_bins - 1) * task.cfg.max_image_size / h_resize_ratio]
    coord_list += [bin_list[2] / (task.cfg.num_bins - 1) * task.cfg.max_image_size / w_resize_ratio]
    coord_list += [bin_list[3] / (task.cfg.num_bins - 1) * task.cfg.max_image_size / h_resize_ratio]
    return coord_list

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants