Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Choose either 'long' or 'short' options for the resize anchor edge if the size variable is scalar #8358

Open
sidijju opened this issue Mar 28, 2024 · 3 comments

Comments

@sidijju
Copy link

sidijju commented Mar 28, 2024

馃殌 The feature

Choose either 'long' or 'short' options for the resize anchor edge if the size variable is scalar

Motivation, pitch

torchvision.transforms.Resize() does not provide a clean interface for resizing images based off the longer edge.

Consider the following use case - a user wants to resize a set of images such that the dimensions are constrained by size, e.g. the longer edge of the images is always equal to size. Consider two images of size [1000, 500] and [500, 1000]. We want to resize both such that the maximum dimension is 500, e.g. resize the first image to [500, 250]and the second to [250, 500].

The naive method approach would be to set size = 500. As noted in the docs,

If size is an int, smaller edge of the image will be matched to this number.

But in both our cases, the smaller edge of the image is already 500 so this essentially does nothing.

Setting max_size = 500 also doesn't solve the issue since the current implementation specifically doesn't allow max_size == size in the code. While we could select a value for size that is less than max_size, there's no clear way to pick a value of size that would result in the desired effect.

Right now there's no clean way to resize images based solely off the size of the longer edge. Adding the ability to pick the resize anchor edge would allow this.

Alternatives

No response

Additional context

A similar comment was made in #2868, but it seems like the discussion about the longer edge was lost in the final implementation

@NicolasHug
Copy link
Member

Thanks for the feature request @sidijju .

In order to get [500, 250] and [250, 500] from these specific input images, you could set size=499, max_size=500. But of course, this isn't a great UX, and it might not be possible to find a size value that would satisfy all input images.

There's been discussion of adding an edge parameter in the past but the parameters of resize are already fairly confusing. It seems that if we were to allow size=None, max_size=500 we could implement the behaviour you are looking for and this should cover all of the potential use cases:

  • size=tuple -> resize to fixed size
  • size=int -> resize shorter edge to size while preserving aspect ratio
  • size=int, max_size=int -> try to resize shorter edge to size while preserving aspect ratio but if resulting longer edge exceeds max_size, then scale down. This corresponds to the resizing strategy of some detection models.
  • size=None, max_size -> resize longer edge to max_size while preserving aspect ratio.

The first 3 are already implemented, the last one isn't. Any thoughts @pmeier @vfdev-5 ?

@sidijju
Copy link
Author

sidijju commented Apr 23, 2024

Thanks for the reply @NicolasHug! I agree that setting size=499, max_size=500 would work for this set of input images, but I'm not sure about the effect this would have on a more varied dataset. I also agree that it isn't the best UX since it's not very intuitive.

I think the proposal for a size=None option is a good stopgap for now. If others agree and I have some guidance from more experienced contributors, I can attempt to implement this feature.

@NicolasHug
Copy link
Member

Thanks for your feedback @sidijju . I'm happy to review a PR from you if you'd like to try to submit one. Our contributing guide is here: https://github.com/pytorch/vision/blob/main/CONTRIBUTING.md

For this specific change you'd only need to update torchvision.transforms.v2 (transform class, PIL functional and tensor functional). No need to change the "v1" transforms, i.e. the stuff in torchvision.transforms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants