You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Choose either 'long' or 'short' options for the resize anchor edge if the size variable is scalar
Motivation, pitch
torchvision.transforms.Resize() does not provide a clean interface for resizing images based off the longer edge.
Consider the following use case - a user wants to resize a set of images such that the dimensions are constrained by size, e.g. the longer edge of the images is always equal to size. Consider two images of size [1000, 500] and [500, 1000]. We want to resize both such that the maximum dimension is 500, e.g. resize the first image to [500, 250]and the second to [250, 500].
The naive method approach would be to set size = 500. As noted in the docs,
If size is an int, smaller edge of the image will be matched to this number.
But in both our cases, the smaller edge of the image is already 500 so this essentially does nothing.
Setting max_size = 500 also doesn't solve the issue since the current implementation specifically doesn't allow max_size == size in the code. While we could select a value for size that is less than max_size, there's no clear way to pick a value of size that would result in the desired effect.
Right now there's no clean way to resize images based solely off the size of the longer edge. Adding the ability to pick the resize anchor edge would allow this.
Alternatives
No response
Additional context
A similar comment was made in #2868, but it seems like the discussion about the longer edge was lost in the final implementation
The text was updated successfully, but these errors were encountered:
In order to get [500, 250] and [250, 500] from these specific input images, you could set size=499, max_size=500. But of course, this isn't a great UX, and it might not be possible to find a size value that would satisfy all input images.
There's been discussion of adding an edge parameter in the past but the parameters of resize are already fairly confusing. It seems that if we were to allow size=None, max_size=500 we could implement the behaviour you are looking for and this should cover all of the potential use cases:
size=tuple -> resize to fixed size
size=int -> resize shorter edge to size while preserving aspect ratio
size=int, max_size=int -> try to resize shorter edge to size while preserving aspect ratio but if resulting longer edge exceeds max_size, then scale down. This corresponds to the resizing strategy of some detection models.
size=None, max_size -> resize longer edge to max_size while preserving aspect ratio.
The first 3 are already implemented, the last one isn't. Any thoughts @pmeier@vfdev-5 ?
Thanks for the reply @NicolasHug! I agree that setting size=499, max_size=500 would work for this set of input images, but I'm not sure about the effect this would have on a more varied dataset. I also agree that it isn't the best UX since it's not very intuitive.
I think the proposal for a size=None option is a good stopgap for now. If others agree and I have some guidance from more experienced contributors, I can attempt to implement this feature.
For this specific change you'd only need to update torchvision.transforms.v2 (transform class, PIL functional and tensor functional). No need to change the "v1" transforms, i.e. the stuff in torchvision.transforms
馃殌 The feature
Choose either 'long' or 'short' options for the resize anchor edge if the size variable is scalar
Motivation, pitch
torchvision.transforms.Resize()
does not provide a clean interface for resizing images based off the longer edge.Consider the following use case - a user wants to resize a set of images such that the dimensions are constrained by
size
, e.g. the longer edge of the images is always equal tosize
. Consider two images of size[1000, 500]
and[500, 1000]
. We want to resize both such that the maximum dimension is 500, e.g. resize the first image to[500, 250]
and the second to[250, 500]
.The naive method approach would be to set
size = 500
. As noted in the docs,But in both our cases, the smaller edge of the image is already 500 so this essentially does nothing.
Setting
max_size = 500
also doesn't solve the issue since the current implementation specifically doesn't allowmax_size == size
in the code. While we could select a value forsize
that is less thanmax_size
, there's no clear way to pick a value ofsize
that would result in the desired effect.Right now there's no clean way to resize images based solely off the size of the longer edge. Adding the ability to pick the resize anchor edge would allow this.
Alternatives
No response
Additional context
A similar comment was made in #2868, but it seems like the discussion about the longer edge was lost in the final implementation
The text was updated successfully, but these errors were encountered: