-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] Allow configuration of MAX_IMAGE_PIXELS in ImageDatasource #45415
base: master
Are you sure you want to change the base?
Conversation
i rebased the pr to the latest of master to avoid some bugs in ci, my apologies if this interrupt your work in any ways, thankks |
b2aeee2
to
75a2c81
Compare
a44f713
to
e881c58
Compare
990cac5
to
06c7488
Compare
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
06c7488
to
4db9ec1
Compare
I tried to add a unit test but I don't think they exercise reading actual images. I manually ran this script to test the changes:
Exception raised:
|
cc @c21 would you mind reviewing this PR? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also add a test?
@@ -75,6 +77,9 @@ def _read_stream( | |||
) -> Iterator[Block]: | |||
from PIL import Image, UnidentifiedImageError | |||
|
|||
if self.max_image_pixels is not None: | |||
Image.MAX_IMAGE_PIXELS = self.max_image_pixels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this updates the the global variable, and will also affect the next read_images
call.
Is there a way to avoid this?
@@ -791,6 +791,7 @@ def read_images( | |||
file_extensions: Optional[List[str]] = ImageDatasource._FILE_EXTENSIONS, | |||
concurrency: Optional[int] = None, | |||
override_num_blocks: Optional[int] = None, | |||
max_image_pixels: Optional[int] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update this arg to something like pil_image_args
? so we don't need to keep adding new parameters in the future.
I tried to add a unit test but I couldn't get it passing because it didn't seem like the test actually reads images. Can you point me to an example unit test that is actually read images? |
Why are these changes needed?
PIL.Image sets a default limit to image sizes to prevent decompression bomb DOS attacks. Example stack trace:
However, the limit should be configurable when processing images with a trusted dataset and users should be able to configure the limit when calling
ray.data.read_images
.Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.