Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge/concatenate all pillow images to one single image #238

Open
rushabh-wadkar opened this issue Jul 20, 2022 · 2 comments
Open

Merge/concatenate all pillow images to one single image #238

rushabh-wadkar opened this issue Jul 20, 2022 · 2 comments

Comments

@rushabh-wadkar
Copy link

rushabh-wadkar commented Jul 20, 2022

Hi Edouard,

Is there a way where we can merge the PIL images to one single image for multipage pdf ?
I see, in parsers.py function parse_buffer_to_jpeg, we convert the hex to PIL image array using Image.open( .. ) by splitting on \xff\d9 which is great but when I printed the buffer I saw the following -

Any jpeg file buffer hex starts with \xff\xd8 (which marks the start) and \x\ff\d9 (which marks the end). But when there is a multipage pdf, we do have something like this in middle -\x\ff\d9\xff\xd8 (combination of ending of page1 hex with starting of page2 hex). I tried removing it and replaced it with some random hex code but It didn't work. The Pillow image is created but when i try to save it, it fails. Am I missing something here ?

Can we have a solution/workaround for this as we need this ASAP?

The reason for this ask is because, we want to concatenate all the images to one single image and pass it to another system (which is consuming a lot of memory in RAM)

Merge images to one single image logic if you're interested to debug why it is consuming a lot of memory!

def merge(pil_images):
    widths, heights = zip(*(i.size for i in pil_images))
    
    total_width = max(widths)
    max_height = sum(heights) + (len(pil_images) -1)* 10
   
    merged_pil_image = Image.new('RGB', (total_width, max_height))
    x_offset = 0
    for im in pil_images:
            merged_pil_image.paste(im, (0, x_offset))
            x_offset += im.height +10
    return merged_pil_image
@rushabh-wadkar
Copy link
Author

@Belval ^^

@rushabh-wadkar
Copy link
Author

Attaching performance profiler -

There were 2 pages in pdf. So each image conversion took 350MB :'(

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
   272    675.0 MiB    675.0 MiB           1   @profile
   273                                         def merge_pil_images(pil_images: Union[PIL.Image.Image, List[PIL.Image.Image]]) -> PIL.Image.Image:
   274    675.0 MiB      0.0 MiB           9       widths, heights = zip(*(i.size for i in pil_images))
   275                                             
   276    675.0 MiB      0.0 MiB           1       total_width = max(widths)
   277    675.0 MiB      0.0 MiB           1       max_height = sum(heights) + (len(pil_images) -1)* 10
   278                                             
   279   1016.9 MiB    341.9 MiB           1       merged_pil_image = PIL.Image.new('RGB', (total_width, max_height))
   280   1016.9 MiB      0.0 MiB           1       x_offset = 0
   281   1358.6 MiB      0.0 MiB           4       for im in pil_images:
   282   1358.6 MiB    341.7 MiB           3               merged_pil_image.paste(im, (0, x_offset))
   283   1358.6 MiB      0.0 MiB           3               x_offset += im.height +10
   284   1358.6 MiB      0.0 MiB           1       return merged_pil_image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant