Many images on page slows down request when using page.images.with_size #242

martijndekuijper · 2018-11-06T10:33:31Z

Hey, when you request a page with many images (eg 80) and use page.images.with_size the request is so slow that our applications times out (>30s).

Is there a way to limit the number of images it calculates the size for? It would love to be able to only fetch the size of the first 10 images for example.

The text was updated successfully, but these errors were encountered:

jaimeiniesta · 2018-11-06T12:03:09Z

You can disable image downloading entirely if you want:

https://github.com/jaimeiniesta/metainspector#image-downloading

Downloading images can take a lot of time if there are many images and they take a lot to download. Better do this in a background job so that the server does not time out.

To improve this, I can think of:

a) Introduce a new option to limit the number of images to download. Or maybe we could change it to be an integer, instead of a boolean. For example, instead of download_images: false we could have download_images: 0 or download_images: 10. So by default we download all the images, but if an integer is specified we limit it to that.

b) We could try downloading the images in parallel.

What do you think?

martijndekuijper · 2018-11-06T12:40:47Z

O wow, this speeds up things drastically and fixed my issue for now! Thanks so much.

I feel this can indeed be improved even more by being able to set an option to limit the number of images to download. Loading in parallel also might fix the load time, but it still downloads the images then, which might be redundant if you don't need the last say 70 images. Right?

jaimeiniesta · 2018-11-06T13:13:18Z

That's right, if you're interested in downloading images to get better results for their dimensions then this should be definitely go into a background job. Well, as any external request can take a long time, maybe all things related to scraping should go in background jobs as a general rule.

You're right that we could do a) and b), that is, let us specify the number of images to download, and also try to download them in parallel.

ishields · 2020-06-13T21:50:19Z

I think the idea of making "download_images" an integer instead of a boolean is a good one too. I will make a PR. That or since there are more and more image options, maybe we just add a hash: image_options: { download: true, max_downloaded_images: 300 }
and include other ones I've aded in https://github.com/metainspector/metainspector/pull/267s (fetch_all_image_meta and image_blacklist_words)

martijndekuijper changed the title ~~Many images on page slows down request when using page.images.with_size~~ Many images on page slows down request when using page.images.with_size Nov 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Many images on page slows down request when using page.images.with_size #242

Many images on page slows down request when using page.images.with_size #242

martijndekuijper commented Nov 6, 2018

jaimeiniesta commented Nov 6, 2018 •

edited

martijndekuijper commented Nov 6, 2018

jaimeiniesta commented Nov 6, 2018

ishields commented Jun 13, 2020 •

edited

Many images on page slows down request when using page.images.with_size #242

Many images on page slows down request when using page.images.with_size #242

Comments

martijndekuijper commented Nov 6, 2018

jaimeiniesta commented Nov 6, 2018 • edited

martijndekuijper commented Nov 6, 2018

jaimeiniesta commented Nov 6, 2018

ishields commented Jun 13, 2020 • edited

jaimeiniesta commented Nov 6, 2018 •

edited

ishields commented Jun 13, 2020 •

edited