Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to reuse the puppeteer instance for better performance? #80

Open
tomerb15 opened this issue Nov 15, 2020 · 22 comments
Open
Labels
enhancement New feature or request help wanted Extra attention is needed question Further information is requested
Projects

Comments

@tomerb15
Copy link

What is your recommended settings to maximize performance in the following case:

  1. Small HTMLs usually resulting jpeg images of 10K-50K in size
  2. HTML content is 100% trusted
  3. HTML is send via data uri
  4. The image is returned in memory as a buffer and not saved to disk.

Thanks!

@frinyvonnick
Copy link
Owner

Hi @tomerb15 👋

I don't understand what is the issue you encountered. Is it too slow? If it is the case can you share your code, please?

@frinyvonnick frinyvonnick added the question Further information is requested label Dec 1, 2020
@tomerb15
Copy link
Author

tomerb15 commented Dec 1, 2020

Hey, we use it to generate many small screenshots. Basically small HTMLs to JPEGs.
The problem is that we have high volumes.

We found puppeteer-cluster and with the right configuration, we manage to crack it down and go from ~900ms per image (Which grow up to 60 seconds at peak) to a steady ~350ms per image which solved our problem.

Hope this will help anyone in the future.

@frinyvonnick
Copy link
Owner

@tomerb15 as you can see here node-html-to-image uses puppeteer-cluster under the hood. Maybe you could help optimize it with your configuration?

@tomerb15
Copy link
Author

tomerb15 commented Dec 1, 2020

Here is our configuration for puppeteer-cluster which works very well:

      {
            concurrency: Cluster.CONCURRENCY_CONTEXT,
            maxConcurrency: 10,
            puppeteerOptions: {args: ['--no-sandbox', '--headless', '--disable-gpu']}
        }

We don't see any memory leak from the browsers.
We also taking screenshot from a specific element in the dom rather then the entire body.

@erohana
Copy link

erohana commented Dec 13, 2020

@tomerb15 how did you configure it ? can you please provide an example ?

@KL13NT
Copy link

KL13NT commented Dec 29, 2020

Perhaps the maxConcurrency option can be specified as part of the options? I have a bot that produces images and spawning an instance every time to render images is a huge performance hit for me (about 500ms on average + network overhead). Different systems have different capabilities and the maxConcurrency option should be dependent on users, perhaps with the default as 2. Willing to PR this.

@KL13NT
Copy link

KL13NT commented Dec 29, 2020

Just discovered another performance-related issue. You create a cluster for a batch of jobs, but the maxConcurrency option fails to use the full advantage of puppeteer-cluser AND the fact that you terminated the whole cluster after the job is done is also a performance hit.

So issues currently present are:

  • In case someone needs to render multiple images at the same time, they're limited to 2 instances at a time which doesn't utilise the full potential of puppeteer-cluster.
  • In my case, I need to re-use the same cluster/instance for on-demand requests. The library currently terminates clusters at the end of each call, which is useless for me and actually adds more overhead than before.

Solutions:

  • Make maxConcurrency part of the options passed to the function.
  • Add an option to re-use the same cluster.

I'm ready to work on implementing these solutions.

@KL13NT
Copy link

KL13NT commented Dec 29, 2020

Spawning a cluster on a clean start of the bot took 3,000ms.
image

Subsequent spawns took an average of over 300ms.
image

@joshk0
Copy link

joshk0 commented Jan 28, 2021

@frinyvonnick
It would be nice to create some kind of instance object, this way puppeteer can be launched in the background as soon as the module is required or the object is instantiated. Because the instantiation of puppeteer occurs within the nodeHtmlToImage() call, we can't safely start chrome until that very moment.

In other words, my Dream Code:

// imageMaker contains the puppeteer instance
const imageMaker = require('node-html-to-image');

imageMaker.createImage({html: html}) // as many times as you want... reuses the same browser

joshk0 added a commit to joshk0/node-html-to-image that referenced this issue Jan 28, 2021
**This is a breaking API change (for now)**

node-html-to-image's default export is now a function which returns an
instance object with a render() method which is the old
nodeHtmlToImage() functionality. The reason for doing this is to allow
us to instantiate a single 'image maker' object that we call render() on
multiple times and reuse the same puppeteer cluster.

In addition,
* Update the type definitions.
* Use Cluster.execute() instead of Cluster.queue() plus Cluster.idle()
  as this will allow better use in concurrent situations. We will only
  wait for our jobs to finish rather than for the entire cluster to go
  idle.

Fixes frinyvonnick#80
@KL13NT
Copy link

KL13NT commented Jan 29, 2021

@frinyvonnick The changes would be significant in terms of performance and not go beyond a few lines of code.

@frinyvonnick
Copy link
Owner

@KL13NT I look at all issues and new messages. Answering takes some time. Avoid pinging me without new information. I'll answer as soon as I can. I maintain this package in my free time. Please, keep that in mind 😄

@BrnPer
Copy link

BrnPer commented Feb 3, 2021

I'm facing this issue... it takes about 2 seconds to render per image. I've already tried to change puppeter settings, but the time didn't change... can you please help me? I generate a buffer with the image in jpeg.
Thanks in advance.

@joshk0
Copy link

joshk0 commented Feb 3, 2021

@BrnPer @KL13NT
I'd love a tester for my change #95 . If you're willing to tweak the way you call the library, you can test it easily.

@frinyvonnick
Copy link
Owner

@joshk0 I would prefer to avoid breaking change. I created this package in the idea of having something simple and easy to use. Initially I thought people might directly use puppeteer for more advance use cases. I changed a bit my mind since people seems to enjoy this package.

Here are the existing use cases:

Single image

const nodeHtmlToImage = require('node-html-to-image');

nodeHtmlToImage({
  html: '<html><body>Hello world!</body></html>',
})

Batch image creation

const nodeHtmlToImage = require('node-html-to-image');

nodeHtmlToImage({
  html: '<html><body>Hello {{name}}!</body></html>',
  content: [{ name: 'Pierre' }, { name: 'Paul' }, { name: 'Jacques' }]
})
  .then(() => console.log('The images were created successfully!'))

We might expose a name export that let you reuse the instance like it:

const { launch } = require('node-html-to-image');

const instance = launch({ puppeteerArgs })

instance.screenshot({
  html: '<html><body>Hello {{name}}!</body></html>',
  content: [{ name: 'Pierre' }, { name: 'Paul' }, { name: 'Jacques' }]
})
  .then(() => console.log('The images were created successfully!'))

instance.screenshot({
  html: '<html><body>Hello {{name}}!</body></html>',
  content: { name: 'Michel' }
})
  .then(() => console.log('The image was created successfully!'))

@BrnPer
Copy link

BrnPer commented Feb 6, 2021

@frinyvonnick I think it would be a good idea to expose the instance.
In my testing I shave some milliseconds off by using the following puppeterArgs:

['--disable-gpu', '--no-sandbox', '--disable-setuid-sandbox', '--disable-dev-shm-usage', '--disable-accelerated-2d-canvas', '--no-first-run', '--no-zygote'] 

And changing the waitUntil to 'load'.
It now takes about 1 second to generate the image. But if was possible to generate even faster, would be awesome 😎
And btw @frinyvonnick awesome job with this library, well documented and very useful 🎉

@frinyvonnick frinyvonnick added enhancement New feature or request help wanted Extra attention is needed labels Feb 14, 2021
@zeel01
Copy link

zeel01 commented Apr 20, 2021

This would be of great use to me as well, I'm attempting to use this to create information cards to display in Discord embeds. Unfortunately it takes a rather long time for the images to be generated (around 4-5 seconds).

@bekworks
Copy link

@joshk0 I would prefer to avoid breaking change. I created this package in the idea of having something simple and easy to use. Initially I thought people might directly use puppeteer for more advance use cases. I changed a bit my mind since people seems to enjoy this package.

Here are the existing use cases:

Single image

const nodeHtmlToImage = require('node-html-to-image');

nodeHtmlToImage({
  html: '<html><body>Hello world!</body></html>',
})

Batch image creation

const nodeHtmlToImage = require('node-html-to-image');

nodeHtmlToImage({
  html: '<html><body>Hello {{name}}!</body></html>',
  content: [{ name: 'Pierre' }, { name: 'Paul' }, { name: 'Jacques' }]
})
  .then(() => console.log('The images were created successfully!'))

We might expose a name export that let you reuse the instance like it:

const { launch } = require('node-html-to-image');

const instance = launch({ puppeteerArgs })

instance.screenshot({
  html: '<html><body>Hello {{name}}!</body></html>',
  content: [{ name: 'Pierre' }, { name: 'Paul' }, { name: 'Jacques' }]
})
  .then(() => console.log('The images were created successfully!'))

instance.screenshot({
  html: '<html><body>Hello {{name}}!</body></html>',
  content: { name: 'Michel' }
})
  .then(() => console.log('The image was created successfully!'))

Could we get an update on the whole idea about reusing the instance?

The usecase i'm facing is the following:

12 different HTML templates
10 different sets of content

Right now, we have to go through them, one by one, because we have multiple different templates, which in turn means that we have to create and load a new puppeteer instance for every single image.

I could live with us being able to make html and content pairs

@frinyvonnick
Copy link
Owner

I discover that puppeteer-cluster can take a puppeteer object as parameter of the launch method (see https://github.com/thomasdondorf/puppeteer-cluster#clusterlaunchoptions). This could be an option to pass your own instance of puppeteer. Is someone interested in opening a pull request to add this option?

Another performance improvement could be to use an instance of puppeteer-core with headless shell (see https://github.com/puppeteer/puppeteer/blob/main/docs/api.md#puppeteer-vs-puppeteer-core). Could be useful in some contextes where disk space matters. Anyone interested in by this too?

@luanraithz
Copy link

I had performance issues for a scenario where a user request would start some good (~300) amount of screenshots, so if more than a small amount of users decided to do the same thing, the machine would run out of memory. I decided to create a singleton puppeteer cluster and enqueue manually the screenshots, another improvement was to render all the html in the same page and run query selectors (all the divs were small, height around 350px).
I went from ~58 seconds to take the screenshots to ~15.

@AramZS
Copy link

AramZS commented Jan 2, 2023

I'm very interested in this feature!

@BlazingTide
Copy link

Any update on this?

@Tiffceet
Copy link
Contributor

What is the current workaround for performance issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed question Further information is requested
Projects
Roadmap
To do
Development

Successfully merging a pull request may close this issue.