[Question]: I found a way to use a single playwright instance in a multithreaded context #2001

Yomguithereal · 2023-07-06T09:04:39Z

Your question

Hello playwright team,

I know the consensus about multithreading & playwright is that you should create one playwright instance per thread because playwright is not threadsafe, which is right. But one instance per thread seems quite costly and it is sad to lose the ability of the asyncio implementation to do multiple things at once on multiple tabs (it can do that, no?)

So I dug into the problem and I found the way to use an asyncio playwright from a multithreaded context safely. Which means you remain able to do multiple things using a single playwright instance concurrently, all while interacting with the browser from multiple threads as this is a legitimate use case for legacy reasons and other usability reasons. In my personal case I have a webmining project named minet in which I need to be able to combine multithreaded urllib3 or pycurl calls interwoven with some playwright tasks sometimes, for complex web crawling tasks and I orchestrate the threaded work using the quenouille library. In this context, mixing threads and asyncio for orchestration is a fully-fledged nightmare, so I wanted to find a way to pilot an asynchronous playwright instance from multiple threads.

The solution is therefore the following:

You need to have some class that will spawn a thread in which in new asyncio loop will run
Then you need to start the playwright instance in said thread and make the loop run forever
Then you can send "jobs" using coroutine functions called through asyncio.run_coroutine_threadsafe

There is some threading glue code involved of course for synchronization but the rest is pretty straightforward.

Here is an example of such a class: https://github.com/medialab/minet/blob/master/minet/browser/threadsafe_browser.py
Here is an example of it being used: https://github.com/medialab/minet/blob/master/ftest/playwright-threading.py

But now I have some questions:

Is it really safe or is there some hidden footgun I have not yet seen?
Is this useful to anyone else? (I am not advocating for API additions nor code modification, here, but maybe documentation)
Is playwright actually able to run concurrent actions in multiple tabs (it certainly does look like it)
Is it actually worth it not to create multiple playwright instances because the underlying playwright controlling process might be some weird singleton I don't know about?

Some other related notes:

This does not work out-of-the-box in py3.7 because of the lack of ThreadedChildWatcher which means you cannot stop the playwright instance from a non-main thread. This is fixable by backporting the class from py3.8 source code lol https://github.com/medialab/minet/blob/master/minet/browser/threadsafe_browser.py#L10
It would be useful to expose some way to run the playwright command line programmatically from python. I do it like so by copying/repurposing some internal code: https://github.com/medialab/minet/blob/master/minet/browser/plawright_shim.py
It would be useful to create pyinstaller build that don't package a browser and able to use those installed in a user's home normally.

The text was updated successfully, but these errors were encountered:

ukenmisneru · 2023-07-14T04:39:02Z

Hi.Yomguithereal .
I got a similar question.
Is it possible to the pages with different contexts all open in one playwright instance (one browser window) ?
I need to post a lot of different queries for the same website and using different proxies to avoid been detected.
It will consumes a lot of system resources if different contexts opened in different browser windows.
Thanks

dgtlmoon · 2023-11-08T09:58:16Z

Always consider that playwright is primarily a testing framework, not a web scraping framework.

mxschmitt added the P3-collecting-feedback label Jul 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: I found a way to use a single playwright instance in a multithreaded context #2001

[Question]: I found a way to use a single playwright instance in a multithreaded context #2001

Yomguithereal commented Jul 6, 2023

ukenmisneru commented Jul 14, 2023 •

edited

dgtlmoon commented Nov 8, 2023

[Question]: I found a way to use a single playwright instance in a multithreaded context #2001

[Question]: I found a way to use a single playwright instance in a multithreaded context #2001

Comments

Yomguithereal commented Jul 6, 2023

Your question

ukenmisneru commented Jul 14, 2023 • edited

dgtlmoon commented Nov 8, 2023

ukenmisneru commented Jul 14, 2023 •

edited