You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I know the consensus about multithreading & playwright is that you should create one playwright instance per thread because playwright is not threadsafe, which is right. But one instance per thread seems quite costly and it is sad to lose the ability of the asyncio implementation to do multiple things at once on multiple tabs (it can do that, no?)
So I dug into the problem and I found the way to use an asyncio playwright from a multithreaded context safely. Which means you remain able to do multiple things using a single playwright instance concurrently, all while interacting with the browser from multiple threads as this is a legitimate use case for legacy reasons and other usability reasons. In my personal case I have a webmining project named minet in which I need to be able to combine multithreaded urllib3 or pycurl calls interwoven with some playwright tasks sometimes, for complex web crawling tasks and I orchestrate the threaded work using the quenouille library. In this context, mixing threads and asyncio for orchestration is a fully-fledged nightmare, so I wanted to find a way to pilot an asynchronous playwright instance from multiple threads.
The solution is therefore the following:
You need to have some class that will spawn a thread in which in new asyncio loop will run
Then you need to start the playwright instance in said thread and make the loop run forever
Then you can send "jobs" using coroutine functions called through asyncio.run_coroutine_threadsafe
There is some threading glue code involved of course for synchronization but the rest is pretty straightforward.
Is it really safe or is there some hidden footgun I have not yet seen?
Is this useful to anyone else? (I am not advocating for API additions nor code modification, here, but maybe documentation)
Is playwright actually able to run concurrent actions in multiple tabs (it certainly does look like it)
Is it actually worth it not to create multiple playwright instances because the underlying playwright controlling process might be some weird singleton I don't know about?
Hi.Yomguithereal .
I got a similar question.
Is it possible to the pages with different contexts all open in one playwright instance (one browser window) ?
I need to post a lot of different queries for the same website and using different proxies to avoid been detected.
It will consumes a lot of system resources if different contexts opened in different browser windows.
Thanks
Your question
Hello playwright team,
I know the consensus about multithreading & playwright is that you should create one playwright instance per thread because playwright is not threadsafe, which is right. But one instance per thread seems quite costly and it is sad to lose the ability of the asyncio implementation to do multiple things at once on multiple tabs (it can do that, no?)
So I dug into the problem and I found the way to use an asyncio playwright from a multithreaded context safely. Which means you remain able to do multiple things using a single playwright instance concurrently, all while interacting with the browser from multiple threads as this is a legitimate use case for legacy reasons and other usability reasons. In my personal case I have a webmining project named minet in which I need to be able to combine multithreaded
urllib3
orpycurl
calls interwoven with some playwright tasks sometimes, for complex web crawling tasks and I orchestrate the threaded work using the quenouille library. In this context, mixing threads and asyncio for orchestration is a fully-fledged nightmare, so I wanted to find a way to pilot an asynchronous playwright instance from multiple threads.The solution is therefore the following:
asyncio.run_coroutine_threadsafe
There is some threading glue code involved of course for synchronization but the rest is pretty straightforward.
Here is an example of such a class: https://github.com/medialab/minet/blob/master/minet/browser/threadsafe_browser.py
Here is an example of it being used: https://github.com/medialab/minet/blob/master/ftest/playwright-threading.py
But now I have some questions:
Some other related notes:
playwright
command line programmatically from python. I do it like so by copying/repurposing some internal code: https://github.com/medialab/minet/blob/master/minet/browser/plawright_shim.pyThe text was updated successfully, but these errors were encountered: