[BUG] asyncio.exceptions.InvalidStateError: invalid state thrown by exit in async context manager #2238

pjsg · 2024-01-08T23:23:21Z

System info

Playwright Version: [v1.40]
Operating System: [ macOS 14.2.1]
Browser: Chromium
Other info:

Source code

from playwright.async_api import async_playwright
import asyncio

async def doit(url):
    print(f"Processing {url}")
    try:
        async with async_playwright() as p:

                browser_type = p.chromium

                browser = await browser_type.launch(
                    headless=True,
                )

                page = await browser.new_page(
                    bypass_csp=True,
                    ignore_https_errors=True,
                )

                res = await page.goto(url, wait_until="load", timeout=30 * 1000)

                await page.wait_for_load_state(state="networkidle")
                await browser.close()

    except Exception as e:
        print(f"Got exception {e}")
        raise e

asyncio.run(doit("https://www.streetinsider.com/Press+Releases/Radius+Recycling+Reports+First+Quarter+Fiscal+2024+Financial+Results/22593061.html"))

Steps

Save the code above and run it. I'm using python 3.10.7

Expected

It should complete without error.

Actual

It throws an InvalidStateError -- if it works, just run it a couple more times. It nearly always fails for me.

Processing https://www.streetinsider.com/Press+Releases/Radius+Recycling+Reports+First+Quarter+Fiscal+2024+Financial+Results/22593061.html
Got exception invalid state
Traceback (most recent call last):
  File "/Users/philip/play-dir/playtest.py", line 22, in doit
    await page.wait_for_load_state(state="networkidle")
  File "/Users/philip/.pyenv/versions/play-dir/lib/python3.10/site-packages/playwright/async_api/_generated.py", line 9367, in wait_for_load_state
    await self._impl_obj.wait_for_load_state(state=state, timeout=timeout)
  File "/Users/philip/.pyenv/versions/play-dir/lib/python3.10/site-packages/playwright/_impl/_page.py", line 491, in wait_for_load_state
    return await self._main_frame.wait_for_load_state(**locals_to_params(locals()))
  File "/Users/philip/.pyenv/versions/play-dir/lib/python3.10/site-packages/playwright/_impl/_frame.py", line 237, in wait_for_load_state
    return await self._wait_for_load_state_impl(state, timeout)
  File "/Users/philip/.pyenv/versions/play-dir/lib/python3.10/site-packages/playwright/_impl/_frame.py", line 265, in _wait_for_load_state_impl
    await waiter.result()
playwright._impl._errors.TimeoutError: Timeout 30000ms exceeded.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/philip/play-dir/playtest.py", line 29, in <module>
    asyncio.run(doit("https://www.streetinsider.com/Press+Releases/Radius+Recycling+Reports+First+Quarter+Fiscal+2024+Financial+Results/22593061.html"))
  File "/Users/philip/.pyenv/versions/3.10.7/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/Users/philip/.pyenv/versions/3.10.7/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete
    return future.result()
  File "/Users/philip/play-dir/playtest.py", line 27, in doit
    raise e
  File "/Users/philip/play-dir/playtest.py", line 7, in doit
    async with async_playwright() as p:
  File "/Users/philip/.pyenv/versions/play-dir/lib/python3.10/site-packages/playwright/async_api/_context_manager.py", line 58, in __aexit__ 
    await self._connection.stop_async()
  File "/Users/philip/.pyenv/versions/play-dir/lib/python3.10/site-packages/playwright/_impl/_connection.py", line 288, in stop_async
    self.cleanup()
  File "/Users/philip/.pyenv/versions/play-dir/lib/python3.10/site-packages/playwright/_impl/_connection.py", line 299, in cleanup
    callback.future.set_exception(self._closed_error)
asyncio.exceptions.InvalidStateError: invalid state

The text was updated successfully, but these errors were encountered:

dgozman · 2024-01-12T19:11:25Z

I was able to repro in 1 out of 5 runs. However, I was not able to repro with the following snippet. Not yet sure what's going on.

from playwright.async_api import async_playwright
import asyncio

async def doit(url):
    print(f"Processing {url}")

    async with async_playwright() as p:
        browser_type = p.chromium
        browser = await browser_type.launch(
            headless=True,
        )

        try:
            page = await browser.new_page(
                bypass_csp=True,
                ignore_https_errors=True,
            )
            res = await page.goto(url, wait_until="load", timeout=30 * 1000)
            await page.wait_for_load_state(state="networkidle")
        except Exception as e:
            print(f"Got exception {e}")
            raise e
        finally:
            await browser.close()

asyncio.run(doit("https://www.streetinsider.com/Press+Releases/Radius+Recycling+Reports+First+Quarter+Fiscal+2024+Financial+Results/22593061.html"))

pjsg · 2024-01-12T19:14:34Z

It appears that the browser.close() is the key difference. In @dgozman example, this is executed, whereas in my example it is not executed (as the exception is already thrown). Having said that, if you don't do the close() then it throws a different exception on other urls: https://cnn.com/

mxschmitt · 2024-02-05T10:44:29Z

I'm unfortunately not able to reproduce it. I tried to repro running 10 times on macOS with Python 3.10 and Python 3.12.

mxschmitt · 2024-02-26T18:34:04Z

Closing for now since we can't reproduce it.

danphenderson · 2024-02-26T20:39:59Z

I don't think this should be closed. I can reproduce the error. Whenever there is a timeout error it appears that the event loop is closing, resulting in an Invalid state.

In [3]: from playwright.async_api import async_playwright
   ...: import asyncio
   ...:
   ...: async def doit(url):
   ...:     print(f"Processing {url}")
   ...:     try:
   ...:         async with async_playwright() as p:
   ...:
   ...:                 browser_type = p.chromium
   ...:
   ...:                 browser = await browser_type.launch(
   ...:                     headless=True,
   ...:                 )
   ...:
   ...:                 page = await browser.new_page(
   ...:                     bypass_csp=True,
   ...:                     ignore_https_errors=True,
   ...:                 )
   ...:
   ...:                 res = await page.goto(url, wait_until="load", timeout=30 * 1000)
   ...:
   ...:                 await page.wait_for_load_state(state="networkidle")
   ...:                 await browser.close()
   ...:
   ...:     except Exception as e:
   ...:         print(f"Got exception {e}")
   ...:         raise e
   ...:
   ...: asyncio.run(doit("https://www.streetinsider.com/Press+Releases/Radius+Recycling+Reports+First+Quarter+Fiscal+2024+Financial+Results/22593061.html"))
Processing https://www.streetinsider.com/Press+Releases/Radius+Recycling+Reports+First+Quarter+Fiscal+2024+Financial+Results/22593061.html
Got exception Timeout 30000ms exceeded.
---------------------------------------------------------------------------
TimeoutError                              Traceback (most recent call last)
Cell In[3], line 29
     26         print(f"Got exception {e}")
     27         raise e
---> 29 asyncio.run(doit("https://www.streetinsider.com/Press+Releases/Radius+Recycling+Reports+First+Quarter+Fiscal+2024+Financial+Results/22593061.html"))

File ~/.pyenv/versions/3.10.6/lib/python3.10/asyncio/runners.py:44, in run(main, debug)
     42     if debug is not None:
     43         loop.set_debug(debug)
---> 44     return loop.run_until_complete(main)
     45 finally:
     46     try:

File ~/.pyenv/versions/3.10.6/lib/python3.10/asyncio/base_events.py:646, in BaseEventLoop.run_until_complete(self, future)
    643 if not future.done():
    644     raise RuntimeError('Event loop stopped before Future completed.')
--> 646 return future.result()

Cell In[3], line 27, in doit(url)
     25 except Exception as e:
     26     print(f"Got exception {e}")
---> 27     raise e

Cell In[3], line 20, in doit(url)
     11 browser = await browser_type.launch(
     12     headless=True,
     13 )
     15 page = await browser.new_page(
     16     bypass_csp=True,
     17     ignore_https_errors=True,
     18 )
---> 20 res = await page.goto(url, wait_until="load", timeout=30 * 1000)
     22 await page.wait_for_load_state(state="networkidle")
     23 await browser.close()

File ~/Desktop/open-source/playwright-python/playwright/async_api/_generated.py:8612, in Page.goto(self, url, timeout, wait_until, referer)
   8551 async def goto(
   8552     self,
   8553     url: str,
   (...)
   8559     referer: typing.Optional[str] = None
   8560 ) -> typing.Optional["Response"]:
   8561     """Page.goto
   8562
   8563     Returns the main resource response. In case of multiple redirects, the navigation will resolve with the first
   (...)
   8608     Union[Response, None]
   8609     """
   8611     return mapping.from_impl_nullable(
-> 8612         await self._impl_obj.goto(
   8613             url=url, timeout=timeout, waitUntil=wait_until, referer=referer
   8614         )
   8615     )

File ~/Desktop/open-source/playwright-python/playwright/_impl/_page.py:500, in Page.goto(self, url, timeout, waitUntil, referer)
    493 async def goto(
    494     self,
    495     url: str,
   (...)
    498     referer: str = None,
    499 ) -> Optional[Response]:
--> 500     return await self._main_frame.goto(**locals_to_params(locals()))

File ~/Desktop/open-source/playwright-python/playwright/_impl/_frame.py:145, in Frame.goto(self, url, timeout, waitUntil, referer)
    135 async def goto(
    136     self,
    137     url: str,
   (...)
    140     referer: str = None,
    141 ) -> Optional[Response]:
    142     return cast(
    143         Optional[Response],
    144         from_nullable_channel(
--> 145             await self._channel.send("goto", locals_to_params(locals()))
    146         ),
    147     )

File ~/Desktop/open-source/playwright-python/playwright/_impl/_connection.py:59, in Channel.send(self, method, params)
     58 async def send(self, method: str, params: Dict = None) -> Any:
---> 59     return await self._connection.wrap_api_call(
     60         lambda: self.inner_send(method, params, False)
     61     )

File ~/Desktop/open-source/playwright-python/playwright/_impl/_connection.py:509, in Connection.wrap_api_call(self, cb, is_internal)
    507 self._api_zone.set(_extract_stack_trace_information_from_stack(st, is_internal))
    508 try:
--> 509     return await cb()
    510 finally:
    511     self._api_zone.set(None)

File ~/Desktop/open-source/playwright-python/playwright/_impl/_connection.py:97, in Channel.inner_send(self, method, params, return_as_dict)
     95 if not callback.future.done():
     96     callback.future.cancel()
---> 97 result = next(iter(done)).result()
     98 # Protocol now has named return values, assume result is one level deeper unless
     99 # there is explicit ambiguity.
    100 if not result:

TimeoutError: Timeout 30000ms exceeded.

yijiyap · 2024-04-04T09:24:01Z

I am facing a similar problem with my scraper as well. The entire code base is really large so I can't post it here.
The scraper is supposed to scrape about 1400+ pages, and each page has a timeout of about 10 seconds. The process should take about 12+ hours without any errors.

Where this error happens isn't exactly consistent, but it seems to occur somewhere after about 3 hours of scraping, at around 350 links. It only throws the error when I stop the python programme, and does not stop the python file automatically like an error.

Some measures taken to workaround:

created a csv to mark the exact link that was scraped until before the error occured. So that when I scrape again, it will resume from where was left off;
automatically restart the scraper after 2 hours before it hits the error message.

Edit: Happens on Python 3.10 on MacOS and Python 3.11 on Windows.

haf · 2024-04-22T21:56:35Z

Another stacktrace:

 .venv/lib/python3.11/site-packages/playwright/_impl/_connection.py:296, in Connection.cleanup(self, cause)
     294     ws_connection._transport.dispose()
     295 for callback in self._callbacks.values():
 --> 296     callback.future.set_exception(self._closed_error)
     297 self._callbacks.clear()
     298 self.emit("close")

With anyio:

async with (
    async_playwright() as p,
    create_task_group() as tg
):
    browser = await p.chromium.launch()
    list_spider = await SpiderAPI[ListingLink, ListPageLink].create(browser)
    tg.start_soon(list_spider.run, spider_list(config)) # curried
    await sleep(5)
    tg.cancel_scope.cancel()

dgozman transferred this issue from microsoft/playwright Jan 12, 2024

mxschmitt added the triaging label Feb 13, 2024

mxschmitt closed this as completed Feb 26, 2024

mxschmitt reopened this Feb 26, 2024

mxschmitt added P3-collecting-feedback and removed triaging labels Feb 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] asyncio.exceptions.InvalidStateError: invalid state thrown by exit in async context manager #2238

[BUG] asyncio.exceptions.InvalidStateError: invalid state thrown by exit in async context manager #2238

pjsg commented Jan 8, 2024

dgozman commented Jan 12, 2024

pjsg commented Jan 12, 2024

mxschmitt commented Feb 5, 2024

mxschmitt commented Feb 26, 2024

danphenderson commented Feb 26, 2024 •

edited

yijiyap commented Apr 4, 2024 •

edited

haf commented Apr 22, 2024 •

edited

[BUG] asyncio.exceptions.InvalidStateError: invalid state thrown by exit in async context manager #2238

[BUG] asyncio.exceptions.InvalidStateError: invalid state thrown by exit in async context manager #2238

Comments

pjsg commented Jan 8, 2024

System info

Source code

dgozman commented Jan 12, 2024

pjsg commented Jan 12, 2024

mxschmitt commented Feb 5, 2024

mxschmitt commented Feb 26, 2024

danphenderson commented Feb 26, 2024 • edited

yijiyap commented Apr 4, 2024 • edited

haf commented Apr 22, 2024 • edited

danphenderson commented Feb 26, 2024 •

edited

yijiyap commented Apr 4, 2024 •

edited

haf commented Apr 22, 2024 •

edited