Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gevent.exceptions.InvalidSwitchError: Invalid switch into AsyncResult.wait(): None #2005

Open
r-bk opened this issue Nov 7, 2023 · 1 comment

Comments

@r-bk
Copy link

r-bk commented Nov 7, 2023

  • gevent version: pip3 install gevent==23.9.1 greenlet==3.0.1 celery[redis]==5.3.4 redis==4.6.0
  • Python version: python:3.10-slim-bookworm downloaded from hub.docker.com
  • Operating System: debian:bookworm in the container; ubuntu:focal on the Host

Description:

The following exception started to show up since upgrade to Python 3.10 (also happens with 3.11). Never happened with Python 3.8 or 3.9. As a consequence celery stops processing its task queue.

Exception ignored in:
<function AsyncResult.__del__ at 0x7fa913b38040>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/celery/result.py", line 416, in __del__

self.backend.remove_pending_result(self)
  File "/usr/local/lib/python3.10/site-packages/celery/backends/asynchronous.py", line 208, in remove_pending_result

self.on_result_fulfilled(result)
  File "/usr/local/lib/python3.10/site-packages/celery/backends/asynchronous.py", line 216, in on_result_fulfilled

self.result_consumer.cancel_for(result.id)
  File "/usr/local/lib/python3.10/site-packages/celery/backends/redis.py", line 184, in cancel_for

self._pubsub.unsubscribe(key)
  File "/usr/local/lib/python3.10/site-packages/redis/client.py", line 1659, in unsubscribe

return self.execute_command("UNSUBSCRIBE", *args)
  File "/usr/local/lib/python3.10/site-packages/redis/client.py", line 1469, in execute_command

self.connection = self.connection_pool.get_connection(
  File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 1461, in get_connection

connection.connect()
  File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 707, in connect

sock = self.retry.call_with_retry(
  File "/usr/local/lib/python3.10/site-packages/redis/retry.py", line 46, in call_with_retry

return do()
  File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 708, in <lambda>

lambda: self._connect(), lambda error: self.disconnect(error)
  File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 974, in _connect

for res in socket.getaddrinfo(
  File "/usr/local/lib/python3.10/site-packages/gevent/_socketcommon.py", line 225, in getaddrinfo

addrlist = get_hub().resolver.getaddrinfo(host, port, family, type, proto, flags)
  File "/usr/local/lib/python3.10/site-packages/gevent/resolver/thread.py", line 63, in getaddrinfo

return self.pool.apply(_socket.getaddrinfo, args, kwargs)
  File "/usr/local/lib/python3.10/site-packages/gevent/pool.py", line 161, in apply

return self.spawn(func, *args, **kwds).get()
  File "src/gevent/event.py", line 329, in gevent._gevent_cevent.AsyncResult.get
  File "src/gevent/event.py", line 356, in gevent._gevent_cevent.AsyncResult.get
  File "src/gevent/_abstract_linkable.py", line 487, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait_core
  File "src/gevent/_abstract_linkable.py", line 490, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait_core
  File "src/gevent/_abstract_linkable.py", line 442, in gevent._gevent_c_abstract_linkable.AbstractLinkable._AbstractLinkable__wait_to_be_notified
  File "src/gevent/_abstract_linkable.py", line 455, in gevent._gevent_c_abstract_linkable.AbstractLinkable._switch_to_hub
gevent.exceptions.InvalidSwitchError: Invalid switch into AsyncResult.wait(): None

What I've run:

The error happens in our test environment, and can be reproduced pretty consistently.
The exception documentation says: This is usually a bug in gevent, greenlet, or the event loop. So I have opened it here.

This exception resembles the bug fixed in python-greenlet/greenlet@f6fd00f. However, I have run the code with greenlet==3.0.1 compiled in debug mode. No assertions were triggered, and the exception was raised with exactly the same traceback.

@jamadden
Copy link
Member

jamadden commented Nov 7, 2023

Thanks for the report.

From the traceback, we can guess that this is occurring during garbage collection. And celery's code tries to do networking during garbage collection, when things are in a potentially unknown and fragile state. Sigh. Probably what's happening is that we're already running code when the GC is triggered, which changes things out from underneath us, leading to this.

I don't have a good general solution for that right now.

As a mitigation, you can try increasing your GC thresholds to make them less frequent, manually running GC's at "safe" points when otherwise idle, or even disabling GC. (I realize that all of that might be anywhere from non-trivial to impossible, depending on how you're running celery.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants