Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] GeneratorExit possibly causing issues on context detach in OTel finally #441

Open
cretz opened this issue Dec 7, 2023 · 9 comments
Labels
bug Something isn't working

Comments

@cretz
Copy link
Member

cretz commented Dec 7, 2023

Describe the bug

Reports of:

Failed to detach context

Traceback (most recent call last):
  File "/path/to/temporalio/contrib/opentelemetry.py", line 427, in _top_level_workflow_context
    yield None
  File "/path/to/temporalio/contrib/opentelemetry.py", line 340, in execute_workflow
    return await super().execute_workflow(input)
GeneratorExit

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/path/to/opentelemetry/context/__init__.py", line 157, in detach
    _RUNTIME_CONTEXT.detach(token)  # type: ignore
  File "/path/to/opentelemetry/context/contextvars_context.py", line 50, in detach
    self._current_context.reset(token)  # type: ignore
ValueError: <Token var=<ContextVar name='current_context' default={} at 0x79340011d590> at 0x7933a4b4c9c0> was created in a different Context"

There are outstanding issues like open-telemetry/opentelemetry-python#2606 that may be related.

@cretz cretz added the bug Something isn't working label Dec 7, 2023
@jpcarlino
Copy link

Is there any workaround for this? We have tons of these errors in our logs forcing us to disable tracing in our workers.

@cretz
Copy link
Member Author

cretz commented Apr 19, 2024

I am not aware of an obvious one (see linked upstream ticket). We may be able to make sure everywhere we detach we aren't doing so in a GeneratorExit.

However, we have found that GeneratorExit has other problems, so we have made a fairly substantial change to workflow teardown in #499 that should prevent GeneratorExit from occurring within common workflow use cases (i.e. we try to force tasks to complete before letting garbage collector take over and raise this). This will come out in next release, but if you'd like you can build main now and test it. Your errors may go away.

I wouldn't be surprised if this is some OpenTelemetry users' issue too - GeneratorExit can occur in a completely different thread than the async code was originally run within and may have lost all contextual information. It's a very dangerous thing Python does here (waking up tasks on GC in any thread).

@benliddicott
Copy link

benliddicott commented Apr 21, 2024

@cretz As I commented on #2606 I was able to work around a similar issue. Feel free to contact me if you are interested in more details.
open-telemetry/opentelemetry-python#2606 (comment)

I suspect that if the otel teardown_request hook "opentelemetry-flask.activation_key" could somehow know that it was not in the correct context (e.g. on the wrong thread, or different async context) and therefore decide to do nothing, this might resolve the issue.

@matthewgrossman
Copy link

matthewgrossman commented Apr 22, 2024

Hey @benliddicott I've been following this issue for python-otel, and put out a flask-specific fix awhile back: open-telemetry/opentelemetry-python-contrib#1692. Since you're still recently having issues, I'm imagining that didn't resolve it for you (unless you're on an old version?)

Maybe the changes / PR desc could be helpful debugging your issue?

@cretz
Copy link
Member Author

cretz commented Apr 22, 2024

For the Temporal SDK here in this GH issue, we don't even use Flask. The GC can wake up the coroutine in any thread, including the same one it may have started in but is currently being used for something else by the thread pool. There's not much you can do but avoid letting coroutines/generators get GC'd (i.e. cancel and wait for graceful completion).

@matthewgrossman
Copy link

matthewgrossman commented Apr 22, 2024

Ya sorry about the flask noise, probably better to post what I did back in open-telemetry/opentelemetry-python#2606

I'm now on a new project where we're running into the same error, and it's moreso related to generators/coroutines that you are mentioning; that's why I'm also following this thread

@cretz
Copy link
Member Author

cretz commented Apr 22, 2024

No prob, yeah I flat out could not find a good solution besides not ever reaching GeneratorExit (or coroutine GC). There's no way to disable this behavior, no way to intercept, and with threading (especially thread pools) no guarantee where it may wake up.

So that's how we solved this, just making sure all tasks complete (but leaving open until we release it)

@benliddicott
Copy link

benliddicott commented Apr 22, 2024

@matthewgrossman I am not having issues, as I have a workaround, as described in open-telemetry/opentelemetry-python#2606 (comment) so there is no urgency from my side.

@cretz I haven't tried to find out if my workaround is still necessary in the latest release - all I can say is that the version we are using hasn't broken the workaround.

@hellais
Copy link

hellais commented Apr 26, 2024

@benliddicott would you be available to share the code you used as a solution to this? I am running into the same issue as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants