Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent IdleInTransactionSessionTimeout #28

Open
gavinmh opened this issue Dec 4, 2020 · 5 comments
Open

Intermittent IdleInTransactionSessionTimeout #28

gavinmh opened this issue Dec 4, 2020 · 5 comments

Comments

@gavinmh
Copy link

gavinmh commented Dec 4, 2020

Thanks for this useful package. Intermittently, my applications encounter the following exception:

IdleInTransactionSessionTimeout: terminating connection due to idle-in-transaction timeout
SSL connection has been closed unexpectedly

  File "sqlalchemy/engine/base.py", line 1245, in _execute_context
    self.dialect.do_execute(
  File "sqlalchemy/engine/default.py", line 588, in do_execute
    cursor.execute(statement, parameters)
  File "ddtrace/contrib/dbapi/__init__.py", line 99, in execute
    return self._trace_method(self.__wrapped__.execute, self._self_datadog_name, query, {}, query, *args, **kwargs)
  File "ddtrace/contrib/psycopg/patch.py", line 56, in _trace_method
    return super(Psycopg2TracedCursor, self)._trace_method(method, name, resource, extra_tags, *args, **kwargs)
  File "ddtrace/contrib/dbapi/__init__.py", line 69, in _trace_method
    return method(*args, **kwargs)
InternalError: (psycopg2.errors.IdleInTransactionSessionTimeout) terminating connection due to idle-in-transaction timeout
SSL connection has been closed unexpectedly

[SQL: set lock_timeout = %(timeout)s]
[parameters: {'timeout': 10000}]
(Background on this error at: http://sqlalche.me/e/2j85)
  File "flask/app.py", line 2446, in wsgi_app
    response = self.full_dispatch_request()
  File "flask/app.py", line 1951, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "ddtrace/contrib/flask/helpers.py", line 22, in wrapper
    return func(pin, wrapped, instance, args, kwargs)
  File "ddtrace/contrib/flask/helpers.py", line 31, in wrapper
    return wrapped(*args, **kwargs)
  File "flask/app.py", line 1820, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "flask/_compat.py", line 39, in reraise
    raise value
  File "flask/app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "ddtrace/contrib/flask/helpers.py", line 22, in wrapper
    return func(pin, wrapped, instance, args, kwargs)
  File "ddtrace/contrib/flask/patch.py", line 474, in _traced_request
    return wrapped(*args, **kwargs)
  File "flask/app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "ddtrace/contrib/flask/wrappers.py", line 25, in trace_func
    return wrapped(*args, **kwargs)
  File "connexion/decorators/decorator.py", line 48, in wrapper
    response = function(request)
  File "connexion/decorators/uri_parsing.py", line 144, in wrapper
    response = function(request)
  File "connexion/decorators/validation.py", line 384, in wrapper
    return function(request)
  File "connexion/decorators/parameter.py", line 121, in wrapper
    return function(**kwargs)
  File "src/api.py", line 179, in get_premises
    "location_risks": _get_risks(id, cached_only)['risks'],
  File "src/api.py", line 52, in _get_risks
    with lock:
  File "pals/core.py", line 132, in __enter__
    if not self.acquire():
  File "pals/core.py", line 101, in acquire
    self.conn.execute(timeout_sql, timeout=acquire_timeout)
  File "sqlalchemy/engine/base.py", line 982, in execute
    return meth(self, multiparams, params)
  File "sqlalchemy/sql/elements.py", line 293, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "sqlalchemy/engine/base.py", line 1095, in _execute_clauseelement
    ret = self._execute_context(
  File "sqlalchemy/engine/base.py", line 1249, in _execute_context
    self._handle_dbapi_exception(
  File "sqlalchemy/engine/base.py", line 1476, in _handle_dbapi_exception
    util.raise_from_cause(sqlalchemy_exception, exc_info)
  File "sqlalchemy/util/compat.py", line 398, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "sqlalchemy/util/compat.py", line 152, in reraise
    raise value.with_traceback(tb)
  File "sqlalchemy/engine/base.py", line 1245, in _execute_context
    self.dialect.do_execute(
  File "sqlalchemy/engine/default.py", line 588, in do_execute
    cursor.execute(statement, parameters)
  File "ddtrace/contrib/dbapi/__init__.py", line 99, in execute
    return self._trace_method(self.__wrapped__.execute, self._self_datadog_name, query, {}, query, *args, **kwargs)
  File "ddtrace/contrib/psycopg/patch.py", line 56, in _trace_method
    return super(Psycopg2TracedCursor, self)._trace_method(method, name, resource, extra_tags, *args, **kwargs)
  File "ddtrace/contrib/dbapi/__init__.py", line 69, in _trace_method
    return method(*args, **kwargs)

Are connections being left in the "idle in transaction" state? Can these transactions be committed or rolled back?

@rsyring
Copy link
Member

rsyring commented Dec 4, 2020

My guess is that there is probably a PostgreSQL setting at the database or connection level that is limiting how long a connection can remain open. And, that limit is lower than the lock expiration time you are setting.

Can you do some additional research and confirm or refute that guess?

@gavinmh
Copy link
Author

gavinmh commented Dec 4, 2020

Thanks for the quick response. The database's idle_in_transaction_session_timeout is 1d. I am letting pals use the default create_engine settings.

that limit is lower than the lock expiration time you are setting.

I don't understand this; I don't know how to set a lock expiration time. I do use the default Lock acquire timeout, -1.

@rsyring
Copy link
Member

rsyring commented Dec 4, 2020

I don't understand this; I don't know how to set a lock expiration time. I do use the default Lock acquire timeout, -1.

Sorry, I was actually thinking of the aquire timeout, which probably isn't affected here.

The idle_in_transaction_session_timeout value can be set on a user or connection basis. Just make sure you don't have anything else setting it to a value lower than you would expect.

Is it possible that sometimes the code that is being protected by the lock is pausing/sleeping/waiting for longer than one day?

I don't think I have any obvious answers for you. And I don't actually know the internals of postgresql that well or how the idle_in_transaction_session_timeout would interact with the pg_advisory_lock calls that we are making. On the face of it, it surprises me that a transaction would even be active. But that could actually be a side effect of using pyscopg2, see their docs on autocommit.

Perhaps PALs should be setting autocommit=True on the connection pool we are creating.

@rsyring
Copy link
Member

rsyring commented Jun 19, 2022

@gavinmh I realize this thread is pretty old but I'm trying to track down the same problem now. A few things:

  • I'm also seeing this occur in an app that is using datadog's ddtrace.
  • My exception is happening in a celery worker where we have forked the engine and have not disposed of the connections in the underlying pool as we should have. Could that be happening in your case?
  • Did you happen to find a solution?

1 similar comment
@rsyring
Copy link
Member

rsyring commented Jun 19, 2022

@gavinmh I realize this thread is pretty old but I'm trying to track down the same problem now. A few things:

  • I'm also seeing this occur in an app that is using datadog's ddtrace.
  • My exception is happening in a celery worker where we have forked the engine and have not disposed of the connections in the underlying pool as we should have. Could that be happening in your case?
  • Did you happen to find a solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants