Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simulatenous access issues - Deadlock found when trying to get lock; try restarting transaction #608

Open
guillaume-perreal opened this issue Jun 2, 2023 · 4 comments
Assignees
Labels
help wanted New New issue who need to be evaluated

Comments

@guillaume-perreal
Copy link

Describe the bug

I am using ansible to configure targets that are only accessible through passhport,
using the same user for all targets.

When accessing one target at a time, everything goes fine.

However, when accessing to several targets in parallel, connections start to fail randomly. I have tracked down the error to this message in passhportd logs :

[Thu Jun 01 15:35:26.896602 2023] [wsgi:error] [pid 28845] [remote 127.0.0.1:36534] [2023-06-01 15:35:26,895] ERROR in app: Exception on /user/accessible_idtargets/guillaume.perreal@inrae.fr [GET]
[Thu Jun 01 15:35:26.896627 2023] [wsgi:error] [pid 28845] [remote 127.0.0.1:36534] Traceback (most recent call last):
[Thu Jun 01 15:35:26.896631 2023] [wsgi:error] [pid 28845] [remote 127.0.0.1:36534]   File "/home/passhport/passhport-run-env/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1244, in _execute_context
[Thu Jun 01 15:35:26.896635 2023] [wsgi:error] [pid 28845] [remote 127.0.0.1:36534]     cursor, statement, parameters, context
[Thu Jun 01 15:35:26.896637 2023] [wsgi:error] [pid 28845] [remote 127.0.0.1:36534]   File "/home/passhport/passhport-run-env/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 552, in do_execute
[Thu Jun 01 15:35:26.896641 2023] [wsgi:error] [pid 28845] [remote 127.0.0.1:36534]     cursor.execute(statement, parameters)
[Thu Jun 01 15:35:26.896645 2023] [wsgi:error] [pid 28845] [remote 127.0.0.1:36534]   File "/home/passhport/passhport-run-env/lib/python3.7/site-packages/pymysql/cursors.py", line 148, in execute
[Thu Jun 01 15:35:26.896650 2023] [wsgi:error] [pid 28845] [remote 127.0.0.1:36534]     result = self._query(query)
[Thu Jun 01 15:35:26.896654 2023] [wsgi:error] [pid 28845] [remote 127.0.0.1:36534]   File "/home/passhport/passhport-run-env/lib/python3.7/site-packages/pymysql/cursors.py", line 310, in _query
[Thu Jun 01 15:35:26.896658 2023] [wsgi:error] [pid 28845] [remote 127.0.0.1:36534]     conn.query(q)
[Thu Jun 01 15:35:26.896662 2023] [wsgi:error] [pid 28845] [remote 127.0.0.1:36534]   File "/home/passhport/passhport-run-env/lib/python3.7/site-packages/pymysql/connections.py", line 548, in query
[Thu Jun 01 15:35:26.896665 2023] [wsgi:error] [pid 28845] [remote 127.0.0.1:36534]     self._affected_rows = self._read_query_result(unbuffered=unbuffered)
[Thu Jun 01 15:35:26.896668 2023] [wsgi:error] [pid 28845] [remote 127.0.0.1:36534]   File "/home/passhport/passhport-run-env/lib/python3.7/site-packages/pymysql/connections.py", line 775, in _read_query_result
[Thu Jun 01 15:35:26.896671 2023] [wsgi:error] [pid 28845] [remote 127.0.0.1:36534]     result.read()
[Thu Jun 01 15:35:26.896673 2023] [wsgi:error] [pid 28845] [remote 127.0.0.1:36534]   File "/home/passhport/passhport-run-env/lib/python3.7/site-packages/pymysql/connections.py", line 1156, in read
[Thu Jun 01 15:35:26.896676 2023] [wsgi:error] [pid 28845] [remote 127.0.0.1:36534]     first_packet = self.connection._read_packet()
[Thu Jun 01 15:35:26.896679 2023] [wsgi:error] [pid 28845] [remote 127.0.0.1:36534]   File "/home/passhport/passhport-run-env/lib/python3.7/site-packages/pymysql/connections.py", line 725, in _read_packet
[Thu Jun 01 15:35:26.896682 2023] [wsgi:error] [pid 28845] [remote 127.0.0.1:36534]     packet.raise_for_error()
[Thu Jun 01 15:35:26.896685 2023] [wsgi:error] [pid 28845] [remote 127.0.0.1:36534]   File "/home/passhport/passhport-run-env/lib/python3.7/site-packages/pymysql/protocol.py", line 221, in raise_for_error
[Thu Jun 01 15:35:26.896688 2023] [wsgi:error] [pid 28845] [remote 127.0.0.1:36534]     err.raise_mysql_exception(self._data)
[Thu Jun 01 15:35:26.896690 2023] [wsgi:error] [pid 28845] [remote 127.0.0.1:36534]   File "/home/passhport/passhport-run-env/lib/python3.7/site-packages/pymysql/err.py", line 143, in raise_mysql_exception
[Thu Jun 01 15:35:26.896707 2023] [wsgi:error] [pid 28845] [remote 127.0.0.1:36534]     raise errorclass(errno, errval)
[Thu Jun 01 15:35:26.896710 2023] [wsgi:error] [pid 28845] [remote 127.0.0.1:36534] pymysql.err.OperationalError: (1213, 'Deadlock found when trying to get lock; try restarting transaction')

According to what I have found (https://stackoverflow.com/questions/2332768/how-to-avoid-mysql-deadlock-found-when-trying-to-get-lock-try-restarting-trans), this might be related to the locking order of rows in the MySQL. I think multiple instances of passhportd try to access the same rows of the database in parallel and it somehow fails. This might be related to the SQL transaction isolation, as IIRC there are different kind of isolations, depending on the planned operation to the rows (read/write/etc...).

To Reproduce

  1. Setup passhport with multiple targets.
  2. Try accessing several targets with the same user in parallel (using some kind of tool).

Expected behavior

Accessing all targets simulatenously

@guillaume-perreal guillaume-perreal added the New New issue who need to be evaluated label Jun 2, 2023
@elg
Copy link
Contributor

elg commented Jun 2, 2023

Hi,

Thanks for this feedback. I've Never experienced that even if I agree that ansible is a pain to use with passhport apparently.

Since I can't have an easy way to reproduce, Can you provide a tool to contact multiple target in the same Time or describe step by step how to achieve that with a fresh ansible install ? Thanks

@guillaume-perreal
Copy link
Author

guillaume-perreal commented Jun 2, 2023

I can reproduce it with the following bash script. Ansible is not even necessary.

#!/usr/bin/env bash
set -exuo pipefail

# Usage: hammer.sh passhport@bastion.example.com target-1 target-2 target-3 ...

BASTION="$1"
shift

for TARGET in "$@"; do
    ssh -xT -o "BatchMode=yes" -o "ControlMaster=no" "$BASTION" "$TARGET" hostname &
done

wait

Errors start to happen with 3 targets or more.

@Raphux
Copy link
Contributor

Raphux commented Jun 2, 2023

@guillaume-perreal , do you use the apache server in place of the embeded server ?
http://docs.passhport.org/en/latest/installation-and-configuration/apache-wsgi-for-production.html
Parallelization won't work with embeded server, try with the above WSGI server.

@guillaume-perreal
Copy link
Author

Yes we are using Apache with the WSGI interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted New New issue who need to be evaluated
Projects
None yet
Development

No branches or pull requests

3 participants