Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection management improvements for multi-database setup #7244

Open
ivyazmitinov opened this issue Oct 2, 2023 · 5 comments · May be fixed by #7286
Open

Connection management improvements for multi-database setup #7244

ivyazmitinov opened this issue Oct 2, 2023 · 5 comments · May be fixed by #7286

Comments

@ivyazmitinov
Copy link
Contributor

This issue is more of an invitation to a technical discussion than a solid proposal.

Currently, the main issue with having multiple DATABASEs in a cluster is connection management which is not adapted to this. The issue can be split into two parts:

  1. The most problematic areas are the transaction recovery and distributed deadlock detection mechanisms that do not respect citus.max_shared_pool_size and open at least n * d connections per worker, where n -- number of nodes in cluster, d -- number of DATABASEs
  2. Even if they respect the citus.max_shared_pool_size, this limit is applied per DATABASE, meaning that it will still require n * d connections.

In order to overcome this, I propose to:

  1. Make citus.max_shared_pool_size cluster-wide. Per-database setting then may be used to set a connection quota for a database within the global citus.max_shared_pool_size
  2. Make the transaction recovery and distributed deadlock detection respect the improved citus.max_shared_pool_size.

Since those changes make sense mostly for the multi-database setup, they may be enabled by a single GUC, like citus.multi_database, or have a separate GUC for the behaviour of citus.max_shared_pool_size, transaction recovery, and distributed deadlock detection.

@marcoslot
Copy link
Collaborator

marcoslot commented Oct 2, 2023

Make citus.max_shared_pool_size cluster-wide. Per-database setting then may be used to set a connection quota for a database within the global citus.max_shared_pool_size

Seems reasonable, not sure why we decided to include database in the key.

Make the transaction recovery and distributed deadlock detection respect the improved citus.max_shared_pool_size.

I think as a short-term fix they probably should not count towards max_shared_pool_size, since there is not much to be gained from it.

The longer term fix is to avoid a persistent connection per database. Connections from the maintenance daemon are cached because of how frequently the deadlock detector asks for lock graphs. However, we don't actually need a deadlock detector per database. Just one is enough.

The other (database-specific) operations in the maintenance daemon are much less frequent and could afford open a new connection every time, and either honour max_shared_pool_size (block if none available) or have a separate shared pool.

Thanks for sharing your feedback on this!

@ivyazmitinov
Copy link
Contributor Author

not sure why we decided to include database in the key

According to the comment, it is there specificaly to support the per-database setting 🙂

The longer term fix is to avoid a persistent connection per database. Connections from the maintenance daemon are cached because of how frequently the deadlock detector asks for lock graphs. However, we don't actually need a deadlock detector per database. Just one is enough.
The other (database-specific) operations in the maintenance daemon are much less frequent and could afford open a new connection every time, and either honour max_shared_pool_size (block if none available) or have a separate shared pool.

Agree, this improvement looks great! I was more focused on limiting the number of connections to workers than their caching, but remove this unnecessary action will be also beneficial.

either honour max_shared_pool_size (block if none available) or have a separate shared pool.

I would go with a separate shared pool, but introduce an option to configure it as a percentage of max_shared_pool_size, like max_shared_pool_maintenance_percent, dedicated to maintenance operations only. This way all the operations will be bound by the actual amount of the connection slots available, but maintenance operations won't compete with user queries.

Thanks for sharing your feedback on this!

I am actually up to a contribution (we suffered a long enough from this), so if you approve, I may start right away 🙂

@onderkalaci
Copy link
Member

either honour max_shared_pool_size (block if none available) or have a separate shared pool.

I would go with a separate shared pool, but introduce an option to configure it as a percentage of max_shared_pool_size, like max_shared_pool_maintenance_percent, dedicated to maintenance operations only. This way all the operations will be bound by the actual amount of the connection slots available, but maintenance operations won't compete with user queries.

The main reason we implemented citus.max_shared_pool_size is to prevent the coordinator establishing more connections than any worker can accept (e.g., citus.max_shared_pool_size < max_connections of worker)

I think making citus.max_shared_pool_size global (e.g., not per database) sounds useful and aligns with why we added it. I think one important caveat here is that citus.max_shared_pool_size cannot/should not be able to set per-database, otherwise things might become complicated. As far as I know, SIGHUP GUCs -- such as shared_pool_size -- cannot be set per database.

When it comes to a separate pool vs some percentage in the pool, I'm very much in favor of the latter. The former could be very hard to implement.

I think even the latter is non-trivial to implement, there are bunch of areas needs to be thought of well, such as make sure that this is not broken: https://github.com/citusdata/citus/blob/main/src/backend/distributed/connection/locally_reserved_shared_connections.c or https://github.com/citusdata/citus/blob/main/src/backend/distributed/executor/adaptive_executor.c#L4707

So, please make sure we have good test coverage involving these areas as well as base scenarios with multiple databases.

The longer term fix is to avoid a persistent connection per database

also, see #3671 as a reference

I am actually up to a contribution (we suffered a long enough from this), so if you approve, I may start right away 🙂

Great!

@ivyazmitinov
Copy link
Contributor Author

Thanks for the useful insights, @onderkalaci!

Before I start, I have one question regarding the deadlock detector:

However, we don't actually need a deadlock detector per database. Just one is enough

I assume, the way to solve this is to have one "management" database exactly for such cluster-wide procedures, and that is being implemented now in the pool_2pc branch?

@marcoslot
Copy link
Collaborator

I assume, the way to solve this is to have one "management" database exactly for such cluster-wide procedures, and that is being implemented now in the pool_2pc branch?

Yes, I think so. The goal of that infrastructure is to be able to sync and track role & database-related commands across the cluster by funneling them all through the management database (even from non-Citus databases). We could then also decide to run the deadlock detector only in the management database.

@ivyazmitinov ivyazmitinov linked a pull request Oct 27, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants