7244/multi db connection improvements #7286

ivyazmitinov · 2023-10-27T15:32:27Z

This PR introduces the following changes:

Makes citus.max_shared_pool_size cluster-wide. Introduces new citus.max_databases_per_worker_tracked GUC to adapt the shared_connection_stats to this change
Introduces new "maintenance" connections and a dedicated connection pool for them. New pool is configured by new GUC citus.max_maintenance_shared_pool_size, which is equal to 5 by default.

~~1. Makes transaction recovery and distributed deadlocks detection respect citus.max_shared_pool_size~~
1. Adds citus.shared_pool_size_maintenance_quota GUC to reserve a fraction of citus.max_shared_pool_size for maintenance operations exclusively. So, by default, regular and maintenance operations have isolated fractions of the pool and do not compete with each other
~~1. Adds the possibility to assign one database as dedicated for maintenance via new citus.maintenance_management_database GUC. I'll integrate it later with #7254 if it is merged~~
~~1. Disables connections caching for all maintenance backends, except the one for citus.maintenance_management_database~~
~~1. Makes distributed deadlocks detection be invoked only on citus.maintenance_management_database, when it is set.~~

When citus.maintenance_management_database is not set (default), all maintenance daemons perform distributed deadlocks detection, but they never cache connections and respect the citus.shared_pool_size_maintenance_quota or citus.max_shared_pool_size.

Logic, related to citus.main_db will be introduced in a separate PR.

This is a draft implementation to check if I am moving in the right direction, so there is only the base scenario covered by test.
I also haven't found an example of distributed deadlocks detection test, so I'll appreciate some advice on this, as well as how to better address the @onderkalaci's concern:

I think even the latter is non-trivial to implement, there are bunch of areas needs to be thought of well, such as make sure that this is not broken: https://github.com/citusdata/citus/blob/main/src/backend/distributed/connection/locally_reserved_shared_connections.c or https://github.com/citusdata/citus/blob/main/src/backend/distributed/executor/adaptive_executor.c#L4707

cc @marcocitus

closes #7244

marcoslot · 2023-11-01T09:17:34Z

Makes transaction recovery and distributed deadlocks detection respect citus.max_shared_pool_size

I think this is probably the more controversial / hazardous part of this PR. It is required for stability that max_shared_pool_size >= max_client_connections, otherwise not every client connection can get a slot. We generally recommend max_client_connections == max_shared_pool_size such that you will not make more outbound connections per worker than you receive in inbound client connections, which means there is no slack in the max_shared_pool_size in most set ups.

If we change the semantics of max_shared_pool_size, we might risk a lot of set ups becoming unstable unless they set max_shared_pool_size >= max_client_connections + # reserved maintenance connections

Adds the possibility to assign one database as dedicated for maintenance via new citus.maintenance_management_database GUC. I'll integrate it later with #7254 if it is merged
Disables connections caching for all maintenance backends, except the one for citus.maintenance_management_database
Makes distributed deadlocks detection be invoked only on citus.maintenance_management_database, when it is set.

Could this part be a separate PR? It probably doesn't require much discussion and #7254 is merged now.

ivyazmitinov · 2023-11-01T09:53:10Z

If we change the semantics of max_shared_pool_size, we might risk a lot of setups becoming unstable unless they set max_shared_pool_size >= max_client_connections + # reserved maintenance connections

I see your point, but it is still a more predictable way to manage a cluster, I think. Currently, DBAs just have to keep in mind that there are some operations that require additional connections, so max_connections is typically slightly larger than max_shared_pool_size + max_client_connections. With this PR, the "slightly larger" part just becomes "official": configurable and isolated into its own part of max_shared_pool_size.
And, of course, it is always possible to implement a backup feature flag 🙂

Could this part be a separate PR?

I'd have to move those points as well, then:

Disables connections caching for all maintenance backends, except the one for citus.maintenance_management_database

Makes distributed deadlocks detection be invoked only on citus.maintenance_management_database, when it is set.

If it is okay, then no problem 🙂

marcoslot · 2023-11-03T22:32:54Z

I see your point, but it is still a more predictable way to manage a cluster,

Perhaps, if we were starting from scratch. In practice, due to a large user / customer base, and platforms that need to support a large number of version combinations, we try to avoid breaking backwards compatibility and/or introducing custom upgrade steps whenever possible.

An alternative option could be to have 2 separate pool settings, e.g. max_maintenance_pool_size & max_shared_pool_size. The former could default to some small number (e.g. 5), to keep the total number of outbound connections predictable.

If it is okay, then no problem 🙂

Yup

ivyazmitinov · 2023-11-08T08:13:58Z

Sounds reasonable, and shouldn't be too hard to implement...
Ok, I'll be back with that and a new PR for main_db then 🙂

ivyazmitinov · 2023-11-30T12:35:02Z

@marcoslot, done, now it is a separate pool 🙂

ivyazmitinov · 2023-11-30T12:35:24Z

src/backend/distributed/connection/shared_connection_stats.c

+        int currentConnectionsCount;
+        if (maintenanceConnection)
+        {
+            currentConnectionsLimit = GetMaxMaintenanceSharedPoolSize();


@marcoslot, not sure about this moment: current implementation has common MaxMaintenanceSharedPoolSize both for remote and local connections, because it is simpler. Is it necessary to introduce separate limit for local maintenance connections?
Another effect of this is that when GetLocalSharedPoolSize() == DISABLE_REMOTE_CONNECTIONS_FOR_LOCAL_QUERIES, remote connections are disabled both for maintenance and regular connections

ivyazmitinov · 2024-01-09T11:02:48Z

@onderkalaci, kind remainder about this PR 🙂
I noticed that @marcoslot is at Crunchy Data now, so wanted to make sure that the PR won't be abandoned due to organizational changes

ivyazmitinov · 2024-01-11T09:00:08Z

Well, that's awkward...
@JelteF, maybe you could help :) ?

JelteF · 2024-01-11T09:15:27Z

I'll try to take a look at this soon, thank you for your work on this.

JelteF · 2024-01-18T10:59:09Z

Could you run citus_indent on your PR? It's hard for me to see what code is actually changed because some tabs have been changed to spaces. Instructions to run it can be found here: https://github.com/citusdata/citus/blob/main/CONTRIBUTING.md#following-our-coding-conventions

JelteF · 2024-01-18T12:31:00Z

FYI I pushed a commit to your branch to reindent it.

JelteF

Overall I quite like the idea and implementation. The build is failing though, and thus the tests are not actually run in CI. So I don't know if this completely breaks some tests.

I left a few small comments now. Once the tests are passing I'll take another close look.

JelteF · 2024-01-29T21:44:40Z

src/backend/distributed/connection/shared_connection_stats.c



 PG_FUNCTION_INFO_V1(citus_remote_connection_stats);

+


unnecessary diff

Suggested change

JelteF · 2024-01-29T21:46:42Z

src/backend/distributed/connection/shared_connection_stats.c

-	/*
-	 * Given that citus.shared_max_pool_size can be defined per database, we
-	 * should keep track of shared connections per database.
-	 */


Removing this comment indeed seems like the correct thing: Based on this comment throttling was originally implemented per database on purpose, but I don't think that makes much sense. max_connections is at the server level. And also citus.max_shared_pool_size is PGC_SIGHUP, so it cannot be set at the database level.

JelteF · 2024-01-29T21:49:14Z

src/backend/distributed/connection/shared_connection_stats.c

-	bool counterIncremented = false;
-	SharedConnStatsHashKey connKey;
-
-	strlcpy(connKey.hostname, hostname, MAX_NODE_LENGTH);
-	if (strlen(hostname) > MAX_NODE_LENGTH)
-	{
-		ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-						errmsg("hostname exceeds the maximum length of %d",
-							   MAX_NODE_LENGTH)));
-	}


Why remove this check?

It moved to PrepareWorkerNodeHashKey

JelteF · 2024-01-29T21:56:14Z

src/backend/distributed/connection/shared_connection_stats.c

-	}
-	else
+
+	if (IsLoggableLevel(DEBUG4))


nit: this if check shouldn't be necessary. ereport something like that internally already.

I thought it was there to prevent building of the expensive errmsg. Maybe it is better to leave if this way just in case, to avoid some performance regression?

JelteF · 2024-01-29T22:09:29Z

src/backend/distributed/connection/shared_connection_stats.c

+	uint32 hash = string_hash(entry->workerNodeKey.hostname, NAMEDATALEN);
+	hash = hash_combine(hash, hash_uint32(entry->workerNodeKey.port));


nit: Would be good to call SharedConnectionHashHash here instead, instead of re-implementing it. Same for the Compare function.

ivyazmitinov · 2024-01-30T10:48:04Z

@JelteF Great, thanks! I'll address your comments shortly.
Test fails doesn't seem to be caused by my changes, right? But the branch should be synced with main anyway, may be it will help 🤷‍♂️

I also concerned with the test coverage: as I mentioned, I covered only basic scenario. Maybe, there are some use-cases that should be also covered? Especially the distributed deadlock detection, which I am not sure how to test with the existing tool set

JelteF · 2024-01-30T13:29:27Z

The build error on PG16 seems to be because you included #include "distributed/connection_management.h" after pg_versioncompat.h in maintenanced.c. (this is arguably a bug in in how we define has_createdb_privilege in pg_versioncompat.h, but it's only being triggered because of your changes).

JelteF

This at least needs some changes to make the PG16 build and test pass (I think I pressed approve by accident before). General idea and implementation still seem good.

ivyazmitinov · 2024-02-01T14:15:10Z

@JelteF, fixed pg16 build and addressed the review.

The #define have_createdb_privilege() have_createdb_privilege() in pg_version_compat.h seems to be unnecessary, and it breaks the build if included before commands/dbcommands.h. Seems to work fine without it

ivyazmitinov · 2024-02-01T14:32:01Z

I also experienced some flakiness with multi_extension, is this expected?

codecov · 2024-02-01T15:45:36Z

Codecov Report

Attention: Patch coverage is 80.45455% with 43 lines in your changes are missing coverage. Please review.

Project coverage is 82.65%. Comparing base (553d5ba) to head (02e7191).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7286      +/-   ##
==========================================
- Coverage   89.69%   82.65%   -7.04%     
==========================================
  Files         283      283              
  Lines       60506    60581      +75     
  Branches     7538     7541       +3     
==========================================
- Hits        54270    50073    -4197     
- Misses       4081     8068    +3987     
- Partials     2155     2440     +285

ivyazmitinov · 2024-02-02T09:37:29Z

Tried to fix the check_gucs_are_alphabetically_sorted and got a bit confused: it seems to be failing on citus.enable_repartitioned_insert_select even on main, on my machine. But it somehow passes on your CI 🤔

ivan@ivan-pc:~/IdeaProjects/citus-fork$ ./ci/check_gucs_are_alphabetically_sorted.sh 
sort: gucs.out:40: disorder:            "citus.enable_repartitioned_insert_select",

JelteF · 2024-02-06T14:13:03Z

I haven't looked at it in more detail yet, but at least the flaky test detection is failing because this test randomly fails:

+++ /__w/citus/citus/src/test/regress/results/multi_maintenance_multiple_databases.out.modified	2024-02-02 13:40:41.496394020 +0000
@@ -277,21 +277,21 @@
                    (SELECT setting::int FROM pg_settings WHERE name = 'port')),
             $statement$
             SELECT groupid, gid
             FROM pg_dist_transaction
             WHERE gid LIKE 'citus_0_1234_4_0_%'
                 OR gid LIKE 'citus_0_should_be_forgotten_%'
             $statement$) AS t(groupid integer, gid text)
 WHERE datname LIKE 'db%';
  pg_dist_transaction_after_recovery_coordinator_test 
 -----------------------------------------------------
- t
+ f
 (1 row)
 
 SELECT count(*) = 0 AS cached_connections_after_recovery_coordinator_test
 FROM pg_stat_activity
 WHERE state = 'idle'
   AND now() - backend_start > '5 seconds'::interval;
  cached_connections_after_recovery_coordinator_test 
 ----------------------------------------------------

JelteF · 2024-02-06T14:16:56Z

src/backend/distributed/shared_library_init.c

@@ -2021,6 +2033,20 @@ RegisterCitusConfigVariables(void)
 		GUC_STANDARD,
 		NULL, NULL, NULL);

+	DefineCustomIntVariable(
+		"citus.max_databases_per_worker_tracked",


Tried to fix the check_gucs_are_alphabetically_sorted and got a bit confused: it seems to be failing on citus.enable_repartitioned_insert_select even on main, on my machine. But it somehow passes on your CI 🤔

ivan@ivan-pc:~/IdeaProjects/citus-fork$ ./ci/check_gucs_are_alphabetically_sorted.sh sort: gucs.out:40: disorder: "citus.enable_repartitioned_insert_select",

Looking at your changes both new gucs that you're introducing are not at the correct place alphabetically (both need to be moved up). I expect that if you fix their location in the list, that the error goes away.

I agree that it's weird that it reports citus.enable_repartitioned_insert_select though. Sounds like a reporting bug. @onurctirtir do you know more about this? (you changed stuff about this script recently)

Ok, figured that out: I haven't had LC_COLLATE=C, so sort on my machine treated the underscore character differently.
The easiest fix is to force it for the sort explicitly in the check_gucs_are_alphabetically_sorted.sh:

grep -P "^[\t][\t]\"citus\.[a-zA-Z_0-9]+\"" RegisterCitusConfigVariables_func_def.out > gucs.out LC_COLLATE=C sort -c gucs.out

Ah interesting, could you make a separate PR to fix that.

Certainly 🙂
But I am not sure that the easiest fix would be the best one: may be it's better to force it via ./configure or something like that, so every dev setup would be as identical to CI as possible

Well, in test runs we already set collation etc to expected settings. But these are simple shell scripts where we don't have this infra structure. And normally that doesn't matter, but indeed for sorting it does.

So, to clarify. If you open a PR where you only add LC_COLLATE=C to the sort command, then I'd be happy to accept that. (if not I'll probably do it myself in the near future, if I don't forget)

Yes, I'll add it tomorrow, don't worry :)
Got distracted trying to reproduce the flakiness :)

ivyazmitinov · 2024-02-07T15:12:51Z

@JelteF, is there a way I can download the results of failed tests?

Tried to reproduce locally but no use. But my tests time twice as long, so my theory is that it is reproducible on weaker machines

JelteF · 2024-02-07T15:18:32Z

Yes if you go to the overview and scroll to the bottom you will see all the artifacts produced and also the diff of the failures inline: https://github.com/citusdata/citus/actions/runs/7812230917/attempts/2#summary-21309102210

JelteF · 2024-02-07T15:20:20Z

The flaky tests you might be able to reproduce locally by running the test in question multiple times (this runs it 200 times):

src/test/regress/citus_tests/run_test.py multi_maintenance_multiple_databases  --repeat 200

ivyazmitinov · 2024-02-07T15:26:19Z

src/test/regress/citus_tests/run_test.py multi_maintenance_multiple_databases --repeat 200

Yep, that's what I tried :)
It seems, it requires something more sophisticated...

- Changed the way to disable connection caching for maintenance daemons - Implemented a test for basic use cases - Brushed up before MR

- Removed maintenance_management_database GUC and logic

…te pool - Improve tests

… pool - Fixed tests

- run citus_indent

ivyazmitinov · 2024-03-21T08:15:34Z

@JelteF, reminder about this PR, just in case 🙂

JelteF · 2024-03-21T11:33:37Z

I'll try to make some time soon to do a full review of this, I think it's pretty close to being mergable.

For now, could you look at the check-multi tests. Because they are timing out in CI.

ivyazmitinov · 2024-03-21T11:41:14Z

I think it's pretty close to being mergable.

I still would like to add more test coverage, especially deadlock detection, but not sure how to do this with the existing toolkit.
The scenario I have added covers only basic flow

JelteF · 2024-03-21T12:24:50Z

but not sure how to do this with the existing toolkit.

I think the python test framework that we have would be a good fit for doing that.
Readme and examples can be found here: https://github.com/citusdata/citus/tree/main/src/test/regress/citus_tests/test

ivyazmitinov · 2024-04-19T10:29:30Z

For now, could you look at the check-multi tests. Because they are timing out in CI.

Not sure when I'll be able to get back to this, so just leave a note here: test are timing out because of low citus.max_maintenance_shared_pool_size: some operations require more than one connection to complete, especially with the new MainDb logic which requires to extra maintenance connection for a SELECT citus_internal.commit_management_command_2pc() call.
The proper fix would be to implement some API to reserve necessary amount of connections from the maintenance pool and use it for all maintenance operations, but it doesn't seem feasible. Instead it is better to introduce a simple timeout for acquiring connection for a maintenance pool with a meaningful error message: this way a user will be able to see that citus.max_maintenance_shared_pool_size is too small and increase it accordingly. It will also be appropriate to disable the maintenance pool limit by default: issues with maintenance connections are more likely to arise with considerable amount of databases, when there are only few it may not be necessary.

ivyazmitinov · 2024-05-09T14:47:52Z

The timeout is basically implemented, along with a fix of conditional variables for new pool, which I have missed before, but there are still issues related to #7203 and test coverage that should be improved

ivyazmitinov force-pushed the 7244/multi_db_connection_improvements branch from 06cccd8 to 2e1e780 Compare November 28, 2023 11:16

ivyazmitinov commented Nov 30, 2023

View reviewed changes

JelteF marked this pull request as ready for review January 18, 2024 10:59

JelteF approved these changes Jan 29, 2024

View reviewed changes

JelteF requested changes Jan 30, 2024

View reviewed changes

ivyazmitinov force-pushed the 7244/multi_db_connection_improvements branch from 3417599 to 889d5bd Compare January 30, 2024 21:32

ivyazmitinov force-pushed the 7244/multi_db_connection_improvements branch from bb48581 to 2e5b010 Compare February 2, 2024 09:33

ivyazmitinov requested a review from JelteF February 2, 2024 12:51

JelteF reviewed Feb 6, 2024

View reviewed changes

ivyazmitinov and others added 19 commits March 6, 2024 13:19

Introducing new backend type and disabling caching for daemon backends

19681ca

Introduce new GUC

5880ecc

Implementation of a dedicated maintenance database

9f26f74

Style fixes

c8ec1b6

Changes:

481aa99

- Changed the way to disable connection caching for maintenance daemons - Implemented a test for basic use cases - Brushed up before MR

Adapt implementation to changing MaintenanceQuota on a fly

b3bfca9

- Synced with main

f447b39

- Removed maintenance_management_database GUC and logic

- Change the implementation from quota of the shared pool to a separa…

09917f8

…te pool - Improve tests

- Adapt locally_reserved_shared_connections to maintenance connection…

115ed00

… pool - Fixed tests

Run make reindent

6171486

Fix tests

728ce2b

Fix failure_single_select

5b8dcb1

Fix pg16 build

f1b5509

- Address review

d3f358e

- run citus_indent

Fix gucs order

62c5090

Run citus_indent

3719836

Fix flakiness in multi_maintenance_multiple_databases.sql

5b2c297

Fix flakiness in multi_maintenance_multiple_databases.sql x2

10f4777

Address review

a48f274

ivyazmitinov force-pushed the 7244/multi_db_connection_improvements branch from 1901be2 to a48f274 Compare March 6, 2024 12:19

Merge branch 'main' into 7244/multi_db_connection_improvements

46ec3fe

ivyazmitinov added 2 commits May 9, 2024 16:20

Preliminary add "citus.maintenance_connection_pool_timeout"

725bdce

Merge branch 'citusdata:main' into 7244/multi_db_connection_improvements

02e7191

ivyazmitinov mentioned this pull request May 9, 2024

Adds 2PC distributed commands from other databases #7203

Merged

		uint32 hash = string_hash(entry->workerNodeKey.hostname, NAMEDATALEN);
		hash = hash_combine(hash, hash_uint32(entry->workerNodeKey.port));

7244/multi db connection improvements #7286

Are you sure you want to change the base?

7244/multi db connection improvements #7286

Conversation

ivyazmitinov commented Oct 27, 2023 • edited

marcoslot commented Nov 1, 2023 • edited

ivyazmitinov commented Nov 1, 2023 • edited

marcoslot commented Nov 3, 2023

ivyazmitinov commented Nov 8, 2023

ivyazmitinov commented Nov 30, 2023

Choose a reason for hiding this comment

ivyazmitinov commented Jan 9, 2024

ivyazmitinov commented Jan 11, 2024

JelteF commented Jan 11, 2024

JelteF commented Jan 18, 2024

JelteF commented Jan 18, 2024

JelteF left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JelteF Jan 29, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ivyazmitinov commented Jan 30, 2024

JelteF commented Jan 30, 2024 • edited

JelteF left a comment

Choose a reason for hiding this comment

ivyazmitinov commented Feb 1, 2024

ivyazmitinov commented Feb 1, 2024

codecov bot commented Feb 1, 2024 • edited

Codecov Report

ivyazmitinov commented Feb 2, 2024 • edited

JelteF commented Feb 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JelteF Feb 7, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ivyazmitinov commented Feb 7, 2024

JelteF commented Feb 7, 2024

JelteF commented Feb 7, 2024

ivyazmitinov commented Feb 7, 2024

ivyazmitinov commented Mar 21, 2024

JelteF commented Mar 21, 2024

ivyazmitinov commented Mar 21, 2024

JelteF commented Mar 21, 2024

ivyazmitinov commented Apr 19, 2024

ivyazmitinov commented May 9, 2024

ivyazmitinov commented Oct 27, 2023 •

edited

marcoslot commented Nov 1, 2023 •

edited

ivyazmitinov commented Nov 1, 2023 •

edited

JelteF Jan 29, 2024 •

edited

JelteF commented Jan 30, 2024 •

edited

codecov bot commented Feb 1, 2024 •

edited

ivyazmitinov commented Feb 2, 2024 •

edited

JelteF Feb 7, 2024 •

edited