Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

7244/multi db connection improvements #7286

Open
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

ivyazmitinov
Copy link
Contributor

@ivyazmitinov ivyazmitinov commented Oct 27, 2023

This PR introduces the following changes:

  1. Makes citus.max_shared_pool_size cluster-wide. Introduces new citus.max_databases_per_worker_tracked GUC to adapt the shared_connection_stats to this change
  2. Introduces new "maintenance" connections and a dedicated connection pool for them. New pool is configured by new GUC citus.max_maintenance_shared_pool_size, which is equal to 5 by default.

1. Makes transaction recovery and distributed deadlocks detection respect citus.max_shared_pool_size
1. Adds citus.shared_pool_size_maintenance_quota GUC to reserve a fraction of citus.max_shared_pool_size for maintenance operations exclusively. So, by default, regular and maintenance operations have isolated fractions of the pool and do not compete with each other
1. Adds the possibility to assign one database as dedicated for maintenance via new citus.maintenance_management_database GUC. I'll integrate it later with #7254 if it is merged
1. Disables connections caching for all maintenance backends, except the one for citus.maintenance_management_database
1. Makes distributed deadlocks detection be invoked only on citus.maintenance_management_database, when it is set.

When citus.maintenance_management_database is not set (default), all maintenance daemons perform distributed deadlocks detection, but they never cache connections and respect the citus.shared_pool_size_maintenance_quota or citus.max_shared_pool_size.

Logic, related to citus.main_db will be introduced in a separate PR.

This is a draft implementation to check if I am moving in the right direction, so there is only the base scenario covered by test.
I also haven't found an example of distributed deadlocks detection test, so I'll appreciate some advice on this, as well as how to better address the @onderkalaci's concern:

I think even the latter is non-trivial to implement, there are bunch of areas needs to be thought of well, such as make sure that this is not broken: https://github.com/citusdata/citus/blob/main/src/backend/distributed/connection/locally_reserved_shared_connections.c or https://github.com/citusdata/citus/blob/main/src/backend/distributed/executor/adaptive_executor.c#L4707

cc @marcocitus

closes #7244

@marcoslot
Copy link
Collaborator

marcoslot commented Nov 1, 2023

Makes transaction recovery and distributed deadlocks detection respect citus.max_shared_pool_size

I think this is probably the more controversial / hazardous part of this PR. It is required for stability that max_shared_pool_size >= max_client_connections, otherwise not every client connection can get a slot. We generally recommend max_client_connections == max_shared_pool_size such that you will not make more outbound connections per worker than you receive in inbound client connections, which means there is no slack in the max_shared_pool_size in most set ups.

If we change the semantics of max_shared_pool_size, we might risk a lot of set ups becoming unstable unless they set max_shared_pool_size >= max_client_connections + # reserved maintenance connections

Adds the possibility to assign one database as dedicated for maintenance via new citus.maintenance_management_database GUC. I'll integrate it later with #7254 if it is merged
Disables connections caching for all maintenance backends, except the one for citus.maintenance_management_database
Makes distributed deadlocks detection be invoked only on citus.maintenance_management_database, when it is set.

Could this part be a separate PR? It probably doesn't require much discussion and #7254 is merged now.

@ivyazmitinov
Copy link
Contributor Author

ivyazmitinov commented Nov 1, 2023

If we change the semantics of max_shared_pool_size, we might risk a lot of setups becoming unstable unless they set max_shared_pool_size >= max_client_connections + # reserved maintenance connections

I see your point, but it is still a more predictable way to manage a cluster, I think. Currently, DBAs just have to keep in mind that there are some operations that require additional connections, so max_connections is typically slightly larger than max_shared_pool_size + max_client_connections. With this PR, the "slightly larger" part just becomes "official": configurable and isolated into its own part of max_shared_pool_size.
And, of course, it is always possible to implement a backup feature flag 🙂

Could this part be a separate PR?

I'd have to move those points as well, then:

  • Disables connections caching for all maintenance backends, except the one for citus.maintenance_management_database
  • Makes distributed deadlocks detection be invoked only on citus.maintenance_management_database, when it is set.

If it is okay, then no problem 🙂

@marcoslot
Copy link
Collaborator

I see your point, but it is still a more predictable way to manage a cluster,

Perhaps, if we were starting from scratch. In practice, due to a large user / customer base, and platforms that need to support a large number of version combinations, we try to avoid breaking backwards compatibility and/or introducing custom upgrade steps whenever possible.

An alternative option could be to have 2 separate pool settings, e.g. max_maintenance_pool_size & max_shared_pool_size. The former could default to some small number (e.g. 5), to keep the total number of outbound connections predictable.

If it is okay, then no problem 🙂

Yup

@ivyazmitinov
Copy link
Contributor Author

Sounds reasonable, and shouldn't be too hard to implement...
Ok, I'll be back with that and a new PR for main_db then 🙂

@ivyazmitinov ivyazmitinov force-pushed the 7244/multi_db_connection_improvements branch from 06cccd8 to 2e1e780 Compare November 28, 2023 11:16
@ivyazmitinov
Copy link
Contributor Author

@marcoslot, done, now it is a separate pool 🙂

int currentConnectionsCount;
if (maintenanceConnection)
{
currentConnectionsLimit = GetMaxMaintenanceSharedPoolSize();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcoslot, not sure about this moment: current implementation has common MaxMaintenanceSharedPoolSize both for remote and local connections, because it is simpler. Is it necessary to introduce separate limit for local maintenance connections?
Another effect of this is that when GetLocalSharedPoolSize() == DISABLE_REMOTE_CONNECTIONS_FOR_LOCAL_QUERIES, remote connections are disabled both for maintenance and regular connections

@ivyazmitinov
Copy link
Contributor Author

@onderkalaci, kind remainder about this PR 🙂
I noticed that @marcoslot is at Crunchy Data now, so wanted to make sure that the PR won't be abandoned due to organizational changes

@ivyazmitinov
Copy link
Contributor Author

Well, that's awkward...
@JelteF, maybe you could help :) ?

@JelteF
Copy link
Contributor

JelteF commented Jan 11, 2024

I'll try to take a look at this soon, thank you for your work on this.

@JelteF JelteF marked this pull request as ready for review January 18, 2024 10:59
@JelteF
Copy link
Contributor

JelteF commented Jan 18, 2024

Could you run citus_indent on your PR? It's hard for me to see what code is actually changed because some tabs have been changed to spaces. Instructions to run it can be found here: https://github.com/citusdata/citus/blob/main/CONTRIBUTING.md#following-our-coding-conventions

@JelteF
Copy link
Contributor

JelteF commented Jan 18, 2024

FYI I pushed a commit to your branch to reindent it.

Copy link
Contributor

@JelteF JelteF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I quite like the idea and implementation. The build is failing though, and thus the tests are not actually run in CI. So I don't know if this completely breaks some tests.

I left a few small comments now. Once the tests are passing I'll take another close look.



PG_FUNCTION_INFO_V1(citus_remote_connection_stats);


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary diff

Suggested change

Comment on lines -72 to -75
/*
* Given that citus.shared_max_pool_size can be defined per database, we
* should keep track of shared connections per database.
*/
Copy link
Contributor

@JelteF JelteF Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this comment indeed seems like the correct thing: Based on this comment throttling was originally implemented per database on purpose, but I don't think that makes much sense. max_connections is at the server level. And also citus.max_shared_pool_size is PGC_SIGHUP, so it cannot be set at the database level.

Comment on lines -287 to -296
bool counterIncremented = false;
SharedConnStatsHashKey connKey;

strlcpy(connKey.hostname, hostname, MAX_NODE_LENGTH);
if (strlen(hostname) > MAX_NODE_LENGTH)
{
ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("hostname exceeds the maximum length of %d",
MAX_NODE_LENGTH)));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It moved to PrepareWorkerNodeHashKey

}
else

if (IsLoggableLevel(DEBUG4))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this if check shouldn't be necessary. ereport something like that internally already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it was there to prevent building of the expensive errmsg. Maybe it is better to leave if this way just in case, to avoid some performance regression?

Comment on lines 967 to 968
uint32 hash = string_hash(entry->workerNodeKey.hostname, NAMEDATALEN);
hash = hash_combine(hash, hash_uint32(entry->workerNodeKey.port));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Would be good to call SharedConnectionHashHash here instead, instead of re-implementing it. Same for the Compare function.

@ivyazmitinov
Copy link
Contributor Author

@JelteF Great, thanks! I'll address your comments shortly.
Test fails doesn't seem to be caused by my changes, right? But the branch should be synced with main anyway, may be it will help 🤷‍♂️

I also concerned with the test coverage: as I mentioned, I covered only basic scenario. Maybe, there are some use-cases that should be also covered? Especially the distributed deadlock detection, which I am not sure how to test with the existing tool set

@JelteF
Copy link
Contributor

JelteF commented Jan 30, 2024

The build error on PG16 seems to be because you included #include "distributed/connection_management.h" after pg_versioncompat.h in maintenanced.c. (this is arguably a bug in in how we define has_createdb_privilege in pg_versioncompat.h, but it's only being triggered because of your changes).

Copy link
Contributor

@JelteF JelteF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This at least needs some changes to make the PG16 build and test pass (I think I pressed approve by accident before). General idea and implementation still seem good.

@ivyazmitinov ivyazmitinov force-pushed the 7244/multi_db_connection_improvements branch from 3417599 to 889d5bd Compare January 30, 2024 21:32
@ivyazmitinov
Copy link
Contributor Author

@JelteF, fixed pg16 build and addressed the review.

The #define have_createdb_privilege() have_createdb_privilege() in pg_version_compat.h seems to be unnecessary, and it breaks the build if included before commands/dbcommands.h. Seems to work fine without it

@ivyazmitinov
Copy link
Contributor Author

I also experienced some flakiness with multi_extension, is this expected?

Copy link

codecov bot commented Feb 1, 2024

Codecov Report

Attention: Patch coverage is 80.45455% with 43 lines in your changes are missing coverage. Please review.

Project coverage is 82.65%. Comparing base (553d5ba) to head (02e7191).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7286      +/-   ##
==========================================
- Coverage   89.69%   82.65%   -7.04%     
==========================================
  Files         283      283              
  Lines       60506    60581      +75     
  Branches     7538     7541       +3     
==========================================
- Hits        54270    50073    -4197     
- Misses       4081     8068    +3987     
- Partials     2155     2440     +285     

@ivyazmitinov ivyazmitinov force-pushed the 7244/multi_db_connection_improvements branch from bb48581 to 2e5b010 Compare February 2, 2024 09:33
@ivyazmitinov
Copy link
Contributor Author

ivyazmitinov commented Feb 2, 2024

Tried to fix the check_gucs_are_alphabetically_sorted and got a bit confused: it seems to be failing on citus.enable_repartitioned_insert_select even on main, on my machine. But it somehow passes on your CI 🤔

ivan@ivan-pc:~/IdeaProjects/citus-fork$ ./ci/check_gucs_are_alphabetically_sorted.sh 
sort: gucs.out:40: disorder:            "citus.enable_repartitioned_insert_select",

@JelteF
Copy link
Contributor

JelteF commented Feb 6, 2024

I haven't looked at it in more detail yet, but at least the flaky test detection is failing because this test randomly fails:

+++ /__w/citus/citus/src/test/regress/results/multi_maintenance_multiple_databases.out.modified	2024-02-02 13:40:41.496394020 +0000
@@ -277,21 +277,21 @@
                    (SELECT setting::int FROM pg_settings WHERE name = 'port')),
             $statement$
             SELECT groupid, gid
             FROM pg_dist_transaction
             WHERE gid LIKE 'citus_0_1234_4_0_%'
                 OR gid LIKE 'citus_0_should_be_forgotten_%'
             $statement$) AS t(groupid integer, gid text)
 WHERE datname LIKE 'db%';
  pg_dist_transaction_after_recovery_coordinator_test 
 -----------------------------------------------------
- t
+ f
 (1 row)
 
 SELECT count(*) = 0 AS cached_connections_after_recovery_coordinator_test
 FROM pg_stat_activity
 WHERE state = 'idle'
   AND now() - backend_start > '5 seconds'::interval;
  cached_connections_after_recovery_coordinator_test 
 ----------------------------------------------------

@@ -2021,6 +2033,20 @@ RegisterCitusConfigVariables(void)
GUC_STANDARD,
NULL, NULL, NULL);

DefineCustomIntVariable(
"citus.max_databases_per_worker_tracked",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to fix the check_gucs_are_alphabetically_sorted and got a bit confused: it seems to be failing on citus.enable_repartitioned_insert_select even on main, on my machine. But it somehow passes on your CI 🤔

ivan@ivan-pc:~/IdeaProjects/citus-fork$ ./ci/check_gucs_are_alphabetically_sorted.sh 
sort: gucs.out:40: disorder:            "citus.enable_repartitioned_insert_select",

Looking at your changes both new gucs that you're introducing are not at the correct place alphabetically (both need to be moved up). I expect that if you fix their location in the list, that the error goes away.

I agree that it's weird that it reports citus.enable_repartitioned_insert_select though. Sounds like a reporting bug. @onurctirtir do you know more about this? (you changed stuff about this script recently)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, figured that out: I haven't had LC_COLLATE=C, so sort on my machine treated the underscore character differently.
The easiest fix is to force it for the sort explicitly in the check_gucs_are_alphabetically_sorted.sh:

grep -P "^[\t][\t]\"citus\.[a-zA-Z_0-9]+\"" RegisterCitusConfigVariables_func_def.out > gucs.out
LC_COLLATE=C sort -c gucs.out

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah interesting, could you make a separate PR to fix that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly 🙂
But I am not sure that the easiest fix would be the best one: may be it's better to force it via ./configure or something like that, so every dev setup would be as identical to CI as possible

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, in test runs we already set collation etc to expected settings. But these are simple shell scripts where we don't have this infra structure. And normally that doesn't matter, but indeed for sorting it does.

Copy link
Contributor

@JelteF JelteF Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, to clarify. If you open a PR where you only add LC_COLLATE=C to the sort command, then I'd be happy to accept that. (if not I'll probably do it myself in the near future, if I don't forget)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll add it tomorrow, don't worry :)
Got distracted trying to reproduce the flakiness :)

@ivyazmitinov
Copy link
Contributor Author

@JelteF, is there a way I can download the results of failed tests?

Tried to reproduce locally but no use. But my tests time twice as long, so my theory is that it is reproducible on weaker machines

@JelteF
Copy link
Contributor

JelteF commented Feb 7, 2024

Yes if you go to the overview and scroll to the bottom you will see all the artifacts produced and also the diff of the failures inline: https://github.com/citusdata/citus/actions/runs/7812230917/attempts/2#summary-21309102210

@JelteF
Copy link
Contributor

JelteF commented Feb 7, 2024

The flaky tests you might be able to reproduce locally by running the test in question multiple times (this runs it 200 times):

src/test/regress/citus_tests/run_test.py multi_maintenance_multiple_databases  --repeat 200

@ivyazmitinov
Copy link
Contributor Author

src/test/regress/citus_tests/run_test.py multi_maintenance_multiple_databases --repeat 200

Yep, that's what I tried :)
It seems, it requires something more sophisticated...

@ivyazmitinov ivyazmitinov force-pushed the 7244/multi_db_connection_improvements branch from 1901be2 to a48f274 Compare March 6, 2024 12:19
@ivyazmitinov
Copy link
Contributor Author

@JelteF, reminder about this PR, just in case 🙂

@JelteF
Copy link
Contributor

JelteF commented Mar 21, 2024

I'll try to make some time soon to do a full review of this, I think it's pretty close to being mergable.

For now, could you look at the check-multi tests. Because they are timing out in CI.

@ivyazmitinov
Copy link
Contributor Author

I think it's pretty close to being mergable.

I still would like to add more test coverage, especially deadlock detection, but not sure how to do this with the existing toolkit.
The scenario I have added covers only basic flow

@JelteF
Copy link
Contributor

JelteF commented Mar 21, 2024

but not sure how to do this with the existing toolkit.

I think the python test framework that we have would be a good fit for doing that.
Readme and examples can be found here: https://github.com/citusdata/citus/tree/main/src/test/regress/citus_tests/test

@ivyazmitinov
Copy link
Contributor Author

For now, could you look at the check-multi tests. Because they are timing out in CI.

Not sure when I'll be able to get back to this, so just leave a note here: test are timing out because of low citus.max_maintenance_shared_pool_size: some operations require more than one connection to complete, especially with the new MainDb logic which requires to extra maintenance connection for a SELECT citus_internal.commit_management_command_2pc() call.
The proper fix would be to implement some API to reserve necessary amount of connections from the maintenance pool and use it for all maintenance operations, but it doesn't seem feasible. Instead it is better to introduce a simple timeout for acquiring connection for a maintenance pool with a meaningful error message: this way a user will be able to see that citus.max_maintenance_shared_pool_size is too small and increase it accordingly. It will also be appropriate to disable the maintenance pool limit by default: issues with maintenance connections are more likely to arise with considerable amount of databases, when there are only few it may not be necessary.

@ivyazmitinov
Copy link
Contributor Author

The timeout is basically implemented, along with a fix of conditional variables for new pool, which I have missed before, but there are still issues related to #7203 and test coverage that should be improved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Connection management improvements for multi-database setup
3 participants