Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve upgrade problem related to non-transactional metadata synchronization. #7239

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

gokhangulbiz
Copy link
Contributor

Fixes #7238
DESCRIPTION: PR description that will go into the change log, up to 78 characters

Copy link
Member

@onderkalaci onderkalaci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sharing our internal chat discussion:

At a high-level, we should strongly avoid non-transactional operations. Non-transactional operations on a transactional database are super hard to reason about, source of bunch of bugs and mostly likely not idempotent.

We have to make Citus metadata sync non-transactional due to a Postgres limitation, see problem number 2 discussed here. Basically, Postgres does NOT allow very large transaction blocks with lots of DDLs, which is common for metadata syncing with lots of distributed tables/shards.

Now, this PR suggests we should expand non-transactional logic to citus_finalize_upgrade_to_citus11, which I really don't think it is a good idea.

To make start_metadata_sync_to_all_nodes non-transactional, we spent significant amount of time/energy. There are bunch of sophisticated retry logic associated with start_metadata_sync_to_all_nodes to make it idempotent.

If we want to make citus_finalize_upgrade_to_citus11, we should ensure it is idempotent. Note that I'm not suggesting going that route, just telling what should be done.

I absolutely know that it is painful to handle upgrades that fail due to this problem, I'm probably the person who involved most commonly to resolve these issues in production. Still, I'm in favor of leaving as-is. Otherwise, we might find ourselves debugging clusters with a broken metadata/upgrade. We at least now have a lot of control over what is going on.

Also, the number of clusters that might fail to upgrade due to metadata syncing (and upgrades to Citus 11) is very limited.

So, instead of making this piece of code non-transactional, I suggested @gokhangulbiz to create list of steps that you can apply to mitigate such issues, but mostly following the transactional route, only running the metadata syncing in non-transactional manner. We should perhaps share the steps on this PR as a reference. If we have such a recipe, it should be fairly easy to mitigate incidents like that with one operator taking care.

(Also note that, this behavior was intentional, we didn't want to make any piece of code non-transactional other than start_metadata_sync_to_all_nodes. If you dig enough on #6728, I think you can find the references to such discussions)

RAISE NOTICE 'Preparing to sync the metadata to all nodes %', current_setting('citus.metadata_sync_mode');
IF current_setting('citus.metadata_sync_mode') = 'transactional' THEN
DECLARE
BEGIN
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this matters? BEGIN in PL/pgSQL starts a subtransaction, not a transaction. We are already in a multi-statement transaction.

If it does matter, I'm wondering why and whether we should fix that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Citus v10.x to v11.x or Newer Major Version Upgrade Error in Non-Transactional Metadata Sync Mode
3 participants