Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Replica cluster require a restart after promote #4172

Closed
4 tasks done
litaocdl opened this issue Mar 26, 2024 · 2 comments · Fixed by #4399
Closed
4 tasks done

[Bug]: Replica cluster require a restart after promote #4172

litaocdl opened this issue Mar 26, 2024 · 2 comments · Fixed by #4399
Assignees
Labels
bug 🐛 Something isn't working

Comments

@litaocdl
Copy link
Collaborator

Is there an existing issue already for this bug?

  • I have searched for an existing issue, and could not find anything. I believe this is a new bug.

I have read the troubleshooting guide

  • I have read the troubleshooting guide and I think this is a new bug.

I am running a supported version of CloudNativePG

  • I have read the troubleshooting guide and I think this is a new bug.

Contact Details

No response

Version

1.22.2

What version of Kubernetes are you using?

1.29

What is your Kubernetes environment?

Self-managed: kind (evaluation)

How did you install the operator?

YAML manifest

What happened?

when replica cluster promoted, archive_mode changed from always to on but primary is not restart, once scale up the cluster, primary will restart.

Cluster resource

No response

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@litaocdl litaocdl added the triage Pending triage label Mar 26, 2024
@litaocdl litaocdl added bug 🐛 Something isn't working and removed triage Pending triage labels Apr 3, 2024
@litaocdl
Copy link
Collaborator Author

litaocdl commented Apr 8, 2024

More description

For the replica clusters, if the replica cluster has one instances, the archive_mode is set to always in primary.
once the designated primary is promote to primary, the archive_mode is changed to on, but primary did not have a restart. this restart delay to the time when scale up the new cluster, as primary restart in scale up is not as expected, we should restart primary during the promoting time.

@litaocdl
Copy link
Collaborator Author

when we promote a cluster, the instance reconciler will write new custom.conf file with archive_mode=on, and return a reloadNeeded=true flag

	// Reconcile cluster role without DB
	reloadClusterRoleConfig, err := r.reconcileClusterRoleWithoutDB(ctx, cluster)

But this reload will not happen, as in this line we get replica cluster promoted by pg_ctl promote, this function return a restart=true to indicates that a pg_ctl promote is called.

restarted, err := r.reconcilePrimary(ctx, cluster)
  1.  `pg_ctl promote` does not use the changed configuration file in custom.conf, 
    

as following code show, the instance reload only been called when there is no restart. so we miss the reload here

if reloadNeeded && !restarted {
		contextLogger.Info("reloading the instance")
		if err = r.instance.Reload(ctx); err != nil {
			return reconcile.Result{}, fmt.Errorf("while reloading the instance: %w", err)
		}
		if err = r.processConfigReloadAndManageRestart(ctx, cluster); err != nil {
			return reconcile.Result{}, fmt.Errorf("cannot apply new PostgreSQL configuration: %w", err)
		}
	}

And in next reconfile loop, as there is no file change, there is no reload needed.

so the reload is skipped here.

mnencia pushed a commit that referenced this issue May 22, 2024
When a replica cluster is promoted, the `archive_mode` is changed from
`always` to `on`. This change requires a restart because Postgres 
does not reload the configuration during the promotion.

Closes: #4172

Signed-off-by: Tao Li <tao.li@enterprisedb.com>
Signed-off-by: Jaime Silvela <jaime.silvela@enterprisedb.com>
Co-authored-by: Jaime Silvela <jaime.silvela@enterprisedb.com>
cnpg-bot pushed a commit that referenced this issue May 22, 2024
When a replica cluster is promoted, the `archive_mode` is changed from
`always` to `on`. This change requires a restart because Postgres
does not reload the configuration during the promotion.

Closes: #4172

Signed-off-by: Tao Li <tao.li@enterprisedb.com>
Signed-off-by: Jaime Silvela <jaime.silvela@enterprisedb.com>
Co-authored-by: Jaime Silvela <jaime.silvela@enterprisedb.com>
(cherry picked from commit 33d7b65)
cnpg-bot pushed a commit that referenced this issue May 22, 2024
When a replica cluster is promoted, the `archive_mode` is changed from
`always` to `on`. This change requires a restart because Postgres
does not reload the configuration during the promotion.

Closes: #4172

Signed-off-by: Tao Li <tao.li@enterprisedb.com>
Signed-off-by: Jaime Silvela <jaime.silvela@enterprisedb.com>
Co-authored-by: Jaime Silvela <jaime.silvela@enterprisedb.com>
(cherry picked from commit 33d7b65)
cnpg-bot pushed a commit that referenced this issue May 22, 2024
When a replica cluster is promoted, the `archive_mode` is changed from
`always` to `on`. This change requires a restart because Postgres
does not reload the configuration during the promotion.

Closes: #4172

Signed-off-by: Tao Li <tao.li@enterprisedb.com>
Signed-off-by: Jaime Silvela <jaime.silvela@enterprisedb.com>
Co-authored-by: Jaime Silvela <jaime.silvela@enterprisedb.com>
(cherry picked from commit 33d7b65)
dougkirkley pushed a commit to dougkirkley/cloudnative-pg that referenced this issue Jun 11, 2024
)

When a replica cluster is promoted, the `archive_mode` is changed from
`always` to `on`. This change requires a restart because Postgres
does not reload the configuration during the promotion.

Closes: cloudnative-pg#4172

Signed-off-by: Tao Li <tao.li@enterprisedb.com>
Signed-off-by: Jaime Silvela <jaime.silvela@enterprisedb.com>
Co-authored-by: Jaime Silvela <jaime.silvela@enterprisedb.com>
Signed-off-by: Douglass Kirkley <dkirkley@eitccorp.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants