[Bug]: Replica cluster require a restart after promote #4172

litaocdl · 2024-03-26T12:43:11Z

Is there an existing issue already for this bug?

I have searched for an existing issue, and could not find anything. I believe this is a new bug.

I have read the troubleshooting guide

I have read the troubleshooting guide and I think this is a new bug.

I am running a supported version of CloudNativePG

I have read the troubleshooting guide and I think this is a new bug.

Contact Details

No response

Version

1.22.2

What version of Kubernetes are you using?

1.29

What is your Kubernetes environment?

Self-managed: kind (evaluation)

How did you install the operator?

YAML manifest

What happened?

when replica cluster promoted, archive_mode changed from always to on but primary is not restart, once scale up the cluster, primary will restart.

Cluster resource

No response

Relevant log output

No response

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

litaocdl · 2024-04-08T06:29:48Z

More description

For the replica clusters, if the replica cluster has one instances, the archive_mode is set to always in primary.
once the designated primary is promote to primary, the archive_mode is changed to on, but primary did not have a restart. this restart delay to the time when scale up the new cluster, as primary restart in scale up is not as expected, we should restart primary during the promoting time.

litaocdl · 2024-04-29T13:14:38Z

when we promote a cluster, the instance reconciler will write new custom.conf file with archive_mode=on, and return a reloadNeeded=true flag

	// Reconcile cluster role without DB
	reloadClusterRoleConfig, err := r.reconcileClusterRoleWithoutDB(ctx, cluster)

But this reload will not happen, as in this line we get replica cluster promoted by pg_ctl promote, this function return a restart=true to indicates that a pg_ctl promote is called.

restarted, err := r.reconcilePrimary(ctx, cluster)

 `pg_ctl promote` does not use the changed configuration file in custom.conf,

as following code show, the instance reload only been called when there is no restart. so we miss the reload here

if reloadNeeded && !restarted {
		contextLogger.Info("reloading the instance")
		if err = r.instance.Reload(ctx); err != nil {
			return reconcile.Result{}, fmt.Errorf("while reloading the instance: %w", err)
		}
		if err = r.processConfigReloadAndManageRestart(ctx, cluster); err != nil {
			return reconcile.Result{}, fmt.Errorf("cannot apply new PostgreSQL configuration: %w", err)
		}
	}

And in next reconfile loop, as there is no file change, there is no reload needed.

so the reload is skipped here.

When a replica cluster is promoted, the `archive_mode` is changed from `always` to `on`. This change requires a restart because Postgres does not reload the configuration during the promotion. Closes: #4172 Signed-off-by: Tao Li <tao.li@enterprisedb.com> Signed-off-by: Jaime Silvela <jaime.silvela@enterprisedb.com> Co-authored-by: Jaime Silvela <jaime.silvela@enterprisedb.com>

When a replica cluster is promoted, the `archive_mode` is changed from `always` to `on`. This change requires a restart because Postgres does not reload the configuration during the promotion. Closes: #4172 Signed-off-by: Tao Li <tao.li@enterprisedb.com> Signed-off-by: Jaime Silvela <jaime.silvela@enterprisedb.com> Co-authored-by: Jaime Silvela <jaime.silvela@enterprisedb.com> (cherry picked from commit 33d7b65)

) When a replica cluster is promoted, the `archive_mode` is changed from `always` to `on`. This change requires a restart because Postgres does not reload the configuration during the promotion. Closes: cloudnative-pg#4172 Signed-off-by: Tao Li <tao.li@enterprisedb.com> Signed-off-by: Jaime Silvela <jaime.silvela@enterprisedb.com> Co-authored-by: Jaime Silvela <jaime.silvela@enterprisedb.com> Signed-off-by: Douglass Kirkley <dkirkley@eitccorp.com>

litaocdl added the triage Pending triage label Mar 26, 2024

litaocdl assigned gbartolini Mar 26, 2024

litaocdl added bug 🐛 Something isn't working and removed triage Pending triage labels Apr 3, 2024

litaocdl mentioned this issue Apr 29, 2024

fix: replica cluster should restart after promotion #4399

Merged

mnencia closed this as completed in #4399 May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Replica cluster require a restart after promote #4172

[Bug]: Replica cluster require a restart after promote #4172

litaocdl commented Mar 26, 2024

litaocdl commented Apr 8, 2024

litaocdl commented Apr 29, 2024

[Bug]: Replica cluster require a restart after promote #4172

[Bug]: Replica cluster require a restart after promote #4172

Comments

litaocdl commented Mar 26, 2024

Is there an existing issue already for this bug?

I have read the troubleshooting guide

I am running a supported version of CloudNativePG

Contact Details

Version

What version of Kubernetes are you using?

What is your Kubernetes environment?

How did you install the operator?

What happened?

Cluster resource

Relevant log output

Code of Conduct

litaocdl commented Apr 8, 2024

litaocdl commented Apr 29, 2024