Longhorn v0.6.0 Upgrade: Workaround for recovering from a rollback failure in Rancher

Note

Please make a backup of the volumes if possible before proceeding.

Background

Due to a Longhorn bug, some users failed to upgrade Longhorn to version v0.6.0: https://github.com/longhorn/longhorn/issues/754

Then they tried to roll back to v0.5.0 via Rancher UI. But unfortunately another Rancher/Helm bug was trigged and users got stuck in that rollback failure state: https://github.com/longhorn/longhorn/issues/755

Here we document the workaround to help users recover from a rollback failure of the Longhorn app.

Workaround applies only if:

The Longhorn app is getting stuck in upgrade/rollback from v0.6.0 to v0.5.0.
The error message:

Failed to install app longhorn-system. Error: UPGRADE FAILED: timed out waiting for the condition

or

Failed to install app longhorn-system. Error: UPGRADE FAILED: transport is closing

shows in the app detail page in the Rancher UI.

Steps

1. Delete all workloads

Delete all workloads of longhorn system in the app's detail page, including longhorn-driver-deployer, longhorn-manager, longhorn-post-upgrade, and longhorn-ui. DO NOT DELETE OTHER PODS.

This step is to avoid the following upgrade getting stuck.
This deletion is safe for the data in Longhorn as long as CRD objects and old engine/replica pods from v0.5.0 remain intact.

If you prefer kubectl commands rather than Rancher UI, you can use following commands to clean up the workloads:

kubectl -n longhorn-system delete daemonset longhorn-manager
kubectl -n longhorn-system delete deployment longhorn-driver-deployer longhorn-ui
kubectl -n longhorn-system delete job longhorn-post-upgrade

2. Delete release histories of Helm

Delete all ConfigMaps named longhorn-system.v<version number> in namespace longhorn-system. e.g. longhorn-system.v2.

Those are release histories of Longhorn, recorded by the Helm.
Do not remove the config maps without the longhorn-system.v prefix.

These ConfigMaps can be deleted via Rancher UI or kubectl commands.

If you prefer kubectl commands rather than Rancher UI, you can use the following commands to find out then delete all related ConfigMaps.

kubectl -n longhorn-system get cm
kubectl -n longhorn-system delete cm <longhorn-system.vxxx>

3. Clean up the resources introduced by the failed v0.6.0 upgrade

kubectl patch -p '{"metadata":{"finalizers": null}}' crd instancemanagers.longhorn.rancher.io
kubectl delete crd instancemanagers.longhorn.rancher.io
kubectl -n longhorn-system delete cm longhorn-default-setting

You can check this doc for the details.

4. Upgrade to the version v0.6.2

Use the Rancher App page to upgrade Longhorn to the latest version. The error message in the Rancher UI will disappear.

Since there is no release history now, Helm will apply install rather than upgrade to avoid the panic caused by force upgrade function.

5. Check the image version

Check the image version of longhorn system workloads. If it’s incorrect, back to step 1 and redo the whole workaround steps.

The incorrect image version means Helm somehow messes up the current release. Then we need to delete those workloads and the related release history to let Helm reinstall the whole app.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Longhorn v0.6.0 Upgrade: Workaround for recovering from a rollback failure in Rancher

Note

Background

Workaround applies only if:

Steps

1. Delete all workloads

2. Delete release histories of Helm

3. Clean up the resources introduced by the failed v0.6.0 upgrade

4. Upgrade to the version v0.6.2

5. Check the image version

6. Verify Longhorn works as normal.

Longhorn Wiki

Roadmap

Release Known Issues

Release Schedule & Support

Release Test Plans

Release Regular Tasks

Release Flow

Branch Strategy

Backporting Policy

Community Issue Coordiantion

CVE Resolution

Dependency Update Policy

Deprecation Policy

Test Automation Strategy

Version Update Policy

Development

Issue Management

Performance Benchmark

Member Task Priority

Domain Experts

Clone this wiki locally