[IMPROVEMENT] Make gRPC service timeout configurable #8590
Labels
area/volume-replica-rebuild
Volume replica rebuilding related
component/longhorn-instance-manager
Longhorn instance manager (interface between control and data plane)
kind/improvement
Request for improvement of existing function
require/backport
Require backport. Only used when the specific versions to backport have not been definied.
require/doc
Require updating the longhorn.io documentation
require/manual-test-plan
Require adding/updating manual test cases if they can't be automated
Is your improvement request related to a feature? Please describe (馃憤 if you like this request)
We are using longhorn 1.5.x in various environments.
One of them has PVC getting close to 1TB. It also has slower network than others.
From time to time we have network issues that will cause replica issues and cause longhorn to salvage a volume. Then longhorn will try to rebuild a replica from remaining sane ones: with the slower network and that amount of data, this operation takes hours and at some point timeouts when reaching 24h which might be frustrating if you've spent those hours looking at rebuild percentage going up slowly to 90+% before going back to 0%.
Describe the solution you'd like
As making the rebuild operation faster might be challenging and limited by network speed vs data size, the alternative would be to be able to give it more time i.e., from my understanding, be able to configure the gRPC service long timeout
Describe alternatives you've considered
Alternatives:
The text was updated successfully, but these errors were encountered: