New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Disks become over provisioned when Storage Over Provisioning Percentage is set to 100 #8450
Comments
Looking. The first thing I notice is that in the Harvester settings, yamls/cluster/harvesterhci.io/v1beta1/settings.yaml, there is an - apiVersion: harvesterhci.io/v1beta1
default: '{"cpu":1600,"memory":150,"storage":200}'
kind: Setting
metadata:
...
name: overcommit-config
resourceVersion: "13529643"
uid: fa02c2bd-f32b-4872-b803-112aec13351d
status: {} I would suspect that may be the cause of the observed behavior. And so, it seems like a reasonable workaround to change that setting. I have not tried it myself, and I cannot say what would happen if the system is already over 100% and there is an attempt to change the config. |
I discussed this with @james-munson offline and took a look at the support bundle. The bundle can't be posted here, but using the various generic object names: For one of the overscheduled disks
@PhanLe1010 and I (and probably others) discussed this while I was working on #8043, but we need a followup ticket for it. That issue was more specific, but in general, we think Longhorn is vulnerable to accidental overscheduling if it is scheduling replicas for multiple volumes simultaneously. The general flow of the replica scheduling is:
Now, imagine two different volume controllers are scheduling for two different volumes at the same time.
In summary, I believe this happened as a result of many volumes being created simultaneously. We need to improve Longhorn replica scheduling to ensure it cannot happen. |
We can probably just use this ticked to track the need for an enhancement. When we discussed it previously, the vulnerability was somewhat theoretical. This appears to be a textbook example of it actually occurring. |
Workaround to avoid the issueThis issue seems to be quite rare. It is probably because:
I am not sure why all the volumes were created simultaneously in this ticket. Perhaps there was some other factor at work. If your workflow involves creating many volumes simultaneously, it may be best to try intentionally slow it down a bit until a fix is implemented (e.g. create one volume, wait a second, create another volume). Workaround if you have hit the issueIt should be possible to evict individual replicas from the overscheduled disks. Longhorn will find a different disk and move the data. This can be done while the workload is running. |
The ticket for a general solution to pick a leader from all longhorn-managers to avoid races and conflicts is #5571 |
Thank you for the detailed response and workaround. We are creating all of our machines and disks in one go with the harvester terraform provider; we'll add a pause somewhere to help avoid this issue. Marking the disk as un-schedulable and using evicting allowed us to rebalance things. I'm not clear on how the harvester overcommit settings factor into this. In Slack Connor Kuehl said
Is it the case our disks became over provisioned due to the harvester overcommit settings, the bug you have described, or both? |
This is a good question. My current belief is that:
This is because the support bundle clearly shows the the I will ask Connor to take a look at the first part and help decide if it is a Harvester bug. |
This bug allows Longhorn to over-schedule some disks, where it clearly might have been possible to consider other disks. If the Harvester value of 200% were used, Longhorn still might still have scheduled the same way, but not have reported them as over-scheduled later on. Whether they actually are over-committed depends on how much data is written to the volumes over time. I think we do still have work to do to ensure that the Longhorn and Harvester settings are in step. |
cc @derekbit |
Describe the bug
With Harvester 1.3.0 and Longhorn 1.6.0, we have observed several disks becoming over provisioned despite Storage Over Provisioning Percentage being set to 100.
To Reproduce
Theoretical steps (we haven't reproduced)
Expected behavior
Disks do not become over provisioned and VMs fail to schedule if there isn't enough storage.
Support bundle for troubleshooting
Please can no URLs or VM details from the bundle be posted to this issue.
Sent to longhorn-support-bundle@suse.com.
Environment
harvester-node-2 disk /var/lib/harvester/extra-disks/0e75b3ff4813c3cae0f71a1e9f3ac893
harvester-node-5 disk /var/lib/harvester/extra-disks/89ade731face5f52e750ade464ca09bc
The text was updated successfully, but these errors were encountered: