Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vSphere CSI: Storage DRS support #686

Open
yuga711 opened this issue Mar 4, 2021 · 64 comments
Open

vSphere CSI: Storage DRS support #686

yuga711 opened this issue Mar 4, 2021 · 64 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@yuga711
Copy link

yuga711 commented Mar 4, 2021

The vSphere CSI driver page states vSphere CSI driver and Cloud Native Storage in vSphere does not currently support Storage DRS feature in vSphere. Since CSI driver already creates FCDs, can you please explain what is the limitation in supporting the storage DRS feature? Also, is this feature in the CSI driver roadmap? Thanks!

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug
/kind feature

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • csi-vsphere version:
  • vsphere-cloud-controller-manager version:
  • Kubernetes version:
  • vSphere version:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 4, 2021
@yuga711
Copy link
Author

yuga711 commented Mar 4, 2021

cc @divyenpatel @msau42 @jingxu97

@divyenpatel
Copy link
Member

@yuga711

Since CSI driver already creates FCDs, can you please explain what is the limitation in supporting the storage DRS feature?

FCD does not support SDRS yet.

Also, is this feature in the CSI driver roadmap?

Yes, we have SDRS support in our roadmap.

@SandeepPissay
Copy link
Contributor

@yuga711 Can you explain your use case or your customer's use cases for SDRS support? SDRS does storage load balancing between datastores based on performance, latency, capacity usage, etc. It also helps in moving storage objects(currently only VMs) out of the datastore when you put a datastore into maintenance mode. We have commonly heard of datastore maintenance mode support and capacity load balancing use cases. So having some details on the immediate problems to be solved will help prioritize the feature. Please reach out to me in Kubernetes slack(#provider-vsphere) if you would like to share some details.

@SandeepPissay SandeepPissay self-assigned this Mar 15, 2021
@limb-adobe
Copy link

@SandeepPissay our immediate need for SDRS support fall under the common requests you have already stated, datastore maintenance mode and capacity load balancing.

@SandeepPissay
Copy link
Contributor

@limb-adobe what version of vSphere are you using? I'm wondering if you are fine with upgrading the vSphere or you are looking for capacity load balancing and datastore maintenance mode feature support for already released vSphere versions.

@tgelter
Copy link

tgelter commented Mar 15, 2021

@limb-adobe what version of vSphere are you using? I'm wondering if you are fine with upgrading the vSphere or you are looking for capacity load balancing and datastore maintenance mode feature support for already released vSphere versions.

@SandeepPissay, @limb-adobe is my colleague so I can field this question for now.
We are using vCenter Server 7.0 Update 1c. Landon would need to provide guidance about future version timelines. I think we'd be open to it, but would likely prefer support for this version of vCenter since we're currently in the process of globally migrating off of 6.7 right now.

@yuga711
Copy link
Author

yuga711 commented Mar 15, 2021

@yuga711 Can you explain your use case or your customer's use cases for SDRS support? SDRS does storage load balancing between datastores based on performance, latency, capacity usage, etc. It also helps in moving storage objects(currently only VMs) out of the datastore when you put a datastore into maintenance mode. We have commonly heard of datastore maintenance mode support and capacity load balancing use cases. So having some details on the immediate problems to be solved will help prioritize the feature. Please reach out to me in Kubernetes slack(#provider-vsphere) if you would like to share some details.

Our customers are looking for the common SDRS usecases stated here: capacity, IO load balancing and storage-objects migration.

@SandeepPissay
Copy link
Contributor

@yuga711 Lets say we cannot support SDRS in the near future. Is there any preference on which use cases are more important to your customers:

  1. Datastore maintenance to relocate volumes(and VMs) out of the datastore. This should balance the capacity usage on other available datastores.
  2. Capacity load balancing during provisioning operation so that the datastores are balanced on usage.
  3. IO load balancing between datastores. Dynamically move the volumes to load balance the IO performance.

IMHO (3) will mean we need SDRS support which may take a long time.

How would you(or your customers) prioritize it? Few more questions that comes to my mind are:

  1. How many of your customers are asking for this features?
  2. Which version of vSphere are they using it?
  3. When do they need this feature, are they open to upgrade vSphere?
  4. We can probably enhance CSI to address datastore maintenance and capacity load balancing sooner. Would that be a better short term solution for your customers?

@mitchellmaler
Copy link

We are working on upgrading to 7.0u2 and were investigating using vSphere CSI for our Vanilla Kubernetes Clusters and excited for the incoming online resize support.

We are mostly looking for the capacity and IO load balancing support and we would pick capacity/space load balancing as a good first feature to support for SDRS.

@SandeepPissay
Copy link
Contributor

@mitchellmaler can you answer these questions:

  1. How many datastores are present in the vCenter inventory?
  2. Are they all VMFS datastores? I'm wondering if you are looking at capacity load balancing for the same datastore type or between different datastore types like VMFS to NFS, VVOL to VMFS, VSAN to VMFS, etc.
  3. Which version of vSphere do you have?
  4. How many vSphere clusters do you have? Are you looking for capacity load balancing between datastores across datacenters?

@mitchellmaler
Copy link

mitchellmaler commented Apr 7, 2021

@SandeepPissay

  1. The current SDRS datastore clusters that we will be working with contain around 15 to 20 datastores.
  2. They are all VMFS. We are only looking for the same type as we do not use the other types much.
  3. We are currently on 6.7u3 but looking to upgrade to 7 soon.
  4. We have multiple clusters per datacenter but only looking to load balance within the same datacenter and not across.

@tgelter
Copy link

tgelter commented Apr 7, 2021

@SandeepPissay

  1. The current SDRS datastore clusters that we will be working with contain around 15 to 20 datastores.
  2. They are all VMFS. We are only looking for the same type as we do not use the other types much.
  3. We are currently on 6.7u3 but looking to upgrade to 7 soon.
  4. We have multiple clusters per datacenter but only looking to load balance within the same datacenter and not across.

FWIW our configuration looks almost identical to this. We use a smaller number of datastores at present, but all are VMFS. Also, we're using vSphere 7.0u2.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 6, 2021
@vrabbi
Copy link

vrabbi commented Jul 20, 2021

Any updates on this?

@SandeepPissay
Copy link
Contributor

@vrabbi we are tracking this ask in our internal backlog. This seems to be a large work item, so I do not have visibility into when this feature will be released.

@vrabbi
Copy link

vrabbi commented Jul 20, 2021

@SandeepPissay thanks for the quick reply. This would be huge for many of our customers. If there is any information that would be helpfull to get from me please let me know. I get this is a large work item and completely understand it may take some time. If there is anythin we can do to help add context, use cases, referneces etc. In order to help push this forwards i would be glad to help with that.

@SandeepPissay
Copy link
Contributor

If there is anythin we can do to help add context, use cases, referneces etc. In order to help push this forwards i would be glad to help with that.

@vrabbi yes, this will be super useful! Could you send this info in an email to me? My email is ssrinivas@vmware.com. Thanks!

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 19, 2021
@tgelter
Copy link

tgelter commented Aug 19, 2021

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 19, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 17, 2021
@tgelter
Copy link

tgelter commented Nov 17, 2021

/remove-lifecycle rotten

@McAndersDK
Copy link

Any news on this? and how do we recover then CNS tell us it cant find the VMDK after it have been moved by DRS ?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@MarkMorsing
Copy link

@vrabbi That's great, but it should really be a top priority to get it integrated in the csi. It'd boost stability and usability quite a bit

@jingxu97
Copy link

jingxu97 commented Feb 2, 2022

I am using vcenter 6.7u3, it works.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 3, 2022
@tgelter
Copy link

tgelter commented May 3, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 3, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 1, 2022
@tgelter
Copy link

tgelter commented Aug 1, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 1, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 30, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 29, 2022
@tgelter
Copy link

tgelter commented Nov 30, 2022

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Nov 30, 2022
@tgelter
Copy link

tgelter commented Nov 30, 2022

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Nov 30, 2022
@MarkMorsing
Copy link

@vrabbi How far are we from adding this feature to the CSI?

@MarkMorsing
Copy link

MarkMorsing commented Jan 3, 2023

Any updates so far?

@bsdnet
Copy link

bsdnet commented Jan 4, 2023

Any updates so far?

+1

@ignisanulum
Copy link

@vrabbi How far are we from adding this feature to the CSI?
+1

@vrabbi
Copy link

vrabbi commented Mar 2, 2023

I dont work on the CSI Driver so dont know

@MarkMorsing
Copy link

@SandeepPissay whats the status on getting SDRS support added to the driver?

@luanaBanana
Copy link

Any updates so far?

+1

@SandeepPissay
Copy link
Contributor

SandeepPissay commented Mar 4, 2024

Supporting SDRS necessitates collaboration across multiple teams to integrate it into vSphere. Currently, there hasn't been significant progress in enabling SDRS support for CNS/CSI in vSphere due to the absence of a compelling business case. Although acknowledging its frequent request, the lack of a clear business case poses challenges in advancing this initiative. For prioritizing this task, please contact the VMware team or myself via email(sandeep.pissay-srinivasa-rao@broadcom.com).

@MarkMorsing
Copy link

Hello @SandeepPissay
Whats your e-mail? We face this issue on near daily basis, and our only fix is to recreate PVC’s. Which isn’t really a great solution.

@SandeepPissay
Copy link
Contributor

Hello @SandeepPissay Whats your e-mail? We face this issue on near daily basis, and our only fix is to recreate PVC’s. Which isn’t really a great solution.

I assumed that everyone can see my email on my GitHub profile, maybe it is hidden? Anyways, I added my email to my previous message.

@Jeremy-Boyle
Copy link

+1 would like to see this as well, I also have a business / customer use case for this @SandeepPissay, I'll go through proper Broadcom channels.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

No branches or pull requests