Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Data Repository Associations for Lusture 2.12 or newer filesystems(e.g. PERSISTENT_2 deployment type) #368

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

everpeace
Copy link

@everpeace everpeace commented Dec 27, 2023

Is this a bug fix or adding new feature?

new feature
fixes #367

What is this PR about? / Why do we need it?

This PR supports Data Repository Association(API reference) for Lusture 2.12 or newer filesystems(e.g. PERSISTENT_2 deployment type) like below:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: fsx-sc
provisioner: fsx.csi.aws.com
parameters:
  subnetId: subnet-0d7b5e117ad7b4961
  securityGroupIds: sg-05a37bfe01467059a
  deploymentType: PERSISTENT_2
  perUnitStorageThroughput: "125"
  # User can specify multiple data repository associations like this
  dataRepositoryAssociations: |
    - batchImportMetaDataOnCreate: true
      dataRepositoryPath: s3://ml-training-data-000
      fileSystemPath: /ml-training-data-000
      s3:
        autoExportPolicy:
          events: ["NEW", "CHANGED", "DELETED" ]
        autoImportPolicy:
          events: ["NEW", "CHANGED", "DELETED" ]
    - batchImportMetaDataOnCreate: true
      dataRepositoryPath: s3://ml-training-data-001
      fileSystemPath: /ml-training-data-001
      s3:
        autoExportPolicy:
          events: ["NEW", "CHANGED", "DELETED" ]
        autoImportPolicy:
          events: ["NEW", "CHANGED", "DELETED" ]

  # NOTE: These parameters can't be set when using dataRepositoryAssociations
  #       as document explained:: https://docs.aws.amazon.com/fsx/latest/APIReference/API_CreateFileSystemLustreConfiguration.html
  # s3ImportPath: s3://ml-training-data-000
  # s3ExportPath: s3://ml-training-data-000/export
  # autoImportPolicy: NEW_CHANGED

What testing is done?

make test

# with setting GINKGO_FOCUS=".*fsx-csi-e2e.*PERSISTENT_2.*"
make test-e2e 

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 27, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: everpeace
Once this PR has been reviewed and has the lgtm label, please assign olemarkus for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Dec 27, 2023
@everpeace everpeace changed the title Support Data Repository Associations for PERSISTENT_2 deployment type filesystems Support Data Repository Associations for Lusture 2.12 or newer filesystems(e.g. PERSISTENT_2 deployment type) Dec 27, 2023
@everpeace
Copy link
Author

/retest pull-aws-fsx-csi-driver-e2e

@k8s-ci-robot
Copy link
Contributor

@everpeace: The /retest command does not accept any targets.
The following commands are available to trigger required jobs:

  • /test pull-aws-fsx-csi-driver-e2e
  • /test pull-aws-fsx-csi-driver-unit
  • /test pull-aws-fsx-csi-driver-verify

Use /test all to run all jobs.

In response to this:

/retest pull-aws-fsx-csi-driver-e2e

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@everpeace
Copy link
Author

/test pull-aws-fsx-csi-driver-e2e

@@ -62,6 +64,10 @@ var (
// disks are found with the same volume name.
ErrMultiFileSystems = errors.New("Multiple filesystems with same ID")

// ErrMultiAssociations is an error that is returned when multiple
// associations are found with the same volume name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this would be if there are multiple DRAs with the same association id, not multiple associations with the same volume.

Copy link
Author

@everpeace everpeace Dec 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. fixed in bba7484.

Copy link
Author

@everpeace everpeace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jacobwolfaws Thank you for the quick review. I addressed your feedbacks. PTAL 🙇

@@ -65,6 +65,7 @@ controller:
- effect: NoExecute
operator: Exists
tolerationSeconds: 300
provisionerTimeout: 5m
Copy link
Author

@everpeace everpeace Dec 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make the default provisioner timeout longer in helm chart? It is because it often takes more time to prepare an FSx filesystem when it has data repository associations.

Single FSx for Lusture fielsystem can have up to 8 data repository associations.

In my experience, it usually takes around 7-10 minutes to make single data repository associations available even for empty S3 bucket.
Moreover, setting up data repository associations on the specific filesystem looks like sequential.

So, I think 90 min = 10x8 min (data repository associations) + 5 min (FSx filesystem) + <buffer> would be safe because the current CreateVolume operation is synchronous operation and not be safe when timeout happened.

What do you think??

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think keeping the default timer the same + clearly documenting the need to change the timeout if using DRAs would be the correct move. This ensures consistent behavior for users who aren't using DRAs. Extending it is a one way door (because reducing the timeout would break compatibility for users who are using a large number of DRAs).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extending it is a one way door (because reducing the timeout would break compatibility for users who are using a large number of DRAs)

It makes sense.

the default timer the same + clearly documenting the need to change the timeout if using DRAs would be the correct move

OK. let me add the documentation.

Copy link
Author

@everpeace everpeace Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in below commits:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the information for users, it seems like users using DRAs will still be fine in most cases:
https://github.com/kubernetes-csi/external-provisioner?tab=readme-ov-file
https://github.com/kubernetes-csi/external-provisioner?tab=readme-ov-file#csi-error-and-timeout-handling
The CreateVolume will timeout and other ones will be made with an exponential backoff. It's only in the case of a large number of DRAs where this will be an issue.

@everpeace
Copy link
Author

/test pull-aws-fsx-csi-driver-e2e

@@ -0,0 +1 @@
CONTROLLER_PROVISIONER_TIMEOUT=5m
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the value of creating a separate file for this vs. putting it in the values.yaml:
https://github.com/kubernetes-sigs/aws-fsx-csi-driver/blob/master/charts/aws-fsx-csi-driver/values.yaml#L42-L67

Copy link
Author

@everpeace everpeace Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is for manifests only for kustomize. values.yaml is dedicated to helm chart. I understand this driver supports both kustomize and helm.

In kustomize, injecting parameter in building manifest needs a bit hack. This env file is needed for kustomize users to change timeout value. I also updated install.md as below:

https://github.com/everpeace/aws-fsx-csi-driver/blob/suppor-dra/docs/install.md#deploy-driver

# To set CSI controller's provisioner timeout,
# Please follow the instruction
$ cd $(mktemp -d)
$ kustomize init
$ kustomize edit add resource "github.com/kubernetes-sigs/aws-fsx-csi-driver/deploy/kubernetes/overlays/stable/?ref=release-1.1"
$ kustomize edit add configmap fsx-csi-controller --from-literal=CONTROLLER_PROVISIONER_TIMEOUT=30m --behavior=merge
$ kubectl apply -k .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should avoid hacks when possible and this seems like an avoidable instance. If users want to configure their kustomize templates, they can download them, configure them, and deploy them freely. We should follow precedent in terms of implementation, which is to put it in the values.yaml.

@@ -65,6 +65,7 @@ controller:
- effect: NoExecute
operator: Exists
tolerationSeconds: 300
provisionerTimeout: 5m
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the information for users, it seems like users using DRAs will still be fine in most cases:
https://github.com/kubernetes-csi/external-provisioner?tab=readme-ov-file
https://github.com/kubernetes-csi/external-provisioner?tab=readme-ov-file#csi-error-and-timeout-handling
The CreateVolume will timeout and other ones will be made with an exponential backoff. It's only in the case of a large number of DRAs where this will be an issue.

@@ -0,0 +1 @@
CONTROLLER_PROVISIONER_TIMEOUT=5m
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should avoid hacks when possible and this seems like an avoidable instance. If users want to configure their kustomize templates, they can download them, configure them, and deploy them freely. We should follow precedent in terms of implementation, which is to put it in the values.yaml.

// target file system values
PollCheckTimeout = 10 * time.Minute
PollCheckTimeout = 15 * time.Minute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If PollCheckTimeout < provisionerTimeout, the provisionerTimeout will always kill the CreateVolume call before it the PollCheckTimeout is hit. I don't think incrementing this should make a difference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Supporting (multiple) data repository association
3 participants