Skip to content

Latest commit

 

History

History

netapp_dataops_k8s

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

NetApp DataOps Toolkit for Kubernetes

The NetApp DataOps Toolkit for Kubernetes is a Python library that makes it simple for developers, data scientists, DevOps engineers, and data engineers to perform various data management tasks within a Kubernetes cluster. Some of the key capabilities that the toolkit provides are the ability to provision a new persistent volume or data science workspace, the ability to almost instantaneously clone a volume or workspace, the ability to almost instantaneously save off a snapshot of a volume or workspace for traceability/baselining, and the ability to move data between S3 compatible object storage and a Kubernetes persistent volume.

Compatibility

The NetApp DataOps Toolkit for Kubernetes supports Linux and macOS hosts.

The toolkit must be used in conjunction with a Kubernetes cluster in order to be useful. Additionally, Trident, NetApp's dynamic storage orchestrator for Kubernetes, and/or the BeeGFS CSI driver must be installed within the Kubernetes cluster. The toolkit simplifies performing of various data management tasks that are actually executed by a NetApp maintained CSI driver. In order to facilitate this, the toolkit communicates with the appropriate driver via the Kubernetes API.

The toolkit is currently compatible with Kubernetes versions 1.20 and above, and OpenShift versions 4.7 and above.

The toolkit is currently compatible with Trident versions 20.07 and above. Additionally, the toolkit is compatible with the following Trident backend types:

  • ontap-nas
  • ontap-nas-flexgroup
  • gcp-cvs
  • azure-netapp-files

The toolkit is currently compatible with all versions of the BeeGFS CSI driver, though not all functionality is supported by BeeGFS. Operations that are not supported by BeeGFS are noted within the documentation.

Installation

Prerequisites

The NetApp DataOps Toolkit for Kubernetes requires that Python 3.8, 3.9, 3.10, or 3.11 be installed on the local host. Additionally, the toolkit requires that pip for Python3 be installed on the local host. For more details regarding pip, including installation instructions, refer to the pip documentation.

Installation Instructions

To install the NetApp DataOps Toolkit for Kubernetes, run the following command.

python3 -m pip install netapp-dataops-k8s

Getting Started: Standard Usage

The NetApp DataOps Toolkit for Kubernetes can be utilized from any Linux or macOS host that has network access to the Kubernetes cluster.

The toolkit requires that a valid kubeconfig file be present on the local host, located at $HOME/.kube/config or at another path specified by the KUBECONFIG environment variable. Refer to the Kubernetes documentation for more information regarding kubeconfig files.

Getting Started: In-cluster Usage (for advanced Kubernetes users)

The NetApp DataOps Toolkit for Kubernetes can also be utilized from within a pod that is running in the Kubernetes cluster. If the toolkit is being utilized within a pod in the Kubernetes cluster, then the pod's ServiceAccount must have the following permissions:

- apiGroups: [""]
  resources: ["persistentvolumeclaims", "persistentvolumeclaims/status", "services"]
  verbs: ["get", "list", "create", "delete"]
- apiGroups: ["snapshot.storage.k8s.io"]
  resources: ["volumesnapshots", "volumesnapshots/status", "volumesnapshotcontents", "volumesnapshotcontents/status"]
  verbs: ["get", "list", "create", "delete"]
- apiGroups: ["apps", "extensions"]
  resources: ["deployments", "deployments/scale", "deployments/status"]
  verbs: ["get", "list", "create", "delete", "patch", "update"]
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["get", "list"]

In the Examples directory, you will find the following examples pertaining to utilizing the toolkit within a pod in the Kubernetes cluster:

  • service-account-netapp-dataops.yaml: Manifest for a Kubernetes ServiceAccount named 'netapp-dataops' that has all of the required permissions for executing toolkit operations.
  • job-netapp-dataops.yaml: Manifest for a Kubernetes Job named 'netapp-dataops' that can be used as a template for executing toolkit operations.

Refer to the Kubernetes documentation for more information on accessing the Kubernetes API from within a pod.

Capabilities

The NetApp DataOps Toolkit for Kubernetes provides the following capabilities.

Workspace Management

The NetApp DataOps Toolkit can be used to manage data science workspaces within a Kubernetes cluster. Some of the key capabilities that the toolkit provides are the ability to provision a new JupyterLab workspace, the ability to almost instantaneously clone a JupyterLab workspace, and the ability to almost instantaneously save off a snapshot of a JupyterLab workspace for traceability/baselining.

Refer to the NetApp DataOps Toolkit for Kubernetes Workspace Management documentation for more details.

Volume Management

The NetApp DataOps Toolkit can be used to manage persistent volumes within a Kubernetes cluster. Some of the key capabilities that the toolkit provides are the ability to provision a new persistent volume, the ability to almost instantaneously clone a persistent volume, and the ability to almost instantaneously save off a snapshot of a persistent volume for traceability/baselining.

Refer to the NetApp DataOps Toolkit for Kubernetes Volume Management documentation for more details.

Data Movement

The NetApp DataOps Toolkit provides the ability to facilitate data movement between Kubernetes persistent volumes and external services. The data movement operations currently provided are for use with S3 compatible services.

Refer to the NetApp DataOps Toolkit for Kubernetes Data Movement documentation for more details.

NVIDIA Triton Inference Server Management

The NetApp DataOps Toolkit provides the ability to manage NVIDIA Triton Inference Server instances whithin a Kubernetes cluster.

Refer to the NetApp DataOps Toolkit for NVIDIA Triton Inference Server Management documentation for more details.

Tips and Tricks

Support

Report any issues via GitHub: https://github.com/NetApp/netapp-data-science-toolkit/issues.