Report for Reference Setup, Performance, Scalability, and Sizing Guid…

…elines longhorn-2598 Signed-off-by: Phan Le <phan.le@suse.com>
longhorn · Feb 13, 2024 · dd2c03e · dd2c03e
1 parent e64e70d
commit dd2c03e
Show file tree

Hide file tree

Showing 10 changed files with 183 additions and 0 deletions.
diff --git a/examples/reference-setup-performance-scalability-and-sizing-guidelines/README.md b/examples/reference-setup-performance-scalability-and-sizing-guidelines/README.md
@@ -0,0 +1,14 @@
+# Reference Setup, Performance, Scalability, and Sizing Guidelines
+
+In this document, we present the reference setup, performance, scalability, and sizing guidelines when using the Longhorn storage system. 
+In practice, users deploy Longhorn in a vast array of different cluster specifications, making it impossible for us to test all potential setups. 
+Therefore, we will select and test some typical environments for users' reference. 
+By providing these references, users can gain insight into how Longhorn would perform in a similar cluster specification.
+
+## Public Cloud 
+1. [Medium Node Spec](./public-cloud/medium-node-spec.md)
+1. [Big Node Spec](./public-cloud/big-node-spec.md)
+
+## On-Prem 
+1. [Medium Node Spec](./on-prem/medium-node-spec.md)
+1. [Big Node Spec](./on-prem/big-node-spec.md)
diff --git a/...ce-setup-performance-scalability-and-sizing-guidelines/on-prem/big-node-spec.md b/...ce-setup-performance-scalability-and-sizing-guidelines/on-prem/big-node-spec.md
diff --git a/...setup-performance-scalability-and-sizing-guidelines/on-prem/medium-node-spec.md b/...setup-performance-scalability-and-sizing-guidelines/on-prem/medium-node-spec.md
diff --git a/...-scalability-and-sizing-guidelines/public-cloud/assets/random-read-iops-1-3.png b/...-scalability-and-sizing-guidelines/public-cloud/assets/random-read-iops-1-3.png
diff --git a/...-scalability-and-sizing-guidelines/public-cloud/assets/random-read-iops-3-6.png b/...-scalability-and-sizing-guidelines/public-cloud/assets/random-read-iops-3-6.png
diff --git a/...bility-and-sizing-guidelines/public-cloud/assets/random-read-iops-6-9-12-15.png b/...bility-and-sizing-guidelines/public-cloud/assets/random-read-iops-6-9-12-15.png
diff --git a/...sizing-guidelines/public-cloud/assets/random-read-iops-with-6-nodes-30-pods.png b/...sizing-guidelines/public-cloud/assets/random-read-iops-with-6-nodes-30-pods.png
diff --git a/...-and-sizing-guidelines/public-cloud/assets/single-pod-localpath-vs-longhorn.png b/...-and-sizing-guidelines/public-cloud/assets/single-pod-localpath-vs-longhorn.png
diff --git a/...tup-performance-scalability-and-sizing-guidelines/public-cloud/big-node-spec.md b/...tup-performance-scalability-and-sizing-guidelines/public-cloud/big-node-spec.md
diff --git a/...-performance-scalability-and-sizing-guidelines/public-cloud/medium-node-spec.md b/...-performance-scalability-and-sizing-guidelines/public-cloud/medium-node-spec.md
@@ -0,0 +1,169 @@
+# Reference Setup, Performance, Scalability, and Sizing Guidelines: Public Cloud - Medium Node Spec
+
+## Cluster Spec:
+Node spec:
+* EC2 instance type: ec2 m5.2xlarge
+* 8vCPUs, 32GB RAM
+* Root disk
+  * Size 50GB 
+  * Type EBS gp3
+* OS: Ubuntu 22.04.3 LTS (Jammy Jellyfish)
+* Kernel version: 6.2.0-1017-aws
+
+Network:
+* Network bandwidth: 
+  * Baseline bandwidth 2.5 Gbps
+  * Burst bandwidth: Up to 10 Gbps in short time 
+  * Actual speed measured by [iperf](https://www.cyberciti.biz/faq/how-to-test-the-network-speedthroughput-between-two-linux-servers/): 4.79 Gbps
+* Network latency: ~0.3ms RTT via ping command
+
+Disk spec:
+* We are using dedicated disk for Longhorn volumes' replicas on the nodes. 
+We select a typical EBS volume with average IOPs and bandwidth performance:
+  * Single EC2 EBS gp3
+  * 1TB 
+  * IOPS set to 8000 
+  * Throughput set to 500MiB/s
+  * Formatted as ext4 filesystem
+
+Kubernetes spec:
+* Kubernetes Version: v1.27.8+rke2r1
+* CNI plugin: Calico
+* Control plane nodes are separated from worker nodes
+
+Longhorn config:
+* Longhorn version: v1.6.0
+* Settings:
+  * Using dedicated disk for Longhorn instead of root disk
+  * The number of replicas per volume is 3
+  * Storage Minimal Available Percentage setting: 10%
+    * As we are using dedicated disk, we don't need big reserve storage as mentioned in best practice https://longhorn.io/docs/1.6.0/best-practices/#minimal-available-storage-and-over-provisioning
+  * Storage Over Provisioning Percentage setting: 110%
+    * We are planning to fill 15GB for each 20GB volume.
+      If we schedule maximum amount, it would be 1100 GiB and actual usage will be (15/20)*1200 = 825GiB. This leaves 100GiB as 10% Storage Minimal Available Percentage setting plus some volumes' filesystem space overhead.
+
+Additional components:
+* We deployed [Rancher monitoring](https://ranchermanager.docs.rancher.com/integrations-in-rancher/monitoring-and-alerting) which is a downstream version of [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack).
+  Note that this monitoring system generally consume certain amount of CPU, memory, and disk space on the node. We are deploying it with:
+  * CPU request: 750m
+  * CPU limit: 1000m
+  * Memory request: 750Mi
+  * Memory limit: 3000Mi
+  * Data retention size: 25GiB
+* We deployed [Local Path Provisioner](https://github.com/rancher/local-path-provisioner) to test baseline storage performance when using  local storage in each node directly.
+
+## Workload design
+We use [Kbench](https://github.com/yasker/kbench) which a tool to test Kubernetes storage.
+The idea of Kbench is that it deploys a pod with a Longhorn volume.
+Then it run various `fio` job (specified by user) to test multiple performance aspects of the volume (IOPs, bandwidth, latency).
+The pod runs the test repeatedly and exposes the result as Prometheus metrics.
+We then collect and visualize the data using Grafana.
+
+Traditionally, kbench only deploy 1 single pod to test 1 volume. 
+However, in this report, we will gradually scale up the number of Kbench pods.
+Since each pod stress-tests its Longhorn volume, as the number of Kbench pods go up,
+we can simulate the situation in which the cluster has many pods doing IO aggressively. 
+From there we can see the performance characteristic of Longhorn storage system as there are more and more IO intensive workloads.
+
+Even though the IO stress-tests help us to discover IO performance boundary of Longhorn system, in practice, it is uncommon to have all workloads in the cluster doing stress IOs constantly.
+Therefore, beside of tests with many IO intensive workloads in the cluster, we also perform the tests in which Kbench pods are rate-limited. 
+From the rate-limited tests, we can discover the answers for questions such as:
+1. Max/recommended number of Longhorn volumes inside the cluster if each volume is doing certain amount of IOs
+1. Max/recommended number of Longhorn volumes attached to a node if each volume is doing certain amount of IOs
+1. Max/recommended number of Longhorn replicas on to a node if the volume of each replica is doing certain amount of IOs
+
+
+Note that in this report, we will distribute the load evenly across worker nodes:
+
+1. Each worker nodes will have relative similar number of Kbench pods
+2. Each worker nodes will have relative similar number of Longhorn replicas
+
+This setup optimize efficiency of the cluster and Longhorn system. In practice, this balance is also what users usually strive for.
+
+## Read Performance
+
+### Random Read IOPs - Stress Tests
+The first performance metric we will look at is random Read IOPs. 
+This metric is usually important for application like database, virtualization, online transaction processing (OLTP) systems, etc...
+
+#### 1 control plane node + 3 worker nodes
+
+We start with the cluster that has 1 control plane node + 3 worker nodes.
+
+First, we do a comparison between a single RWO Longhorn PVC against a single Local Path Provisioner volume.
+We deploy 1 Kbench pod which attached a Local Path Provisioner PVC with the Kbench parameters:
+* `MODE: random-read-iops`: specify that we are running random read iops job. # TODO add the link to the job here
+* `SIZE: 15G`: the `fio` test size is 15G, this will avoid cache effect.
+* PVC size is 20G. 
+
+Then we delete the above Kbench pod and PVC and repeat the test with Longhorn PVC instead.
+
+> Result: 
+> * Local Path Provisioner: 8000
+> * Longhorn: 18500
+
+![single-pod-localpath-vs-longhorn.png](assets%2Fsingle-pod-localpath-vs-longhorn.png)
+
+**Comment**: 
+* Because Longhorn has 3 replicas, it can read from 3 replicas concurrently thus may produce better read performance
+
+Next, we use Kbench with Longhorn PVCs and scale up the number of Kbench pods to see how random read IOPs is effected when there are more and more IO intensive pods.
+
+Scaling workload from 1 to 3 pods.
+> Result:
+> * Each Kbench pod is able to achieve 8500 random read IOPs on its Longhorn volume
+> * Total random IOPs can be achieved by all 3 Longhorn volumes is 25500
+
+![random-read-iops-1-3.png](assets%2Frandom-read-iops-1-3.png) 
+
+**Comment**:
+* Since each EBS volume on the host is provisioned with 8000 IOPs with a 10% margin. Total IOPs of 3 EBS volumes is around 24000
+* It looks like Longhorn system is able to reach the maximum IOPs capacity of the 3 host EBS volumes. 
+
+Scaling workload from 3 to 6 pods.
+> Result:
+> * Each Kbench pod is able to achieve 4200 random read IOPs on its Longhorn volume
+> * Total random IOPs can be achieved by all 6 Longhorn volumes is 25200
+
+![random-read-iops-3-6.png](assets%2Frandom-read-iops-3-6.png)
+
+**Comment**:
+* We can see that the total random read IOPs of Longhorn volume remain realatively the same while the average IOPs per volume decreases.
+
+Scaling workload pods from 6 to 9, then 9 to 12, then 12 to 15
+> Result:
+> * At 9 pods, the average random read IOPs per Longhorn volume is 2836. Total random IOPs is 25524
+> * At 12 pods, the average random read IOPs per Longhorn volume is 2836. Total random IOPs is 25632
+> * At 15 pods, the average random read IOPs per Longhorn volume is 1700. Total random IOPs is 25500
+
+![random-read-iops-6-9-12-15.png](assets%2Frandom-read-iops-6-9-12-15.png)
+
+**Comment**:
+* From the scaling test so far, we can see that the total random read IOPs of all Longhorn volumes remain relative same around 25000 when the number of Kbench pods increase.
+  If we call the average random read IOPs each volume can achieve (x) and the number of volumes (y), they form a reciprocal function: x * y = 25000. 
+  Users can use this information to make some prediction for this cluster:
+  * The upper bound limit that Longhorn system can achieve in this cluster is The value 25000 random read IOPs
+  * If each of your workload pod is doing 1000 random IOPs in average, you can have estimatedly 25 pods
+* When the user keeps scaling up number of pods eventually, this reciprocal relation (x * y = 25000) might no longer hold as the CPU contention and other factors kick in (i.e. x*y will be less and less)
+* The bottleneck in this cluster seems to be the IOPs performance of the EBS volumes on host instead of CPU, memory, or network bandwidth.
+
+#### 1 control plane node + 6 worker nodes
+We double the number worker nodes (from 3 to 6) and double the number of Kbench pods (from 15 to 30)
+
+> Result:
+> * The average random read IOPs per Longhorn volume is the same 1677
+> * The total random IOPs can be achieved by all Longhorn volumes is doubled 50310
+
+![random-read-iops-with-6-nodes-30-pods.png](assets%2Frandom-read-iops-with-6-nodes-30-pods.png)
+
+**Comment**:
+* Since the load is evenly distributed, we can see a linear relationship between total random IOPs and number of nodes: when the number of nodes is doubled, total random IOPs is doubled 
+* From this reference, users can estimate how many worker nodes with the specified spec they need to achieve their target total random read IOPs
+
+### Random Read IOPs - Rate Limited Tests
+
+## Volume max size
+
+## Backup/Restore speed with AWS S3
+
+