Skip to content

Commit

Permalink
Report for Reference Setup, Performance, Scalability, and Sizing Guid…
Browse files Browse the repository at this point in the history
…elines

longhorn-2598

Signed-off-by: Phan Le <phan.le@suse.com>
  • Loading branch information
PhanLe1010 committed Feb 13, 2024
1 parent e64e70d commit dd2c03e
Show file tree
Hide file tree
Showing 10 changed files with 183 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Reference Setup, Performance, Scalability, and Sizing Guidelines

In this document, we present the reference setup, performance, scalability, and sizing guidelines when using the Longhorn storage system.
In practice, users deploy Longhorn in a vast array of different cluster specifications, making it impossible for us to test all potential setups.
Therefore, we will select and test some typical environments for users' reference.
By providing these references, users can gain insight into how Longhorn would perform in a similar cluster specification.

## Public Cloud
1. [Medium Node Spec](./public-cloud/medium-node-spec.md)
1. [Big Node Spec](./public-cloud/big-node-spec.md)

## On-Prem
1. [Medium Node Spec](./on-prem/medium-node-spec.md)
1. [Big Node Spec](./on-prem/big-node-spec.md)
Empty file.
Empty file.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
# Reference Setup, Performance, Scalability, and Sizing Guidelines: Public Cloud - Medium Node Spec

## Cluster Spec:
Node spec:
* EC2 instance type: ec2 m5.2xlarge
* 8vCPUs, 32GB RAM
* Root disk
* Size 50GB
* Type EBS gp3
* OS: Ubuntu 22.04.3 LTS (Jammy Jellyfish)
* Kernel version: 6.2.0-1017-aws

Network:
* Network bandwidth:
* Baseline bandwidth 2.5 Gbps
* Burst bandwidth: Up to 10 Gbps in short time
* Actual speed measured by [iperf](https://www.cyberciti.biz/faq/how-to-test-the-network-speedthroughput-between-two-linux-servers/): 4.79 Gbps
* Network latency: ~0.3ms RTT via ping command

Disk spec:
* We are using dedicated disk for Longhorn volumes' replicas on the nodes.
We select a typical EBS volume with average IOPs and bandwidth performance:
* Single EC2 EBS gp3
* 1TB
* IOPS set to 8000
* Throughput set to 500MiB/s
* Formatted as ext4 filesystem

Kubernetes spec:
* Kubernetes Version: v1.27.8+rke2r1
* CNI plugin: Calico
* Control plane nodes are separated from worker nodes

Longhorn config:
* Longhorn version: v1.6.0
* Settings:
* Using dedicated disk for Longhorn instead of root disk
* The number of replicas per volume is 3
* Storage Minimal Available Percentage setting: 10%
* As we are using dedicated disk, we don't need big reserve storage as mentioned in best practice https://longhorn.io/docs/1.6.0/best-practices/#minimal-available-storage-and-over-provisioning
* Storage Over Provisioning Percentage setting: 110%
* We are planning to fill 15GB for each 20GB volume.
If we schedule maximum amount, it would be 1100 GiB and actual usage will be (15/20)*1200 = 825GiB. This leaves 100GiB as 10% Storage Minimal Available Percentage setting plus some volumes' filesystem space overhead.

Additional components:
* We deployed [Rancher monitoring](https://ranchermanager.docs.rancher.com/integrations-in-rancher/monitoring-and-alerting) which is a downstream version of [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack).
Note that this monitoring system generally consume certain amount of CPU, memory, and disk space on the node. We are deploying it with:
* CPU request: 750m
* CPU limit: 1000m
* Memory request: 750Mi
* Memory limit: 3000Mi
* Data retention size: 25GiB
* We deployed [Local Path Provisioner](https://github.com/rancher/local-path-provisioner) to test baseline storage performance when using local storage in each node directly.

## Workload design
We use [Kbench](https://github.com/yasker/kbench) which a tool to test Kubernetes storage.
The idea of Kbench is that it deploys a pod with a Longhorn volume.
Then it run various `fio` job (specified by user) to test multiple performance aspects of the volume (IOPs, bandwidth, latency).
The pod runs the test repeatedly and exposes the result as Prometheus metrics.
We then collect and visualize the data using Grafana.

Traditionally, kbench only deploy 1 single pod to test 1 volume.
However, in this report, we will gradually scale up the number of Kbench pods.
Since each pod stress-tests its Longhorn volume, as the number of Kbench pods go up,
we can simulate the situation in which the cluster has many pods doing IO aggressively.
From there we can see the performance characteristic of Longhorn storage system as there are more and more IO intensive workloads.

Even though the IO stress-tests help us to discover IO performance boundary of Longhorn system, in practice, it is uncommon to have all workloads in the cluster doing stress IOs constantly.
Therefore, beside of tests with many IO intensive workloads in the cluster, we also perform the tests in which Kbench pods are rate-limited.
From the rate-limited tests, we can discover the answers for questions such as:
1. Max/recommended number of Longhorn volumes inside the cluster if each volume is doing certain amount of IOs
1. Max/recommended number of Longhorn volumes attached to a node if each volume is doing certain amount of IOs
1. Max/recommended number of Longhorn replicas on to a node if the volume of each replica is doing certain amount of IOs


Note that in this report, we will distribute the load evenly across worker nodes:

1. Each worker nodes will have relative similar number of Kbench pods
2. Each worker nodes will have relative similar number of Longhorn replicas

This setup optimize efficiency of the cluster and Longhorn system. In practice, this balance is also what users usually strive for.

## Read Performance

### Random Read IOPs - Stress Tests
The first performance metric we will look at is random Read IOPs.
This metric is usually important for application like database, virtualization, online transaction processing (OLTP) systems, etc...

#### 1 control plane node + 3 worker nodes

We start with the cluster that has 1 control plane node + 3 worker nodes.

First, we do a comparison between a single RWO Longhorn PVC against a single Local Path Provisioner volume.
We deploy 1 Kbench pod which attached a Local Path Provisioner PVC with the Kbench parameters:
* `MODE: random-read-iops`: specify that we are running random read iops job. # TODO add the link to the job here
* `SIZE: 15G`: the `fio` test size is 15G, this will avoid cache effect.
* PVC size is 20G.

Then we delete the above Kbench pod and PVC and repeat the test with Longhorn PVC instead.

> Result:
> * Local Path Provisioner: 8000
> * Longhorn: 18500
![single-pod-localpath-vs-longhorn.png](assets%2Fsingle-pod-localpath-vs-longhorn.png)

**Comment**:
* Because Longhorn has 3 replicas, it can read from 3 replicas concurrently thus may produce better read performance

Next, we use Kbench with Longhorn PVCs and scale up the number of Kbench pods to see how random read IOPs is effected when there are more and more IO intensive pods.

Scaling workload from 1 to 3 pods.
> Result:
> * Each Kbench pod is able to achieve 8500 random read IOPs on its Longhorn volume
> * Total random IOPs can be achieved by all 3 Longhorn volumes is 25500
![random-read-iops-1-3.png](assets%2Frandom-read-iops-1-3.png)

**Comment**:
* Since each EBS volume on the host is provisioned with 8000 IOPs with a 10% margin. Total IOPs of 3 EBS volumes is around 24000
* It looks like Longhorn system is able to reach the maximum IOPs capacity of the 3 host EBS volumes.

Scaling workload from 3 to 6 pods.
> Result:
> * Each Kbench pod is able to achieve 4200 random read IOPs on its Longhorn volume
> * Total random IOPs can be achieved by all 6 Longhorn volumes is 25200
![random-read-iops-3-6.png](assets%2Frandom-read-iops-3-6.png)

**Comment**:
* We can see that the total random read IOPs of Longhorn volume remain realatively the same while the average IOPs per volume decreases.

Check failure on line 131 in examples/reference-setup-performance-scalability-and-sizing-guidelines/public-cloud/medium-node-spec.md

View workflow job for this annotation

GitHub Actions / codespell

realatively ==> relatively

Scaling workload pods from 6 to 9, then 9 to 12, then 12 to 15
> Result:
> * At 9 pods, the average random read IOPs per Longhorn volume is 2836. Total random IOPs is 25524
> * At 12 pods, the average random read IOPs per Longhorn volume is 2836. Total random IOPs is 25632
> * At 15 pods, the average random read IOPs per Longhorn volume is 1700. Total random IOPs is 25500
![random-read-iops-6-9-12-15.png](assets%2Frandom-read-iops-6-9-12-15.png)

**Comment**:
* From the scaling test so far, we can see that the total random read IOPs of all Longhorn volumes remain relative same around 25000 when the number of Kbench pods increase.
If we call the average random read IOPs each volume can achieve (x) and the number of volumes (y), they form a reciprocal function: x * y = 25000.
Users can use this information to make some prediction for this cluster:
* The upper bound limit that Longhorn system can achieve in this cluster is The value 25000 random read IOPs
* If each of your workload pod is doing 1000 random IOPs in average, you can have estimatedly 25 pods
* When the user keeps scaling up number of pods eventually, this reciprocal relation (x * y = 25000) might no longer hold as the CPU contention and other factors kick in (i.e. x*y will be less and less)
* The bottleneck in this cluster seems to be the IOPs performance of the EBS volumes on host instead of CPU, memory, or network bandwidth.

#### 1 control plane node + 6 worker nodes
We double the number worker nodes (from 3 to 6) and double the number of Kbench pods (from 15 to 30)

> Result:
> * The average random read IOPs per Longhorn volume is the same 1677
> * The total random IOPs can be achieved by all Longhorn volumes is doubled 50310
![random-read-iops-with-6-nodes-30-pods.png](assets%2Frandom-read-iops-with-6-nodes-30-pods.png)

**Comment**:
* Since the load is evenly distributed, we can see a linear relationship between total random IOPs and number of nodes: when the number of nodes is doubled, total random IOPs is doubled
* From this reference, users can estimate how many worker nodes with the specified spec they need to achieve their target total random read IOPs

### Random Read IOPs - Rate Limited Tests

## Volume max size

## Backup/Restore speed with AWS S3


0 comments on commit dd2c03e

Please sign in to comment.