New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Actual Report for Reference Setup, Performance, Scalability, and Sizing Guidelines #7905
base: master
Are you sure you want to change the base?
Conversation
60e0b80
to
83a86aa
Compare
83a86aa
to
6a6c3b5
Compare
Hi team, I am starting pushing report continuously starting with |
1fe5d44
to
d57e948
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be good to add a section of "whys" at some point. For example, why 1TiB? Why those particular replica scheduling settings? This can help us reason about the testing in the future.
...ference-setup-performance-scalability-and-sizing-guidelines/public-cloud/medium-node-spec.md
Outdated
Show resolved
Hide resolved
...ference-setup-performance-scalability-and-sizing-guidelines/public-cloud/medium-node-spec.md
Outdated
Show resolved
Hide resolved
Good idea, I will add the explanation:
|
dd2c03e
to
cf26a63
Compare
fe6d2c9
to
17f0897
Compare
Update: I am benchmarking Longhorn control plan. Looks like Longhorn cannot scale pass 310 pods (with 310 volumes) in the medium spec cluster (1 control + 3 worker nodes, EC2 instance type: ec2 m5zn.2xlarge - 8vCPUs - 32GB RAM) The error event on pending pods:
|
The issue #7905 (comment) is a known issue #7919 |
022e585
to
bc1580e
Compare
This PR is ready for review. I finished the reported for the cloud medium spec cluster (the last information about max volume size is coming soon). As discussed in the US team meeting, I decided to leave the report for cloud big spec and baremetal cluster for the next release and focus on 1.7.0 backlog. This is the first version of the report, I am doing it manually. For the next version, I will try to have automation to speed up the process. Thank you for all the helpful feedbacks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the nitpicky review! I don't think @jillian-maroket will be handling this one, so I paid a bit more attention to it grammatically than I normally would. From a technical perspective, things are looking quite good.
I only made it about halfway through so far. There aren't any dealbreakers for me yet, so please feel free to consider my comments/suggestions, either or adopt them or not, and directly resolve the conversation.
examples/reference-setup-performance-scalability-and-sizing-guidelines/README.md
Outdated
Show resolved
Hide resolved
examples/reference-setup-performance-scalability-and-sizing-guidelines/README.md
Outdated
Show resolved
Hide resolved
...ference-setup-performance-scalability-and-sizing-guidelines/public-cloud/medium-node-spec.md
Outdated
Show resolved
Hide resolved
...ference-setup-performance-scalability-and-sizing-guidelines/public-cloud/medium-node-spec.md
Outdated
Show resolved
Hide resolved
**Comment:** | ||
* We choose 10000 for EBS disk's IOPs simply because it is a middle value between minimum value 3000 and maximum value 16000 of the gp3 EBS disk | ||
* We choose 360MiB/s for EBS disk's bandwidth because the m5zn.2xlarge EC2 instance has EBS bandwidth 396.25 MiB/s. | ||
If we choose a bigger value than 396.25 MiB/s for EBS disk's bandwidth, the ec2 instance would not be able to push EBS disk to that value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: Don't bother if it's too difficult or annoying, but links to where the reader can find this information might be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to keep the info here since there isn't a single page which explain this behavior. It would be multiple links if we decided to put the links here
...ference-setup-performance-scalability-and-sizing-guidelines/public-cloud/medium-node-spec.md
Outdated
Show resolved
Hide resolved
|
||
> Result: | ||
> * Each Kbench pod is able to achieve 386 MiB/s random read bandwidth on its Longhorn volume | ||
> * Total random read bandwidth can be achieved by all 3 Longhorn volumes is 1158 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> * Total random read bandwidth can be achieved by all 3 Longhorn volumes is 1158 | |
> * Total sequential read bandwidth can be achieved by all 3 Longhorn volumes is 1158 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be random
Scaling workload from 3 to 6, then 6 to 9, then 9 to 12, then 12 to 15 | ||
|
||
> Result: | ||
> * At 6 pods, the average random read bandwidth per Longhorn volume is 196 MiB/s. Total random bandwidth is 1176 MiB/s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> * At 6 pods, the average random read bandwidth per Longhorn volume is 196 MiB/s. Total random bandwidth is 1176 MiB/s | |
> * At 6 pods, the average sequential read bandwidth per Longhorn volume is 196 MiB/s. Total random bandwidth is 1176 MiB/s |
And below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be random
...ference-setup-performance-scalability-and-sizing-guidelines/public-cloud/medium-node-spec.md
Outdated
Show resolved
Hide resolved
|
||
|
||
|
||
### Random Read Bandwidth - Stress Tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stopped here for now. Will continue later.
…elines longhorn-2598 Signed-off-by: Phan Le <phan.le@suse.com>
Thanks @ejweber I resolved most of the comments. Looking forward to your next review |
Signed-off-by: Phan Le <phan.le@suse.com>
#2598