WT-12954: optimizing Cluster performance jitter caused by checkpoint #10565

y123456yz · 2024-05-07T10:21:31Z

Today, a user encountered a similar issue，they put a lot of pressure on us and requested us to resolve it within a week. I want to solve it by limiting the checkpoint speed, but I am not sure if this will solve the problem, and I am concerned about bugs in my code. Therefore, I need your help to ensure that everything is safe.

Through this PR, we can limit the checkpoint speed of MongoDB by using the following command：
db.adminCommand( { setParameter : 1, "wiredTigerEngineRuntimeConfig" : "io_capacity=(checkpoint=1M)"})

If convenient, please prioritize processing this PR, thank you.

wiredtiger-pr-bot · 2024-05-07T10:21:35Z

Hi @y123456yz, thank you for your submission!
Please make sure to sign our Contributor Agreement (if you haven't already) and provide us with editor permissions on your branch. Instructions on how do that can be found here.

y123456yz · 2024-05-07T10:37:50Z

Background:
We have many low-level MongoDB instances on the cloud, such as 2C4G When a checkpoint cycle writes QPS slightly higher, we often encounter the following problems:

CPU burrs and jitter
Slow queries affecting business

By analyzing diagnose.data, it is confirmed that the main cause is checkpoint, which is basically completed in seconds. This is the root cause of the problem
This PR mainly limits the write speed of checkpoint io to ensure smoother and more stable checkpoints

Looking at the historical MongoDB user work orders, we found that this issue exists in at least dozens of user clusters, not including clusters where users have not reported any issues. There are actually more clusters with this issue.

y123456yz · 2024-05-07T10:55:35Z

WT-12954

y123456yz · 2024-05-07T13:53:58Z

May I ask: This PR needs to add a use case for io_capacity.checkpoint, which is similar to the test case for WT-11877. However, I found that WT-11877 was Revert, and I think PR(WT-11877) is meaningful. We need to solve the testing script's bug for PR(WT-11877). What should I do better here?

Cluster performance jitter caused by optimizing checkpoint

7647c16

y123456yz changed the title ~~optimizing Cluster performance jitter caused checkpoint~~ [Urgent, please prioritize processing]-optimizing Cluster performance jitter caused checkpoint May 7, 2024

y123456yz changed the title ~~[Urgent, please prioritize processing]-optimizing Cluster performance jitter caused checkpoint~~ [Urgent, please help prioritize review code]-optimizing Cluster performance jitter caused checkpoint May 7, 2024

y123456yz changed the title ~~[Urgent, please help prioritize review code]-optimizing Cluster performance jitter caused checkpoint~~ WT-12954: [Urgent, please help prioritize review code]-optimizing Cluster performance jitter caused checkpoint May 7, 2024

y123456yz changed the title ~~WT-12954: [Urgent, please help prioritize review code]-optimizing Cluster performance jitter caused checkpoint~~ WT-12954: [Urgent, please help prioritize review code]-optimizing Cluster performance jitter caused by checkpoint May 7, 2024

bug fix

cdcbf0d

y123456yz changed the title ~~WT-12954: [Urgent, please help prioritize review code]-optimizing Cluster performance jitter caused by checkpoint~~ WT-12954: [Urgent, please help prioritize deal this PR]-optimizing Cluster performance jitter caused by checkpoint May 7, 2024

bug fix

029f1c3

y123456yz changed the title ~~WT-12954: [Urgent, please help prioritize deal this PR]-optimizing Cluster performance jitter caused by checkpoint~~ WT-12954: optimizing Cluster performance jitter caused by checkpoint May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WT-12954: optimizing Cluster performance jitter caused by checkpoint #10565

WT-12954: optimizing Cluster performance jitter caused by checkpoint #10565

y123456yz commented May 7, 2024 •

edited

wiredtiger-pr-bot bot commented May 7, 2024

y123456yz commented May 7, 2024

y123456yz commented May 7, 2024

y123456yz commented May 7, 2024 •

edited

WT-12954: optimizing Cluster performance jitter caused by checkpoint #10565

Are you sure you want to change the base?

WT-12954: optimizing Cluster performance jitter caused by checkpoint #10565

Conversation

y123456yz commented May 7, 2024 • edited

wiredtiger-pr-bot bot commented May 7, 2024

y123456yz commented May 7, 2024

y123456yz commented May 7, 2024

y123456yz commented May 7, 2024 • edited

y123456yz commented May 7, 2024 •

edited

y123456yz commented May 7, 2024 •

edited