Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDEV-34166 Server could hang with BP < 80M under stress #3256

Merged
merged 1 commit into from
May 21, 2024

Conversation

mariadb-DebarunBanerjee
Copy link
Contributor

  • The Jira issue number for this PR is: MDEV-34166

Description

BUF_LRU_MIN_LEN (256) is too high value for low buffer pool(BP) size. For example, for BP size lower than 80M and 16 K page size, the limit is more than 5% of total BP and for lowest BP 5M, it is 80% of the BP. Non-data objects like explicit locks could occupy part of the BP pool reducing the pages available for LRU. If LRU reaches minimum limit and if no free pages are available, server would hang with page cleaner not able to free any more pages.

Fix: To avoid such hang, we adjust the LRU limit lower than the limit for data objects as checked in buf_LRU_check_size_of_non_data_objects() i.e. one page less than 5% of BP.

Release Notes

This could happen in rare case with BP size < 80M. Too many lock objects created with UPDATE, DELETE, INSERT INTO SELECT from same TABLE with queries over large range in RR or Serializable isolation could leads to the issue.

How can this PR be tested?

./mtr innodb.lock_memory_debug

Basing the PR against the correct MariaDB version

  • This is a new feature and the PR is based against the latest MariaDB development branch.
  • This is a bug fix and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Contributor

@dr-m dr-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea looks good. Did you check all other occurrences of BUF_LRU_MIN_LEN? Should some of the other references be adjusted?

If the limit is being consulted often, we might consider introducing a global variable buf_pool.LRU_min_len (protected by buf_pool.mutex) and adjust it in buf_pool_t::resize().

mysql-test/suite/innodb/r/lock_memory_debug.result Outdated Show resolved Hide resolved
mysql-test/suite/innodb/t/lock_memory_debug.test Outdated Show resolved Hide resolved
storage/innobase/buf/buf0flu.cc Outdated Show resolved Hide resolved
mysql-test/suite/innodb/t/lock_memory_debug.test Outdated Show resolved Hide resolved
@mariadb-DebarunBanerjee
Copy link
Contributor Author

Did you check all other occurrences of BUF_LRU_MIN_LEN?

Thanks for pointing it out. I did check the usage.

  1. In 10.5, the only other place it is used is to calculate the minimum BP size. Perhaps the initially it was thought to be sufficient to have just +20%. We don't want to change the min/max spec and it is better to keep this part as is.

  2. In later versions, we are using the limit in other places but it is not critical and we should be able to carry on with the constant.

If the limit is being consulted often, we might consider introducing a global variable buf_pool.LRU_min_len (protected by buf_pool.mutex) and adjust it in buf_pool_t::resize().

I had considered this option and decided otherwise because ...

  1. We are not really trying to have a design change where the limit gets modified with BP size constantly. It is rather a special case for low BP size (< 80M) while the constant today works fine in vast majority of the user scenarios. It could start impacting general cases if the constant is replaced by a variable.

  2. Changing it to a variable would automatically move all the other usages in higher versions during merge and it could induce bugs implicitly which would be difficult to check/find during merge.

Since the scenario affects only limited (< 80M BP) cases, I think it is much better to consider them case by case and modify only if there is any visible issue with it.

BUF_LRU_MIN_LEN (256) is too high value for low buffer pool(BP) size.
For example, for BP size lower than 80M and 16 K page size, the limit is
more than 5% of total BP and for lowest BP 5M, it is 80% of the BP.
Non-data objects like explicit locks could occupy part of the BP pool
reducing the pages available for LRU. If LRU reaches minimum limit and
if no free pages are available, server would hang with page cleaner not
able to free any more pages.

Fix: To avoid such hang, we adjust the LRU limit lower than the limit
for data objects as checked in buf_LRU_check_size_of_non_data_objects()
i.e. one page less than 5% of BP.
@mariadb-DebarunBanerjee mariadb-DebarunBanerjee merged commit b2944ad into 10.5 May 21, 2024
17 of 19 checks passed
@mariadb-DebarunBanerjee mariadb-DebarunBanerjee deleted the 10.5-MDEV-34166 branch May 21, 2024 10:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants