Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WT-12861 consolidate cache config validation into a single function #10548

Open
wants to merge 19 commits into
base: develop
Choose a base branch
from

Conversation

bzm-jas
Copy link
Contributor

@bzm-jas bzm-jas commented May 2, 2024

No description provided.

@bzm-jas bzm-jas force-pushed the wt-12861-consolidate-cache-config-validation-into-a-single-function branch from aaa0d7c to fb6ec16 Compare May 2, 2024 06:41
Copy link

evergreen-ci-prod bot commented May 2, 2024

Test coverage is too low, this change probably shouldn't be merged as-is.

Metric (for added/changed code) Coverage
Line coverage 77% (41/53)
Branch coverage 53% (32/60)

@bzm-jas bzm-jas marked this pull request as ready for review May 30, 2024 07:11
Copy link
Contributor

@etienneptl etienneptl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one! Left a few comments.

if (cache->eviction_dirty_target > cache->eviction_target) {
if (debug_config)
__wt_verbose_warning(session, WT_VERB_CONFIGURATION,
"config eviction dirty target=%f is larger than eviction target=%f",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we say the new value we set it to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, added in: d2ff734

@@ -419,3 +463,13 @@ __wti_cache_destroy(WT_SESSION_IMPL *session)
__wt_free(session, conn->cache);
return (ret);
}

#ifdef HAVE_UNITTEST
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that used anywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was for the unit test, now removed: 820f876

@@ -2601,6 +2601,8 @@ extern int __ut_ovfl_discard_wrapup(WT_SESSION_IMPL *session, WT_PAGE *page)
WT_GCC_FUNC_DECL_ATTRIBUTE((warn_unused_result));
extern int __ut_ovfl_track_init(WT_SESSION_IMPL *session, WT_PAGE *page)
WT_GCC_FUNC_DECL_ATTRIBUTE((warn_unused_result));
extern int __ut_validate_cache_config(WT_SESSION_IMPL *session, const char *cfg[], bool create)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed here: 820f876

test/suite/test_config12.py Outdated Show resolved Hide resolved
* Validate trigger and target values of given configs.
*/
static int
__validate_cache_config(WT_SESSION_IMPL *session, const char *cfg[], bool create)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For naming, the static function should start with the subsystem name. Also we generally put the verb last. So I suggest __conn_cache_config_validate as the name. The name also makes me think it is only validating the existing values. This function is both setting up the values and then validating so it is a bit misleading. Some possibilities:

  • Split it into two functions like conn_cache_config to extract all the values and then conn_cach_config_validate that does all the checking.
  • Like the above but have it called by a wrapper that higher level functions call so callers don't need to make 2 different calls.
  • Keep it as is but change validate to something that reflects both purposes (by looking at other similar functions and use some consistent word).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, Sue! I will be renaming the function for better clarity.

Regarding splitting the functions: the original purpose of this ticket was to change the logic for checking values in WT_CACHE so they would only be set after validation. In other words, for invalid configurations, they would not be set, and the function would fail immediately.

However, since we need to keep auto-fixing for now, extracting the values and setting them in WT_CACHE is the best approach. In the future, we do want to progressively move to the original idea. In that case, if we have a function that extracts and sets the cache values, it would need to be modified to include a temporary structure for storing the config values to validate and another function for setting all valid configs to the WT_CACHE structure of the current session. So it will be changed entirely.

Since there could be major changes to this part of the code, I'm open to discussing the best way to structure it now to benefit and simplify future modifications. I could also rename the function to reflect both extraction and validation, like __conn_cache_config_prepare or __conn_cache_config_setup. Please let me know what you think!

cache = conn->cache;

WT_RET(__wt_config_gets(session, cfg, "debug_mode.configuration", &cval));
debug_config = cval.val;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this already done from wti_debug_mode_configuration and, instead of needing the boolean, you can just use FLD_ISSET(conn, WT_CONN_DEBUG_CONFIGURATION)? It is better to have one place getting the configuration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that the debug flags are set after setting the cache. @bzm-jas posted a comment about it in the JIRA ticket here.
I asked in our team channel if there was any specific reason why we do it in this order. The current code seems to indicate this order is required but I don't understand why and I wonder if this comment was valid before but now now:

    /*
     * This function expects the cache to be created so parse this after the rest of the connection
     * is set up.
     */
    WT_ERR(__wti_debug_mode_config(session, cfg));

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree using FLD_ISSET(conn, WT_CONN_DEBUG_CONFIGURATION) is preferable. I'd suggest

/* Debug flags are not yet set when this function runs during connection open. Set it now. */
WT_RET(__wt_config_gets(session, cfg, "debug_mode.configuration", &cval));
if (cval.val)
    FLD_SET(conn->debug_flags, WT_CONN_DEBUG_CONFIGURATION);
else
    FLD_CLR(conn->debug_flags, WT_CONN_DEBUG_CONFIGURATION);

if(FLD_ISSET(conn, WT_CONN_DEBUG_CONFIGURATION))
    ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish we didn't have to read the configuration twice and call __wti_debug_mode_config beforehand but if that's the only solution, I'd be happy with that as well!

}

WT_RET(__wt_config_gets(session, cfg, "eviction_target", &cval));
cache->eviction_target = (double)cval.val;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is confusing to see that cache->eviction_target was just checked above and now we're setting it. Maybe add a comment after the conditional that now we're resetting the values via reconfigure or restart or however we get here.

Copy link
Contributor Author

@bzm-jas bzm-jas May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, comments added in: c7c0742

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dug into the callstacks a bit from what I can tell the code looks like:

__wti_cache_create()
    __wti_cache_config()
        __cache_config_local()
            __validate_cache_config(create=false) // This makes sure that triggers are greater than targets
    __validate_cache_config(create=true)          // This re-validates that triggers are greater than targets

so whenever we call __validate_cache_config(create=true) we must have already set the triggers to be greater than the targets in __validate_cache_config(create=false).
I think we can drop the second call to __validate_cache_config(create=true) and remove the create=true logic from this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the create=true logic and the second call to __validate_cache_config(create=true) in: 9e4baa7

cache->eviction_target = (double)cval.val;

WT_RET(__wt_config_gets(session, cfg, "eviction_trigger", &cval));
cache->eviction_trigger = (double)cval.val;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone can reconfigure all the values, don't we need to repeat all the same invalid checks we did on create? For example, if one can reconfigure both cache->eviction_target and cache->eviction_trigger don't we need to check like at line 69? The checks below do not repeat the checks above but do potentially change values in an invalid way.

If so, then this code should be re-arranged to have one set of those EINVAL conditionals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that both checks should be performed, but using EINVAL for all checks causes a lot of failures, I will look into finding an approach to log these messages without breaking our tests.

src/conn/conn_cache.c Outdated Show resolved Hide resolved
@@ -577,6 +577,9 @@ def __ge__(self, other):
adjust log removal to retain the log records of this number of checkpoints. Zero
or one means perform normal removal.''',
min='0', max='1024'),
Config('configuration', 'false', r'''
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we planning to extend this functionality to other configs? If debug_mode.configuration only applies to the cache config I think debug_mode.cache_configuration is a better name.
If we make the above change WT_CONN_DEBUG_CONFIGURATION could also do with a rename.

cache = conn->cache;

WT_RET(__wt_config_gets(session, cfg, "debug_mode.configuration", &cval));
debug_config = cval.val;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree using FLD_ISSET(conn, WT_CONN_DEBUG_CONFIGURATION) is preferable. I'd suggest

/* Debug flags are not yet set when this function runs during connection open. Set it now. */
WT_RET(__wt_config_gets(session, cfg, "debug_mode.configuration", &cval));
if (cval.val)
    FLD_SET(conn->debug_flags, WT_CONN_DEBUG_CONFIGURATION);
else
    FLD_CLR(conn->debug_flags, WT_CONN_DEBUG_CONFIGURATION);

if(FLD_ISSET(conn, WT_CONN_DEBUG_CONFIGURATION))
    ...

@@ -61,6 +175,8 @@ __cache_config_local(WT_SESSION_IMPL *session, bool shared, const char *cfg[])
conn = S2C(session);
cache = conn->cache;

__validate_cache_config(session, cfg, false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
__validate_cache_config(session, cfg, false);
WT_RET(__validate_cache_config(session, cfg, false));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed here: 9e4baa7

WT_RET(__wt_config_gets(session, cfg, "cache_overhead", &cval));
cache->overhead_pct = (u_int)cval.val;

WT_RET(__wt_config_gets(session, cfg, "eviction_target", &cval));
cache->eviction_target = (double)cval.val;
WT_RET(
__cache_config_abs_to_pct(session, &(cache->eviction_target), "eviction target", shared));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These __cache_config_abs_to_pct calls are modifying the field we used to read on the previous line. They need to be kept directly after the = cval.val calls to make sure our validations checks don't compare (for example) a trigger measured as an absolute value against a target defined as a percentage.

This could lead to a case like

total_cache = 200MB
trigger = 100MB (50% of cache)
target = 0.8 (80% of cache)

but our validation reports a valid config.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally __cache_config_abs_to_pct uses conn->cache_size to convert to a percentage, so when we move these __cache_config_abs_to_pctcalls we also need to make sure conn->cache_size gets read before we call __validate_cache_config. Currently we're calling __validate_cache_config first.

The way we access conn->cache inside __cache_config_abs_to_pct feels like a footgun to me. I'd prefer if we passed conn->cache into __cache_config_abs_to_pct as an argument instead of invisibly accessing S2C(session)->conn_cache. We can do this in a separate ticket if you don't want to add more changes to this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this up, I've modified __validate_cache_config so now the values gets changed to percentage straight away after retrieving and setting the value. See this commit: 9e4baa7
I'd prefer to change the way we access conn->cache in another ticket because I think there's already quite a bit of changes going on for this PR, it might be clearer to do for a seperate ticket.

}

WT_RET(__wt_config_gets(session, cfg, "eviction_target", &cval));
cache->eviction_target = (double)cval.val;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dug into the callstacks a bit from what I can tell the code looks like:

__wti_cache_create()
    __wti_cache_config()
        __cache_config_local()
            __validate_cache_config(create=false) // This makes sure that triggers are greater than targets
    __validate_cache_config(create=true)          // This re-validates that triggers are greater than targets

so whenever we call __validate_cache_config(create=true) we must have already set the triggers to be greater than the targets in __validate_cache_config(create=false).
I think we can drop the second call to __validate_cache_config(create=true) and remove the create=true logic from this function.

if (cache->eviction_dirty_target > cache->eviction_target) {
if (debug_config)
__wt_verbose_warning(session, WT_VERB_CONFIGURATION,
"config eviction_dirty_target=%f is larger than eviction_target=%f, changing "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General wording suggestion for this and the other warning messages

Suggested change
"config eviction_dirty_target=%f is larger than eviction_target=%f, changing "
"config eviction_dirty_target (%f) cannot exceed than eviction_target (%f). Setting"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment rephrased in: 9e4baa7

Comment on lines 135 to 136
"config eviction_updates_target=%f is less than DBL_EPSILON=%f,changing "
"eviction_updates_target to %f.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DBL_EPSILON is an internal variable name. We should avoid displaying it to the end user:

Suggested change
"config eviction_updates_target=%f is less than DBL_EPSILON=%f,changing "
"eviction_updates_target to %f.",
"config eviction_updates_target (%f) cannot be zero. Setting to 50% of eviction_updates_target (%f)",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment rephrased in: 9e4baa7

class test_config12(wttest.WiredTigerTestCase):

@contextmanager
def expect_verbose(self, config, patterns, expect_output = True):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can define this function inside WiredTigerTestCase instead of having two definitions here and in test_verbose01.py.
@etienneptl you wrote the original test_verbose01.py. Was there a reason we defined expect_verbose in test_verbose01.py?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply because it was the only test using that function and it was designed to meet the needs of verbose01. This version is slightly different than the one defined in verbose01 but I agree that it should be accessible from any test, it can be useful.

with self.expect_verbose('debug_mode=(configuration=true)', ['config checkpoint target=.*is less than eviction dirty target=.*',
'config eviction updates target=.*is less than DBL_EPSILON=.*',
'config eviction updates trigger=.*is less than DBL_EPSILON=.*']):
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some more tests cases for non-default configs? For example we're currently not testing cases like cache->eviction_dirty_trigger > cache->eviction_trigger

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will add at least one test for each of the checks.

def test_config12(self):
self.conn.close()
# Test the default WT connection configuration with debug mode enabled. Expect warning messages.
with self.expect_verbose('debug_mode=(configuration=true)', ['config eviction_checkpoint_target=.*cannot be less than eviction_dirty_target=.*Setting eviction_checkpoint_target to.*',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we define the second arg as a variable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, defined as a variable here: 5f55183

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants