Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WT-12096 Don't validate timestamp against the on-page value if the update is globally visible #10595

Merged
merged 1 commit into from
Jun 1, 2024

Conversation

tod-johnson-mongodb
Copy link
Contributor

@tod-johnson-mongodb tod-johnson-mongodb commented May 16, 2024

If an update is globally visible it can be evicted and EBUSY should not be returned. Thus the check for EBUSY should be skipped if an update is globally visible. The bug is the check for EBUSY was not skipped if an update is globally visible.

Please review just all commits. There were temporary changes that have been backed out.

@tod-johnson-mongodb tod-johnson-mongodb self-assigned this May 16, 2024
@sueloverso
Copy link
Member

sueloverso commented May 16, 2024

@tod-johnson-mongodb do you want me to run this branch against the format reproducer I've used so far?

@sueloverso github won't let me add a reply. So I am editing the comment.
You can do so. However, there have been some source changes, not mine, that do not build with the v3 toolchain you have. They require the v4 toolchain. If you want to do that work, then go ahead and run this branch. You could also run the new test wiredtiger/test/suite/test_bug033.py.

The error message from not using the v4 toolchain on file wiredtiger/bench/workgen/workgen.c
../bench/workgen/workgen.cpp:43:10: fatal error: filesystem: No such file or directory

src/btree/bt_split.c Outdated Show resolved Hide resolved
@sueloverso
Copy link
Member

My build on kodkod-aws completes. If there is a failure in building I don't see it and the process exits with success (unlike say a compilation error). I have started the format run.

test/suite/test_bug033.py Outdated Show resolved Hide resolved
test/suite/test_bug033.py Outdated Show resolved Hide resolved
test/suite/test_bug033.py Outdated Show resolved Hide resolved
test/suite/test_bug033.py Outdated Show resolved Hide resolved
@sueloverso sueloverso requested review from luke-pearson and removed request for sueloverso May 20, 2024 13:35
@sueloverso
Copy link
Member

This is not code that I know well enough to say this is the correct fix. So I removed myself as reviewer and added @luke-pearson.

I will say that my repro that readily failed has run for the last 4 days, over 2000 iterations, which is far longer than any of the failure runs took to hit. So that is a positive.

@sueloverso
Copy link
Member

The test failure in PR testing should be investigated to see if it is related to this change/fix since the test has to do with updates and tombstones (based on the name).

@luke-pearson
Copy link
Contributor

This is a reconciliation change that should have a lot of eyes on it, I've added Chenhao.

src/btree/bt_split.c Outdated Show resolved Hide resolved
if (prev_upd->start_ts < vpack->tw.start_ts ||
(WT_TIME_WINDOW_HAS_STOP(&vpack->tw) && prev_upd->start_ts < vpack->tw.stop_ts)) {
/*
* If prev_upd is globally visible it can be evicted. Thus this check applies only if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

evicted is not the right word. I think it should be the following updates can be removed by obsolete check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is evicted not the right word? That is the word used throughout this function. I will talk with you about this.

 381      * If eviction reconciliation starts before checkpoint, it is fine to evict updates without
 382      * timestamps.

 389         WT_STAT_CONN_DATA_INCR(session, cache_eviction_blocked_remove_hs_race_with_checkpoint);

 401         WT_STAT_CONN_DATA_INCR(session, cache_eviction_blocked_no_ts_checkpoint_race_2);

 433             WT_STAT_CONN_DATA_INCR(session, cache_eviction_blocked_no_ts_checkpoint_race_4);

 475          * If prev_upd is globally visible it can be evicted. Thus this check applies only if
 476          * prev_upd is not globally visible.

 482             WT_STAT_CONN_DATA_INCR(session, cache_eviction_blocked_no_ts_checkpoint_race_1);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obsolete check may happen at any time not necessarily in eviction.

Copy link
Contributor

@quchenhao quchenhao May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rollback to stable may recover updates from the history store that is out of order to the on-disk value. Normally these updates have the WT_UPDATE_RESTORED_FROM_HS flag on them. However, in rare cases, if the newer update becomes globally visible, the restored update may be removed by the obsolete check. This may lead to an out of order edge case but it is benign. Check the global visibility of the update and ignore this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -471,8 +471,13 @@ __rec_validate_upd_chain(WT_SESSION_IMPL *session, WT_RECONCILE *r, WT_UPDATE *s
prev_upd->start_ts == prev_upd->durable_ts || !WT_TIME_WINDOW_HAS_STOP(&vpack->tw) ||
prev_upd->durable_ts >= vpack->tw.durable_stop_ts,
"Durable timestamps cannot be out of order for prepared updates");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me and tod discussed about changing these assert as

if (WT_TIME_WINDOW_HAS_STOP(&vpack->tw))
...
else
...

This improves readability

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not done.

  1. When we discussed this I misread this. The first WT_ASSERT_ALWAYS() is checking vpack->tw.durable_start_ts. The second WT_ASSERT_ALWAYS() is checking vpack->tw.durable_stop_ts.

  2. If WT_ASSERT_ALWAYS() fails the assert condition is printed. Moving part of the condition out of the WT_ASSERT_ALWAYS() prints just part of the assert condition if the assert fails.

I did change the asserts to have different comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The part that is moved out of the assert is not important.

"Durable timestamps cannot be out of order for prepared updates");
if (prev_upd->start_ts < vpack->tw.start_ts ||
(WT_TIME_WINDOW_HAS_STOP(&vpack->tw) && prev_upd->start_ts < vpack->tw.stop_ts)) {
if (prev_upd->prepare_state == WT_PREPARE_INPROGRESS ||
Copy link
Contributor

@quchenhao quchenhao May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we agree to do

  if (WT_TIME_WINDOW_HAS_STOP(&vpack->tw))
            WT_ASSERT_ALWAYS(session,
               prev_upd->prepare_state == WT_PREPARE_INPROGRESS || prev_upd->start_ts == prev_upd->durable_ts ||
                prev_upd->durable_ts >= vpack->tw.durable_stop_ts,
              "Stop: Durable timestamps cannot be out of order for prepared updates");
   else
             WT_ASSERT_ALWAYS(session, prev_upd->prepare_state == WT_PREPARE_INPROGRESS || prev_upd->start_ts == prev_upd->durable_ts || prev_upd->durable_ts >= vpack->tw.durable_start_ts,
              "Start: Durable timestamps cannot be out of order for prepared updates");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be best?

        if (prev_upd->prepare_state != WT_PREPARE_INPROGRESS && prev_upd->start_ts != prev_upd->durable_ts) {
            WT_ASSERT_ALWAYS(session, prev_upd->durable_ts >= vpack->tw.durable_start_ts && (!WT_TIME_WINDOW_HAS_STOP(&vpack->tw) ||  prev_upd->durable_ts >= vpack->tw.durable_stop_ts));

        }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer my version because I think the code is clearer in that way. We compare the stop time pair if on the disk image we have one. Otherwise, we compare with the start time pair.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay fair enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@luke-pearson
Copy link
Contributor

This solution skips comparing the last update in the update chain with the on disk update if the last update is globally visible. This ignores the fact that it may have already walked obsolete updates prior to that. Would it not be better if the loop on line 420 became:

    /* Loop forward from update after the selected on-page update. */
    for (prev_upd = select_upd, upd = select_upd->next; upd != NULL; upd = upd->next) {
        if (upd->txnid == WT_TXN_ABORTED)
            continue;
        WT_ASSERT_ALWAYS(session,
          prev_upd->prepare_state == WT_PREPARE_INPROGRESS ||
            prev_upd->start_ts == prev_upd->durable_ts || prev_upd->durable_ts >= upd->durable_ts,
          "Durable timestamps cannot be out of order for prepared updates");
        /* Validate that the updates older than us have older timestamps. */
        if (prev_upd->start_ts < upd->start_ts) {
            WT_ASSERT_ALWAYS(
              session, prev_upd->start_ts == WT_TS_NONE, "Previous update missing start timestamp");
            WT_STAT_CONN_DATA_INCR(session, cache_eviction_blocked_no_ts_checkpoint_race_4);
            return (EBUSY);
        }
        /*
         * Rollback to stable may restore older updates from the data store or history store. In
         * this case, the restored update has older update than the onpage value, which is expected.
         * Reconciliation may restore the onpage value to the update chain. In this case, no need to
         * check further as the value is the same as the onpage value. If we have a committed
         * prepared update restored from the onpage value, no need to check further as well because
         * the update chain after it should only contain committed prepared updates from the same
         * transaction.
         */
        if (F_ISSET(upd,
              WT_UPDATE_RESTORED_FROM_DS | WT_UPDATE_RESTORED_FROM_HS |
                WT_UPDATE_PREPARE_RESTORED_FROM_DS))
            return (0);

        /* Don't walk obsolete updates. */
        if (__wt_txn_visible_all(session, upd))
            return (0);

        prev_upd = upd;
    }

The only concern I have with this solution is that that globally visible update could still be OOO with respect to the on disk update but it shouldn't matter as its globally visible?

@quchenhao
Copy link
Contributor

quchenhao commented May 22, 2024

This solution skips comparing the last update in the update chain with the on disk update if the last update is globally visible. This ignores the fact that it may have already walked obsolete updates prior to that. Would it not be better if the loop on line 420 became:

    /* Loop forward from update after the selected on-page update. */
    for (prev_upd = select_upd, upd = select_upd->next; upd != NULL; upd = upd->next) {
        if (upd->txnid == WT_TXN_ABORTED)
            continue;
        WT_ASSERT_ALWAYS(session,
          prev_upd->prepare_state == WT_PREPARE_INPROGRESS ||
            prev_upd->start_ts == prev_upd->durable_ts || prev_upd->durable_ts >= upd->durable_ts,
          "Durable timestamps cannot be out of order for prepared updates");
        /* Validate that the updates older than us have older timestamps. */
        if (prev_upd->start_ts < upd->start_ts) {
            WT_ASSERT_ALWAYS(
              session, prev_upd->start_ts == WT_TS_NONE, "Previous update missing start timestamp");
            WT_STAT_CONN_DATA_INCR(session, cache_eviction_blocked_no_ts_checkpoint_race_4);
            return (EBUSY);
        }
        /*
         * Rollback to stable may restore older updates from the data store or history store. In
         * this case, the restored update has older update than the onpage value, which is expected.
         * Reconciliation may restore the onpage value to the update chain. In this case, no need to
         * check further as the value is the same as the onpage value. If we have a committed
         * prepared update restored from the onpage value, no need to check further as well because
         * the update chain after it should only contain committed prepared updates from the same
         * transaction.
         */
        if (F_ISSET(upd,
              WT_UPDATE_RESTORED_FROM_DS | WT_UPDATE_RESTORED_FROM_HS |
                WT_UPDATE_PREPARE_RESTORED_FROM_DS))
            return (0);

        /* Don't walk obsolete updates. */
        if (__wt_txn_visible_all(session, upd))
            return (0);

        prev_upd = upd;
    }

The only concern I have with this solution is that that globally visible update could still be OOO with respect to the on disk update but it shouldn't matter as its globally visible?

This can work because we are here only in eviction. My concern is that checking global visibility for each update is expensive.

globally visible update could still be OOO with respect to the on disk update but it shouldn't matter as its globally visible

This is OK and it is exactly the exception case that is false positive.

@luke-pearson
Copy link
Contributor

luke-pearson commented May 22, 2024

My concern is that checking global visibility for each update is expensive.

I suppose that is true, we can definitely reduce the number of checks by keeping only one... I guess for performance reasons we'll only do one check but it should be detailed in a comment why we chose to do it that way @tod-johnson-mongodb .

Copy link
Contributor

@quchenhao quchenhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@tod-johnson-mongodb tod-johnson-mongodb force-pushed the wt-12096-start_ts_assert branch 3 times, most recently from 6fde309 to 26abd29 Compare May 24, 2024 01:37
@tod-johnson-mongodb tod-johnson-mongodb force-pushed the wt-12096-start_ts_assert branch 2 times, most recently from f389f62 to 34b2389 Compare May 24, 2024 01:54
…visible.

* Fix EBUSY check and add an explaining comment.
* Improve two WT_ASSERT_ALWAYS() preceeding the EBUSY check.
* Add a test program.
Copy link

Test coverage is too low, this change probably shouldn't be merged as-is.

Metric (for added/changed code) Coverage
Line coverage 71% (5/7)
Branch coverage 40% (12/30)

@tod-johnson-mongodb tod-johnson-mongodb added this pull request to the merge queue May 24, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 24, 2024
Copy link
Contributor

@luke-pearson luke-pearson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@quchenhao quchenhao changed the title WT-12096 failed: __rec_validate_upd_chain start_ts assertion fails on ubuntu2004-stress-tests [wiredtiger @ eca47553] WT-12096 Don't validate timestamp against the on-page value if the update is globally visible May 29, 2024
@quchenhao
Copy link
Contributor

@tod-johnson-mongodb Are you ready to merge this?

@tod-johnson-mongodb
Copy link
Contributor Author

I am merging this. I have two approvals. Sue said she didn't need to re-review since I already had two approvals. I didn't hear whether Sean wanted to re-review.

@tod-johnson-mongodb tod-johnson-mongodb added this pull request to the merge queue Jun 1, 2024
Merged via the queue into develop with commit 3fabafa Jun 1, 2024
7 checks passed
@tod-johnson-mongodb tod-johnson-mongodb deleted the wt-12096-start_ts_assert branch June 1, 2024 01:57
github-merge-queue bot pushed a commit that referenced this pull request Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants