New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync WAL during db Close() #12556
base: main
Are you sure you want to change the base?
Sync WAL during db Close() #12556
Conversation
reopen
upon unsync data loss in crash testreopen
with unsynced data loss in crash test
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Can we sync the WAL during DB close? |
@hx235 has updated the pull request. You must reimport the pull request before landing. |
reopen
with unsynced data loss in crash test
Good point - updated. |
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@@ -0,0 +1 @@ | |||
DB `Close()` will sync the WALs now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that we're trading performance for consistency here: a clean closing of the database now ensures we're safe from host crashes but closing itself can become much more expensive, right? This feels like a big change; I'm wondering if we should make it optional (and disabled by default to prevent surprises for clients).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point - let me think a bit more about this. I also wondered how much close() performance matter since adding a DB options is "expensive" in usability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With more research, i realized this in our API. So it seems like the responsibility of WAL sync on users has been documented @ajkr
This will not fsync the WAL files.
// If syncing is required, the caller must first call SyncWAL(), or Write()
// using an empty write batch with WriteOptions.sync=true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not block our crash test by this discussion - #12567
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With more research, i realized this in our API. So it seems like the responsibility of WAL sync on users has been documented @ajkr
I think good documentation cannot make up for poor API quality. In my opinion, a safe default for close would be to sync the data that we will have no future opportunities to sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion on this but I think in general we don't explicitly sync the WAL files we're done with unless we absolutely have to. One thing this means is that if we're going to do it in this case, we'll probably have to call SyncClosedLogs
to make sure any earlier unsynced WALs are also synced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing this means is that if we're going to do it in this case, we'll probably have to call SyncClosedLogs to make sure any earlier unsynced WALs are also synced
Let me look into that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a SyncWAL() function that is safe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed with SyncWAL()
Summary: **Context/Summary:** See #12556 for the original problem. The [fix](#12556) encountered some design [discussion](#12556 (comment)) that might take longer than I expected. Temporarily disable reopen with unsync data loss now just to stablize our crash test since the original problem is root-caused already. Pull Request resolved: #12567 Test Plan: CI Reviewed By: ltamasi Differential Revision: D56365503 Pulled By: hx235 fbshipit-source-id: 0755e82617c065f42be4c8429e86fa289b250855
Working on a UT |
@hx235 has updated the pull request. You must reimport the pull request before landing. |
@hx235 has updated the pull request. You must reimport the pull request before landing. |
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Looking into the ASAN test failure. The other one seems flaky. |
@hx235 has updated the pull request. You must reimport the pull request before landing. |
@@ -495,9 +495,6 @@ IOStatus WritableFileWriter::SyncWithoutFlush(const IOOptions& opts, | |||
IOStatus s = SyncInternal(io_options, use_fsync); | |||
TEST_SYNC_POINT("WritableFileWriter::SyncWithoutFlush:2"); | |||
if (!s.ok()) { | |||
#ifndef NDEBUG | |||
sync_without_flush_called_ = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anand1976 Can we remove sync_without_flush_called_? If we run DBCompactionTest.ManualCompactionFailsInReadOnlyMode with this PR without this being removed, we can hit Assertion `sync_without_flush_called_' failed.
This is what happened:
mock_env->SetFilesystemActive(false);
will call set_seen_error(). Later, in DB close, we sync the WAL so will enter AssertFalseAndGetStatusForPrevError() while sync WAL has not been previously called. So it seems to me that
"// This should only happen if SyncWithoutFlush() was called." is no longer true.
I also still don't understand why we want below behavior in debug mode. Can we just "return IOStatus::IOError("Writer has previous error.");" instead of crashing?
// SyncWithoutFlush() is the function that is allowed to be called
// concurrently with other function. One of the concurrent call
// could set seen_error_, and the other one would hit assertion
// in debug mode.
@hx235 has updated the pull request. You must reimport the pull request before landing. |
@@ -626,6 +626,19 @@ Status DBImpl::CloseHelper() { | |||
job_context.Clean(); | |||
mutex_.Lock(); | |||
} | |||
if (!mutable_db_options_.avoid_flush_during_shutdown && !logs_.empty()) { | |||
mutex_.Unlock(); | |||
Status s = SyncWAL(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking for some ideas to benchmark @ajkr @ltamasi: since now DB Close() with WAL sync will be slower, how should we answer the question of "how much slower?".
I'm thinking of
(1) Give a rough estimate of per sync time in OS ... probably googling for this number? And tell people to time this number by your #WAL files.
(2) Similar to (1) but microbench SyncWAL() on 1 WAL for estimating the per sync time
(3) ...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how should we answer the question of "how much slower?"
IMO, "a lot slower" plus a reminder that the old behavior to not guarantee durability is available via avoid_flush_during_shutdown == true
.
On the topic of durability violations, now avoid_flush_during_shutdown == true
both makes WAL-enabled writes unsynced and drops WAL-disabled writes. I thought about the possible use cases (especially ones that mix WAL-enabled/disabled writes) and think the risk of bundling these violations into one option is low, but I could be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you wanted the risk to be reduced you could introduce avoid_sync_during_shutdown
. It would allow the following (hypothetical, AFAIK) use case to get the old behavior: (1) sometimes writes with WAL disabled, (2) must not lose WAL-disabled writes on clean shutdown, and (3) does not want to pay WAL sync overhead. It's kind of a confusing use case because (2) implies they are willing to pay SST sync overhead thus making (3) sound contradictory. But that is the case I thought of and the struggle to make sense out of it is why I thought the risk is low.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the risk is low.
... With respect to the bundling durability violations into one option. Regarding the whole thing about changing the default behavior, that feels more like medium.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay - now i'm more toward introducing avoid_sync_during_shutdown and set it true by default so no existing behavior change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok but we will need to reserve time to actually follow through with users to get them to migrate / eventually change the default. Otherwise it'll be confusing for years to come regarding why we built a database without durability by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good - will do so particularly with avoid_flush_during_shutdown=false users
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -626,6 +626,19 @@ Status DBImpl::CloseHelper() { | |||
job_context.Clean(); | |||
mutex_.Lock(); | |||
} | |||
if (!mutable_db_options_.avoid_flush_during_shutdown && !logs_.empty()) { | |||
mutex_.Unlock(); | |||
Status s = SyncWAL(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the risk is low.
... With respect to the bundling durability violations into one option. Regarding the whole thing about changing the default behavior, that feels more like medium.
@hx235 has updated the pull request. You must reimport the pull request before landing. |
@hx235 has updated the pull request. You must reimport the pull request before landing. |
@hx235 has updated the pull request. You must reimport the pull request before landing. |
Context/Summary:
Below crash test found out we don't sync WAL upon DB close, which can lead to unsynced data loss. This PR syncs it.
Test: