New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
util: check for errors after close and read in AutoFile #29307
base: master
Are you sure you want to change the base?
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
8b02e03
to
7c58f6e
Compare
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the Possibly this is due to a silent merge conflict (the changes in this pull request being Leave a comment here, if you need help tracking down a confusing failure. |
7c58f6e
to
5543990
Compare
I don't think any care close-check needs to be done when reading from a file. So what about removing the |
That is my understanding too.
|
The assumption would be that the close was called before the destructor is called. This is a code correctness question and seems like the perfect fit for Assume. That is, if the Assume fails, the code was missing a close, and the code must be changed to fix the internal bug. The Assume does not in any way check for IO errors. |
Then I misunderstood this:
How? |
AutoFile holds a file, which will be nullptr when it is closed. So if the file is nullptr, then it can be assumed to be flushed. |
How/when would the file be closed? |
An |
Ok, the users of the class then have to explicitly close. That is 1.4. from the OP, I elaborated it with more words. |
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the Possibly this is due to a silent merge conflict (the changes in this pull request being Leave a comment here, if you need help tracking down a confusing failure. |
Explicit callers of `AutoFile::fclose()` should check whether `fclose(3)` failed because this may be due to a failure to flush written data to disk. Such failures after writing to a file should be treated as failures to `fwrite(3)`. Github-Pull: bitcoin#29307 Rebased-From: e9abf95
`fread(3)` does not distinguish between end-of-file and error, and callers must use `feof(3)` and `ferror(3)` to determine which occurred. Github-Pull: bitcoin#29307 Rebased-From: 5543990
Seems like there is some discussion in the description that is a better fit for a brainstorming issue. |
Converted to draft due to failing CI and inactivity |
5543990
to
1fa61d9
Compare
Ready for review. I think this would be a strict improvement over what we have now in |
1fa61d9
to
b44c33a
Compare
@vasild CI seems unhappy. |
b44c33a
to
13f540d
Compare
Explicit callers of `AutoFile::fclose()` should check whether `fclose(3)` failed because this may be due to a failure to flush data to disk. Such failures should be treated as failures to `fwrite(3)`. Also terminate the program if `AutoFile::~AutoFile()` is unable to close the file because that is a file write failure which cannot be handled inside the destructor or reported to the caller.
`fread(3)` does not distinguish between end-of-file and error, and callers must use `feof(3)` and `ferror(3)` to determine which occurred.
13f540d
to
661b7d8
Compare
Fixed the CI! |
@@ -23,8 +25,9 @@ std::size_t AutoFile::detail_fread(Span<std::byte> dst) | |||
|
|||
void AutoFile::read(Span<std::byte> dst) | |||
{ | |||
if (detail_fread(dst) != dst.size()) { | |||
throw std::ios_base::failure(feof() ? "AutoFile::read: end of file" : "AutoFile::read: fread failed"); | |||
if (detail_fread(dst) != dst.size() || ferror(m_file)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
de23848: I still don't understand this. Why would fread succeed to read the requested number of bytes, but then mark the stream with an error? If the goal is to somehow detect errors after the read succeeded, I am not sure this should be a goal. Otherwise, it would be good to explain what real user-facing bug this fixes or introduces.
Either this is a refactor, in which case it seems pointless, or it is a behavior change, in which case it needs to explain why it beneficial and correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was somewhat answered in #29307 (comment) above but let me elaborate - the thing that prompted me to do this change is this text from the fread(3)
manual page:
The function fread() does not distinguish between end-of-file and error,
and callers must use feof(3) and ferror(3) to determine which occurred.
I think checking for ferror(3)
after fread(3)
is a kind of harmless "belt-and-suspenders". If you are sure that if detail_fread(dst) == dst.size()
then ferror(m_file)
will always be false
or feel strongly against this then I could drop the commit de23848 util: check for error after reading from a file
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are sure that if detail_fread(dst) == dst.size() then ferror(m_file) will always be false or feel strongly against this then I could drop the commit de23848 util: check for error after reading from a file.
I don't feel strongly in one way or another. I am just saying that with the currently available information, it is not possible to confidently review this and know it is a good change. At a minimum, it should be clear whether this is a refactor or a behavior change, and what user-facing bug this could possibly prevent or fix. (I can't think of one, but maybe I am missing something?)
661b7d8
to
de23848
Compare
@@ -213,6 +218,7 @@ bool DumpMempool(const CTxMemPool& pool, const fs::path& dump_path, FopenFn mock | |||
fs::file_size(dump_path)); | |||
} catch (const std::exception& e) { | |||
LogInfo("Failed to dump mempool: %s. Continuing anyway.\n", e.what()); | |||
(void)file.fclose(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand this change either. This seems to be mixing the third alternative in the pull request description with the approach you wanted to take?
Defeats the purpose of a RAII wrapper for FILE* which automatically closes the file when it goes out of scope and there are a lot of users of AutoFile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is best if all users of AutoFile
explicitly call the fclose()
method and check that it succeeds (option 3. from PR description). This is not feasible because there are many users of AutoFile
. However, some of those users already do call the fclose()
method explicitly. For those users it is easy and I added checks whether fclose()
succeeds. DumpMempool()
is such a user and the change is:
throw std::runtime_error("FileCommit failed");
- file.fclose();
+ if (file.fclose() != 0) {
+ throw std::runtime_error(
+ strprintf("Error closing %s: %s", fs::PathToString(file_fspath), SysErrorString(errno)));
+ }
if (!RenameOver(dump_path + ".new", dump_path)) {
For consistency in DumpMempool()
I changed all code paths in that function to explicitly call fclose()
. There are two codepaths and this added (void)file.fclose();
is the other one. I ignore the return value because this is already failure handling and the function is going to return false
. I didn't want to do something like "while handling error1, error2 occurred".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, now I see this is required, because otherwise the node would crash on this error instead of continuing with the current code, if this was missing?
Not sure this is desirable, see also https://github.com/bitcoin/bitcoin/pull/29307/files#r1612065514
🐙 This pull request conflicts with the target branch and needs rebase. |
~AutoFile() | ||
{ | ||
if (fclose() != 0 && m_was_written) { | ||
LogPrintLevel(BCLog::ALL, BCLog::Level::Error, "Error closing file: %s\n", SysErrorString(errno)); | ||
std::abort(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure about unconditionally crashing the node on any failure here. This may or may not be an issue, so the call site will have to decide about this.
For example, If an RPC user dumps the mempool to an external (corrupt) USB thumb drive, a failure shouldn't crash the node, but notify the user over the RPC.
Also, I am not sure about involving logging in this low-level code. Logging may not be set up, so this may never be printed before the crash.
Also, I am not sure about the log message. It just says that there was an error, but not whose fault it was or what should be done about it.
A better solution overall may be to just call if (m_was_written) Assume(IsNull());
. This should highlight all violating call sites during CI, because the test should be deterministic. (If there is a code path that is not deterministically tested, all hopes are lost anyway, and I don't think the solution should be to crash the users node completely and without considering the context?)
fclose(3)
may fail to flush the previously written data to disk, thus a failingfclose(3)
is as serious as a failingfwrite(3)
.Previously the code ignored
fclose(3)
failures. This PR improves the explicit callers to check whether it failed. However there is a design issue thatfclose(3)
is also called from theAutoFile
destructor. There is no good way to signal a failure to the caller from the destructor. So, if closing from the destructor fails, then log a message (the file name is unavailable at this time 😞) and terminate the program.Also check for
ferror(3)
after calls tofread(3)
.Other alternatives are:
fflush(3)
after each write to the file (and throw if it fails from theAutoFile::write()
method) and hope thatfclose(3)
will then always succeed. Assert that it succeeds from the destructor 🙄. Will hurt performance.AutoFile
so that its destructor cannot fail. Adjust all its users 😭. For example, if the file has been written to, then require the callers to explicitly call theAutoFile::fclose()
method before the object goes out of scope. In the destructor, as a sanity check, assume/assert that this is indeed the case. Defeats the purpose of a RAII wrapper forFILE*
which automatically closes the file when it goes out of scope and there are a lot of users ofAutoFile
.