util: check for errors after close and read in AutoFile #29307

vasild · 2024-01-24T14:35:29Z

fclose(3) may fail to flush the previously written data to disk, thus a failing fclose(3) is as serious as a failing fwrite(3).

Previously the code ignored fclose(3) failures. This PR improves the explicit callers to check whether it failed. However there is a design issue that fclose(3) is also called from the AutoFile destructor. There is no good way to signal a failure to the caller from the destructor. So, if closing from the destructor fails, then log a message (the file name is unavailable at this time 😞) and terminate the program.

Also check for ferror(3) after calls to fread(3).

Other alternatives are:

fflush(3) after each write to the file (and throw if it fails from the AutoFile::write() method) and hope that fclose(3) will then always succeed. Assert that it succeeds from the destructor 🙄. Will hurt performance.
Throw nevertheless from the destructor. Exception within the exception in C++ I think results in terminating the program without a useful message.
Redesign AutoFile so that its destructor cannot fail. Adjust all its users 😭. For example, if the file has been written to, then require the callers to explicitly call the AutoFile::fclose() method before the object goes out of scope. In the destructor, as a sanity check, assume/assert that this is indeed the case. Defeats the purpose of a RAII wrapper for FILE* which automatically closes the file when it goes out of scope and there are a lot of users of AutoFile.

DrahtBot · 2024-01-24T14:35:33Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#29612 (rpc: Optimize serialization and enhance metadata of dumptxoutset output by fjahr)
#27432 (contrib: add tool to convert compact-serialized UTXO set to SQLite database by theStack)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot · 2024-01-24T14:58:28Z

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

_{Debug: https://github.com/bitcoin/bitcoin/runs/20819752683}

src/streams.cpp

src/test/fuzz/autofile.cpp

maflcko · 2024-01-29T11:32:54Z

I don't think any care close-check needs to be done when reading from a file.

So what about removing the write method from AutoFile, and introduce a new derived class to add it back. This class could Assume that the file was flushed/closed before the destructor is called?

vasild · 2024-02-04T10:48:36Z

I don't think any care close-check needs to be done when reading from a file.

That is my understanding too.

So what about removing the write method from AutoFile, and introduce a new derived class to add it back. This class could Assume that the file was flushed/closed before the destructor is called?

Assume is more for code correctness, not for external errors (like IO error). If it does not fail during testing and on CI, that does not mean IO errors are absent and will not occur on the users' machines. Given that Assume() is removed from production builds, it still means that file corruption could remain unnoticed on live environments, like now on master.

maflcko · 2024-02-05T09:45:05Z

This class could Assume that the file was flushed/closed before the destructor is called?

Assume is more for code correctness, not for external errors (like IO error).

The assumption would be that the close was called before the destructor is called. This is a code correctness question and seems like the perfect fit for Assume. That is, if the Assume fails, the code was missing a close, and the code must be changed to fix the internal bug. The Assume does not in any way check for IO errors.

vasild · 2024-02-05T15:17:17Z

Then I misunderstood this:

This class could Assume that the file was flushed/closed before the destructor is called

How?

maflcko · 2024-02-05T15:34:33Z

AutoFile holds a file, which will be nullptr when it is closed. So if the file is nullptr, then it can be assumed to be flushed.

vasild · 2024-02-05T16:40:43Z

How/when would the file be closed?

maflcko · 2024-02-06T08:00:33Z

How/when would the file be closed?

An AutoFile can be closed by calling the fclose() method in the code.

vasild · 2024-02-06T09:45:13Z

Ok, the users of the class then have to explicitly close. That is 1.4. from the OP, I elaborated it with more words.

DrahtBot · 2024-03-12T15:42:43Z

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

_{Debug: https://github.com/bitcoin/bitcoin/runs/20850328376}

Explicit callers of `AutoFile::fclose()` should check whether `fclose(3)` failed because this may be due to a failure to flush written data to disk. Such failures after writing to a file should be treated as failures to `fwrite(3)`. Github-Pull: bitcoin#29307 Rebased-From: e9abf95

`fread(3)` does not distinguish between end-of-file and error, and callers must use `feof(3)` and `ferror(3)` to determine which occurred. Github-Pull: bitcoin#29307 Rebased-From: 5543990

achow101 · 2024-04-09T15:43:01Z

Seems like there is some discussion in the description that is a better fit for a brainstorming issue.

DrahtBot · 2024-04-23T06:50:32Z

Converted to draft due to failing CI and inactivity

vasild · 2024-05-10T10:28:50Z

5543990321...1fa61d9edc: rebase and log a message from ~AutoFile() if closing fails and terminate the program.

Ready for review. I think this would be a strict improvement over what we have now in master - a silent ignoring of a possible file corruption.

Sjors · 2024-05-10T16:59:06Z

@vasild CI seems unhappy.

Explicit callers of `AutoFile::fclose()` should check whether `fclose(3)` failed because this may be due to a failure to flush data to disk. Such failures should be treated as failures to `fwrite(3)`. Also terminate the program if `AutoFile::~AutoFile()` is unable to close the file because that is a file write failure which cannot be handled inside the destructor or reported to the caller.

`fread(3)` does not distinguish between end-of-file and error, and callers must use `feof(3)` and `ferror(3)` to determine which occurred.

vasild · 2024-05-14T09:51:45Z

Fixed the CI!

maflcko · 2024-05-14T10:57:05Z

src/streams.cpp

@@ -23,8 +25,9 @@ std::size_t AutoFile::detail_fread(Span<std::byte> dst)

 void AutoFile::read(Span<std::byte> dst)
 {
-    if (detail_fread(dst) != dst.size()) {
-        throw std::ios_base::failure(feof() ? "AutoFile::read: end of file" : "AutoFile::read: fread failed");
+    if (detail_fread(dst) != dst.size() || ferror(m_file)) {


de23848: I still don't understand this. Why would fread succeed to read the requested number of bytes, but then mark the stream with an error? If the goal is to somehow detect errors after the read succeeded, I am not sure this should be a goal. Otherwise, it would be good to explain what real user-facing bug this fixes or introduces.

Either this is a refactor, in which case it seems pointless, or it is a behavior change, in which case it needs to explain why it beneficial and correct.

This was somewhat answered in #29307 (comment) above but let me elaborate - the thing that prompted me to do this change is this text from the fread(3) manual page:

The function fread() does not distinguish between end-of-file and error,
and callers must use feof(3) and ferror(3) to determine which occurred.

I think checking for ferror(3) after fread(3) is a kind of harmless "belt-and-suspenders". If you are sure that if detail_fread(dst) == dst.size() then ferror(m_file) will always be false or feel strongly against this then I could drop the commit de23848 util: check for error after reading from a file.

If you are sure that if detail_fread(dst) == dst.size() then ferror(m_file) will always be false or feel strongly against this then I could drop the commit de23848 util: check for error after reading from a file.

I don't feel strongly in one way or another. I am just saying that with the currently available information, it is not possible to confidently review this and know it is a good change. At a minimum, it should be clear whether this is a refactor or a behavior change, and what user-facing bug this could possibly prevent or fix. (I can't think of one, but maybe I am missing something?)

maflcko · 2024-05-15T12:54:07Z

src/kernel/mempool_persist.cpp

@@ -213,6 +218,7 @@ bool DumpMempool(const CTxMemPool& pool, const fs::path& dump_path, FopenFn mock
                  fs::file_size(dump_path));
    } catch (const std::exception& e) {
        LogInfo("Failed to dump mempool: %s. Continuing anyway.\n", e.what());
+        (void)file.fclose();


Not sure I understand this change either. This seems to be mixing the third alternative in the pull request description with the approach you wanted to take?

Defeats the purpose of a RAII wrapper for FILE* which automatically closes the file when it goes out of scope and there are a lot of users of AutoFile.

It is best if all users of AutoFile explicitly call the fclose() method and check that it succeeds (option 3. from PR description). This is not feasible because there are many users of AutoFile. However, some of those users already do call the fclose() method explicitly. For those users it is easy and I added checks whether fclose() succeeds. DumpMempool() is such a user and the change is:

throw std::runtime_error("FileCommit failed"); - file.fclose(); + if (file.fclose() != 0) { + throw std::runtime_error( + strprintf("Error closing %s: %s", fs::PathToString(file_fspath), SysErrorString(errno))); + } if (!RenameOver(dump_path + ".new", dump_path)) {

For consistency in DumpMempool() I changed all code paths in that function to explicitly call fclose(). There are two codepaths and this added (void)file.fclose(); is the other one. I ignore the return value because this is already failure handling and the function is going to return false. I didn't want to do something like "while handling error1, error2 occurred".

Oh, now I see this is required, because otherwise the node would crash on this error instead of continuing with the current code, if this was missing?

Not sure this is desirable, see also https://github.com/bitcoin/bitcoin/pull/29307/files#r1612065514

DrahtBot · 2024-05-23T16:47:30Z

🐙 This pull request conflicts with the target branch and needs rebase.

maflcko · 2024-05-23T17:22:44Z

src/streams.h

+    ~AutoFile()
+    {
+        if (fclose() != 0 && m_was_written) {
+            LogPrintLevel(BCLog::ALL, BCLog::Level::Error, "Error closing file: %s\n", SysErrorString(errno));
+            std::abort();


not sure about unconditionally crashing the node on any failure here. This may or may not be an issue, so the call site will have to decide about this.

For example, If an RPC user dumps the mempool to an external (corrupt) USB thumb drive, a failure shouldn't crash the node, but notify the user over the RPC.

Also, I am not sure about involving logging in this low-level code. Logging may not be set up, so this may never be printed before the crash.

Also, I am not sure about the log message. It just says that there was an error, but not whose fault it was or what should be done about it.

A better solution overall may be to just call if (m_was_written) Assume(IsNull());. This should highlight all violating call sites during CI, because the test should be deterministic. (If there is a code path that is not deterministically tested, all hopes are lost anyway, and I don't think the solution should be to crash the users node completely and without considering the context?)

DrahtBot added the Utils/log/libs label Jan 24, 2024

vasild mentioned this pull request Jan 24, 2024

Make (Read/Write)BinaryFile work with char vector, use AutoFile #29229

Closed

vasild force-pushed the AutoFile_error_check branch from 8b02e03 to 7c58f6e Compare January 24, 2024 14:58

DrahtBot added the CI failed label Jan 24, 2024

maflcko reviewed Jan 24, 2024

View reviewed changes

src/streams.cpp Outdated Show resolved Hide resolved

vasild commented Jan 24, 2024

View reviewed changes

src/test/fuzz/autofile.cpp Outdated Show resolved Hide resolved

This was referenced Jan 24, 2024

PoC: fuzz chainstate and block managers #29158

Draft

rpc: Optimize serialization disk space of dumptxoutset #26045

Closed

vasild force-pushed the AutoFile_error_check branch from 7c58f6e to 5543990 Compare January 25, 2024 08:30

DrahtBot removed the CI failed label Jan 25, 2024

Sjors mentioned this pull request Jan 25, 2024

CKey: add Serialize and Unserialize #29295

Draft

DrahtBot mentioned this pull request Mar 10, 2024

rpc: Optimize serialization and enhance metadata of dumptxoutset output #29612

Merged

vasild mentioned this pull request Mar 11, 2024

BufferedFile: fclose at destruction #29614

Open

DrahtBot added the CI failed label Mar 12, 2024

DrahtBot mentioned this pull request Apr 1, 2024

contrib: add tool to convert compact-serialized UTXO set to SQLite database #27432

Open

DrahtBot marked this pull request as draft April 23, 2024 06:50

vasild force-pushed the AutoFile_error_check branch from 5543990 to 1fa61d9 Compare May 10, 2024 10:27

vasild marked this pull request as ready for review May 10, 2024 10:28

vasild force-pushed the AutoFile_error_check branch from 1fa61d9 to b44c33a Compare May 10, 2024 10:58

vasild force-pushed the AutoFile_error_check branch from b44c33a to 13f540d Compare May 13, 2024 12:24

vasild marked this pull request as draft May 13, 2024 12:25

vasild added 2 commits May 13, 2024 18:19

util: check for error after reading from a file

de23848

`fread(3)` does not distinguish between end-of-file and error, and callers must use `feof(3)` and `ferror(3)` to determine which occurred.

vasild force-pushed the AutoFile_error_check branch from 13f540d to 661b7d8 Compare May 13, 2024 16:27

DrahtBot removed the CI failed label May 13, 2024

vasild marked this pull request as ready for review May 14, 2024 09:50

maflcko reviewed May 14, 2024

View reviewed changes

vasild force-pushed the AutoFile_error_check branch from 661b7d8 to de23848 Compare May 14, 2024 12:32

maflcko reviewed May 15, 2024

View reviewed changes

DrahtBot added the Needs rebase label May 23, 2024

maflcko reviewed May 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

util: check for errors after close and read in AutoFile #29307

util: check for errors after close and read in AutoFile #29307

vasild commented Jan 24, 2024 •

edited

DrahtBot commented Jan 24, 2024 •

edited

DrahtBot commented Jan 24, 2024

maflcko commented Jan 29, 2024

vasild commented Feb 4, 2024

maflcko commented Feb 5, 2024

vasild commented Feb 5, 2024

maflcko commented Feb 5, 2024

vasild commented Feb 5, 2024

maflcko commented Feb 6, 2024

vasild commented Feb 6, 2024

DrahtBot commented Mar 12, 2024

achow101 commented Apr 9, 2024

DrahtBot commented Apr 23, 2024

vasild commented May 10, 2024 •

edited

Sjors commented May 10, 2024

vasild commented May 14, 2024

maflcko May 14, 2024

vasild May 14, 2024

maflcko May 23, 2024

maflcko May 15, 2024

vasild May 23, 2024

maflcko May 23, 2024

DrahtBot commented May 23, 2024

maflcko May 23, 2024

util: check for errors after close and read in AutoFile #29307

Are you sure you want to change the base?

util: check for errors after close and read in AutoFile #29307

Conversation

vasild commented Jan 24, 2024 • edited

DrahtBot commented Jan 24, 2024 • edited

Code Coverage

Reviews

Conflicts

DrahtBot commented Jan 24, 2024

maflcko commented Jan 29, 2024

vasild commented Feb 4, 2024

maflcko commented Feb 5, 2024

vasild commented Feb 5, 2024

maflcko commented Feb 5, 2024

vasild commented Feb 5, 2024

maflcko commented Feb 6, 2024

vasild commented Feb 6, 2024

DrahtBot commented Mar 12, 2024

achow101 commented Apr 9, 2024

DrahtBot commented Apr 23, 2024

vasild commented May 10, 2024 • edited

Sjors commented May 10, 2024

vasild commented May 14, 2024

maflcko May 14, 2024

Choose a reason for hiding this comment

vasild May 14, 2024

Choose a reason for hiding this comment

maflcko May 23, 2024

Choose a reason for hiding this comment

maflcko May 15, 2024

Choose a reason for hiding this comment

vasild May 23, 2024

Choose a reason for hiding this comment

maflcko May 23, 2024

Choose a reason for hiding this comment

DrahtBot commented May 23, 2024

maflcko May 23, 2024

Choose a reason for hiding this comment

vasild commented Jan 24, 2024 •

edited

DrahtBot commented Jan 24, 2024 •

edited

vasild commented May 10, 2024 •

edited