Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Issue 24165 - Failed readf leaves File in inconsistent state #8826

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pbackus
Copy link
Contributor

@pbackus pbackus commented Oct 14, 2023

Previously, a failed call to readf resulted in multiple copies of the
same LockingTextReader being destroyed. Since LockingTextReader's
destructor calls ungetc on the last-read character, this caused that
character to appear multiple times in subsequent reads from the File.

This change ensures that the destructor in question is only run once by
making LockingTextReader a reference-counted type.

Ideally, to avoid unnecessary overhead, this issue would have been fixed
by making LockingTextReader non-copyable. However, non-copyable ranges
are poorly-supported by Phobos, and this approach would have required
extensive changes to several other modules, including changes to the
interfaces of some public symbols.

@dlang-bot
Copy link
Contributor

Thanks for your pull request and interest in making D better, @pbackus! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Auto-close Bugzilla Severity Description
24165 normal Failed readf leaves File in inconsistent state

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + phobos#8826"

@schveiguy
Copy link
Member

schveiguy commented Oct 15, 2023

Given that LockingTextReader is used mainly for performance, this is going to remove some of the benefits of it, for something that happens very rarely.

Don't we just need to ensure that only one legit instance is valid at a time? What about adding a copy constructor that hollows out the original?

Previously, a failed call to readf resulted in multiple copies of the
same LockingTextReader being destroyed. Since LockingTextReader's
destructor calls ungetc on the last-read character, this caused that
character to appear multiple times in subsequent reads from the File.

This change ensures that the destructor in question is only run once by
making LockingTextReader a reference-counted type.

Ideally, to avoid unnecessary overhead, this issue would have been fixed
by making LockingTextReader non-copyable. However, non-copyable ranges
are poorly-supported by Phobos, and this approach would have required
extensive changes to several other modules, including changes to the
interfaces of some public symbols.
@pbackus
Copy link
Contributor Author

pbackus commented Oct 15, 2023

Whoops, my bad, this affects LockingTextReader, not Writer.

@pbackus
Copy link
Contributor Author

pbackus commented Oct 15, 2023

What about adding a copy constructor that hollows out the original?

Seems like this could break code that currently works (e.g., anything that makes a copy and then calls .front on the original). Maybe we can get away with it in practice though.

@schveiguy
Copy link
Member

The question is, what is intended to be supported? Typically with input-only ranges (not forward ranges), only one instance should be used. Not preventing copying on input ranges was a design mistake, but we have to live with it now.

even if you don't hollow it out, what about removing the character to put back so it's not put back twice? That also seems like a kludge as well.

@pbackus
Copy link
Contributor Author

pbackus commented Oct 15, 2023

even if you don't hollow it out, what about removing the character to put back so it's not put back twice?

Isn't that functionally the same as hollowing it out?

I guess I could add a bool flag that gets set to false on copy, and only call ungetc if the flag is true. I'll give it a try locally and amend the PR if it works. EDIT: actually it'd have to be true in the copy and false in the original.

@pbackus
Copy link
Contributor Author

pbackus commented Oct 15, 2023

Actually never mind, LockingTextReader is public, which means we absolutely cannot assume that code using it follows input range best practices and only ever uses one instance at a time. If it were internal to Phobos we could maybe get away with a kludge, but for a public API, we have to code defensively.

@pbackus
Copy link
Contributor Author

pbackus commented Oct 15, 2023

Attempted the hollow-out approach anyway and it turns out that replacing LockingTextReader's postblit with a copy ctor is not going to be an option—it requires doing the same to File, which in turn breaks all kinds of stuff.

@schveiguy
Copy link
Member

Another idea -- since File is a refcounted type anyway, can we stuff the char in there? Still doesn't help the performance, but at least we don't have the overhead of allocating another refcounted impl.

@schveiguy
Copy link
Member

Another idea -- upon creation of the LockingTextReader, store a bool in the File impl that indicates whether the ungtc has been called once. Then on destruction, if the hasChar is true, only do ungetc if that hasn't been done yet.

@Imperatorn
Copy link
Contributor

Another idea -- upon creation of the LockingTextReader, store a bool in the File impl that indicates whether the ungtc has been called once. Then on destruction, if the hasChar is true, only do ungetc if that hasn't been done yet.

Couldn't something like this work @pbackus ?

@pbackus
Copy link
Contributor Author

pbackus commented Nov 20, 2023

The problem with that suggestion is that it calls ungetc on the destruction of the first instance, not the last instance. Which means that if you copy the reader while hasChar == true, destroy the copy, and then continue reading from the original, the character will be duplicated.

If that sounds unrealistic, keep in mind that passing by value creates and destroys a copy. For example, a function like the following would trigger the failure mode described above:

// Note: takes range by value
void enforceNonEmpty(Range)(Range range)
if (isInputRange!Range)
{
    enforce(!r.empty, "Expected non-empty range");
}

Now, maybe our official position is that such code is not supported—if you copy a range, and then read from the original, you get what you get and there are no guarantees, even if you never read from the copy. But that seems like unnecessarily hostile API design to me.

@schveiguy
Copy link
Member

The problem with that suggestion is that it calls ungetc on the destruction of the first instance, not the last instance.

We can have 3 states:

  1. we have no char in the buffer
  2. we have a char in our buffer, and the next char has not been fetched before
  3. we have a char in our buffer, and the stream has had a char put back into it from a copy of LockingTextReader.

We already have a hasChar in the LockingFileReader. That covers whether there is a char in the buffer or not.

We need a second boolean, which can be stored in the main File refcounted struct, which is set to true when ungetc is called. Then we can cover this situation fine.

What I'm trying to avoid is yet another allocation for something that could easily be short-lived. We already have the pointer stored in the private File _f, and we have to go through that anyway to access the FILE *. So there is no "extra cost" by storing another state boolean in that Impl struct.

You can even encapsulate this functionality in 2 new private functions inside the File: ungetcIfNotDone(char) and getcAlreadyHadChar() or something like that.

I think this works, but I haven't tried it out or thought of all the details. The main goal should be to avoid putting back data that was never on the stream in the first place. Even if this is at the expense of "just don't do that" type warnings, fine. The secondary goal is to try and avoid reading data that isn't there. I think we can do both, without resorting to an extra indirection and an extra allocation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants