Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: io: CopierTo and CopierFrom interfaces #67074

Open
neild opened this issue Apr 26, 2024 · 7 comments
Open

proposal: io: CopierTo and CopierFrom interfaces #67074

neild opened this issue Apr 26, 2024 · 7 comments
Labels
Milestone

Comments

@neild
Copy link
Contributor

neild commented Apr 26, 2024

Proposal Details

The io.Copy and io.CopyBuffer functions will use WriteTo when the source supports it and ReadFrom when the destination supports it.

There are cases when a type can efficiently support WriteTo or ReadFrom, but only some of the time. For example, CL 472475 added an WriteTo method to *os.File which performs a more efficient copy on Linux systems when the destination is a Unix or TCP socket. In all other cases, it falls back to io.Copy.

The io.Copy fallback means that io.CopyBuffer with an *os.File source will no longer use the provided buffer, because the buffer is not plumbed through to WriteTo. (It also resulted in https://go.dev/issue/66988, in which the fallback resulted in the fast path in *net.TCPConn.ReadFrom not being taken.)

There have been previous proposals to fix this problem:

Those issues also contain some links to various other instances of this problem turning up, but I think the *os.File.WriteTo case is a sufficient example of a problem that needs solving, since io.CopyBuffer no longer works as expected on files.

I propose that we add two new interfaces to the io package:

package io

// CopierTo is the interface that wraps the CopyTo method.
//
// CopyTo writes data to w until there is no more data to write or when an error occurs.
// The return value n is the number of bytes written.
// Any error encountered during the write is also returned.
//
// When len(buf) > 0, CopyTo may use the provided buffer as temporary space.
//
// CopyTo may return an error wrapping errors.ErrUnsupported to indicate that
// it is unable to efficiently copy to w.
//
// The Copy function uses CopierTo if available.
type CopierTo interface {
	CopyTo(w Writer, buf []byte) (n int64, err error)
}

// CopierFrom is the interface that wraps the CopyFrom method.
//
// CopyFrom reads data from r until EOF or error.
// The return value n is the number of bytes read.
// Any error except EOF encountered during the read is also returned.
//
// When len(buf) > 0, CopyFrom may use the provided buffer as temporary space.
//
// CopyFrom may return an error wrapping errors.ErrUnsupported to indicate that
// it is unable to efficiently copy from r.
//
// The Copy function uses CopierFrom if available.
type CopierFrom interface {
	CopyFrom(r Reader, buf []byte) (n int64, err error)
}

The CopierTo and CopierFrom interfaces supersede WriterTo and ReaderFrom when available. They provide a way to plumb the copy buffer down through the copy operation, and permit an implementation to implement a fast path for some subset of sources or destinations while deferring to io.Copy for other cases.

We will update io.Copy and io.CopyBuffer to prefer CopierTo and CopierFrom when available.

We will update *os.File and *net.TCPConn (and possibly other types) to add CopyTo and CopyFrom methods.

@jfrech
Copy link

jfrech commented Apr 26, 2024

Is CopyTo guaranteed to have had no observable effect when it returns an errors.Is(,errors.ErrUnsupported) error?

@neild
Copy link
Contributor Author

neild commented Apr 26, 2024

Is CopyTo guaranteed to have had no observable effect when it returns an errors.Is(,errors.ErrUnsupported) error?

Yes.

@jfrech
Copy link

jfrech commented Apr 28, 2024

Maybe I'm missing the point but what precisely is the issue with solving this how you opt out of any unwanted type unerasure by forcing a barrier struct (as already mentioned in #16474)? If you are using io.CopyBuffer, chances are you specially prepared for the buffer, so we are talking about a deliberate, pin-pointed optimization, not implicit system-wide optimizations which io.WriterTo do help with.

io.CopyBuffer is not alone in exhibiting behaviour not expressible in the type system which one needs to be wary about: compress/zlib.NewReader may discard data if the reader cannot unerase to an exact one and bufio.(*Reader).WriteTo can even call ReadFrom (although undocumented). If you want to prohibit a function from utilizing its fast paths, hide your own capabilities.

@neild
Copy link
Contributor Author

neild commented Apr 29, 2024

The motivation for this proposal is to fix io.Copy when used with common standard library types such as *os.File and *net.TCPConn.

It's perhaps not immediately obvious that io.Copy is broken. Let's look at a concrete example from the net/http package (https://go.googlesource.com/go/+/refs/tags/go1.22.2/src/net/http/transfer.go#412):

func (t *transferWriter) doBodyCopy(dst io.Writer, src io.Reader) (n int64, err error) {
	buf := getCopyBuf()
	defer putCopyBuf(buf)
	n, err = io.CopyBuffer(dst, src, buf)
	if err != nil && err != io.EOF {
		t.bodyReadError = err
	}
	return
}

This function copies from a user-provided io.Reader to an io.Writer. It doesn't know anything about the source and destination, although the Writer has usually been created by the net/http package. It uses io.CopyBuffer and does buffer pooling to minimize allocations. The copy operation takes advantage of sendfile when available.

Let's consider the case where we're on macOS, copying from an *os.File created by os.Pipe, and to a *net.TCPConn.

In this case:

  • doBodyCopy allocates a copy buffer from its pool.
  • io.CopyBuffer notices that the source supports WriteTo and calls it.
  • *os.File.WriteTo doesn't have any special handling on macOS, so it wraps the source in a type that removes the WriteTo method and calls io.Copy.
  • io.Copy notices that the destination supports ReadFrom and calls it.
  • *net.TCPConn.ReadFrom calls sendfile, which fails, because macOS doesn't support sendfile from a pipe.
  • *net.TCPConn.ReadFrom wraps the destination in a type that removes the ReadFrom method and calls io.Copy.f
  • io.Copy allocates a buffer and does the copy.

Note that there are three io.Copy operations in the call stack. We lost the user-allocated buffer provided to CopyBuffer after the first one. It is entirely not obvious to the original caller (doBodyCopy) that the buffer provided to CopyBuffer won't be used; doBodyCopy was, after all, operating on a plain io.Reader and io.Writer.

We could change doBodyCopy to mask any ReadFrom or WriteTo methods on the reader and writer, at the cost of disabling the sendfile optimization when we do want it. Requiring every caller of io.CopyBuffer everywhere to do this would be unfortunate.

This is a mess.

At a high level, I have two goals. When using common types from the standard library (files, network sockets):

  • io.Copy (and related functions) should not recursively call itself; and
  • io.CopyBuffer should use the buffer provided to it.

Arguably, the recursive calls to Copy aren't necessarily a problem, but they add so much confusion to the call stack that it has become very difficult to understand what's actually going on in a copy operation. The CopyBuffer buffer getting lost is, I think, at this point a straight up bug: As of Go 1.22, CopyBuffer with an *os.File as the source will never use the provided buffer. The fact that this changed between Go 1.21 and 1.22 seems to me to be a plain regression.

The Go compatibility promise closes off most avenues for fixing this:

  • We document io.Copy as calling ReadFrom/WriteTo when available.
  • Changing io.CopyBuffer to not call ReadFrom/WriteTo seems likely to cause regressions in code that expects the current behavior.
  • We can't remove the WriteTo method from *os.File.
  • We can't remove the ReadFrom method from *net.TCPConn.
  • We can't start returning errors.ErrUnsupported from existing ReadFrom or WriteTo methods.

(Of these options, the last one seems like the most feasible: If could return ErrUnsupported from *os.File.WriteTo and *net.TCPConn.ReadFrom when the fast sendfile path is not available, then we can get by without any new APIs. That was #21904, which was rejected. Perhaps it should be reopened?)

The root of the problem is that WriteTo and ReadFrom aren't the right interface between io.Copy and an io.Reader/io.Writer:

  • Support for fast-path copies may be conditional on the pairing of (source, destination), and the WriteTo/ReadFrom are necessarily associated with only one or the other. There needs to be a way for a type to support fast copies, with a fallback to the usual copy path.
  • If a type needs a buffer to support its copy, it has WriteTo/ReadFrom have no access to the buffer provided to io.CopyBuffer.

The proposed CopierTo/CopierFrom aim to fix io.Copy on os and net package types by providing an interface that does include the necessary features.

Adding yet more magic methods to dig us out of the hole caused by the existing magic methods does seem somewhat odd. It's quite possible that if we were designing io.Copy from the start, there would be a better way to do this. But I don't see another way out of this given the current constraints, I believe this proposal does address the problem, and I think that the addition of *os.File.WriteTo in Go 1.22 has put us at a point where surprising io.Copy behavior is common enough that we need to do something to address the situation.

@ianlancetaylor
Copy link
Contributor

We could constrain this by using an internal package, and by looking for methods that return types defined in that internal package. That would mean that only types in the standard library would implement the new interfaces. On the one hand that would prevent other packages from taking advantage of these new methods. On the other hand it would constrain the additional complexity.

For example

package iofd

type FD uintptr

type SysFDer interface {
    SysFD() FD
    CopyBetween(dst, src FD, buf []byte) (int64, bool, error)
}

Then in package io

    srcfd, srcOK := src.(iofd.SysFDer)
    dstfd, dstOK := dst.(iofd.SysFDer)
    if srcOK && dstOK {
        len, handled, err := srcfd.CopyBetween(dstfd, srcfd, buf)
        if handled {
            return len, err
        }
    }

@neild
Copy link
Contributor Author

neild commented Apr 30, 2024

We could constrain this to be internal-only, but I don't think it's worth it. We'd still have the complexity of the new methods, we'd just not be allowing anyone else to take advantage of the benefits.

@ianlancetaylor
Copy link
Contributor

I think the complexity of the new methods is significantly less if nobody else can implement them. In particular we can design the new methods so that they work on the pair of source and destination, which is what we need. One of the problems with the current methods is that they only work on the source or on the destination, which is how we've gotten into the position of needing to use both, and therefore needing to turn off both, and therefore increasing complexity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

4 participants