Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop sending network requests in seek operation for fetching chunked blobs #126

Open
qweeah opened this issue Apr 2, 2022 · 6 comments
Open
Labels
enhancement New feature or request perf Performance related issues
Milestone

Comments

@qweeah
Copy link
Contributor

qweeah commented Apr 2, 2022

If a oras-go user want to get a blob chunk from a remote target, he/she can

  1. fetch the blob io reader via a resolved descriptor
  2. seek the io reader to a target offset
  3. read out the chunked content

In the currently implementation, while calling the seek operation, a HTTP request will be sent to the remote registry. So if a user seek multiple time before reading out a single chunk, more than one network request will be sent out.

We should change the implementation to: maintain a position status while doing seek operation and only send out the ranged network request while reading out the chunked content.

Actually this is the same design used by common file systems: when reading out one chunk in a large file, fseek will not cause any IO operation, only the fread will.

@qweeah qweeah changed the title Stop sending out seek operation for fetching chunked blobs Stop sending network requests in seek operation for fetching chunked blobs Apr 2, 2022
@shizhMSFT shizhMSFT added this to the future milestone May 16, 2022
@sparr
Copy link
Contributor

sparr commented Jun 20, 2023

What is in that HTTP request? Can you link to the relevant parts of the code?

@qweeah
Copy link
Contributor Author

qweeah commented Jun 26, 2023

What is in that HTTP request? Can you link to the relevant parts of the code?

Thanks for reminding. This is talking about the implementation of seekable reader utility in

func (rsc *readSeekCloser) Seek(offset int64, whence int) (int64, error) {

Which is used in below code to let blob store fetch operation to support resumable pulling

// check server range request capability.
// Docker spec allows range header form of "Range: bytes=<start>-<end>".
// However, the remote server may still not RFC 7233 compliant.
// Reference: https://docs.docker.com/registry/spec/api/#blob
if rangeUnit := resp.Header.Get("Accept-Ranges"); rangeUnit == "bytes" {
return httputil.NewReadSeekCloser(s.repo.client(), req, resp.Body, target.Size), nil
}

@dtroyer-salad
Copy link

Is this still on anyone's radar? I saw the behaviour as I'm working on resumable downloads and just removed it.

@shizhMSFT shizhMSFT added enhancement New feature or request perf Performance related issues labels Mar 19, 2024
@Wwwsylvia
Copy link
Member

I saw the behaviour as I'm working on resumable downloads and just removed it.

Hi @dtroyer-salad , we have not put efforts on this perf enhancement because we were not aware of a real use case. Could you share your scenarios of resumable downloads?

@dtroyer-salad
Copy link

Pulling large images over networks that we do not control has led to a number of occasions where downloads get 'stuck' on the a large layer, restarting the entire layer from scratch repeatedly. As a specific example, a ~20GB image had a single ~18GB layer that never completed downloading due to the restarts.

My current implementation has dropped using readSeekCloser altogether in the resume codepath in favor of just setting the Range header for the remaining blob bits not already on disk.

@ktarplee
Copy link
Contributor

ktarplee commented Apr 15, 2024

Another (partial) use case for io.ReadSeekCloser is in a copy @Wwwsylvia . Oras-go currently does this...

oras-go/copy.go

Lines 347 to 359 in 11d464f

// doCopyNode copies a single content from the source CAS to the destination CAS.
func doCopyNode(ctx context.Context, src content.ReadOnlyStorage, dst content.Storage, desc ocispec.Descriptor) error {
rc, err := src.Fetch(ctx, desc)
if err != nil {
return err
}
defer rc.Close()
err = dst.Push(ctx, desc, rc)
if err != nil && !errors.Is(err, errdef.ErrAlreadyExists) {
return err
}
return nil
}

rc is a io.ReadSeekCloser if the server supports Range requests.

If the dst.Push() for a remote repository fails due to a network issue it is never retried (even when using the retry client). It could be retried by seeking back to the start and retrying the HTTP request. In other words we could set req.GetBody to seek to 0 and then the retry logic would work.

This same resiliency feature applies to #338

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request perf Performance related issues
Projects
No open projects
Status: No status
Development

No branches or pull requests

6 participants