Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support of "HTTP/1.1 byte range request" in file retrieval #1599

Closed
hkawaji opened this issue Sep 9, 2018 · 17 comments
Closed

Support of "HTTP/1.1 byte range request" in file retrieval #1599

hkawaji opened this issue Sep 9, 2018 · 17 comments

Comments

@hkawaji
Copy link

hkawaji commented Sep 9, 2018

I have one feature request on zenodo - can the zenodo server support HTTP/1.1 byte range request https://tools.ietf.org/html/rfc7233 ?

Zenodo platform is already incredible, and your support of the byte range request will increase the value of deposited data further since some applications have relied on byte range request, in particular when dealing large files.

I'd like to add an example on how the byte range request works, to make my point clear. For example, github (raw.githubusercontent.com) support the byte range request as below:

###
### The entire part of the README file is retrieved, and processed locally
###
$ curl  https://raw.githubusercontent.com/zenodo/zenodo/master/README.rst |head -5 | tail -1
    Zenodo is free software; you can redistribute it

###
### Only the specified bytes specified in the file is retrieved, which does not require local processing
###
$ curl -H "range: bytes=72-125"  https://raw.githubusercontent.com/zenodo/zenodo/master/README.rst 
    Zenodo is free software; you can redistribute it

However, the byte range request is ignored in zenodo.org

###
### the entire part of the file is retrieved
###
$ curl   https://zenodo.org/record/1407145/files/DOI_Test.txt
This is a test of the Zenodo DOI functionality for GitLab. 

###
### Only small bytes are requested, but the entire part is retrieved
###
$ curl -H "range: bytes=6-7"  https://zenodo.org/record/1407145/files/DOI_Test.txt
This is a test of the Zenodo DOI functionality for GitLab.
@kpalin
Copy link

kpalin commented Jan 31, 2019

I'll second this. It would be very useful e.g. for genomics datasets to be accessed directly with tabix. It seems to require a config change in the zenodo web server setting 'max_ranges' to a positive number.

Is there some technical reason not to do that?

@slint
Copy link
Member

slint commented Jan 31, 2019

Our file storage backend at the moment is not optimized to serve HTTP range requests (meaning that enabling this feature would potentially lead to significant slowdowns for the file upload/download API). Of course, there are people working on making it possible, though we can't give an accurate ETA on it...

@rabernat
Copy link

I just wanted to add my 👍 to state that enabling range requests would be very useful for geospatial data formats. Cloud Optimized GeoTIFF in particular would benefit a lot from this. Allowing range requests could really reduce the bandwidth needed from zenodo.

@zhanxw
Copy link

zhanxw commented Oct 10, 2020

Our file storage backend at the moment is not optimized to serve HTTP range requests (meaning that enabling this feature would potentially lead to significant slowdowns for the file upload/download API). Of course, there are people working on making it possible, though we can't give an accurate ETA on it...

Many people cannot download large genetic files (several GB). e.g.,
#460 (comment)

Some has to retry many times, and that's actually wasting your bandwidth...

@thengl
Copy link

thengl commented Feb 4, 2021

For our project also important that we can use Cloud-Optimized GeoTIFFs (see e.g. https://zenodo.org/record/4483227) directly from Zenodo. Figshare apparently works with COG's, zenodo does not? We wrote a tutorial for users how to get small chunks of data using COG files.

@oeway
Copy link

oeway commented Mar 25, 2021

Could you please support this?

We need it to serve large image files (in Zarr format) by chunks, that allows us visualize the files in the browser instantly. It won't be possible to for the browser to download the, e.g.10GB, file and display.

@jakirkham
Copy link

Just noting the value for the Zarr use case. Thanks all for your work on Zenodo!

@rabernat
Copy link

For Zarr, we could hypothetically get zenodo working today, without any changes. Zenodo does not support directories, but if we could map a regular zarr directory store to some sort of flat hierarchy, via a special character, we could make it work. For example, if the special character is __

.zgroup
foo__.zarray
foo__.zattrs
foo__0.0
foo__0.1

etc.

@jakirkham
Copy link

Could you please raise an issue here ( https://github.com/zarr-developers/zarr-specs/issues )?

@oeway
Copy link

oeway commented Mar 25, 2021

@rabernat I afraid that won't scale because Zenodo only allow 100 files at maximum.

Total files size limit per record is 50GB (max 100 files). One-time 100GB quota can be requested and granted on a case-by-case basis.

source: https://www.openaire.eu/technical-requirements

@slint
Copy link
Member

slint commented Nov 17, 2021

HTTP Range support is now available for file downloads!

# Fetches only the last two lines of the CSV file
curl -r -182 https://zenodo.org/record/5702574/files/articles_by_influence.csv

@slint slint closed this as completed Nov 17, 2021
@rabernat
Copy link

Wow this is huge!

@oeway
Copy link

oeway commented Nov 17, 2021

This is awesome! However, with the recent update, CORS is disabled ;( Submitted an issue here: #2246

@iprapas
Copy link

iprapas commented Oct 25, 2022

Just found this, and is quite awesome that zenodo supports this. What is the recommended way to upload a zarr store in zenodo, assuming the zarr store can have a lot of subfiles?

@rabernat
Copy link

It's probably best to use a zipped store, which is discoverable and accessible via range requests.

@iprapas
Copy link

iprapas commented Oct 25, 2022

Thank you Ryan! I wonder if/how I could then open the zipped store using xarray. It looks like the zarr zipstore only accepts files from the local filesystem https://zarr.readthedocs.io/en/stable/api/storage.html#zarr.storage.ZipStore

@rabernat
Copy link

Yeah I think you have to go through fsspec. Here's an example: pangeo-forge/staged-recipes#90 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants