Support of "HTTP/1.1 byte range request" in file retrieval #1599

hkawaji · 2018-09-09T02:35:52Z

I have one feature request on zenodo - can the zenodo server support HTTP/1.1 byte range request https://tools.ietf.org/html/rfc7233 ?

Zenodo platform is already incredible, and your support of the byte range request will increase the value of deposited data further since some applications have relied on byte range request, in particular when dealing large files.

I'd like to add an example on how the byte range request works, to make my point clear. For example, github (raw.githubusercontent.com) support the byte range request as below:

###
### The entire part of the README file is retrieved, and processed locally
###
$ curl  https://raw.githubusercontent.com/zenodo/zenodo/master/README.rst |head -5 | tail -1
    Zenodo is free software; you can redistribute it

###
### Only the specified bytes specified in the file is retrieved, which does not require local processing
###
$ curl -H "range: bytes=72-125"  https://raw.githubusercontent.com/zenodo/zenodo/master/README.rst 
    Zenodo is free software; you can redistribute it

However, the byte range request is ignored in zenodo.org

###
### the entire part of the file is retrieved
###
$ curl   https://zenodo.org/record/1407145/files/DOI_Test.txt
This is a test of the Zenodo DOI functionality for GitLab. 

###
### Only small bytes are requested, but the entire part is retrieved
###
$ curl -H "range: bytes=6-7"  https://zenodo.org/record/1407145/files/DOI_Test.txt
This is a test of the Zenodo DOI functionality for GitLab.

The text was updated successfully, but these errors were encountered:

kpalin · 2019-01-31T08:40:29Z

I'll second this. It would be very useful e.g. for genomics datasets to be accessed directly with tabix. It seems to require a config change in the zenodo web server setting 'max_ranges' to a positive number.

Is there some technical reason not to do that?

slint · 2019-01-31T15:31:15Z

Our file storage backend at the moment is not optimized to serve HTTP range requests (meaning that enabling this feature would potentially lead to significant slowdowns for the file upload/download API). Of course, there are people working on making it possible, though we can't give an accurate ETA on it...

rabernat · 2020-09-18T13:49:31Z

I just wanted to add my 👍 to state that enabling range requests would be very useful for geospatial data formats. Cloud Optimized GeoTIFF in particular would benefit a lot from this. Allowing range requests could really reduce the bandwidth needed from zenodo.

zhanxw · 2020-10-10T23:37:36Z

Our file storage backend at the moment is not optimized to serve HTTP range requests (meaning that enabling this feature would potentially lead to significant slowdowns for the file upload/download API). Of course, there are people working on making it possible, though we can't give an accurate ETA on it...

Many people cannot download large genetic files (several GB). e.g.,
#460 (comment)

Some has to retry many times, and that's actually wasting your bandwidth...

thengl · 2021-02-04T14:44:10Z

For our project also important that we can use Cloud-Optimized GeoTIFFs (see e.g. https://zenodo.org/record/4483227) directly from Zenodo. Figshare apparently works with COG's, zenodo does not? We wrote a tutorial for users how to get small chunks of data using COG files.

oeway · 2021-03-25T15:29:52Z

Could you please support this?

We need it to serve large image files (in Zarr format) by chunks, that allows us visualize the files in the browser instantly. It won't be possible to for the browser to download the, e.g.10GB, file and display.

jakirkham · 2021-03-25T19:17:56Z

Just noting the value for the Zarr use case. Thanks all for your work on Zenodo!

rabernat · 2021-03-25T19:38:31Z

For Zarr, we could hypothetically get zenodo working today, without any changes. Zenodo does not support directories, but if we could map a regular zarr directory store to some sort of flat hierarchy, via a special character, we could make it work. For example, if the special character is __

.zgroup
foo__.zarray
foo__.zattrs
foo__0.0
foo__0.1

etc.

jakirkham · 2021-03-25T19:49:17Z

Could you please raise an issue here ( https://github.com/zarr-developers/zarr-specs/issues )?

oeway · 2021-03-25T20:27:10Z

@rabernat I afraid that won't scale because Zenodo only allow 100 files at maximum.

Total files size limit per record is 50GB (max 100 files). One-time 100GB quota can be requested and granted on a case-by-case basis.

source: https://www.openaire.eu/technical-requirements

slint · 2021-11-17T12:36:04Z

HTTP Range support is now available for file downloads!

# Fetches only the last two lines of the CSV file
curl -r -182 https://zenodo.org/record/5702574/files/articles_by_influence.csv

rabernat · 2021-11-17T12:59:31Z

Wow this is huge!

oeway · 2021-11-17T14:02:21Z

This is awesome! However, with the recent update, CORS is disabled ;( Submitted an issue here: #2246

iprapas · 2022-10-25T16:18:59Z

Just found this, and is quite awesome that zenodo supports this. What is the recommended way to upload a zarr store in zenodo, assuming the zarr store can have a lot of subfiles?

rabernat · 2022-10-25T16:29:48Z

It's probably best to use a zipped store, which is discoverable and accessible via range requests.

iprapas · 2022-10-25T17:03:38Z

Thank you Ryan! I wonder if/how I could then open the zipped store using xarray. It looks like the zarr zipstore only accepts files from the local filesystem https://zarr.readthedocs.io/en/stable/api/storage.html#zarr.storage.ZipStore

rabernat · 2022-10-25T17:12:43Z

Yeah I think you have to go through fsspec. Here's an example: pangeo-forge/staged-recipes#90 (comment)

slint added Enhancement Needs investigation labels Jan 31, 2019

kbg mentioned this issue Nov 19, 2020

views: support for Range requests inveniosoftware/invenio-files-rest#71

Open

oeway mentioned this issue Apr 9, 2021

BioImage.IO Meeting Minutes bioimage-io/bioimage.io#28

Open

slint closed this as completed Nov 17, 2021

mvdbeek mentioned this issue Mar 2, 2023

Archive upload UI galaxyproject/galaxy#15296

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support of "HTTP/1.1 byte range request" in file retrieval #1599

Support of "HTTP/1.1 byte range request" in file retrieval #1599

hkawaji commented Sep 9, 2018

kpalin commented Jan 31, 2019

slint commented Jan 31, 2019

rabernat commented Sep 18, 2020

zhanxw commented Oct 10, 2020

thengl commented Feb 4, 2021

oeway commented Mar 25, 2021

jakirkham commented Mar 25, 2021

rabernat commented Mar 25, 2021

jakirkham commented Mar 25, 2021

oeway commented Mar 25, 2021

slint commented Nov 17, 2021 •

edited

rabernat commented Nov 17, 2021

oeway commented Nov 17, 2021

iprapas commented Oct 25, 2022

rabernat commented Oct 25, 2022

iprapas commented Oct 25, 2022

rabernat commented Oct 25, 2022

Support of "HTTP/1.1 byte range request" in file retrieval #1599

Support of "HTTP/1.1 byte range request" in file retrieval #1599

Comments

hkawaji commented Sep 9, 2018

kpalin commented Jan 31, 2019

slint commented Jan 31, 2019

rabernat commented Sep 18, 2020

zhanxw commented Oct 10, 2020

thengl commented Feb 4, 2021

oeway commented Mar 25, 2021

jakirkham commented Mar 25, 2021

rabernat commented Mar 25, 2021

jakirkham commented Mar 25, 2021

oeway commented Mar 25, 2021

slint commented Nov 17, 2021 • edited

rabernat commented Nov 17, 2021

oeway commented Nov 17, 2021

iprapas commented Oct 25, 2022

rabernat commented Oct 25, 2022

iprapas commented Oct 25, 2022

rabernat commented Oct 25, 2022

slint commented Nov 17, 2021 •

edited