Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deposit: allow users to provide file structure for upload #1089

Open
slint opened this issue May 17, 2017 · 14 comments
Open

deposit: allow users to provide file structure for upload #1089

slint opened this issue May 17, 2017 · 14 comments

Comments

@slint
Copy link
Member

slint commented May 17, 2017

Uploaded files currently get passed through the secure_filename function, stripping any path-like information. Allowing users to upload entire folders and/or define a structure in the files upload box ("New folder" button, drag-n-drop UI elements of folders/files) might provide this feature and still stay "secure" if folder creation/deletion is regulated through REST API. End result should be something similar to Google Drive's file management interface.

@hoechenberger
Copy link

hoechenberger commented Oct 12, 2018

I recently found myself in a situation where I needed to share complex file / directory structures, and ended up encoding all information (that was supposed to at least partially go into directory names) as file names. Does not look nice and is pretty inconvenient. Support for directory structures would be most helpful in such cases.

@Kodiologist
Copy link

I, too, would appreciate directories. I want to provide a separate data file for each year of my data, because each weighs 180 MB and users should only have to download the ones they want. Being able to put them in a directory would help declutter the project files.

@kellerassel007
Copy link

+1

@r03ert0
Copy link

r03ert0 commented Feb 18, 2020

The data in my community (neuroimaging) is also organised with a standardised directory structure, which we need to flatten to upload to Zenodo, and unflatten upon download. It'd be really nice to be able to deal with directory structures directly! (also, a single download button for the complete dataset?)

@sebastientourbier
Copy link

+1

2 similar comments
@ogourgue
Copy link

+1

@aliFrancis
Copy link

+1

@klapo
Copy link

klapo commented Dec 9, 2020

I'm also going to ping this issue. I have approximately 5000 files representing just under 50GB to upload even after I had aggregated my data. It would be wonderful to be able to mirror the file structure from our internal storage on Zenodo to not overwhelm the UI with a massive list.

@joshmoore
Copy link

Seconding @r03ert0 's comment, a number of microscopy formats are organized into directories where the path information is critical for interpreting the data (e.g. defining the ordering of a time series). Other, more general formats like Zarr face the same issue.

@caleblareau
Copy link

Bumping that this would be extremely useful for me in computational biology / genomics research as well.

@hoechenberger
Copy link

The Brain Imaging Data Structure (BIDS) also demands nested directory structures. Currently, Zenodo is unsuitable for storing such datasets.

@lnielsen
Copy link
Member

lnielsen commented Feb 2, 2021

Thanks for the suggestions. First of all, please note that currently, there's a workaround in that you can store a ZIP file, and the file structure will be shown on Zenodo.

It's not straight forward in our case to support file structure. Some of the issues include:

  • Our storage cluster (CERN EOS) is not geared for many smaller files, but for larger files.
  • The repository software is not geared for storing a large number (1000+) of smaller files associated with a record
  • The user experience for uploading/downloading/browsing many smaller files is difficult to address (i.e. download all, find a specific file among 10000 files, make changes to the structure etc)
  • Zenodo is not meant to be used as a live datastore behind e.g. an HPC cluster or similar. We're an archive that you put things in for longevity, and taken them out again to put them on a live datastore for a computing cluster.

Obviously, you can address all of the above issues in some way or another, but the simple solution of packaging up the data prior to upload works already today, and would essentially be what we would need to mimick if we were to support large data structures.

Most of new feature development for Zenodo is now happening on the InvenioRDM project, and there's discussions among the partners there if to support file structure or not. So far the first version of InvenioRDM won't support it.

Thanks a lot for providing the use cases and specific examples.

@r03ert0
Copy link

r03ert0 commented Feb 2, 2021

Thank you for the clarification!
What would be important, in many use cases I think, is to be able to have a URL for each file, regardless of the way in which the data is actually stored. That URL would conserve a hierarchical file structure, and it would be ok if the files are all stored in a single big zip. It's not that much the storage part of Zenodo that's interesting (although it's very welcome! 🙏 ), but the data organisation and indexing aspect, the link with a DOI, etc.

From what you describe, it could be possible that people upload a single zip, but that the API can provide a URL for a specific file within that zip (it's probably already done, because the GUI does show individual files within zips)

@abubelinha
Copy link

abubelinha commented May 11, 2022

you can store a ZIP file, and the file structure will be shown on Zenodo.

@lnielsen is there support for other compressed formats?
My full folder structure size is less than 50 GB (uncompressed), but I am not being able to produce a single .zip file (my Winrar says "zip file size too big" and suggests me to produce a multiple-file in .rar format).

Which other compression tool would you suggest for producing a big zip file up to 50 GB? (I am on Windows XP)

Most of new feature development for Zenodo is now happening on the InvenioRDM project, and there's discussions among the partners there if to support file structure or not. So far the first version of InvenioRDM won't support it.

Could you please provide links to the relevant discussions?
I couldn't find them.

Thanks
@abubelinha

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests