Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IDEA: move hosting of data cubes to bintray.com #7

Open
r4lv opened this issue Oct 16, 2018 · 0 comments
Open

IDEA: move hosting of data cubes to bintray.com #7

r4lv opened this issue Oct 16, 2018 · 0 comments
Assignees

Comments

@r4lv
Copy link
Contributor

r4lv commented Oct 16, 2018

problem

Git is not made for storing large binary files: On every change to a binary file, the entire file is stored again. When git clone, the entire history is downloaded, and with it all versions of the binary files.

Small files which are never changed are an acceptable overhead, like the current files in this repository, but I think VIP_extras will grow over time, e.g. the 4D IFS cube I have ready weights 22Mb (cropped).

alternatives

git-lfs

git-lfs is the "large file storage" for git, developed (and supported) by GitHub to address exactly that problem. Once git-lfs is set up for a repository (e.g. "track every .fits and .npz file"), one can use the regular git commands as before. Under the hood, the large files are not stored in the repository, but just their reference, while git uploads the large files to a special server.

advantages

  • regular git
  • as larges files are not part of the repo, the repo size does not increase with new/changed files

disadvantages

  • slightly more complicated setup (difficult to move existing files to lfs → rewrite history, etc.)
  • users need git-lfs installed to clone the repo
  • Binder does not seem to support lfs

bintray

advantages

  • free for open source, tightly integrated with GitHub (e.g. organizations)
  • simple to use (web interface for uploading, curl for downloading and astropy.utils.data.download_file for python)
  • keeps multiple file versions (like git or git-lfs)

disadvantages

  • none?

demo

I created a bintray project for VIP, and uploaded the IFS cube for testing.

Take a look at the project site: https://bintray.com/r4lv/vip/data-cubes

Using the files in python would be

from astropy.utils.data import download_file

fn = download_file("https://dl.bintray.com/r4lv/vip/IFS_HD64568.vip.npz")
dataset = vip.HCIDataset.load(fn)
@carlos-gg carlos-gg self-assigned this Oct 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants