Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slim down the repository for cloning by identifying big files. #49

Open
michaelJwilson opened this issue Sep 22, 2020 · 5 comments
Open

Comments

@michaelJwilson
Copy link
Contributor

Cloning and Binder can take a while. Identify opportunities to slim down or repackage the repository.

@cylammarco
Copy link
Collaborator

Minimal work, purely for faster cloning: Use git clone -–depth 1 to avoid cloning the history of the repository

More work: this can get rid of the big files in the older commits, basically rewritting history: BFG repo cleaner. You still need to identify the existing big files for improving cloning speed.

@michaelJwilson
Copy link
Contributor Author

Agreed, numbering above (1) and (2).

For (1), relies on big files being in the history, but not the current repo. Is this true? Related question: we have an existing Dockerfile that seemingly overrides the environment.yml on Binder and results in a slower build. I paused the Docker builds for this reason. Ideally, we'd clean up to the Docker file to implement --depth and get a fast, tailored build.

For (2), yeh I was thinking BFG - great name! I'm agnostic on rewriting the history. I'd be surprised if the big files didn't have an equivalent that wasn't much smaller.

Thanks for the input!

@cylammarco
Copy link
Collaborator

git is very inefficient in versioning Jupyter notebooks, so it can generate a lot of big files very quickly.

@michaelJwilson
Copy link
Contributor Author

Reviving this,

git clone -–depth 1

+1

also building from minimal_environment (which needs updated to include the requests package, at least). Question is, how do we tell binder to do this? I've totally forgotten, if I ever knew.

@michaelJwilson
Copy link
Contributor Author

michaelJwilson commented Feb 17, 2022

As pointed out by @cylammarco, some merge of this and --depth

https://discourse.jupyter.org/t/how-to-reduce-mybinder-org-repository-startup-time/4956

Including apt-get install of texlive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants