Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repo is unnecessarily very large (504 MB) #190

Open
bstrand opened this issue Oct 25, 2022 · 0 comments
Open

Repo is unnecessarily very large (504 MB) #190

bstrand opened this issue Oct 25, 2022 · 0 comments

Comments

@bstrand
Copy link

bstrand commented Oct 25, 2022

Problem

This repo's size is currently 504 MB. This is an accessibility concern, in particular for users with slower or cost-metered internet. It also requires also users to give up a half GB of disk to keep the repo in sync (or be savvier about Git than your audience is likely to be.) Finally, it sets a questionable example for people new to programming and version control.

Description

The repo's total size is 504 MB, 90% of which comes from 15 image files present in Python/Threading and Python/MultiProcessing, each of which has 145 MB of .jpg image files.

In Python/Threading, these image files are downloaded from Unsplash by the tutorial script, so there seems to be little value in having them checked in with the code.

For Python/Multiprocessing, those image files used are input. Having them checked in to the repo is convenient, but not strictly necessary. Instead, the user could be asked to download them from Unsplash with Threading/download-images.py as a prerequisite. (Or allow them to use their own set of images by revising the tutorial script to target an arbitrary set of jpg's, e.g., *.jpg in a subdirectory.

At the very least, these image files could be much smaller (~10x) with heavier compression. (NB the repo would need to be filtered afterwards to remove the large files from the commit history.)

code_snippets on master✔ » du -h -d 1 . | sort -rh | head
504M	.
318M	./Python
161M	./.git
 24M	./Django_Blog
 28K	./Terminal
 …
code_snippets on master✔ » du -h -d 1 ./Python | sort -rh | head
318M	./Python
145M	./Python/Threading
145M	./Python/MultiProcessing
 20M	./Python/Flask_Blog
4.8M	./Python/Matplotlib
…
code_snippets on master✔ » du -hsc ./Python/Threading/*.jpg | tail -n1
145M	total

Suggestions

Minimally:

  1. Recompress all large jpg's in the repo to reduce their file size
  2. Filter the repo to remove the larger versions (≥2 MB) of the files from the git history. (Could make for a good tutorial.)

Alternatively:

  1. Delete .jpg files in Threading and Multiprocessing from the repo entirely, and have the user download the images with Threading/download-images.py script.
  2. Replace the hard coded file names in Multiprocessing with loading *.jpg from a subdirectory so users can use their own / an arbitrary set of images.
  3. Add input/output directories and exclude them in .gitignore
  4. Filter the repo to remove the larger versions of the files from the git history
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant