Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repository is unexpectedly large #71

Open
ghost opened this issue Jul 19, 2017 · 2 comments
Open

Repository is unexpectedly large #71

ghost opened this issue Jul 19, 2017 · 2 comments

Comments

@ghost
Copy link

ghost commented Jul 19, 2017

The GitHub API gives the size of the Norma repository as 362425 KB and the AMI repository as 301415 KB.

The recent experience of two new developers, both of whom needed to buy additional hardware in order to be able to clone and work with these repositories, suggests that new users or developers are unlikely to expect these repositories to be so large.

Ways of reducing the size of the repositories should be investigated. For instance, could the repositories' test corpora be factored out into a different module that can be shared, as a dependency, between Norma, AMI, and perhaps other modules in the AMI stack?

(Corresponding AMI issue: ContentMine/ami#70 .)

@mdales
Copy link

mdales commented Jun 23, 2019

This is a problem when trying to create a docker image from these tools. As of today:

  • Cephis: 1.5 GB
    • Git: 380 MB
    • Test: 1.1 GB
  • Normami: 5.5 GB
    • Test 3.1 GB
    • Git: 2.2 GB

I can filter out test and git from going into the docker setup, but the build process for normami generates a debian file (that isn't used AFAICT in running the tools) which relies on some example files from test (locally I've just commented out making the deb file for now).

Once filtered, there's only 350MB left used to build the image (5% of the storage!). That could go down more I suspect, but it'd be a massive start to just move the test data and then purge the git history of this data.

Git really isn't the best place to store large test data, or if you are going to do this you at least want it in a submodule, so that the main repository can remain lean. Git LFS may also be a solution here.

@petermr
Copy link
Member

petermr commented Jun 30, 2019

Agreed.
The *.deb is not critical. Some people used it in the past. The appassembler script included it from way back.
Yes, the test and git can be dropped as well.
The test stuff needs purging anyway but that's a month of my time I suspect.
Happy to talk about the best strategy when we meet.
Docker will be critical to our future plans. I hope to demo it at Oxford in Sept.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants