Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

index page is really slow with 1000+ repositories #238

Open
jelmer opened this issue May 11, 2019 · 7 comments
Open

index page is really slow with 1000+ repositories #238

jelmer opened this issue May 11, 2019 · 7 comments

Comments

@jelmer
Copy link
Contributor

jelmer commented May 11, 2019

Filing this mostly to track the work I'm doing in this area. With ~2000 repositories loaded, klaus still works well. However, there are two caveats:

  • the index page takes a while to load, since it has to open all the repositories
  • it runs out of file descriptors, since it (or rather, Dulwich) holds on to file handles to the pack files in all repositories
@jonashaag
Copy link
Owner

@jelmer You have an idea how to fix this in klaus? Have a cache that updates whenever a change has been made to the repository? (Somehow circumventing Dulwich/"properly" loading the repo in Dulwich to save the time it takes to "properly" load the repo)

@jelmer
Copy link
Contributor Author

jelmer commented Jul 3, 2019

Yeah, I think we'd want a cache rather than actually reading the repositories every time. Perhaps we could make FancyRepo a wrapper for dulwich.Repo rather than being derived from it?

@jonashaag
Copy link
Owner

OK will look into this soon.

Curious: Do you actually have that kind of use case with 1000+ repos?

jonashaag added a commit that referenced this issue Jul 4, 2019
@jonashaag
Copy link
Owner

Check this out.

I guess there are a lot more ways to do caching but this one of the simplest things to do.

@jelmer
Copy link
Contributor Author

jelmer commented Jul 4, 2019 via email

@jonashaag
Copy link
Owner

I'll have to ramp up my benchmark repository then! Tested it with 1k repos, but let me test and optimize with 20k ;)

If you have the time maybe you could help me think about how we can cache ref listing. I was thinking about checking the stat() of some Git file or folder for cache invalidation; though I'm not sure there is such a thing as filesystem modification timestamp for "any of the recursive folders or files" that you could use for that. Other caching/cache invalidation ideas?

Of course we can always use simple time-based caching, particularly for information like repository description. But I'd rather use that as a last resort only.

Also inotify etc. but I'm not too keen on integrating that TBH

jonashaag added a commit that referenced this issue Jul 4, 2019
@jelmer
Copy link
Contributor Author

jelmer commented Jul 19, 2020

I've worked around this for now by adding an app that just shows a single repository (a list of 10k repositories is not very usable anyway...) and loads that repository on demand. This works, but is a bit ugly since it has to duplicate some of the logic in klaus (e.g. the route table).

See e.g. https://janitor.debian.net/git/klaus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants