Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance problems with large wiki, with long history and large pages #1940

Open
felixwellen opened this issue Jan 13, 2023 · 6 comments
Open

Comments

@felixwellen
Copy link

felixwellen commented Jan 13, 2023

I tried to use gollum (latest master) with this repo:

https://github.com/ncatlab/nlab-content

but it is somewhat too slow to be usable:

  • Browsing the directory structure takes 2-3 seconds per click
  • Loading small pages takes 5-10 seconds each time

I guess the second problem could be solved by caching (I don't mind if there is some wating time after updating a page) and essentially amounts to finishing up this:

For the first problem I only have the wild guess, that it might come from the long history (>100k commits), since it also seemed to cause problems here:

Closing with some details of my setup, just in case that plays a role:

  • I'm running on a fedora 37 with recent hardware and lot's of ram
  • On startup, gollum tells me (twice), it is ignoring some gems, since they haven't been built with native extensions:
Ignoring psych-5.0.1 because its extensions are not built. Try: gem pristine psych --version 5.0.1
Ignoring racc-1.6.2 because its extensions are not built. Try: gem pristine racc --version 1.6.2
Ignoring sassc-2.4.0 because its extensions are not built. Try: gem pristine sassc --version 2.4.0
Ignoring stringio-3.0.4 because its extensions are not built. Try: gem pristine stringio --version 3.0.4
Ignoring unf_ext-0.0.8.2 because its extensions are not built. Try: gem pristine unf_ext --version 0.0.8.2
@felixwellen
Copy link
Author

I managed to make the Ignoring ...-messages go away. If there is any speedup, it is barely noticeable.
Really small pages with only a couple of lines actually take only 1 or 2 seconds. Looking at the history of a page takes a long time, even if it does not have many entries.

@felixwellen
Copy link
Author

Maybe I should add, that the nlab-content repo has a weird structure from a wiki-standpoint, since, so far, it is only used as a backup method (for this wiki https://ncatlab.org/). In particular, it has a deep folder structure, the folder names are number, each page is called content.md and all the internal links do not work as they are.

@dometto
Copy link
Member

dometto commented Jan 13, 2023

Thanks for letting us know, and for providing a test repo. See gollum/gollum-lib#437 for an explanation of what's causing the poor performance. Note: the fix there only takes care of the page load times, the Overview logic still uses the slower complete tree map approach.

We could ultimately combine the approach in gollum/gollum-lib#437 with caching, but I think caching is less important than improving our logic at this point!

@dometto
Copy link
Member

dometto commented Jan 13, 2023

I should also say: any feedback and help is appreciated!

@felixwellen
Copy link
Author

Many thanks for looking into this!
I haven't figured out yet, how I can use your PR. Is that documented somewhere?

@benjaminwil
Copy link
Member

Hi there,

I cloned the NLab wiki and tested it against Gollum 6.0. Individual pages load in an acceptable amount of time, in my opinion. But the "Overview" view is still unbearably slow.

Below I will outline my proposed strategy for making overviews faster, and a link to my first PR to deal with this.

After looking at the overview controller code, it is very easy to see why this is so slow for a large wiki. It loops over all of the files in the wiki (like content.md) for every page request. This NLab wiki has 30,000+ files.

In my opinion, the first step to making this faster will be to let us access the directory tree without having to look at all 30,000 files within the repo; right now we only "build" the directory tree by checking where files live (i.e. if pages/0/0/0/0/10000/my_file.md exists, then the directories pages, pages/0, pages/0/0, pages/0/0/0, and so on, must exist. This seems a bit backwards to me. It would be much faster if we could say, "I am currently looking at the pages directory. The directory 0 is visible."

To start this work I've created a pull request against gollum-rugged_adapter (gollum/rugged_adapter#65) that lets us query for just the directory list as it's checked into git. Once this works, it will be much easier build the overview page UI quickly. We can also more easily filter out out-of-scope pages per overview page (i.e. when viewing pages we do not need to load all deeply-nested results inside of pages/0/0/... and so on... we only need current-directory entries like pages/0 and pages/my_page.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants