-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Add a CI check to rebuild documentation from a cached documentation build #16161
Comments
Incidentally, PlasmaPy's documentation guide has a troubleshooting guide which describes a whole bunch of cryptic Sphinx errors and possible ways to fix them. If the documentation build in CI fails, it prints out a link to the troubleshooting guide. |
Over here, doc build CI is handled directly by RTD via https://docs.readthedocs.io/en/stable/pull-requests.html because the HTML preview is really essential in some PRs. To add another CI via tox to essentially do the same build would be a waste of resources. I wonder if RTD will be able to offer caching directly. Maybe @humitos can advice or point us to someone who can. 🙏 |
Thank you for the feedback! My view is that it would be a worthwhile use of resources since it has the potential to shorten the feedback loop for the documentation build by possibly ten minutes or so. This would only get us the Sphinx error messages, so the time for the HTML preview (which I agree is absolutely essential!) would remain unchanged. (Relatedly, addressing #15149 might reduce the documentation build time by several minutes across platforms.) |
Hmm ok. I added this to 2024-03-07 dev telecon agenda. Thanks! |
Hello 👋🏼 . Thanks for pinging me on this issue. In the past, Read the Docs used to cache the Git repository cloned, the virtualenv with all the requirements installed and the Sphinx environment to be able to re-use it between builds. However, this brought a lot of inconsistencies in the documentation (e.g. documentation not updated with latest changes, different build times when using/not using the cache, etc) and plenty of confusions to our users (e.g. how to purge the cache). Due to this, we decided to remove this feature and make all the builds consistent between them. This has helped a lot to remove these inconsistencies and users confusions 👍🏼 Later, with the introduction of the Even if we implement the I'm open to receive more feedback here or on the linked issue where we are having the |
@humitos — Thank you for the insightful response! Given your response, along with the scale and tradeoffs to consider for the RTD ecosystem, it makes much more sense how caching doc builds would be really difficult there! My inclination would be for Astropy's CI to include a quick rebuild of the docs on GitHub Actions, in addition to the much more reliable RTD build with a preview. Being able to get rapid feedback is essential for CI, even if the feedback is partial. GitHub Actions also lets workflows be run on two processors, which itself might be able to reduce fresh doc build times by ∼25-40% compared to a serial run. Perhaps the next step would be for me to try this out for PlasmaPy to see how much time caching the doc build would actually save in CI, and then compare it to other possibilities for speeding up doc build times. |
This a good point. I just wanted to note that Read the Docs uses AWS instances with 2 CPU cores as well. However, the default Sphinx command to build the documentation 1 does not use Because of this, together with not a lot of benefits in most of the projects (a few seconds faster) and the low adoption of the feature flag, we removed it in readthedocs/readthedocs.org#10423. I'd be curious of the performance/optimization numbers you may get in case you do this research with GitHub Action or even with Read the Docs using Let me know if I can be useful here in something else 👍🏼 Edit: another interesting issues
Footnotes |
I just did some timings on freshly building PlasmaPy's docs locally with The biggest successes I've had with speeding up doc builds in CI for PlasmaPy have been caching Before anything else, I should do some timings to see how much caching the doc build will help after making a minor change in the docs in between. |
After doing some timings and having discussions here and at the astropy dev telecon, I'm now more inclined to support an additional fresh "lite" build of the docs that skips evaluating example Jupyter notebooks. Best case scenario: someone rewrites Sphinx (or at least the most time-consuming parts of it) in Rust. I wish Astral had a bat-signal... 🦇 |
We discussed this in the astropy-dev telecon. I'm trying to summarize the discussion here, so what I write is not necessarily my opinion:
Personally, I suggest to not add another redundant CI job; instead we can look if the docs for how to build the docs locally can be improved. We can also do other speedups e.g. replace the rst epilogue with some other dictionary mechanism that is faster, if there are easy ways to do them. |
At the dev telecon today (2024-03-07), @hamogu and I voiced concerns for this request:
Instead of re-adding this doc CI check (we used to have one before RTD PR build was a thing), these alternatives were discussed:
|
Oh oops, I didn't refresh the browser tab, so I missed the other comments posted at the time of dev telecon, sorry if I duplicated some stuff in my comment above. For completeness... Nick also mentioned in the dev telecon that he tried using parallel build feature but didn't notice any speed up.
This is a very old, still open issue at #10470 . Our docs are so intertwined with each other, no one could think of a simple solution to make it happen. |
@pllim I'm sorry I'm chiming in again here 😅 , but I just wanted to note that we've built a feature to skip a build some time ago: "if there is any command that exits with code 183, the whole build is skipped". We talked about this in the past in readthedocs/readthedocs.org#9649 (comment) and I suggested a way to check for
There is a variation of that command that is:
Not sure what's the best one and which work for your use case, but I you can get the idea from there and adapt as you need. Let me know if that helps 👍🏼 |
Thanks got @humitos to point out this feature of RTD. |
@humitos , yes, I actually implemented that for another package here: However, implementing it for
|
I just did some benchmarks for PlasmaPy's documentation in PlasmaPy/PlasmaPy#2562 and found that:
I did another benchmark on my workstation and found that:
Given this final result, I'm going to close this issue and go back to praying that Sphinx gets re-implemented in Rust. Edited to add: The timings below show that caching could significantly speed up the doc build, but there were wasn't much support at the dev meeting for a GitHub Action to rebuild docs, so I'll leave it closed. The caveat here is that there might be something about our Sphinx config over there which is leading to longer rebuild times, but I've jumped into enough Sphinx dragon holes for the time being... 🐉 🕳️ Here's a summary of what actually has helped with speeding up PlasmaPy's doc build in CI:
|
Thanks for investigating! Really helpful insights here.
I have trouble understanding this conclusion though. 😅 |
Oops! I fixed it above. |
For astropy, locally a full build take ~6 min and a rebuild a bit more than 1 min. For rebuilds, most of the time is spent writing files again, I think mostly due to always writing all API files, since automodapi will always regenerate those. Not sure if there is a way to avoid that. |
What is the problem this feature will solve?
It takes a long time to build Astropy's documentation from scratch. From recent builds on Read the Docs, it takes ∼1000 s.
One of the most important qualities of continuous integration testing is to get rapid feedback. One thing that has happened a lot to me over at PlasmaPy has been that I run into a cryptic Sphinx error, and then have to wait ∼500 s between iterations trying to get the reStructuredText syntax right. (We have a saying over at PlasmaPy that "Sphinx rabbit holes often have dragons in them." 🐰 🕳️ 🐉)
Describe the desired outcome
We could use the GitHub Action for caching to cache a documentation build from
main
, as well as the Python environment used to build it. This cache could be recreated daily.We could then add a workflow to be run in CI that would rebuild the documentation from the cached build (using the
-j auto
flag to have thesphinx-build
be done in parallel while auto-detecting the number of processes to use). We could potentially reduce the documentation build time in CI to ≲ 3 min.Additional context
This probably should not be a required check, at least not at first. I suspect there may be some occasional situations where a rebuild might fail or succeed incorrectly, such as when adding a Sphinx extension or modifying the configuration file. It might help to have a label for skipping the documentation cache, and/or for the cache key to use
hashFiles
onconf.py
and maybe alsopyproject.toml
.I've been trying to do this in PlasmaPy/PlasmaPy#2553 by using a
tox
environment to do this, but it seems that it rebuilds the documentation build from scratch (though caching.tox
between runs did speed things up by a minute or two). It may be necessary to usesphinx-build
directly rather than creating atox
environment for this.Edited to add: Locally, it took my computer ∼7 minutes to do a fresh doc build for Astropy, and ∼2.4 min to do a repeat documentation build. This was with
make html -j 2
for two processors, which is what each GitHub Action allows.The text was updated successfully, but these errors were encountered: