Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs take a long time to build #1420

Closed
freddyaboulton opened this issue Nov 10, 2020 · 8 comments · Fixed by #1654
Closed

Docs take a long time to build #1420

freddyaboulton opened this issue Nov 10, 2020 · 8 comments · Fixed by #1654
Assignees
Labels
documentation Improvements or additions to documentation testing Issues related to testing.

Comments

@freddyaboulton
Copy link
Contributor

freddyaboulton commented Nov 10, 2020

As of late our docs take ~14 minutes to build on circle-ci whereas they took about 6 minutes to build in the previous release. The root cause of this slow-down seems to be that woodwork is inferring some categorical variables as text which then causes AutoML to use the TextFeaturizer. However, even if ww fixes the categorical vs text inference, the time to build the docs will inevitably increase as we write more documentation. This makes it hard for developers to iterate on the docs locally.

Possible solutions:

  • Add some hidden code in the notebook that would skip long running computations.
  • Have nb-sphinx or read the docs cache long-running computations.
@freddyaboulton freddyaboulton added the documentation Improvements or additions to documentation label Nov 10, 2020
@dsherry dsherry added the testing Issues related to testing. label Dec 7, 2020
@dsherry
Copy link
Contributor

dsherry commented Dec 7, 2020

Yep. I changed the default automl stopping criterion to max_batches=1 a couple weeks back also, which didn't help.

I like the solutions you listed! Plus one of my own:

  1. Add some hidden code in the notebook that would skip long running computations. This could be code which mocks pipeline fit/predict. Advantage: works. Disadvantage: may not match with what users get when they run by hand, plus hidden code is confusing.
  2. For long-running notebooks, pre-run locally one time and save the output in the notebook. Nbsphinx will use a saved execution if one exists instead of rerunning. Advantage: works. Disadvantage: we may forget to periodically update the output.
  3. Simplify / delete some of the notebook content. For example, consider lowering data size, stopping criterion etc. if possible. Advantage: speedups. Disadvantage: can't show full output for some examples, like text.

I recommend we go with option 2, but with option 3 in mind.

@angela97lin
Copy link
Contributor

#1627 was closed as a duplicate, but I think there could still be something there that wasn't covered in this issue, so posting here:

I noticed that docs have been taking much longer to build. I think this is likely because the automl docs were changed in c871f3b to use the fraud dataset, instead of the breast cancer data set (+ elsewhere?) to showcase infer_problem_types, since the breast cancer dataset only has numeric columns.

I suspect this is a different issue / reason for the even-longer build time of docs, from the previous 20 minutes to now >30 minutes, and could be worth mentioning!

@dsherry FYI

@dsherry dsherry assigned bchen1116 and unassigned bchen1116 Jan 5, 2021
@bchen1116 bchen1116 self-assigned this Jan 6, 2021
@freddyaboulton
Copy link
Contributor Author

Another possible solution is to use multiple processors to build the docs:

https://www.sphinx-doc.org/en/master/man/sphinx-build.html#cmdoption-sphinx-build-j

@bchen1116
Copy link
Contributor

Update following discussion with @dsherry.

Adding in the -j flag to our Makefile allows the build docs test on circleci to finish faster, as seen here. Unfortunately, ReadtheDocs doesn't run this command, which means that the actual generation of published documentation still takes a while and often errors out.

This is what a successful build looks like for ReadtheDocs, taking a little over 20 minutes to complete. The differences between the HTML and Latex build times suggests that building the Jupyter notebooks themselves do not take a lot of time, which is good.

However, we're also finding instances where the build fails like this. We noticed that for some reason, ReadtheDocs is running the full sequence of commands twice, which causes the build to take much longer (well over 30 minutes each to create the HTML and latex files), and causes the doc build to fail. I'll follow up with the ReadtheDocs support team to see why this is happening and how we can fix this, and I'll update with those results here when I get feedback.

@dsherry
Copy link
Contributor

dsherry commented Jan 12, 2021

@bchen1116 contacted support and they said

It looks like the underlying cause of this bug is the number of active versions that you have. I see a few errors in our logs related to this.
To work around this for now, you might reduce the number of active versions that you keep. It looks like you are building versions for individual branches or pull requests, have you tried our pull request building feature? This would help remove the unneeded versions after building, while still keeping the built content.

I believe the "pull request building feature" referenced here is this, confirming.

@bchen1116
Copy link
Contributor

Update:
We've updated RTD to build from pull requests only, removing the unnecessary builds to different versions (branches) that we push. Additionally, we've deleted all unnecessary (untagged) versions from RTD (miscellaneous branches that we use for PRs), which seems to have helped the doc builds. We don't notice any docs timing out on builds, so we will close this issue tomorrow unless we begin seeing timeouts again.

@dsherry
Copy link
Contributor

dsherry commented Feb 2, 2021

@bchen1116 is this closeable now?

@bchen1116
Copy link
Contributor

Closing now, as there's been no issue with slow doc builds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation testing Issues related to testing.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants