Enable optional parallel site building #383

jozefhajnala · 2019-05-15T17:43:00Z

This PR proposes an option to build_site() using multiple parallel-running R processes, utilizing only the base package parallel. This can result in significant speed improvements with hardware common in 2019.

To enable the parallelization, the user must specify 2 options:

options(
  blogdown.use_parallel = TRUE,
  blogdown.use_parallel.cores = number_of_cores
)

The functionality is only triggered if:

the option blogdown.use_parallel is TRUE
the option blogdown.use_parallel.cores is > 1
the length of files is > 1
the parallel package is available

If this direction of functionality is accepted, we can make the implementation less conservative and easier to use.

yihui

Yes, I'd love to have this feature. Please finish up whatever you planned to do, and let me know when the PR is ready for review. Thank you very much!

jozefhajnala · 2019-05-16T17:43:11Z

I would consider this ready to be reviewed, just not sure about the "UI" in terms of using the feature. I am a fan of conservative introduction of new features, but happy to change.

jozefhajnala · 2019-05-30T07:31:18Z

@yihui, is there anything waited on from my side?

yihui

Sorry about the delay. I just added my own proposal (which is even more conservative but also more general). Please let me know what you think. Thanks!

yihui · 2019-06-11T19:05:14Z

R/render.R

@@ -44,7 +47,11 @@ build_site = function(
  if (local && length(files)) {
    files = getOption('blogdown.files_filter', timestamp_filter)(files)
  }
-  build_rmds(files)


I thought a while about it and I tend to let users decide how to parallelize it because there are multiple ways of parallelization. Here is how I'd implement it:

build_fun = getOption('blogdown.build_rmds', function(files, build) { build(files) }) build_fun(files, build_rmds)

Then users can set their own functions (in .Rprofile), e.g.

options(blogdown.build_rmds = function(files, build) { parallel::mclapply(files, build) })

Updated the approach to hopefully be in line with your though process, albeit still exposing the build_fun as a parameter, not just retrieving it from options within the function. We can also remove the parameter if you like the other approach better.

Would suggest to keep the build_rmds_parallel(), as it gives users an easy way to get parallelization without having to implement themselves, using either

options("blogdown.build_rmds" = blogdown:::build_rmds_parallel)

or (assuming we keep the parameter) calling directly

blogdown::build_site(build_fun = blogdown:::build_rmds_parallel)

cderv · 2019-06-15T10:55:23Z

Just some thoughts about letting user choose how to do parallel. 💭

The future package is really helpful for that. It allows a clean separation between what is parallelized and how it should be run. However, it would mean adding it as dependency, at least in suggest, and could be too heavy 🤔
A vignette that gives examples of how to use this option could be insteresting. It would allow to show build_rmds_parallel and explain how it build on parallel package. Either the user can copy paste from the vignette, or know that a helper blogdown::build_rmds_parallel exists.

Thanks for this feature by the way !

jozefhajnala · 2019-06-18T18:08:46Z

Just some thoughts about letting user choose how to do parallel. 💭

The future package is really helpful for that. It allows a clean separation between what is parallelized and how it should be run. However, it would mean adding it as dependency, at least in suggest, and could be too heavy 🤔

A vignette that gives examples of how to use this option could be insteresting. It would allow to show build_rmds_parallel and explain how it build on parallel package. Either the user can copy paste from the vignette, or know that a helper blogdown::build_rmds_parallel exists.

Thanks for this feature by the way !

Thanks for the feedback!

The currently suggested PR would let you choose your own way of parallelization, so you could define options("blogdown.build_rmds" = my_parallelization_fun), where my_parallelization_fun() could use the future package to parallelize, without having to introduce a dependency to blogdown
I will happily spend some time writing a vignette if we get the PR merged ;-)

jozefhajnala · 2019-09-25T18:56:20Z

Hi @yihui, do we still want to move this forward?

yihui reviewed May 15, 2019

View reviewed changes

yihui added this to the v0.13 milestone May 15, 2019

yihui modified the milestones: v0.13, v0.14 Jun 11, 2019

yihui reviewed Jun 11, 2019

View reviewed changes

jozefhajnala force-pushed the parallel-build branch 2 times, most recently from 102e98f to 129cf50 Compare June 14, 2019 21:08

support custom build functions for build_site()

129cf50

yihui removed this from the v0.14 milestone Aug 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable optional parallel site building #383

Enable optional parallel site building #383

jozefhajnala commented May 15, 2019

yihui left a comment

jozefhajnala commented May 16, 2019

jozefhajnala commented May 30, 2019

yihui left a comment

yihui Jun 11, 2019

jozefhajnala Jun 14, 2019

cderv commented Jun 15, 2019

jozefhajnala commented Jun 18, 2019

jozefhajnala commented Sep 25, 2019

Enable optional parallel site building #383

Are you sure you want to change the base?

Enable optional parallel site building #383

Conversation

jozefhajnala commented May 15, 2019

yihui left a comment

Choose a reason for hiding this comment

jozefhajnala commented May 16, 2019

jozefhajnala commented May 30, 2019

yihui left a comment

Choose a reason for hiding this comment

yihui Jun 11, 2019

Choose a reason for hiding this comment

jozefhajnala Jun 14, 2019

Choose a reason for hiding this comment

cderv commented Jun 15, 2019

jozefhajnala commented Jun 18, 2019

jozefhajnala commented Sep 25, 2019