Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable optional parallel site building #383

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jozefhajnala
Copy link
Contributor

This PR proposes an option to build_site() using multiple parallel-running R processes, utilizing only the base package parallel. This can result in significant speed improvements with hardware common in 2019.

To enable the parallelization, the user must specify 2 options:

options(
  blogdown.use_parallel = TRUE,
  blogdown.use_parallel.cores = number_of_cores
)

The functionality is only triggered if:

  • the option blogdown.use_parallel is TRUE
  • the option blogdown.use_parallel.cores is > 1
  • the length of files is > 1
  • the parallel package is available

If this direction of functionality is accepted, we can make the implementation less conservative and easier to use.

Copy link
Member

@yihui yihui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'd love to have this feature. Please finish up whatever you planned to do, and let me know when the PR is ready for review. Thank you very much!

@yihui yihui added this to the v0.13 milestone May 15, 2019
@jozefhajnala
Copy link
Contributor Author

I would consider this ready to be reviewed, just not sure about the "UI" in terms of using the feature. I am a fan of conservative introduction of new features, but happy to change.

@jozefhajnala
Copy link
Contributor Author

@yihui, is there anything waited on from my side?

@yihui yihui modified the milestones: v0.13, v0.14 Jun 11, 2019
Copy link
Member

@yihui yihui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delay. I just added my own proposal (which is even more conservative but also more general). Please let me know what you think. Thanks!

@@ -44,7 +47,11 @@ build_site = function(
if (local && length(files)) {
files = getOption('blogdown.files_filter', timestamp_filter)(files)
}
build_rmds(files)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought a while about it and I tend to let users decide how to parallelize it because there are multiple ways of parallelization. Here is how I'd implement it:

build_fun = getOption('blogdown.build_rmds', function(files, build) {
  build(files)
})
build_fun(files, build_rmds)

Then users can set their own functions (in .Rprofile), e.g.

options(blogdown.build_rmds = function(files, build) {
  parallel::mclapply(files, build)
})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Updated the approach to hopefully be in line with your though process, albeit still exposing the build_fun as a parameter, not just retrieving it from options within the function. We can also remove the parameter if you like the other approach better.

  • Would suggest to keep the build_rmds_parallel(), as it gives users an easy way to get parallelization without having to implement themselves, using either

    options("blogdown.build_rmds" = blogdown:::build_rmds_parallel)

    or (assuming we keep the parameter) calling directly

    blogdown::build_site(build_fun = blogdown:::build_rmds_parallel)

@jozefhajnala jozefhajnala force-pushed the parallel-build branch 2 times, most recently from 102e98f to 129cf50 Compare June 14, 2019 21:08
@cderv
Copy link
Collaborator

cderv commented Jun 15, 2019

Just some thoughts about letting user choose how to do parallel. 💭

  • The future package is really helpful for that. It allows a clean separation between what is parallelized and how it should be run. However, it would mean adding it as dependency, at least in suggest, and could be too heavy 🤔
  • A vignette that gives examples of how to use this option could be insteresting. It would allow to show build_rmds_parallel and explain how it build on parallel package. Either the user can copy paste from the vignette, or know that a helper blogdown::build_rmds_parallel exists.

Thanks for this feature by the way !

@jozefhajnala
Copy link
Contributor Author

Just some thoughts about letting user choose how to do parallel. 💭

  • The future package is really helpful for that. It allows a clean separation between what is parallelized and how it should be run. However, it would mean adding it as dependency, at least in suggest, and could be too heavy 🤔
  • A vignette that gives examples of how to use this option could be insteresting. It would allow to show build_rmds_parallel and explain how it build on parallel package. Either the user can copy paste from the vignette, or know that a helper blogdown::build_rmds_parallel exists.

Thanks for this feature by the way !

Thanks for the feedback!

  • The currently suggested PR would let you choose your own way of parallelization, so you could define options("blogdown.build_rmds" = my_parallelization_fun), where my_parallelization_fun() could use the future package to parallelize, without having to introduce a dependency to blogdown
  • I will happily spend some time writing a vignette if we get the PR merged ;-)

@yihui yihui removed this from the v0.14 milestone Aug 10, 2019
@jozefhajnala
Copy link
Contributor Author

Hi @yihui, do we still want to move this forward?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants