Skip to content

Commit

Permalink
bootdht can now run in parallel via the foreach/doMC packages, see th…
Browse files Browse the repository at this point in the history
…e cores argument.

closes #44
  • Loading branch information
David Lawrence Miller committed Jun 8, 2021
1 parent a04ce8b commit 9450d68
Show file tree
Hide file tree
Showing 4 changed files with 52 additions and 18 deletions.
4 changes: 3 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Description: A simple way of fitting detection functions to distance sampling
Horvitz-Thompson-like estimator) if survey area information is provided. See
Miller et al. (2019) <doi:10.18637/jss.v089.i01> for more information on
methods and <https://examples.distancesampling.org/> for example analyses.
Version: 1.0.2.9008
Version: 1.0.2.9009
URL: https://github.com/DistanceDevelopment/Distance/
BugReports: https://github.com/DistanceDevelopment/Distance/issues
Language: en-GB
Expand All @@ -24,6 +24,8 @@ Imports:
Suggests:
covr,
progress,
doMC,
foreach,
testthat
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
Expand Down
1 change: 1 addition & 0 deletions NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ Distance 1.0.3
* fix issue #85 when species was used in the detection function and for post-stratification. Thanks to jason-airst for reporting the bug.
* fix dht2 bug where stratification="replicate" variance estimation was 0 due to order of operations
* fix dht2 bug where stratification="effort_sum" encounter rate variance estimation, due to incorrect grouping of transects into strata. Thanks to Samantha Ball and Jamie McKaughan for reporting this issue.
* bootdht can now run in parallel via the foreach/doMC packages, see the cores argument.

Distance 1.0.2
----------------------
Expand Down
60 changes: 44 additions & 16 deletions R/bootdht.R
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
#' @param progress_bar which progress bar should be used? Default "base" uses
#' `txtProgressBar`, "none" suppresses output, "progress" uses the
#' `progress` package, if installed.
#' @param cores number of cores to use to compute the estimates. If >1 then the `foreach` package will be used to run the computation over multiple cores of the computer. It is advised that you do not set `cores` to be greater than one less than the number of cores on your machine.
#'
#' @section Summary Functions:
#' The function `summary_fun` allows the user to specify what summary
Expand Down Expand Up @@ -59,6 +60,8 @@
#' @importFrom utils txtProgressBar setTxtProgressBar getTxtProgressBar
#' @importFrom stats as.formula AIC
#' @importFrom mrds ddf dht
# @importFrom foreach foreach "%dopar%"
# @importFrom doMC registerDoMC
#' @seealso [`summary.dht_bootstrap`][summary.dht_bootstrap] for how to
#' summarize the results, [`bootdht_Nhat_summarize`][bootdht_Nhat_summarize]
#' for an example summary function.
Expand Down Expand Up @@ -90,7 +93,8 @@ bootdht <- function(model,
convert.units=1,
select_adjustments=FALSE,
sample_fraction=1,
progress_bar="base"){
progress_bar="base",
cores=1){

if(!any(c(resample_strata, resample_obs, resample_transects))){
stop("At least one of resample_strata, resample_obs, resample_transects must be TRUE")
Expand Down Expand Up @@ -140,9 +144,8 @@ bootdht <- function(model,
# count failures
nbootfail <- 0
# function to do a single bootstrap iteration
bootit <- function(bootdat, our_resamples, groups,
convert.units, pb){

bootit <- function(bootdat, our_resamples, summary_fun,
convert.units, pb, ...){
# sample at the right levels
for(sample_thingo in our_resamples){
# what are the possible samples at this level
Expand All @@ -169,7 +172,6 @@ bootdht <- function(model,
bootdat$Sample.Label <- paste0(bootdat[[sample_label]], "-",
bootdat[[paste0(sample_thingo, "_ID")]])


aics <- rep(NA, length(models))
for(i in seq_along(models)){
model <- models[[i]]
Expand Down Expand Up @@ -227,35 +229,61 @@ bootdht <- function(model,
pb <- list(pb = txtProgressBar(0, nboot, style=3),
increment = function(pb){
setTxtProgressBar(pb, getTxtProgressBar(pb)+1)
},
done = function(pb){
setTxtProgressBar(pb, environment(pb$up)$max)
})
}else if(progress_bar == "none"){
pb <- list(pb = NA,
increment = function(pb){
invisible()
})
increment = function(pb) invisible(),
done = function(pb) invisible())
}else if(progress_bar == "progress"){
if (!requireNamespace("progress", quietly = TRUE)){
stop("Package 'progress' not installed!")
}else{
pb <- list(pb = progress::progress_bar$new(
format=" [:bar] :percent eta: :eta",
total=nboot, clear=FALSE, width=80),
increment = function(pb) pb$tick())
increment = function(pb) pb$tick(),
done = function(pb) pb$update(1))
pb$pb$tick(0)
}
}else{
stop("progress_bar must be one of \"none\", \"base\" or \"progress\"")
}

# run the code
boot_ests <- replicate(nboot,
bootit(dat, our_resamples,
summary_fun, convert.units=convert.units,
pb=pb), simplify=FALSE)
if(cores > 1){
if (!requireNamespace("foreach", quietly = TRUE) &&
(!requireNamespace("doMC", quietly = TRUE))){
stop("Packages 'foreach' and `doMC` need to be installed to use multiple cores.")
}

# the above is then a list of thingos, do the "right" thing and assume
# they are data.frames and then rbind them all together
boot_ests <- do.call(rbind.data.frame, boot_ests)
# build the cluster
doMC::registerDoMC(cores=cores)
# needed to avoid a syntax error/check fail
`%dopar2%` <- foreach::`%dopar%`
# fit the model nboot times over cores cores
# note there is a bit of fiddling here with the progress bar to get it to
# work (updates happen in this loop rather than in bootit)
boot_ests <- foreach::foreach(i=1:nboot,
.combine=rbind.data.frame) %dopar2% {
r <- bootit(dat, our_resamples=our_resamples,
summary_fun=summary_fun, convert.units=convert.units,
pb=list(increment=function(pb){invisible()}))
pb$increment(pb$pb)
r
}
pb$done(pb$pb)
}else{
boot_ests <- replicate(nboot,
bootit(dat, our_resamples,
summary_fun, convert.units=convert.units,
pb=pb), simplify=FALSE)
# the above is then a list of thingos, do the "right" thing and assume
# they are data.frames and then rbind them all together
boot_ests <- do.call(rbind.data.frame, boot_ests)
}
cat("\n")

attr(boot_ests, "nboot") <- nboot
Expand Down
5 changes: 4 additions & 1 deletion man/bootdht.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 comments on commit 9450d68

@erex
Copy link
Member

@erex erex commented on 9450d68 Jun 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure of the situation with the package doMC. Apparently there is not a Windows version on CRAN.

Console capture when invoking bootdht in Windows:

Error in loadNamespace(x) : there is no package called ‘doMC’
> install.packages("doMC")
Installing package into ‘C:/Users/erexs/Documents/R/win-library/4.1’
(as ‘lib’ is unspecified)
Warning in install.packages :
  package ‘doMC’ is not available for this version of R

@dill
Copy link
Contributor

@dill dill commented on 9450d68 Jun 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for letting me know. This is frustrating. Okay, I'll have to rethink how to do this in a platform independent way. Probably using snow.

@erex
Copy link
Member

@erex erex commented on 9450d68 Jun 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to hunt down a Github version, or any conversations about a Windows version, but came up empty. Apparently this is *nix only? If so, I would have expected comments to that effect somewhere.

@erex
Copy link
Member

@erex erex commented on 9450d68 Jun 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess Github Actions came to a similar conclusion about doMC.
image

@dill
Copy link
Contributor

@dill dill commented on 9450d68 Jun 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best I can find is here.

I'll see if I can get snow to work this afternoon adn hopefully that's an easier cross-platform solution.

@lenthomas
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case it's any use: I think the parallel package is the replacement for snow. I've used the doParallel package to facilitate the use of foreach. Something like

library(doParallel) 
cl <- makeCluster(n.threads) 
registerDoParallel(cl)
ests <- foreach (sim = 1:n.sims, .combine = rbind) %dopar% {
  [stuff in here]
}  

I think doParallel works on multiple platforms, as I think I've used it on isbjorn as well as my windows machine. There's also doRNG if you need to preserve random seeds across parallel instances.

@dill
Copy link
Contributor

@dill dill commented on 9450d68 Jun 9, 2021 via email

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.