bootdht fails when cores > 1 and model object depends on objects in global environment #168

lenthomas · 2023-04-04T23:11:11Z

Distance 1.0.7, R4.2.3

bootdht does not seem to pass variables from the global environment into the environments used when cores > 1. This is likely expected behaviour, but it causes problems for the code below, so perhaps something code be done to either facilitate passing the required objects, or raising an error message.

The scenario is that one fitted ds object is used to provide starting values for a second ds object, and this second ds object is passed in to bootdht as the model:

#Import data
DuikerCameraTraps <- read.csv(file="https://datadryad.org/stash/downloads/file_stream/73221", 
                              header=TRUE, sep="\t")
DuikerCameraTraps$object <- NA
DuikerCameraTraps$object[!is.na(DuikerCameraTraps$distance)] <- 1:sum(!is.na(DuikerCameraTraps$distance))
DuikerCameraTraps$Area <- DuikerCameraTraps$Area / (1000*1000)

#Fit detection function
library(Distance)
trunc.list <- list(left=2, right=15)
mybreaks <- c(2, 3, 4, 5, 6, 7, 8, 10, 12, 15)
#Note uni3 uses uni2 as starting values
uni2 <- ds(DuikerCameraTraps, transect = "point", key="unif", adjustment = "cos",
           nadj=2, cutpoints = mybreaks, truncation = trunc.list)
uni3 <- ds(DuikerCameraTraps, transect = "point", key="unif", adjustment = "cos",
           nadj=3, cutpoints = mybreaks, truncation = trunc.list, 
           initial_values = list(adjustment = c(as.numeric(uni2$ddf$par), 0)))

#Get bootstrap estimate of variance
#Summary statistic
mysummary <- function(ests, fit){
  return(data.frame(Label = ests$individuals$D$Label,
                    Dhat = ests$individuals$D$Estimate))
}
n.boot <- 3
#Single thread - 3 successes
#Note you get warnings, which it would be better if were turned off or
# stored, but not a big deal
set.seed(1235561)
daytime.boot.uni3 <- bootdht(model = uni3, flatfile = DuikerCameraTraps,
                         resample_transects = TRUE, nboot = n.boot, cores = 1,
                         summary_fun = mysummary)
daytime.boot.uni3

#Multi thread - all reps fail
set.seed(1235561)
daytime.boot.uni3.par <- bootdht(model = uni3, flatfile = DuikerCameraTraps,
                             resample_transects = TRUE, nboot = n.boot, cores = 3,
                             summary_fun = mysummary)
daytime.boot.uni3.par

When a version is run where start values are not set, it runs fine:

#Do a multi-thread run without setting start values - 3 successes
set.seed(1235561)
uni3.ownstart <- ds(DuikerCameraTraps, transect = "point", key="unif", adjustment = "cos",
           nadj=3, cutpoints = mybreaks, truncation = trunc.list)
daytime.boot.uni3.ownstart <- bootdht(model = uni3.ownstart, flatfile = DuikerCameraTraps,
                                 resample_transects = TRUE, nboot = n.boot, cores = 3,
                                 summary_fun = mysummary)
daytime.boot.uni3.ownstart

So the problem appears to be that uni3 depends on uni2 for its start values, and this causes an error inside the bootstrap when cores >1.

In the code below, I do a further check that it's not a problem with setting start values per se, by providing some manual start values.

#The above run is quite slow, so start hard-coding the uni2 starting values
print (c(as.numeric(uni2$ddf$par), 0))
set.seed(1235561)
uni3.hardstart <- ds(DuikerCameraTraps, transect = "point", key="unif", adjustment = "cos",
                    nadj=3, cutpoints = mybreaks, truncation = trunc.list,
                    initial_values = list(adjustment = c(0.97178179, 0.03541634, 0)))
daytime.boot.uni3.hardstart <- bootdht(model = uni3.hardstart, flatfile = DuikerCameraTraps,
                                      resample_transects = TRUE, nboot = n.boot, cores = 3,
                                      summary_fun = mysummary)
daytime.boot.uni3.hardstart

#Even faster if you use the uni3 values as starting values
print (as.numeric(uni3$ddf$par))
set.seed(1235561)
uni3.hardstart2 <- ds(DuikerCameraTraps, transect = "point", key="unif", adjustment = "cos",
                     nadj=3, cutpoints = mybreaks, truncation = trunc.list,
                     initial_values = list(adjustment = c(0.93541257, -0.05304261, -0.08043369)))
daytime.boot.uni3.hardstart2 <- bootdht(model = uni3.hardstart2, flatfile = DuikerCameraTraps,
                                       resample_transects = TRUE, nboot = n.boot, cores = 3,
                                       summary_fun = mysummary)
daytime.boot.uni3.hardstart2

The text was updated successfully, but these errors were encountered:

lenthomas · 2023-11-24T10:53:25Z

@LHMarshall to look into how parallelization is done in dht vs dims and potentially change it. Also look briefly into other methods for getting objects passed through. If not fixable relatively quickly then simply document this issue in the bootdht help and leave this as a future enhancement.

LHMarshall · 2023-12-20T10:37:20Z

Check to see if eval/parse functions can force these starting values to be created within bootdht and see if this is a way to ensure that they are passed in to the parallel processes.

lenthomas · 2023-12-20T21:40:10Z

Hi @LHMarshall, regarding an update to the documentation for bootdht, we currently say (under "Parallelization"):

It is also hard to debug any issues in summary_fun so it is best to run a small number of bootstraps first in parallel to check that things work. On Windows systems summary_fun does not have access to the global environment when running in parallel, so all computations must be made using only its ests and fit arguments (i.e., you can not use R objects from elsewhere in that function, even if they are available to you from the console).

This implies summary_fun does have access to the global environment when running in non-Windows OSs. Is that actually correct?

Anyway, here's an additional suggested paragraph, to go after the one above (i.e., still under parallelization):

Another consequence of the global environment being unavailable inside parallel bootstraps is that any starting values in the model object passed in to bootdht must be hard coded (otherwise you get back 0 successful bootstraps). For a worked example showing this, see the camera trap distance sampling online example.

lenthomas added the enhancement label Apr 4, 2023

LHMarshall added this to the Distance 1.0.9 milestone Dec 5, 2023

LHMarshall self-assigned this Dec 5, 2023

LHMarshall added a commit that referenced this issue Dec 21, 2023

Documentation #168

4f5dbd5

LHMarshall modified the milestones: Distance 1.0.9, Distance 1.0.10 Dec 21, 2023

LHMarshall added a commit that referenced this issue Dec 22, 2023

Documentation #168

189c6fa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bootdht fails when cores > 1 and model object depends on objects in global environment #168

bootdht fails when cores > 1 and model object depends on objects in global environment #168

lenthomas commented Apr 4, 2023

lenthomas commented Nov 24, 2023

LHMarshall commented Dec 20, 2023

lenthomas commented Dec 20, 2023

bootdht fails when cores > 1 and model object depends on objects in global environment #168

bootdht fails when cores > 1 and model object depends on objects in global environment #168

Comments

lenthomas commented Apr 4, 2023

lenthomas commented Nov 24, 2023

LHMarshall commented Dec 20, 2023

lenthomas commented Dec 20, 2023