`Bootdht` could stand a performance boost #44

erex · 2020-01-31T14:52:30Z

I know Distance now lives with the CRAN gods. But next release could certainly use some help from parallel processing.

Running 99 replicates of the pretty average savannah sparrow data set (150 detections) takes something like 10-15 minutes. Doing a real bootstrap on this data set would take 2 hours.

The multi-model (hn, hr, Fourier) savannah sparrow bootstrap takes effectively the same amount of time.

The text was updated successfully, but these errors were encountered:

dill · 2020-07-31T07:34:00Z

Per @lenthomas suggestion in #75 the bootstrap could be simply parallelised to give a speed-up.

dill · 2021-06-08T16:32:06Z

This was a bit tricky to implement while still including the progress bar. It doesn't seem possible to do this with the R "recommended" package parallel at the moment (meaning no additional dependencies). The best solution I could find was using foreach.

As of 9450d68 you can now specify cores= and if cores >1 it'll use the doMC backend to run your bootstrap over multiple cores. In future we could do something fancier here and allow user-specified backends (e.g., using snow) but I thought this would cover many use cases (e.g., people with laptops with multiple cores).

A quick test on my laptop:

library(Distance)
data("Savannah_sparrow_1981")
ss81 <- Savannah_sparrow_1981
cf <- convert_units("meter", NULL, "hectare")
it <- ds(ss81, key="hr", formula = ~Region.Label, convert.units = cf)
what <- dht2(it, flatfile = ss81, convert_units = cf,
             strat_formula=~Region.Label,
             stratification = "geographical")
print(what, report = "density")


# bootdht, no parallel
system.time(boo <- bootdht(it, flatfile = ss81))

# with parallel?
system.time(boo <- bootdht(it, flatfile = ss81, cores=3))

Timings, without parallelization:

   user  system elapsed 
400.198   3.572 404.495

vs. with the parallelization:

   user  system elapsed 
429.654   4.747 150.935

Please let me know of any successes/failures with this, including with installation/intial setup.

dill · 2021-06-09T14:38:12Z

Re-opening due to issues listed at 9450d68#commitcomment-51931831

dill · 2021-06-09T14:45:36Z

@erex @lenthomas 2f8334d moves to the doParallel backend, hopefully this works on non-unix systems now.

erex · 2021-06-09T16:16:50Z

No luck I'm afraid.

Getting out our old friends the minkes

library(Distance)
data("minke")
easy <- ds(data=minke, key="hr", truncation=1.5)
easyboot <- bootdht(model=easy, flatfile=minke, nboot=10, cores = 3)

Performing 10 bootstraps
  |                                                                                                |   0%
Error in { : task 1 failed - "object 'models' not found"

I can't make much sense of the interactive debugger so I'm not sure where it falls over, here are the last few lines from the console with debug(bootdht)

Browse[2]> n
debug: cl <- parallel::makeCluster(cores)
Browse[2]> n
debug: doParallel::registerDoParallel()
Browse[2]> n
debug: `%dopar2%` <- foreach::`%dopar%`
Browse[2]> n
debug: boot_ests <- foreach::foreach(i = 1:nboot, .combine = rbind.data.frame) %dopar2% 
    {
        r <- bootit(dat, our_resamples = our_resamples, summary_fun = summary_fun, 
            convert.units = convert.units, pb = list(increment = function(pb) {
                invisible()
            }))
        pb$increment(pb$pb)
        r
    }
Browse[2]> n
Error in { : task 1 failed - "object 'models' not found"

I'm leaving this alone for the rest of the day.

dill · 2021-06-09T20:39:07Z

I've tried to fix this, which required a bit more faff but see how that goes.

erex · 2021-06-10T06:33:12Z

Success with 10 bootstraps of the minke data set 🎉. A more serious test comes with bootstrapping the Howe camera trap analysis from the case study web page. That analysis produces no results, but does not report an error.

I've stripped out the debris from the Rmarkdown file leaving only the necessaries below

## ---- readin, message=FALSE----------------------------------------------------------------
library(Distance)
data("DuikerCameraTraps")
## ----fit-----------------------------------------------------------------------------------
conversion <- convert_units("meter", NULL, "square kilometer")
trunc.list <- list(left=2, right=15)
mybreaks <- c(seq(2,8,1), 10, 12, 15)
hr0 <- ds(DuikerCameraTraps, transect = "point", key="hr", adjustment = NULL,
          cutpoints = mybreaks, truncation = trunc.list)

## ---- sampfrac-----------------------------------------------------------------------------
viewangle <- 42 # degrees
samfrac <- viewangle / 360
conversion <- convert_units("meter", NULL, "square kilometer")
peak.hr.dens <- dht2(hr0, flatfile=DuikerCameraTraps, strat_formula = ~1,
                     sample_fraction = samfrac, er_est = "P2", convert_units = conversion)
print(peak.hr.dens, report="density")

## ---- bootstrap, results='hide', eval=TRUE-------------------------------------------------
mysummary <- function(ests, fit){
  return(data.frame(Dhat = ests$individuals$D$Estimate))
}
duiker.boot.hr <- bootdht(model=hr0, flatfile=DuikerCameraTraps, resample_transects = TRUE,
                       nboot=99, summary_fun=mysummary, sample_fraction = samfrac,
                       convert.units = conversion, cores = 2)

## ----bootresult, eval=TRUE-----------------------------------------------------------------
print(summary(duiker.boot.hr))

The object created by bootdht fills with NAs and the execution time of the bootstrap is nearly instantaneous. Structure of bootdht result is odd:

> str(duiker.boot.hr)
List of 1
 $ c.NA..NA..NA..NA..NA..NA..NA..NA..NA..NA..NA..NA..NA..NA..NA..: logi [1:99] NA NA NA NA NA NA ...
 - attr(*, "class")= chr "dht_bootstrap"
 - attr(*, "row.names")= int [1:99] 1 2 3 4 5 6 7 8 9 10 ...
 - attr(*, "nboot")= num 99
 - attr(*, "nbootfail")= num 0

Afraid I can't provide any more information.

dill · 2021-06-10T09:38:28Z

Thanks for testing Eric.

This is frustrating as again this works fine on my machine. Will test out on other platforms and see if I can see if there's something else going on here. That seems weird though if that's the case and minke works fine for you.

erex · 2021-06-10T10:06:37Z

Yep, frustrating is right. The duiker code is identical to the code that ran back in July 2020 when the case study was compiled.

I've rerun minkes with 99 reps. It still runs to conclusion, however the summary report:

> summary(easyboot)
Bootstrap results

Boostraps          : 99 
Successes          : 99 
Failures           : 0 

       median     mean       se     lcl      ucl   cv
Nhat 11448.12 13935.86 19646.03 2325.94 46261.87 1.72

0 failures however, when looking at the replicate estimates,

> sum(is.na(easyboot$Nhat))
[1] 1
> length(easyboot$Nhat)
[1] 295

which is a bit odd, because 99 replicates X (2 strata + 1 total) = 297

So all doesn't seem to be completely fine with the minke's either.

dill · 2021-06-14T11:03:47Z

Okay, trying this on bluewhale I get the same error, so there's still a platform-dependent issue here somewhere...

dill · 2021-06-14T11:41:04Z

Some further debugging reveals that this is down to the way the global environment is handled in Windows for doParallel. I found the error was:

[1] "Error in get_truncation(truncation, cutpoints, data) : \n  object 'trunc.list' not found\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in get_truncation(truncation, cutpoints, data): object 'trunc.list' not found>

Looking into work-arounds...

dill · 2021-06-14T12:04:46Z

can you try 6fa2826 @erex, I think that fixes your issue (works on bluewhale)

erex · 2021-06-14T18:36:29Z

sorry, not able to try your fix to bootdht as I'm travelling with a Chromebook not a Windows machine. Won't be sitting in front of my home computer until late in the day 18June

lenthomas · 2021-06-14T23:56:27Z

Did a quick test using the Duiker code @erex posted above, using 3 cores and doing 9 bootstraps (on my 4-core machine). Results look sensible. I did notice that the progress bar didn't update until it had finished. Tried to find our more about this but it seems not straightforward to remedy, if it is indeed an issue. I found
https://gist.github.com/kvasilopoulos/d49499ea854541924a8a4cc43a77fed0
and
http://5.9.10.113/66604588/showing-progress-bar-with-doparallel-foreach
in case either are helpful.

dill · 2021-06-15T07:39:28Z

Thanks Len. Yes, the progress bar is a pretty tough problem to sort across platform. It seems to work on Mac but will jump backwards and forwards by a few % as updates don't come back in order. I think using progress="progress" on my machine did slightly better though I'm sure your issue is down to the platform rather than the package used. I'd seen that combine approach in the first link you sent and if I have time I'll give it a try. I'll add a note to the documentation about the progress bar so folks are aware.

…

On 15/06/2021 00:56, Len Thomas wrote: Did a quick test using the Duiker code @erex <https://github.com/erex> posted above, using 3 cores and doing 9 bootstraps (on my 4-core machine). Results look sensible. I did notice that the progress bar didn't update until it had finished. Tried to find our more about this but it seems not straightforward to remedy, if it is indeed an issue. I found https://gist.github.com/kvasilopoulos/d49499ea854541924a8a4cc43a77fed0 <https://gist.github.com/kvasilopoulos/d49499ea854541924a8a4cc43a77fed0> and *MailScanner warning: numerical links are often malicious:* http://5.9.10.113/66604588/showing-progress-bar-with-doparallel-foreach <http://5.9.10.113/66604588/showing-progress-bar-with-doparallel-foreach> in case either are helpful. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#44 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAAPIOEBJ6NUJ5TV6XMDW3TS2JLPANCNFSM4KOIIO3A>. [ { ***@***.***": "http://schema.org", ***@***.***": "EmailMessage", "potentialAction": { ***@***.***": "ViewAction", "target": "#44 (comment)", "url": "#44 (comment)", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { ***@***.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

dill · 2021-06-15T08:01:53Z

@lenthomas I've just pushed a change that might fix this problem on Windows (has null effect on my Mac). Let me know how you go with it.

lenthomas · 2021-06-15T14:26:25Z

No luck: didn't update until it had finished, and this time I got a warning message also (see below). I suppose it migth be more efficient for you to test fixes on bluewhale than to wait for me/Eric?

> duiker.boot.hr <- bootdht(model=hr0, flatfile=DuikerCameraTraps, resample_transects = TRUE,
+                           nboot=9, summary_fun=mysummary, sample_fraction = samfrac,
+                           convert.units = conversion, cores = 3)
Performing 9 bootstraps
  |===========================================================| 100%
Warning message:
In e$fun(obj, substitute(ex), parent.frame(), e$data) :
  already exporting variable(s): pb

dill · 2021-06-15T21:33:53Z

Okay so switching the backend over to doSNOW seems to solve this issue (working on my laptop and bluewhale with both progress bar frameworks for me at least). Closing now but please re-open if you have issues.

lenthomas · 2021-06-16T18:07:45Z

The first time I ran it, I got

Error in loadNamespace(name) : there is no package called ‘snow’

I guess you need to list the package in the dependencies? Ditto for doSNOW!

On the plus side, I'm happy to report that after I installed those packages, it ran and gave a nice progress bar that updated as it went along. Yeah!

On the minus, side, after I ran again with 20 reps and 10 cores, and checked the Windows process monitor, I found a whole load of left-over R sessions (see below). I guess you need to clear up the cluster after running - with stopCluster() or some such (I have not used snow so very likely have the suggested solution incorrect).

dill · 2021-06-16T18:35:11Z

Thanks @lenthomas

On dependencies, I had been included these as "suggested" packages and then check with a requireNamespace() in bootdht when cores>1 to avoid installing too many packages on installation. I hadn't added snow but have corrected that.

Thanks for the tip on stopCluster(), this seems to be handled differently on Mac. I'll try that out on bluewhale after dinner.

lenthomas · 2021-06-16T18:47:45Z

Great, thanks @dill. One other thing that occurs to me is whether it is possible to set a RNG seed in a consistent and reprodcible way in the new parallel code, for the sake of reproducible research, etc. There is a doRNG package for this in the context of foreach; I expect there are other ways to do it also, probably. If not time to do now, perhaps raise as an issue for the next release?

dill · 2021-06-16T19:20:25Z

Noted and a good idea. I'll add that as a separate issue and see if I have time to work this out when I get back.

Running on bluewhale I saw the nodes shutdown so I think that's working now. I'll close again for now.

lenthomas · 2021-06-16T19:38:45Z

Yup, @dill, confirm it works as planned on my desktop machine also. Thanks!

lenthomas · 2021-06-25T21:51:12Z

Tried some code

remotes::install_github("https://github.com/DistanceDevelopment/Distance")

library(Distance)
data("minke")
easy <- ds(data=minke, key="hr", truncation=1.5)
easyboot <- bootdht(model=easy, flatfile=minke, nboot=99, cores = 3)
summary(easyboot)
str(easyboot)

First time I ran it I got the following message
mespace(x) : there is no package called ‘snow’
and second time, after installing snow I got
mespace(x) : there is no package called ‘doSNOW’
So I guess these need added to the list of requirements?

dill · 2021-06-28T08:59:53Z

See comment above but I'd added these packages as suggested since they will not be needed by every user and bootdht will run with cores=1 without them.

FWIW, if you do:

remotes::install_github("https://github.com/DistanceDevelopment/Distance", dependencies=TRUE)

then all of "Depends", "Imports", "LinkingTo" and "Suggests" will be installed for you.

erex · 2021-06-28T12:37:13Z

Continuing the minke saga... Seems that a replicate failure causes problems. See minke example:

library(Distance)
data("minke")
easy <- ds(data=minke, key="hr", truncation=1.5)
easyboot <- bootdht(model=easy, flatfile=minke, nboot=99, cores = 3)

nindex <- seq(1, length(easyboot$Nhat), by=3)
sindex <- seq(2, length(easyboot$Nhat), by=3)
totindex <- seq(3, length(easyboot$Nhat), by=3)

> summary(easyboot)
Bootstrap results

Boostraps          : 99 
Successes          : 98 
Failures           : 1 

       median     mean       se     lcl      ucl   cv
Nhat 11761.19 13000.87 15163.89 2224.63 33617.11 1.29    # of course this summary is not useful as strata are not recognised

> str(easyboot)
List of 1
 $ Nhat: num [1:292] 10663 2556 13220 11761 5989 ...
 
 ** 292 elements seems odd; 99*3=297, 98*3=294, why are there 292 elements?
 > summary(easyboot$Nhat[nindex])
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   9263   13138   15739   18780   18823  198211       1 
> summary(easyboot$Nhat[sindex])
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1724    7484   11464   12575   14194  123824 
> summary(easyboot$Nhat[totindex])
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1226    3206    4048    7648    7415   74386

Seems there are the wrong number of elements in the resulting Nhat vector. Consequently, trying to "unshuffle" the stratum-specific estimates might go wrong.

Here is a second attempt with 450 replicates:

bob3 <- function(ests, fit) {
  return(list(#data.frame(params=fit$par),
              data.frame(north=ests$individuals$N$Estimate[1]),
              data.frame(south=ests$individuals$N$Estimate[2]),
              data.frame(total=ests$individuals$N$Estimate[3])))
}

easyboot <- bootdht(model=easy, flatfile=minke, nboot=450, cores = 3, 
                    summary_fun = bob3, progress="progress")

tmp <- sapply(easyboot, FUN=sum, simplify=TRUE)
north <- tmp[1:450]
south <- tmp[451:900]
total <- tmp[901:1350]
summary(north)
summary(south)
summary(total)
> summary(north)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   2770   10274   12601   18902   15640  796622       4 
> summary(south)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
   617.4   3085.8   3839.7   7783.2   4862.5 611410.3        4 
> summary(total)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   3991   14057   16525   26690   19915 1408032       7

I can't understand why the odd number of NAs, the model is a pooled detection function. If there were 4 missing for North, there should be the same number missing for South and Total, I would have expected.

dill · 2021-06-29T16:42:45Z

Is this an issue with the parallelization or the bootstrapping code in general? It sounds like this is a problem when doing geo stratification and some bootstrap replicates have no copies of some strata? I'll look into this further tonmorrow, but that is an eventuality I had not considered. Two options if that's the case: (1) bootdht needs to know about strata or (2) more sophisticated error handling in the summary function.

(for bootdht) addresses #44

dill · 2021-06-30T13:08:36Z

Combination of mrds 2.2.4.9005 and 549cbba should work-around this issue. Running your examples now to test but that might take a while.

dill · 2021-06-30T14:47:28Z

Unfortunately CRAN will reject Distance as-is because doSNOW and snow are "superceeded". Looking into this now...

dill · 2021-06-30T14:50:54Z

Okay, so snow/doSNOW are superceeded by parallel/doParallel (see here), which is what I had originally used sigh.

dill · 2021-06-30T16:13:54Z

Switching to parallel/doParallel done, but this means that that progress bar will not work very well. Restricting to the base progress bar and using the hack found here I can get something working. Unfortunately this leads to other additional output, e.g.,

> boo <- bootdht(it, flatfile = ss81, cores=3, nboot=10, summary_fun=fn)
Performing 10 bootstraps
  |                                                                      |   0%s
tarting worker pid=56057 on localhost:11375 at 17:06:25.410
starting worker pid=56059 on localhost:11375 at 17:06:25.409
starting worker pid=56058 on localhost:11375 at 17:06:25.409
Loading required package: Distance
Loading required package: mrds
This is mrds 2.2.4.9004
Built: R 4.1.0; ; 2021-06-30 12:57:25 UTC; unix

Attaching package: 'Distance'

The following object is masked from 'package:mrds':

    create.bins

loaded Distance and set parent environment
Loading required package: Distance
Loading required package: mrds
This is mrds 2.2.4.9004
Built: R 4.1.0; ; 2021-06-30 12:57:25 UTC; unix

Attaching package: 'Distance'

The following object is masked from 'package:mrds':

    create.bins

loaded Distance and set parent environment
Loading required package: Distance
Loading required package: mrds
This is mrds 2.2.4.9004
Built: R 4.1.0; ; 2021-06-30 12:57:25 UTC; unix

Attaching package: 'Distance'

The following object is masked from 'package:mrds':

    create.bins

loaded Distance and set parent environment
  |======================================================================| 100%
>

The ticker will also bounce back and forward a bit because it updates from the progress bar object that was passed, so possibly other jobs will have completed and updated before a given one is able to make its update. Very annoying.

dill · 2021-06-30T16:29:26Z

Of course, this doesn't work in Windows.

dill · 2021-06-30T16:32:51Z

Proposal: submit Distance 1.0.3 to CRAN with parallel support but without progress bars (progress bars still work in non-parallel settings). Push progress bars for parallel bootstrapping to 1.0.4.

@lenthomas @erex @LHMarshall: let me know if you have other suggestions. I need to submit mrds first, so I will not submit Distance immediately.

erex · 2021-06-30T16:37:10Z

not that fussed about progress bars, TBH. Usually running bootstrap for reports (i.e. knitting documents) so progress bars are uninformative.

dill · 2021-07-01T14:53:07Z

thanks @erex, in which case I will close here and reopen a new issue for the progress bar.

erex added the enhancement label Jan 31, 2020

dill added the mrds problems likely with mrds rather than Distance label May 13, 2020

dill added this to the 1.0.2 release milestone Jul 13, 2020

dill modified the milestones: 1.0.3 release, 1.0.4 release Sep 7, 2020

dill self-assigned this Jan 12, 2021

dill closed this as completed in 9450d68 Jun 8, 2021

dill reopened this Jun 9, 2021

dill pushed a commit that referenced this issue Jun 9, 2021

pass models to bootit, #44

d5aa7ae

dill pushed a commit that referenced this issue Jun 14, 2021

correct failure counting. partially addresses #44

4b128ab

dill pushed a commit that referenced this issue Jun 14, 2021

avoid falling back to environment, possible fix to #44

6fa2826

dill pushed a commit that referenced this issue Jun 15, 2021

add extra documentation about parallelization for bootdht, per #44

58e9af4

dill pushed a commit that referenced this issue Jun 15, 2021

try exporting progress bar to get it to work on Windows #44

f86599d

dill pushed a commit that referenced this issue Jun 15, 2021

fix crash-on-completion for progress #44

fcf6121

dill closed this as completed Jun 15, 2021

lenthomas reopened this Jun 16, 2021

dill pushed a commit that referenced this issue Jun 16, 2021

suggest/requireNamespace snow, see #44

e4b6320

dill pushed a commit that referenced this issue Jun 16, 2021

stop the cluster! see #44

e9e2fbe

dill closed this as completed Jun 16, 2021

dill mentioned this issue Jun 16, 2021

bootdht parallel random number generation seed and replication #94

Closed

lenthomas reopened this Jun 25, 2021

dill pushed a commit that referenced this issue Jun 30, 2021

enable setting se arg to dht() to turn off calculation of uncertainty

549cbba

(for bootdht) addresses #44

dill pushed a commit that referenced this issue Jun 30, 2021

remvoe progress bars when cores > 1 #44

cd853c8

dill closed this as completed Jul 1, 2021

dill mentioned this issue Jul 1, 2021

Parallel bootstrap progress bar #97

Open

Bootdht could stand a performance boost #44

Bootdht could stand a performance boost #44

Comments

erex commented Jan 31, 2020

dill commented Jul 31, 2020

dill commented Jun 8, 2021

dill commented Jun 9, 2021

dill commented Jun 9, 2021

erex commented Jun 9, 2021

dill commented Jun 9, 2021

erex commented Jun 10, 2021

dill commented Jun 10, 2021

erex commented Jun 10, 2021

dill commented Jun 14, 2021

dill commented Jun 14, 2021

dill commented Jun 14, 2021

erex commented Jun 14, 2021

lenthomas commented Jun 14, 2021

dill commented Jun 15, 2021 via email

dill commented Jun 15, 2021

lenthomas commented Jun 15, 2021

dill commented Jun 15, 2021

lenthomas commented Jun 16, 2021

dill commented Jun 16, 2021

lenthomas commented Jun 16, 2021

dill commented Jun 16, 2021

lenthomas commented Jun 16, 2021

lenthomas commented Jun 25, 2021

dill commented Jun 28, 2021

erex commented Jun 28, 2021

dill commented Jun 29, 2021

dill commented Jun 30, 2021

dill commented Jun 30, 2021

dill commented Jun 30, 2021

dill commented Jun 30, 2021

dill commented Jun 30, 2021

dill commented Jun 30, 2021

erex commented Jun 30, 2021

dill commented Jul 1, 2021

`Bootdht` could stand a performance boost #44

`Bootdht` could stand a performance boost #44