New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bootdht
could stand a performance boost
#44
Comments
Per @lenthomas suggestion in #75 the bootstrap could be simply parallelised to give a speed-up. |
This was a bit tricky to implement while still including the progress bar. It doesn't seem possible to do this with the R "recommended" package As of 9450d68 you can now specify A quick test on my laptop: library(Distance)
data("Savannah_sparrow_1981")
ss81 <- Savannah_sparrow_1981
cf <- convert_units("meter", NULL, "hectare")
it <- ds(ss81, key="hr", formula = ~Region.Label, convert.units = cf)
what <- dht2(it, flatfile = ss81, convert_units = cf,
strat_formula=~Region.Label,
stratification = "geographical")
print(what, report = "density")
# bootdht, no parallel
system.time(boo <- bootdht(it, flatfile = ss81))
# with parallel?
system.time(boo <- bootdht(it, flatfile = ss81, cores=3)) Timings, without parallelization:
vs. with the parallelization:
Please let me know of any successes/failures with this, including with installation/intial setup. |
Re-opening due to issues listed at 9450d68#commitcomment-51931831 |
@erex @lenthomas 2f8334d moves to the |
No luck I'm afraid. Getting out our old friends the minkes
I can't make much sense of the interactive debugger so I'm not sure where it falls over, here are the last few lines from the console with
I'm leaving this alone for the rest of the day. |
I've tried to fix this, which required a bit more faff but see how that goes. |
Success with 10 bootstraps of the I've stripped out the debris from the Rmarkdown file leaving only the necessaries below
The object created by
Afraid I can't provide any more information. |
Thanks for testing Eric. This is frustrating as again this works fine on my machine. Will test out on other platforms and see if I can see if there's something else going on here. That seems weird though if that's the case and minke works fine for you. |
Yep, frustrating is right. The duiker code is identical to the code that ran back in July 2020 when the case study was compiled. I've rerun minkes with 99 reps. It still runs to conclusion, however the summary report:
0 failures however, when looking at the replicate estimates,
which is a bit odd, because 99 replicates X (2 strata + 1 total) = 297 So all doesn't seem to be completely fine with the minke's either. |
Okay, trying this on bluewhale I get the same error, so there's still a platform-dependent issue here somewhere... |
Some further debugging reveals that this is down to the way the global environment is handled in Windows for
Looking into work-arounds... |
sorry, not able to try your fix to |
Did a quick test using the Duiker code @erex posted above, using 3 cores and doing 9 bootstraps (on my 4-core machine). Results look sensible. I did notice that the progress bar didn't update until it had finished. Tried to find our more about this but it seems not straightforward to remedy, if it is indeed an issue. I found |
Thanks Len. Yes, the progress bar is a pretty tough problem to sort
across platform. It seems to work on Mac but will jump backwards and
forwards by a few % as updates don't come back in order. I think using
progress="progress" on my machine did slightly better though I'm sure
your issue is down to the platform rather than the package used.
I'd seen that combine approach in the first link you sent and if I have
time I'll give it a try. I'll add a note to the documentation about the
progress bar so folks are aware.
…On 15/06/2021 00:56, Len Thomas wrote:
Did a quick test using the Duiker code @erex <https://github.com/erex>
posted above, using 3 cores and doing 9 bootstraps (on my 4-core
machine). Results look sensible. I did notice that the progress bar
didn't update until it had finished. Tried to find our more about this
but it seems not straightforward to remedy, if it is indeed an issue. I
found
https://gist.github.com/kvasilopoulos/d49499ea854541924a8a4cc43a77fed0
<https://gist.github.com/kvasilopoulos/d49499ea854541924a8a4cc43a77fed0>
and
*MailScanner warning: numerical links are often malicious:*
http://5.9.10.113/66604588/showing-progress-bar-with-doparallel-foreach
<http://5.9.10.113/66604588/showing-progress-bar-with-doparallel-foreach>
in case either are helpful.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#44 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAAPIOEBJ6NUJ5TV6XMDW3TS2JLPANCNFSM4KOIIO3A>.
[ { ***@***.***": "http://schema.org", ***@***.***": "EmailMessage",
"potentialAction": { ***@***.***": "ViewAction", "target":
"#44 (comment)",
"url":
"#44 (comment)",
"name": "View Issue" }, "description": "View this Issue on GitHub",
"publisher": { ***@***.***": "Organization", "name": "GitHub", "url":
"https://github.com" } } ]
|
@lenthomas I've just pushed a change that might fix this problem on Windows (has null effect on my Mac). Let me know how you go with it. |
No luck: didn't update until it had finished, and this time I got a warning message also (see below). I suppose it migth be more efficient for you to test fixes on bluewhale than to wait for me/Eric?
|
Okay so switching the backend over to |
The first time I ran it, I got
I guess you need to list the package in the dependencies? Ditto for On the plus side, I'm happy to report that after I installed those packages, it ran and gave a nice progress bar that updated as it went along. Yeah! On the minus, side, after I ran again with 20 reps and 10 cores, and checked the Windows process monitor, I found a whole load of left-over R sessions (see below). I guess you need to clear up the cluster after running - with |
Thanks @lenthomas On dependencies, I had been included these as "suggested" packages and then check with a Thanks for the tip on |
Great, thanks @dill. One other thing that occurs to me is whether it is possible to set a RNG seed in a consistent and reprodcible way in the new parallel code, for the sake of reproducible research, etc. There is a |
Noted and a good idea. I'll add that as a separate issue and see if I have time to work this out when I get back. Running on bluewhale I saw the nodes shutdown so I think that's working now. I'll close again for now. |
Yup, @dill, confirm it works as planned on my desktop machine also. Thanks! |
Tried some code
First time I ran it I got the following message |
See comment above but I'd added these packages as suggested since they will not be needed by every user and FWIW, if you do: remotes::install_github("https://github.com/DistanceDevelopment/Distance", dependencies=TRUE) then all of "Depends", "Imports", "LinkingTo" and "Suggests" will be installed for you. |
Continuing the minke saga... Seems that a replicate failure causes problems. See minke example:
Seems there are the wrong number of elements in the resulting Nhat vector. Consequently, trying to "unshuffle" the stratum-specific estimates might go wrong. Here is a second attempt with 450 replicates:
I can't understand why the odd number of NAs, the model is a pooled detection function. If there were 4 missing for North, there should be the same number missing for South and Total, I would have expected. |
Is this an issue with the parallelization or the bootstrapping code in general? It sounds like this is a problem when doing geo stratification and some bootstrap replicates have no copies of some strata? I'll look into this further tonmorrow, but that is an eventuality I had not considered. Two options if that's the case: (1) |
Combination of |
Unfortunately CRAN will reject |
Okay, so |
Switching to
The ticker will also bounce back and forward a bit because it updates from the progress bar object that was passed, so possibly other jobs will have completed and updated before a given one is able to make its update. Very annoying. |
Of course, this doesn't work in Windows. |
Proposal: submit @lenthomas @erex @LHMarshall: let me know if you have other suggestions. I need to submit |
not that fussed about progress bars, TBH. Usually running bootstrap for reports (i.e. knitting documents) so progress bars are uninformative. |
thanks @erex, in which case I will close here and reopen a new issue for the progress bar. |
I know
Distance
now lives with the CRAN gods. But next release could certainly use some help from parallel processing.Running 99 replicates of the pretty average savannah sparrow data set (150 detections) takes something like 10-15 minutes. Doing a real bootstrap on this data set would take 2 hours.
The multi-model (hn, hr, Fourier) savannah sparrow bootstrap takes effectively the same amount of time.
The text was updated successfully, but these errors were encountered: