New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mrgsim in nested futures/parallel settings #1178
Comments
Hi @MahmAbdelwahab - A couple of thoughts:
mod <- mread(..., soloc = "build-dir") This makes sure to build the model locally rather than in the temporary directory specific to your (main) R session.
So if you load and cache the model locally prior to starting the parallel job mod <- mread_cache(..., soloc = "build-dir") Then for each chunk it's the same, but when you do this on the chunk, it's a quick read ... no complie: mod <- mread_cache(..., soloc = "build-dir") Could you let me know what happens. If it's still not working, please email me at my github email address and we can meet on zoom to look at this. Kyle |
Hello @kylebaron Setting many thanks for your help! Best, Mahmoud |
Thanks for reporting back, @MahmAbdelwahab and glad it got resolved. Wondering if you'd be willing to share the relevant parts of your setup? I've done this a long time ago with future batchtools on sge but it got unstable on our system. It sounds like your setup is working well outside of the multisession issue. Kyle |
Hello @kylebaron, Here's the relevant parts of the setup, I will try to post a full example later if needed. # setting up slurm plan
slurm <- future::tweak(future.batchtools::batchtools_slurm,
template = system.file("templates/slurm-simple.tmpl", package = "batchtools"),
workers = 2,
resources = list(
partition = "general",
walltime = 60 * 5,
ncpus = 4
)
)
nsims <- 1E6 # number of simulated patients/profiles
# chunking the nsims
# function taken from https://cran.r-project.org/web/packages/bhmbasket/bhmbasket.pdf (used internally)
# bhmbasket:::chunkVector
chunkVector <- function(x, n_chunks) {
if (n_chunks <= 1) {
chunk_list <- list(x)
} else {
chunk_list <- unname(split(x, cut(seq_along(x), n_chunks, labels = FALSE)))
}
return(chunk_list)
}
set.seed() # seed needs to be set outside the foreach call
plan(list(slurm, callr))
# plan(list(slurm, multisession)) # ran into some issues with loading model object in the worker node
registerDoFuture()
chunk_outer <- chunkVector(seq_len(Ntasks), getDoParWorkers())
sim_results <-
foreach(k = chunk_outer, .combine = c) %dorng% { # uses slurm plan
chunk_inner <- chunkVector(k, getDoParWorkers())
foreach(j = chunk_inner, .combine = c) %dorng% { # uses multisession/callr plan
lapply(j, function(x) {
sim_chunk <- expand.ev(
ID = x,
dose =
amt =
ii = ii,
)
mrgsim(mod, sim_chunk) %>% ..
})
}
}
Additionally, you can wrap the whole foreach(s) code block with Mahmoud |
Hello everyone,
I am setting up a big simulation workflow and I am making use of HPC cluster to submit the jobs. the workflow is a follow:
What I noticed is that with the above steps/workflow I get th following error :
MultisessionFuture (doFuture2-1) failed to receive message results from cluster RichSOCKnode #1 (PID 14433 on localhost ‘localhost’). The reason reported was ‘error reading from connection’. Post-mortem diagnostic: No process exists with this PID, i.e. the localhost worker is no longer alive. The total size of the 4 globals exported is 396.99 KiB. The three largest globals are ‘modList’ (382.41 KiB of class ‘list’), ‘...future.x_ii’ (7.86 KiB of class ‘list’) and ‘makeEventDataset’ (6.18 KiB of class ‘function’) Calls: %dofuture% -> doFuture2
if I move the model code into the innermost foreach loop (Compile the model for each chunk) the workflow works fine
or when using future_mrgsim_d (setting nchunk to 1, but maybe that's not an issue ).
any idea for that behavior ?
best,
Mahmoud
The text was updated successfully, but these errors were encountered: