r_parallel_hpc.rmd

---
title: "managing R on HPC systems"
author: "Ben Bolker and Jennifer Freeman"
date: '`r format(Sys.time(), "%d %b %Y")`'
---

In general, R scripts can be run just like any other kind of program on an HPC (high-performance computing) system. However, there are a few peculiarities. This document compiles some helpful practices; it should be useful to people who are familiar with R but unfamiliar to HPC, and the reverse.

Some of these instructions will be specific to Compute Canada ca. 2022, and particularly to the Graham cluster.

There is also useful information at the [Digital Research Alliance of Canada wiki](https://docs.alliancecan.ca/wiki/R) ([en français aussi](https://docs.alliancecan.ca/wiki/R/fr).

I assume that you're slightly familiar with HPC machinery (i.e. you've taken the Compute Canada orientation session, know how to use `sbatch`/`squeue`/etc. to work with the batch scheduler)

Below, "batch mode" means running R code from an R script rather than starting R and typing commands at the prompt (i.e. "interactive mode"); "on a worker" means running a batch-mode script via the SLURM scheduler (i.e. using `sbatch`) rather than in a terminal session on the head node. Commands to be run within R will use an `R>` prompt, those to be run in the shell will use `sh>`.

## running scripts in batch mode

Given an R script stored in a `.R` file, there are a few ways to run it in batch mode:

- `r`: an improved batch-R version by Dirk Eddelbuettel. You can install it by installing the `littler` package from CRAN (see "Installing Packages" below) and running
    ```{bash littler_setup, eval = FALSE}
    mkdir ~/bin
    cd ~/bin
    ln -s ~/R/x86_64-pc-linux-gnu-library/4.1/littler/bin/r
    ```
    in the shell. (You may need to adjust the path name for your R version.)
- `Rscript <filename>`: by default, output will be printed to the standard output (which will end up in your `.log` file)
     - one weirdness of `Rscript` is that it does not load the `methods` package by default, which may occasionally surprise you - if your script directly or indirectly uses stuff from `methods` you need to load it explicitly with `library("methods")`
- `R CMD BATCH <filename>`: this is similar but automatically sends output to `<filename>.out`
- [This StackOverflow question](https://stackoverflow.com/questions/21969145/why-or-when-is-rscript-or-littler-better-than-r-cmd-batch) says that `r` > `Rscript` > `R CMD BATCH` (according to the author of `r` ...)

## loading modules

R is often missing from the set of programs that is available by default on HPC systems. Most HPC systems use the `module` command to make different programs, and different versions of those programs, available for your use.

- If you try to run `R` in interactive mode on Graham, the system will pop up a long list of possible modules to load and ask you which one you want. The first choice (currently `r/4.1.2` ) is the default, and generally the best option.
- You can make this happen manually by typing `module load r/4.1.2` at the shell prompt.
- if you try to run R in batch mode without loading the module first, *or* if you try to run R on a worker, you'll get an error. In order to run R on a worker you should add `module load r/4.1.2` to your batch script.
- From time to time you may find that your scripts require other modules ...

## installing packages

- if done in batch mode, need to specify the repository, i.e. something like `options(repos =  c(CRAN = "https://cloud.r-project.org"))` (this is a safe default value)
- in order to install your own packages you need to have created and specified a user-writable directory. If you are working interactively the first time you try to install packages *for a particular version of R*, R will prompt you for whether you want to create such a directory (yes) and where to put it (the default is `~/R/x86_64-pc-linux-gnu-library/<R-version>`). (If you are in batch mode you'll get an error.)
- It's probably easiest to install packages from the head node, either by running `install.packages("<pkg>"` in an interactive R session, or by running an R script that defines and installs a long list of packages, e.g.
    ```r
    pkgs <- c("broom", "mvtnorm", <...>)
    install.packages(pkgs)
    ```
    It's generally OK to run *short* (<10 minutes) interactive jobs like this in interactive mode, on the head node.
- The main reason to do package installation on the head node is that worker nodes don't have network access, so you won't be able to download packages from CRAN. If absolutely necessary you can work around this by downloading tarballs from CRAN  (onto the head node, or onto your own machine and then copying them to SHARCnet), but this can be really annoying because you will also have to download tarballs for all of the dependencies of the package you want, and install them in the right order - when you install directly from CRAN this all gets handled automatically. If you have downloaded a package tarball `mypkg_0.1.5.tar.gz`, use `install.packages("mypkg_0.1.5.tar.gz", repos=NULL)` from within R or `R CMD INSTALL mypkg_0.1.5.tar.gz` from the shell to install it.
- if you really do want to install packages in batch mode, we can probably put together some machinery using
    - https://stackoverflow.com/a/40391302/190277


## running R jobs via job array

- A [job array](https://docs.alliancecan.ca/wiki/Job_arrays) is the preferred method for submitting multiple batch jobs
- The number of jobs and the indices used in the job array are restricted by the slurm configuration variable `MAX_ARRAY_SIZE`. This means even if you use steps in your array indices ex. `--array=0-20000:10000` where the number of jobs is only 3, the job array will not run because the maximum index (20000) is larger than `MAX_ARRAY_SIZE`-1. Run something like
`$scontrol show config | grep -E 'MaxArraySize|MaxJobCount'` to determine SLURM configuration. ([slurm job array support](https://slurm.schedmd.com/job_array.html))
- To use job arrays effectively with R scripts, you need to know how to use `commandArgs()` to read command-line arguments from within an R script. For example, if `batch.R` contains:

```r=
cc <- commandArgs(trailingOnly  = TRUE)
intarg <- as.integer(cc[1])
chararg <- cc[2]
cat(sprintf("int arg = %d, char arg = %s\n", intarg, chararg))
```

then running `Rscript batch.R 1234 hello` will produce
```
int arg = 1234, char arg = hello
```
(note that all command-line arguments are passed as character, must be converted to numeric as necessary). If you want fancier argument processing than base R provides (e.g. default argument values, named rather than positional arguments), see [this Stack Overflow question](https://stackoverflow.com/questions/3433603/parsing-command-line-arguments-in-r-scripts) for some options.

## general performance tips

- vectorize!

## interactive sessions

While Compute Canada is generally meant to be run in batch mode, it is sometimes convenient to do some development/debugging in *short* (<3 hour) interactive sessions.

### console

See https://docs.alliancecan.ca/wiki/Running_jobs#Interactive_jobs

### in RStudio (or Jupyter notebook)

- log into https://jupyterhub.sharcnet.ca with your Compute Canada username/password
- click on the 'softwares' icon (left margin), load the `rstudio-server-...` module
- an RStudio icon will appear -- click it!
- this session does **not** have internet access, but it **does** see all of the files in your user space (including packages that you have installed locally)
- You can run Jupyter notebooks, etc. too (I don't know if there is a way to run a Jupyter notebook with a Python kernel ...)
    - https://docs.computecanada.ca/wiki/Jupyter
    - https://docs.alliancecan.ca/wiki/JupyterHub#RStudio
- it might make sense to 'reserve' your session in advance (so you don't have to wait a few minutes for it to start up), not yet sure how to do that ...

## questions for SHARCnet folks

- can you use multithreading and multiprocessing at once? 
- confirming SLURM argument `--cpus-per-task` in single threaded computing

##  miscellaneous useful (?) links

- https://wiki.math.uwaterloo.ca/fluidswiki/index.php?title=Graham_Tips
- https://helpwiki.sharcnet.ca/wiki/images/3/36/Webinar2016-parallel-hpc-R.pdf
    - this is useful in being written by SHARCnet folks, but is not super-useful: (1) out of date in some ways (i.e. SHARCnet no longer recommends spawning multiple batch submissions via shell `for` loop, they prefer [job arrays](https://docs.alliancecan.ca/wiki/Job_arrays)), (2) much of the document focuses on *general* performance tips for high-performance computing in R (using vectorization, packages for out-of-memory computation, etc..) that are not specific to running on HPC clusters
- https://cran.r-project.org/web/packages/rslurm/vignettes/rslurm.html
    - SHARCnet support recommends not using this package. "The slurm settings on the national systems are set to include group accounting and memory request information that determines your group fairshare and priority. Bypassing that may result in unexpected results."
- https://docs.alliancecan.ca/wiki/META:_A_package_for_job_farming.
    - this was recommended by SHARCnet support for embarrassingly parallel jobs in conjunction with job arrays

## levels of parallelization


- **threading**: shared memory, lightweight. Base R doesn't do any multithreading. Multithreaded computations can be done in R (1) by making use of a multithreaded BLAS (linear algebra) library, (2) using a package that includes multithreaded computation (typically via the `OpenMP` system) within C++ code (e.g. `glmmTMB`). 
    - It may sometimes be necessary to *suppress* multithreaded computation
        - `OpenMP` is usually controlled by setting the shell environment `export OMP_NUM_THREADS=1`, but there may be controls within an R package as well.
        - BLAS threading specifically can be controlled via `RhpcBLASctl::blas_set_num_threads()` (see [here](https://github.com/lme4/lme4/issues/492))
- **multicore**/**multiprocess**: parallelization at these levels can be implemented within R. There are a bunch of different packages/front ends to handle this, almost all ultimately rest on either the `RMPI` package (see below) or the `parallel` package: `foreach`, `doParallel`, `future`, `furrr`, ...
    - if using these tools (i.e. parallelization within R) you probably want to figure out the number of chunks `N` and then define a virtual cluster with `N` cores within R (e.g. `parallel::makeCluster(N)`) and set `ntasks==N` in your submission script (and let the scheduler pick the number of CPUs etc.)
    - MPI-based: you probably *don't* want to use MPI unless you are doing 'fancy' parallel computations that require inter-process communication during the course of the job (you can still use MPI but it's a waste if you don't need the communication); requires more specialized cluster resources etc.
- via batch scheduler: this is probably the most efficient way to handle distributed/embarrassingly parallel problems
    - see comments elsewhere about META/job arrays
- deciding on chunk size/number of chunks: META documentation suggests that (1) lower bound on chunking is that each chunk should take >20 minutes computation time (otherwise too much scheduler overhead); max < 1000.
    - overall speed/wall-clock time of completion
    - resource grabbiness/queue wait time


- a [useful primer on threading vs multicore](https://medium.com/mineiros/how-to-use-multithreading-and-multiprocessing-a-beginners-guide-to-parallel-and-concurrent-a69b9dd21e9d)
- some [tips/things to avoid when parallelizing](https://towardsdatascience.com/parallelization-caveats-in-r-1-the-basics-multiprocessing-and-multithreading-performance-eb584b7e850e) (e.g. be aware of memory constraints, overparallelization ...)

## parallelization and SLURM

Determining the number of nodes/cores/processes to request using SLURM will depend on which R package is used for parallelization. The `foreach` package supports both multi-core (multiple cores on a single node/computer) and multiprocessing (multiple processes within a single node or across multiple nodes in a cluster) parellelization. This is an example on how to run both using `foreach`, including how to ensure R and SLURM are communicating via the shell script, https://docs.alliancecan.ca/wiki/R#Exploiting_parallelism_in_R

You should not let the R package you are using detect and try to use the number of available cores when using HPC, you should instead always specify the number to use.

When setting SLURM `#SBATCH` arguments, here are some helpful notes:
- A `task` in SLURM is a process - a process uses one CPU core if it is single threaded.
- How tasks are allocated across cores and nodes can be specified using the arguments `--nodes`, `--ntasks`,  and `--ntasks-per-node` (`--cpus-per-task` is specific to multi-threading). Some helpful task allocation examples: 
https://support.ceci-hpc.be/doc/_contents/SubmittingJobs/SlurmFAQ.html#q05-how-do-i-create-a-parallel-environment
- The task allocation you choose will affect job scheduling. Requesting multiple tasks without specifying the number of nodes (if you don't require all tasks to be on the same node) puts fewer constraints on the system. Requesting a full node `--nodes=1 --ntasks-per-node=32` on the Graham cluster has a scheduling advantage but can be seen as abuse if this is not required.
https://docs.alliancecan.ca/wiki/Job_scheduling_policies#Whole_nodes_versus_cores