Default number of workers for multiprocessing #62

ieivanov · 2023-07-21T16:30:08Z

@talonchandler and I found that on bruno multiprocessing.cpu_count() returns the total number of CPUs on the given machine, which is usually more than the user has reserved. In that sense, multiprocessing.cpu_count() is not a good default value on bruno (as @edyoshikun has also found). We should check if on bruno there is an env variable that gives the number of nodes reserved and use that as default. If that variable doesn't exist (e.g. when not working on bruno) we can fall back to multiprocessing.cpu_count()

The text was updated successfully, but these errors were encountered:

talonchandler · 2023-07-21T17:02:22Z

Thanks for summarizing @ieivanov .

@edyoshikun I think you've been using multiprocessing on slurm the most, so I'd appreciate your comments. I just took a quick look with srun --cpus-per-task=13 --pty bash then env | grep SLURM and I found some good candidate environment variables: SLURM_CPUS_PER_TASK=13, SLURM_JOB_CPUS_PER_NODE=13, SLURM_CPUS_ON_NODE=13.

ieivanov · 2023-07-21T18:31:00Z

I think this now becomes a question for griznog to know what are the differences between these

edyoshikun · 2023-07-21T18:47:33Z

Thanks for finding this. When we run the bash script for slurm we use --cpus-per-task as the the variable to reserve, so I think this is probably the best candidate.

# Get the number of CPUs per task
CPUS_PER_TASK=$(scontrol show job $SLURM_JOB_ID | awk -F= '/^CPUS-per-task/ {print $2}')

I think griznog or chatgpt would be good candidates to ask.

edyoshikun · 2023-07-21T18:52:24Z

Also related but maybe part of different conversation, but I did find this other command sacct and I think Jordao mentioned at some point, to know how many resources the job took (i.e CPU usage and memory usage)

# View resource usage for a specific completed job (replace JOB_ID with the actual job ID)
sacct -j JOB_ID

talonchandler · 2023-07-22T00:19:36Z

Sounds good, I'll trial SLURM_CPUS_PER_TASK and see how it works. (@edyoshikun FYI this variable is directly available from within a SLURM job, so I don't think I'll need to use the scontrol and awk stuff?)

I also recently discovered seff JOB_ID which directly solves at least one of the headaches we were dealing with: monitoring maximum memory utilization @ieivanov

Job ID: 10022176
Cluster: cluster
User/Group: eduardo.hirata/eduardo.hirata.grp
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 32
CPU Utilized: 00:09:02
CPU Efficiency: 12.01% of 01:15:12 core-walltime
Job Wall-clock time: 00:02:21
Memory Utilized: 6.17 GB
Memory Efficiency: 38.58% of 16.00 GB

ieivanov · 2024-01-24T19:42:52Z

We should unify the --num-processes across CLI calls using a decorator, similar to input_position_dirpaths for example. This will ensure we have the right number of workers for every CLI call.

ieivanov assigned talonchandler and edyoshikun Jul 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default number of workers for multiprocessing #62

Default number of workers for multiprocessing #62

ieivanov commented Jul 21, 2023

talonchandler commented Jul 21, 2023

ieivanov commented Jul 21, 2023

edyoshikun commented Jul 21, 2023

edyoshikun commented Jul 21, 2023

talonchandler commented Jul 22, 2023

ieivanov commented Jan 24, 2024

Default number of workers for multiprocessing #62

Default number of workers for multiprocessing #62

Comments

ieivanov commented Jul 21, 2023

talonchandler commented Jul 21, 2023

ieivanov commented Jul 21, 2023

edyoshikun commented Jul 21, 2023

edyoshikun commented Jul 21, 2023

talonchandler commented Jul 22, 2023

ieivanov commented Jan 24, 2024