Hydra Torchrun Launcher

This plugin aims to make the launching of torch distributed training configurable in Hydra.

The configuration is as follows:

hydra:
  launcher:
    _target_: hydra_plugins.hydra_torchrun_launcher.distributed_launcher.TorchDistributedLauncher
    nnodes: '1:1'
    nproc_per_node: '1'
    rdzv_backend: static
    rdzv_endpoint: ''
    rdzv_id: none
    rdzv_conf: ''
    standalone: false
    max_restarts: 0
    monitor_interval: 5
    start_method: spawn # Support start_method=spawn, required by CUDA
    role: default
    module: false
    no_python: false
    run_path: false
    log_dir: null
    redirects: '0'
    tee: '0'
    node_rank: 0
    master_addr: '127.0.0.1'
    master_port: 29500
    local_addr: null
    training_script: ''
    training_script_args: [ ]

The meaning of each parameter matches exactly with the arguments of torchrun. Please refer to its documentation for a more detailed introduction.

Installation

pip3 install git+https://github.com/acherstyx/hydra-torchrun-launcher.git

Usage

python3 run_net.py --multirun hydra/launcher=torchrun hydra.launcher.nproc_per_node=8

The behavior of this example should be the same as launching with torchrun:

torchrun --nproc_per_node=8 run_net.py

Acknowledgement

This plugin is modified from the hydra-torchrun-launcher plugin at hydra/contrib. Currently, the main difference includes:

Following loky, the pickling error described in facebookresearch/hydra#2038 is fixed through the use of cloudpickle. This version of the launcher now supports start_method=spawn, which is required by CUDA (see pytorch/pytorch#40403).
The config is adjusted to match with torchrun.
Fix hydra.runtime.output_dir missing after spawn.
Fix the return value of multi-node training.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
hydra_plugins/hydra_torchrun_launcher		hydra_plugins/hydra_torchrun_launcher
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hydra_plugins/hydra_torchrun_launcher

hydra_plugins/hydra_torchrun_launcher

.gitignore

.gitignore

LICENSE

LICENSE

MANIFEST.in

MANIFEST.in

README.md

README.md

pyproject.toml

pyproject.toml

Repository files navigation

Hydra Torchrun Launcher

Installation

Usage

Acknowledgement

About

Releases 1

Packages

Languages

License

acherstyx/hydra-torchrun-launcher

Folders and files

Latest commit

History

Repository files navigation

Hydra Torchrun Launcher

Installation

Usage

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Languages