Quick Start

This repository is written by Hang'rui Bi based on Shaochen Shi's implementation of DeePMD-kit using PyTorch. It is supposed to offer comparable accuracy and performance to the TF implementation.

Quick Start

Install

This package requires PyTorch 2.

# PyTorch 2 recommends Python >= 3.8 .
conda create -n deepmd-pt python=3.10
conda activate deepmd-pt
# Following instructions on pytorch.org
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
git clone https://github.com/dptech-corp/deepmd-pytorch.git
pip install deepmd-pytorch

# ... or
pip install git+https://github.com/dptech-corp/deepmd-pytorch.git

Run

conda activate deepmd-pt
python3 dp train tests/water/se_e2_a.json

Profiling

# you may change the number of training steps before profiling
PYTHONPATH=/root/deepmd_on_pytorch python3 -m cProfile -o profile deepmd_pt/main.py train tests/water/se_e2_a.json 2>&1
python -m pstats

References

Original DeePMD-kit on TensorFlow https://github.com/deepmodeling/deepmd-kit
DeePMD on PyTorch demo https://github.com/shishaochen/deepmd_on_pytorch

Structure

# deepmd_pt
entrypoints
    main.py
train
    train.py
infer
    inference.py
model
    model.py
    descriptor
        descriptor.py
        embedding_net.py
    task
        fitting.py
loss
    loss.py
optimizer
    LKF.py
    KFWrapper.py
utils
    dataset.py
    env.py
    learning_rate.py
    my_random.py
    stat.py

Deploy

Tested with libtorch pre-CXX11 abi cu116, cuda 11.6, torch 1.13

python test.py
export CMAKE_PREFIX_PATH=`python -c "import torch;print(torch.__path__[0])"`/share/cmake:$CMAKE_PREFIX_PATH
cmake -B build
cd build
cmake --build .

Test

First modify TEST_CONFIG in env.py to the input config you want to test. For example, tests/water/se_e2.json is the config for a tiny water problem. The water dataset is contained in the repository.

The tests are aligned with deepmdkit 2.1.5, may fail with deepmdkit 2.2 or higher.

Distributed Data Parallelism

Currently, we support input files in traditional dp format. We construct a PyTorch DataSet for each system, and fetch batched data with a dedicated DataLoader. This guarantee the input data for one rank in one mini-batch comes from same system, i.e. has a same number of atoms, which is required by the model. Using DistributedSampler, each frame is extracted for training once and only once in one epoch, no matter how many ranks there are.

The systems vary in length, and the number of mini-batches we can get from that DataLoader differs. A index table is created on each rank, and for each DataLoader, its index value is appended to the index array in the number of the length of the DataLoader. In pseudocodes:

self.index: List[int] = []
self.dataloaders: List[DataLoader] = []
for system in systems:
    dl = create_dataloader(system)
    self.dataloaders.append(dl)
    for _ in range(len(dl)): # len(dl) == how many mini-batches in this system
        index.append(len(self.dataloaders) - 1)

We initialize a meta-dataset named dploaderset with the index. Each step draws out an index randomly using RandomSampler, and fetch data from the corresponding DataLoader. Hence, in one epoch, the number of every DataLoader being accessed equals the length of it, which means that all input frames are accessed without omitting.

flowchart LR

    subgraph systems
        subgraph system1
            direction LR
            frame1[frame 1]
            frame2[frame 2]
        end

        subgraph system2
            direction LR
            frame3[frame 3]
            frame4[frame 4]
            frame5[frame 5]
        end
    end

    subgraph dataset
        dataset1[dataset 1]
        dataset2[dataset 2]
    end
    system1 -- frames --> dataset1
    system2 --> dataset2

    subgraph distribted sampler
        ds1[distributed sampler 1]
        ds2[distributed sampler 2]
    end
    dataset1 --> ds1
    dataset2 --> ds2

    subgraph dataloader
        dataloader1[dataloader 1]
        dataloader2[dataloader 2]
    end
    ds1 -- mini batch --> dataloader1
    ds2 --> dataloader2

    subgraph index[index on Rank 0]
        dl11[dataloader 1, entry 1]
        dl21[dataloader 2, entry 1]
        dl22[dataloader 2, entry 2]
    end
    dataloader1 --> dl11
    dataloader2 --> dl21
    dataloader2 --> dl22

    index -- for each step, choose 1 system --> RandomSampler
    --> dploaderset --> bufferedq[buffered queue] --> model

For more details, please see deepmd-pytorch/deepmd_pt/utils/dataloader.py.

Run on a local cluster

We use torchrun to launch a DDP training session.

To start training with multiple GPUs in one node, set parameter nproc_per_node as the number of it:

torchrun --nproc_per_node=4 --no-python dp_pt train input.json
# Not setting `nproc_per_node` uses only 1 GPU
torchrun --no-python dp_pt train input.json

If you wish to execute the codes under active development without pip installing, please try:

PYTHONPATH=~/deepmd-pytorch torchrun ~/deepmd-pytorch/deepmd_pt/entrypoints/main.py train input.json

To train a model with a cluster, one can manually launch the task using the commands below (usually this should be done by your job management system). Set nnodes as the number of available nodes, node_rank as the rank of the current node among all nodes (not the rank of processes!), and nproc_per_node as the number of available GPUs in one node. Please make sure that every node can access the rendezvous address and port (rdzv_endpoint in the command), and has a same amount of GPUs.

# Running DDP on 2 nodes with 4 GPUs each
# On node 0:
torchrun --rdzv_endpoint=node0:12321 --nnodes=2 --nproc_per_node=4 --node_rank=0 --no_python dp train tests/water/se_e2_a.json
# On node 1:
torchrun --rdzv_endpoint=node0:12321 --nnodes=2 --nproc_per_node=4 --node_rank=1 --no_python dp train tests/water/se_e2_a.json

Note Set environment variables to tune CPU specific optimizations in advance.

Note for developers: torchrun by default passes settings as environment variables (list here).

To check forward, backward, and communication time, please set env var TORCH_CPP_LOG_LEVEL=INFO TORCH_DISTRIBUTED_DEBUG=DETAIL. More details can be found here.

Run on slurm system

Use .sbatch file in slurm/, you may need to modify some config to run on your system

sbatch distributed_data_parallel_slurm_setup.sbatch

These files are modified from: https://github.com/lkskstlr/distributed_data_parallel_slurm_setup

Track runs using W&B

wandb is automatically installed as a requirement for deepmd-pytorch.

First setup with wandb login, and set the corresponding fields under the "training" part in your input file (typically input.json) as follows:

// "training": {
    "wandb_config": {
        "job_name": "Cu-dpa_adam_bz1_at2",
        "wandb_enabled": true,
        "entity": "dp_model_engineering", // a username or team name
        "project": "DPA-2"
    },

To disable logging temporarily, set env var WANDB_MODE=disabled.

Name		Name	Last commit message	Last commit date
Latest commit History 359 Commits
.github/workflows		.github/workflows
deepmd_pt		deepmd_pt
deploy		deploy
slurm		slurm
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
rmse_over_step.png		rmse_over_step.png
visualize.py		visualize.py

License

dptech-corp/deepmd-pytorch

Folders and files

Latest commit

History

Repository files navigation

Quick Start

Install

Run

Profiling

References

Structure

Deploy

Test

Distributed Data Parallelism

Run on a local cluster

Run on slurm system

Track runs using W&B

Known Problems & TODO

About

Resources

License

Stars

Watchers

Forks

Languages