Denoise Pretraining for ML Potentials

Denoise Pretraining on Nonequilibrium Molecules for Accurate and Transferable Neural Potentials
Journal of Chemical Theory and Computation [Paper] [arXiv] [PDF]
Yuyang Wang, Changwen Xu, Zijie Li, Amir Barati Farimani
Carnegie Mellon University

This is the official implementation of "Denoise Pretraining on Nonequilibrium Molecules for Accurate and Transferable Neural Potentials". In this work, we propose denoise pretraining on non-equilibrium molecular conformations to achieve more accurate and transferable potential predictions with invariant and equivariant graph neural networks (GNNs). If you find our work useful in your research, please cite:

@article{wang2023denoise,
  title={Denoise Pre-training on Non-equilibrium Molecules for Accurate and Transferable Neural Potentials},
  author={Wang, Yuyang and Xu, Changwen and Li, Zijie and Barati Farimani, Amir},
  journal={Journal of Chemical Theory and Computation},
  doi={10.1021/acs.jctc.3c00289},
  year={2023}
}

Installation

Set up a conda environment and clone the github repo

# create a new environment
$ conda create --name ml_potential python=3.8
$ conda activate ml_potential

# install requirements
$ conda install pytorch==1.12.0 cudatoolkit=11.6 -c pytorch -c conda-forge
$ conda install pyg -c pyg
$ conda install -c dglteam/label/cu116 dgl
$ conda install -c conda-forge tensorboard openmm
$ pip install PyYAML rdkit ase
$ pip install git+https://github.com/AMLab-Amsterdam/lie_learn

# clone the source code
$ git clone https://github.com/yuyangw/Denoise-Pretrain-ML-Potential.git
$ cd Denoise-Pretrain-ML-Potential

Dataset

The datasets used in the work are summarized in the following table, including the link to download, number of molecules, number of conformations, number of elements, number of atoms per molecule, molecule types, and whether each dataset is used for pre-training (PT) and fine-tuning (FT). GNNs are pre-trained on the combination of ANI-1 and ANI-1x, and fine-tuned on each dataset separately.

Dataset	Link	# Mol.	# Conf.	# Ele.	# Atoms	Molecule types	Usage
ANI-1	[link]	57,462	24,687,809	4	2~26	Small molecules	PT & FT
ANI-1x	[link]	63,865	5,496,771	4	2~63	Small molecules	PT & FT
ISO17	[link]	129	645,000	3	19	Isomers of C7O2H10	FT
MD22	[link]	7	223,422	4	42~370	Proteins, lipids, carbohydrates, nucleic acids, supramolecules	FT
SPICE	[link]	19,238	1,132,808	15	3~50	Small molecules, dimers, dipeptides, solvated amino acids	FT

Pre-training

To pre-train the invariant or equivariant GNNs, where the configurations and detailed explaination for each variable can be found in config_pretrain.yaml

$ python pretrain.py

To monitor the training via tensorboard, run tensorboard --logdir {PATH} and click the URL http://127.0.0.1:6006/.

Fine-tuning

To fine-tune the pre-trained GNN models on molecular potential predictions, where the configurations and detailed explaination for each variable can be found in config.yaml

$ python train.py

Pre-trained models

We also provide pre-trained checkpoint model.pth and the configuration config_pretrain.yaml for each model, which can be found in the ckpt folder. Pre-trained models include:

Pre-trained SchNet in ckpt/schnet folder
Pre-trained SE(3)-Transformer in ckpt/se3transformer folder
Pre-trained EGNN in ckpt/egnn folder
Pre-trained TorchMD-Net in ckpt/torchmdnet folder

Acknowledgement

The implementation of GNNs in this work is based on:

Implementation of SchNet: kyonofx/MDsim & PyG
Implementation of SE(3)-Transformer: FabianFuchsML/se3-transformer-public
Implementation of EGNN: vgsatorras/egnn
Implementation of TorchMD-Net: torchmd/torchmd-net

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
ckpt		ckpt
dataset		dataset
figs		figs
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
config_pretrain.yaml		config_pretrain.yaml
pretrain.py		pretrain.py
train.py		train.py
utils.py		utils.py

License

yuyangw/Denoise-Pretrain-ML-Potential

Folders and files

Latest commit

History

Repository files navigation

Denoise Pretraining for ML Potentials

Getting Started

Installation

Dataset

Pre-training

Fine-tuning

Pre-trained models

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Languages