Skip to content

gmh14/Geo-DEG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hierarchical Grammar-Induced Geometry for Data-Efficient Molecular Property Prediction

This repository contains the implementation code for paper Hierarchical Grammar-Induced Geometry for Data-Efficient Molecular Property Prediction (ICML 2023).

In this work, we propose a data-efficient property predictor (Geo-DEG) by utilizing a learnable hierarchical molecular grammar that can generate molecules from grammar production rules. Such a grammar induces an explicit geometry of the space of molecular graphs, which provides an informative prior on molecular structural similarity. The property prediction is performed using graph neural diffusion over the grammar-induced geometry. On both small and large datasets, our evaluation shows that this approach outperforms a wide spectrum of baselines, including supervised and pre-trained graph neural networks.

overview

Installation

Prerequisites

  • Pretrained GNN: We use this codebase for the pretrained GNN used in our paper. The necessary code & pretrained models are built in the current repo.

  • datasets: Download datasets and meta-geometry from this link.

Install Prerequisites

Install the dependencies for Geo-DEG following:

conda create -n Geo_DEG python=3.6
conda activate Geo_DEG
conda install scipy==1.2.1 pandas==0.23.4 numpy==1.16.2 scikit-learn
conda install pytorch torchvision torchaudio cpuonly -c pytorch
conda install -c rdkit rdkit
pip install ogb pykeops
pip install torchdiffeq -f https://pytorch-geometric.com/whl/torch-1.10.1+cpu.html
pip install torch-scatter torch-sparse -f https://data.pyg.org/whl/torch-1.10.0+cpu.html
pip install torch-geometric
pip install torch-cluster torch-spline-conv -f https://data.pyg.org/whl/torch-1.10.0+cpu.html
pip install setproctitle
pip install graphviz
pip install matplotlib
pip install typed-argument-parser
pip install tensorboardX
pip install hyperopt

Train

Training with MPN

cd GrammarDAG
python main.py --dataset crow_smiles_and_Tg_celsius.txt --feat_arch MPN --motif motif --adam

Training with GNN

cd GrammarDAG
python main.py --dataset crow_smiles_and_Tg_celsius.txt --feat_arch GNN --motif motif

Acknowledgements

The implementation of Geo-DEG is heavily based on Data-Efficient Graph Grammar Learning for Molecular Generation and graph-neural-pde , partly based on Molecular Optimization Using Molecular Hypergraph Grammar and Hierarchical Generation of Molecular Graphs using Structural Motifs.

Citation

If you find the idea or code useful for your research, please cite our paper:

@article{guo2023hierarchical,
  title={Hierarchical Grammar-Induced Geometry for Data-Efficient Molecular Property Prediction},
  author={Guo, Minghao and Thost, Veronika and Song, Samuel W and Balachandran, Adithya and Das, Payel and Chen, Jie and Matusik, Wojciech},
  year={2023}
}

Contact

Please contact guomh2014@gmail.com if you have any questions. Enjoy!