GDL-DS

This repository contains the official implementation of GDL-DS as described in the paper: GDL-DS: A Benchmark for Geometric Deep Learning under Distribution Shifts by Deyu Zou, Shikun Liu, Siqi Miao, Victor Fung, Shiyu Chang, and Pan Li.

Introduction

We propose GDL-DS, a comprehensive benchmark designed for evaluating the performance of geometric deep learning (GDL) models in scenarios where scientific applications encounter distribution shift challenges. Our evaluation datasets cover diverse scientific domains from particle physics and materials science to biochemistry, and encapsulate a broad spectrum of distribution shifts including conditional, covariate, and concept shifts. Furthermore, we study three levels of information access from the out-of-distribution (OOD) testing data, including no OOD information (No-Info), only OOD features without labels (O-Feature), and OOD features with a few labels (Par-Label).

Datasets

Figure 1 provides illustrations of some distribution shifts mentioned in this paper. Dataset statistics could be found in our paper. All processed datasets are available for manual download from Zenodo (Here is another upload due to the space limit). For the HEP dataset, we highly recommend using the processed files directly because the raw files would consume a significant amount of disk space and require a longer time for processing. Regarding DrugOOD-3D, in this paper, we utilized three cases of distribution shifts, including lbap_core_ic50_assay, lbap_core_ic50_scaffold, lbap_core_ic50_size, and we recommend readers to find more details in https://github.com/tencent-ailab/DrugOOD. As for QMOF, our data is sourced from https://github.com/Andrew-S-Rosen/QMOF.

Figure 2. Illustrations of the four scientific datasets in this work to study interpretable GDL models.

Installation

We have tested our code on Python 3.9 with PyTorch 1.12.1, PyG 2.2.0 and CUDA 11.3. Please follow the following steps to create a virtual environment and install the required packages.

Step 1: Clone the repository

git clone https://github.com/Graph-COM/GDL_DS.git
cd GDL_DS

Step 2: Create a virtual environment

conda create --name GDL_DS python=3.9 -y
conda activate GDL_DS

Step 3: Install dependencies

conda install -y pytorch==1.12.1 torchvision cudatoolkit=11.3 -c pytorch
pip install torch-scatter==2.1.0 torch-sparse==0.6.16 torch-cluster==1.6.0 torch-geometric==2.2.0 -f https://data.pyg.org/whl/torch-1.12.0+cu113.html
pip install -r requirements.txt

Reproducing Results

We train a model by run.sh file:

cd ./scripts
sh run.sh

Specifically, use the following command in this file:

python run.py --dataset [dataset_name] --method [method_name] --shift [shift_name] --target [target] --setting [setting_name] --backbone [backbone_name]

dataset_name can be chosen from Track, DrugOOD-3D, and QMOF, and the dataset specified will be downloaded automatically.

method_name can be chosen from erm, lri_bern, mixup, dir, groupdro, VREx, coral, DANN.

shift_name can be chosen from pileup (corresponding to target of 50 or 90), signal (corresponding to target of tau, zp_10 or zp_20), assay (corresponding to target of lbap_core_ic50_assay), scaffold (corresponding to target of lbap_core_ic50_scaffold), size (corresponding to target of lbap_core_ic50_size), fidelity (corresponding to target of hse06 or hse06_10hf).

setting_name can be chosen from No-Info, O-Feature, and Par-Label.

backbone_model can be chosen from dgcnn, pointtrans and egnn.

The tuned hyperparameters in the egnn backbone for all distribution shifts and cases can be found in ./src/configs.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
dataset		dataset
logging		logging
scripts		scripts
src		src
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset

dataset

logging

logging

scripts

scripts

src

src

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

GDL-DS

Introduction

Datasets

Installation

Reproducing Results

Reference

About

Releases

Packages

Languages

License

Graph-COM/GDL_DS

Folders and files

Latest commit

History

Repository files navigation

GDL-DS

Introduction

Datasets

Installation

Reproducing Results

Reference

About

Resources

License

Stars

Watchers

Forks

Languages