Dir-GNN: Graph Neural Networks for Directed Graphs

Dir-GNN is a machine learning model that enables learning on directed graphs. This repository contains the official implementation of the paper "Edge Directionality Improves Learning on Heterophilic Graphs", where we introduce Dir-GNN and show that leveraging edge directionality leads to improved learning on heterophilic graphs.

Dir-GNN: Edge Directionality Improves Learning on Heterophilic Graphs

Overview

Graph Neural Networks (GNNs) have become the de-facto standard tool for modeling relational data. However, while many real-world graphs are directed, the majority of today's GNN models discard this information altogether by simply making the graph undirected. The reasons for this are historical: 1) many early variants of spectral GNNs explicitly required undirected graphs, and 2) the first benchmarks on homophilic graphs did not find significant gain from using direction.

In our paper, we show that in heterophilic settings, treating the graph as directed increases the effective homophily of the graph, suggesting a potential gain from the correct use of directionality information. To this end, we introduce Directed Graph Neural Network (Dir-GNN), a novel general framework for deep learning on directed graphs. Dir-GNN can be used to extend any Message Passing Neural Network (MPNN) to account for edge directionality information by performing separate aggregations of the incoming and outgoing edges.

We prove that Dir-GNN matches the expressivity of the Directed Weisfeiler-Lehman test, exceeding that of conventional MPNNs. In extensive experiments, we validate that while our framework leaves performance unchanged on homophilic datasets, it leads to large gains over base models such as GCN, GAT and GraphSage on heterophilic benchmarks, outperforming much more complex methods and achieving new state-of-the-art results.

Getting Started

To get up and running with the project, you need to first set up your environment and install the necessary dependencies. This guide will walk you through the process step by step.

Setting Up the Environment

The project is designed to run on Python 3.10. We recommend using Conda to set up the environment as follows:

conda create -n directed_gnn python=3.10
conda activate directed_gnn

Installing Dependencies

Once the environment is activated, install the required packages:

conda install pytorch==2.0.1 pytorch-cuda=11.7 -c pytorch -c nvidia
conda install pyg pytorch-sparse -c pyg
pip install ogb==1.3.6
pip install pytorch_lightning==2.0.2
pip install gdown==4.7.1

Please ensure that the version of pytorch-cuda matches your CUDA version. If your system does not have a GPU, use the following command to install PyTorch:

conda install pytorch==2.0.1 -c pytorch

For M1/M2/M3 Mac users, pyg (PyTorch Geometric) needs to be installed from source. Detailed instructions for this process can be found here.

Code Structure

run.py: This script is used to run the models.
model.py: Contains the definition of the directed convolutions (DirGCN, DirSage, and DirGAT) explored in the paper.
homophily.py: Houses functions for computing the weighted node homophily and the weighted compatibility matrix, as defined in our paper.

Running Experiments

This section provides instructions on how to reproduce the experiments outlined in the paper. Note that some of the results may not be reproduced exactly, given that some of the operations used are intrinsically non-deterministic on the GPU, as explained here. However, you should obtain results very close to those in the paper.

Dir-GNN Experiments

To reproduce the best Dir-GNN results on heterophilic datasets (Table 3 in our paper), use the following command:

python -m src.run --dataset chameleon --use_best_hyperparams --num_runs 10

The --dataset parameter specifies the dataset to be used. Replace chameleon with the name of the dataset you want to use.

Ablation Study on Using Directionality

Table 2 and 4 contain the results of an ablation study where we compare common undirected GNN models (GCN, GraphSage, GAT) with their Dir-GNN counterpart.

To reproduce the results for the undirected models, run the following command:

python -m src.run --dataset chameleon --conv_type gcn --num_runs 10 --patience 200 --normalize --undirected

Here, --conv_type specifies the convolution type and can be set to gcn, sage or gat.

To instead reproduce the Dir-GNN for different values of $alpha$, use the following commands:

python -m src.run --dataset chameleon --conv_type dir-gcn --num_runs 10 --patience 200 --normalize --alpha 1
python -m src.run --dataset chameleon --conv_type dir-gcn --num_runs 10 --patience 200 --normalize --alpha 0
python -m src.run --dataset chameleon --conv_type dir-gcn --num_runs 10 --patience 200 --normalize --alpha 0.5

In this case, --conv_type can be set to dir-gcn, dir-sage and dir-gat.

Synthetic Experiments

You can run synthetic experiment (Figure 2b) using the following command:

./run_synthetic_experiment sage

You can also replace sage with gcn, or gat to specify the base model to be used in the synthetic experiment and obtain the results in Figure 8.

Dataset Fix

For Citeseer-Full and Cora-ML datasets, PyG loads them as undirected by default. To utilize these datasets in their directed form, a slight modification is required in the PyG local installation. Please comment out the line edge_index = to_undirected(edge_index, num_nodes=x.size(0)) in the file located at:

/miniconda

3/envs/your_env/lib/python3.10/site-packages/torch_geometric/io/npz.py

Command Line Arguments

The following command line arguments can be used with the code:

Dataset Arguments

Argument	Type	Default Value	Description
--dataset	str	"chameleon"	Name of the dataset
--dataset_directory	str	"dataset"	Directory to save datasets
--checkpoint_directory	str	"checkpoint"	Directory to save checkpoints

Preprocessing Arguments

Argument	Action	Description
--undirected	store_true	Use undirected version of graph
--self_loops	store_true	Add self-loops to the graph
--transpose	store_true	Use transpose of the graph

Model Arguments

Argument	Type	Default Value	Description
--model	str	"gnn"	Model type
--hidden_dim	int	64	Hidden dimension of model
--num_layers	int	3	Number of GNN layers
--dropout	float	0.0	Feature dropout
--alpha	float	0.5	Direction convex combination params
--learn_alpha	action	-	If set, learn alpha
--conv_type	str	"dir-gcn"	DirGNN Model
--normalize	action	-	If set, normalize
--jk	str	"max"	Either "max", "cat" or None

Training Args

Argument	Type	Default Value	Description
--lr	float	0.001	Learning Rate
--weight_decay	float	0.0	Weight decay
--num_epochs	int	10000	Max number of epochs
--patience	int	10	Patience for early stopping
--num_runs	int	1	Max number of runs

System Args

Argument	Type	Default Value	Description
--use_best_hyperparams	flag		If specified, use the best hyperparameters
--gpu_idx	int	0	Indexes of GPU to run the program on
--num_workers	int	0	Number of workers for the dataloader
--log	str	"INFO"	Log Level. Choices: ["DEBUG", "INFO", "WARNING"]
--profiler	flag		If specified, enable profiler

Citation

@misc{dirgnn_rossi_2023,
    title={Edge Directionality Improves Learning on Heterophilic Graphs},
    author={Emanuele Rossi and Bertrand Charpentier and Francesco Di Giovanni and Fabrizio Frasca and Stephan Günnemann and Michael Bronstein},
    publisher={arXiv},
    year={2023}
}

Contact

If you have any questions, issues or feedback, feel free to reach out to Emanuele Rossi at emanuele.rossi1909@gmail.com or Bertrand Charpentier at bertrand.henri.charpentier@gmail.com.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
figures		figures
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
best_hyperparams.yml		best_hyperparams.yml
run_synthetic_experiment.sh		run_synthetic_experiment.sh

License

emalgorithm/directed-graph-neural-network

Folders and files

Latest commit

History

Repository files navigation