- Experiments on the public split of Cora
- Fix potentially wrong design: For inductive learning, current framework split training graph and validation (testing) graph on edge levels. Though non common edges, some traning nodes are in the validation graph, which may result in higher validation (testing) results.
- Current mini-batch sampling proposed by GraphSAGE is only enable in negative graph sampling for the linkage prediction task due to the necessity. Should also implement and test mini-batch sampling, even when the full graph can fit into memory (e.g. in the node classification pipelines)
- Consider changing the current dataloaders to PyG's official Ligntning data wrapper.
- Refactor the training and testing pipelines into
py
scripys. - Add GAT into available models in Anomaly-Detection-GNN.
- Add GraphSAINT mini-batch sampling strategies in Anomaly-Detection-GNN.
GNN-based methods for anomaly detection on various datasets will be implemented in this repo: Anomaly-Detection-GNN.
This repo contains Pytorch-Lightning implementations of GCN and GraphSAGE for Node Classification and Link Prediction (as a way for Recommendation System) on the Cora dataset and CUHKSZ-AG dataset. Size of this repo: ~23MB
This project roughly consists of two parts: dataset creation (dataPrep) and GNN application(all jupyter notebooks). Source codes of our data processing and models are under the folder ./utils
.
- Install anaconda or any other conda.
- Create an empty and new environment by
conda create --name yourname python==3.10.9
. - Activate the environment you just created.
- Key libararies required for this project: Pytorch 2.0, Pytorch Sparse, Pytorch_Lightning 2.0, Pytorch_Geometric, tensorboard
Note: if you find any libraries missed when running our code, please check the libraries used at the beginning of the notebook.
When running the first ipynb notebook which enables the initialization of tensorboard, you need to change the ip address to your own one and choose a port that have not been used in your own computer. When running the remained files, all the results will be shown in the same tensorboard.
Or alternatively, you can launch the tensorbord in the terminal at the root of this repo:
tensorboard --logdir 'lightning_logs' --port 6003
Then visit the tensorboad in the browser: http://localhost:6003/
.
- The notebook in ./dataPrep is the web crawling procedures.
- Run everything in
1_NC-Cora_GCN.ipynb
to see the node classification result for Cora dataset using GCN model. - Run everything in
2_NC-AG_GCN.ipynb
to see the node classification result for our academic dataset using GCN model. - Run everything in
3_NC-Cora_GraphSAGE.ipynb
to see the node classification result for Cora dataset using GraphSAGE model. - Run everything in
4_NC-AG_GraphSAGE.ipynb
to see the node classification result for our academic dataset using GraphSAGE model. - Run everything in
5_RS-Cora_GCN.ipynb
to see the recommendation result for Cora dataset using GCN model and transductive method. - Run everything in
6_RS-AG_GCN.ipynb
to see the recommendation result for our academic dataset using GCN model and transductive method. - Run everything in
7_RS-Cora_GraphSAGE_transductive.ipynb
to see the recommendation result for Cora dataset using GraphSAGE model and transductive method. - Run everything in
8_RS-AG_GraphSAGE_transductive.ipynb
to see the recommendation result for our academic dataset using GraphSAGE model and transductive method. - Run everything in
9_RS-Cora_GraphSAGE_inductive.ipynb
to see the recommendation result for Cora dataset using GraphSAGE model and inductive method. - Run everything in
10_RS-AG_GraphSAGE_inductive.ipynb
to see the recommendation result for our academic dataset using GraphSAGE model and inductive method.
Please refer to the report.pdf
.