Skip to content

PyTorch implementation of 'How Graph Structure and Label Dependencies Contribute to Node Classification in a Large Network of Documents'

Notifications You must be signed in to change notification settings

ToineSayan/node-classification-and-label-dependencies

Repository files navigation

How Graph Structure and Label Dependencies Contribute to Node Classification in a Large Network of Documents

A PyTorch implementation

Abstract

We introduce a new data set named WikiVitals which contains a large graph of 48k mutually referred Wikipedia articles classified into 32 categories and connected by 2.3M edges. Our aim is to rigorously evaluate the contributions of three distinct sources of information to the label prediction in a semi-supervised node classification setting, namely the content of the articles, their connections with each other and the correlations among their labels. We perform this evaluation using a Graph Markov Neural Network which provides a theoretically principled model for this task and we conduct a detailed evaluation of the contributions of each sources of information using a clear separation of model selection and model assessment. One interesting observation is that including the effect of label dependencies is more relevant for sparse train sets than it is for dense train sets.

TEST

WikiVitals dataset

More infos about WikiVitals here: https://github.com/ToineSayan/wikivitals-lvl5-04-2022

Requirements

  • Python > 3.10.6
  • numpy
  • scikit-learn
  • scipy
  • torch

Usage

To run a Model selection : python main_selection_NN.py --dataset CORA-ORIG --model FAGCN --split_suffix s
To run a Model assessment : python main_assessment_GMNN.py --dataset CORA-ORIG --model FAGCN --split_suffix s

Note 1: For more information about default configurations used to train the models, check this subfolder.

Note 2: For more information about split suffix, check this subfolder.
Split suffix can be:

  • s: standard splits with dense inner-train sets (10 pre-defined splits)
  • 20: splits with sparse balanced train sets composed of 20 nodes of each class (10 pre-defined splits)
  • spstrat: splits with sparse stratified train sets (10 pre-defined splits)

Note 3: results of model selection phases are provided for each split and each architecture evaluated. Model assessment uses them. If someone decides to re-calculate the best hyperparameter settings, the x-file must be updated or replaced in the code.

General informations

The code was written following the directions of the articles or adapted from the implementations provided by the authors. The main articles and implementations used are:

About

PyTorch implementation of 'How Graph Structure and Label Dependencies Contribute to Node Classification in a Large Network of Documents'

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages