This repository provides complementary code and data for the article Evaluating Node Embeddings of Complex Networks.
Repository groups files into 4 folders:
datastores graphs and community files used in the experimentsexperimentscontains scripts for conducting experiments, script with utility functions and data used to produce specific graphsresultsstores .csv files with results of experimentssrcdirectory include external scripts, source files and packages utilized in the experiments
In the experiments we used both synthethic and real-world graphs. Synthetic data is generated using ABCD framework. For more information about real-world graphs please refer to following sources:
- Airports - https://www.kaggle.com/flashgordon/usa-airport-dataset#Airports2.csv
- Email-EU - https://snap.stanford.edu/data/email-Eu-core-temporal.html
- Github Developers - https://github.com/benedekrozemberczki/MUSAE
- Mouse Brain - http://networkrepository.com/bn-mouse-kasthuri-graph-v4.php
We used numerical IDs for each experiment to simplify notation in the scripts. Each ID correspond to following tasks:
| Experiment ID | Description |
|---|---|
| 1 | Divergence and variance on one ABCD graph with default parameters |
| 2 | Sensitivity analysis for ξ |
| 3 | Sensitivity analysis for β |
| 4 | Sensitivity analysis for γ |
| 5 | Sensitivity analysis for node2vec p and q parameters; |
| 6 | Sensitivity analysis for n |
| 7 | Nodes Classification |
| 8 | Sensitivity analysis for Δ |
| 9 | Divergence and variance for Mousebrain, Airports, GitHub and EmailEU graphs |
| 10 | Community Detection |
| 11 | Link Prediction |
| 50k | Divergence and variance on one ABCD graph with 50k nodes |
Majority of the experiments were launched in cloud environment due to high computational requirements. To prepare local environment for the experiments please follow guidelines below:
- Install Julia (experiments ran using Julia 1.5.3)
- Add required Julia packages
julia -e 'using Pkg; Pkg.add(url="https://github.com/KrainskiL/CGE.jl", rev="v1.2.2")'
julia -e 'using Pkg; Pkg.add(url="https://github.com/bkamins/ABCDGraphGenerator.jl")'- Install python dependencies from
requirements.txt
pip install -r requirements.txt- Download and install OpenNE package
git clone https://github.com/thunlp/OpenNE.git
cd src
python setup.py install- Download and compile VERSE executable for your OS.
src/verse/srcdirectory contains executable build for Ubuntu 18.04. For more details please check VERSE repository.
git clone https://github.com/xgfs/verse.git
cd src && make;Each experiment can be conducted by running experiment.py in appropriate folder in experiments directory.
Presented experiments include multiple random processes, in particular:
- generation of synthetic ABCD graphs
- generation of embeddings (excluding deterministic HOPE and LINE)
- splitting data to train and test subsets
- training of classification models (XGBoost)
All abovementioned algorithms were controlled with proper seeding except generation of embedding which would require modification to OpenNE package and additional constraints for specific embedding algorithms (Node2Vec and DeepWalk rely on external Word2Vec implementation different seeding mechanism). As embedding algorithms provide minor contribution to the overall variance of output measures, executing experiments in current setup should still produce results closely resembling the original ones.
Experiments were conducted using SOSCIP Cloud infrastructure based on OpenStack cloud system.
We used OpenNE framework, that exposes common interface to many embedding algorithms. For VERSE algorithm, implementation available under https://github.com/xgfs/verse was used.