MCNS

Arxiv

Understanding Negative Sampling in Graph Representation Learning.

Zhen Yang*, Ming Ding*, Chang Zhou, Hongxia Yang, Jingren Zhou, Jie Tang. (*These authors contributed equally to this work.)

In KDD 2020 (Research Track)

Introduction

We systematically analyze the role of negative sampling from the perspectives of both objective and risk, and quantify that the negative sampling distribution should be positively but sub-linearly correlated to their positive sampling distribution. With the guidance of the theory, we propose MCNS, approximating the positive distribution with self-contrast approximation and accelerating negative sampling by Metropolis-Hastings.

Preparation

Python 3.7
Tensorflow 1.14.0

Training

Training on the existing datasets

For GraphSAGE:

You can use $ ./experiments/graphsage/***.sh to train MCNS model on the recommendation task. For example, if you want to train on the Amazon dataset, you can run $ ./experiments/graphsage/amazon.sh or python main.py --input data/amazon/ --model graphsage_mean to train MCNS model.

For DeepWalk:

You can use $ ./experiments/deepwalk/***.sh to train MCNS model on the recommendation task. For example, if you want to train on the ml-100k dataset, you can run $ ./experiments/deepwalk/ml.sh or python main.py --input data/ml-100k/ --model deepwalk to train MCNS model.

For GCN:

You can use $ ./experiments/gcn.sh to train MCNS model on the ml-100k dataset for recommendation task.

Training on your own datasets

if you want to train MCNS on your own dataset, you should prepare the following four files:

train.txt: Each line represents an edge <node1> <node2>.
valid.txt: the same format with train.txt
test.txt: the same format with train.txt
test_neg.txt: For each node, we select some unconnected nodes as negs for evaluation. For Amazon and Alibaba datasets, we select 500 negs, and all unconnected negs for ml-100k to evaluate hits@k and MRR. (generated by load_data.py/load_test_neg function).

Dataset

ml-100k contains 943 users, 1,682 items and 100,000 edges.
Amazon contains 192,403 users, 63,001 items and 1,689,188 edges.
Alibaba contains 106,042 users, 53591 items and 907,470 edges.

Acknowledgement

The trainable encoder of our code is based on GraphSAGE.

Cite

Please cite our paper if you find this code useful for your research:

@misc{yang2020understanding,
Author = {Zhen Yang and Ming Ding and Chang Zhou and Hongxia Yang and Jingren Zhou and Jie Tang},
Title = {Understanding Negative Sampling in Graph Representation Learning},
Year = {2020},
Eprint = {arXiv:2005.09863},
}

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
link_prediction		link_prediction
recommendation		recommendation
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

link_prediction

link_prediction

recommendation

recommendation

.DS_Store

.DS_Store

README.md

README.md

Repository files navigation

MCNS

Arxiv

Introduction

Preparation

Training

Training on the existing datasets

For GraphSAGE:

For DeepWalk:

For GCN:

Training on your own datasets

Dataset

Acknowledgement

Cite

About

Releases

Packages

Languages

THUDM/MCNS

Folders and files

Latest commit

History

Repository files navigation

MCNS

Introduction

Preparation

Training

Training on the existing datasets

For GraphSAGE:

For DeepWalk:

For GCN:

Training on your own datasets

Dataset

Acknowledgement

Cite

About

Resources

Stars

Watchers

Forks

Languages