Skip to content

nancheng58/SSL4SR

Repository files navigation

Sequential Recommendation System via Self-supervised Learning

Overview

This repo is the code of our survey paper "基于自监督的预训练在推荐系统中的研究综述" accepted by CCIR 2023, which collects several codes and datasets of Self-supervised learning Sequential Recommendation System baselines.

Model Paper title and link Code link Topic From
ASReP Augmenting Sequential Recommendation with Pseudo-Prior Items via Reversely Pre-training Transformer https://github.com/DyGRec/ASReP Sequential Recommendation SIGIR2021
SASRec Self-Attentive Sequential Recommendation https://github.com/kang205/SASRec Sequential Recommendation ICDM2018
DHCN Self-Supervised Hypergraph Convolutional Networks for Session-based Recommendation https://github.com/xiaxin1998/DHCN Session Recommendation AAAI2021
S3Rec S3Rec: Self-Supervised Learning for Sequential Recommendation with Mutual Information Maximization https://github.com/RUCAIBox/CIKM2020-S3Rec Sequential Recommendation CIKM2020
MrTransformer Improving Transformer-based Sequential Recommenders through Preference Editing https://github.com/mamuyang/MrTransformer Sequential Recommendation TOIS2022
BERT4Rec BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer https://github.com/FeiSun/BERT4Rec Sequential Recommendation CIKM2019
CL4SRec Contrastive Learning for Sequential Recommendation our reproduction via RecBole and DuoRec Sequential Recommendation ICDE2022
SGL Self-supervised Graph Learning for Recommendation https://github.com/wujcan/SGL Session Recommendation SIGIR2021

For CL4Rec and SGL models, we reproduce them and run experiment with RecBole.

Compared with the original paper, the code is changed, e.g., we have added the code to count the indicators of different length.

Datasets

Beauty ML-1M Yelp
User 22364 6040 22845
Item 12102 3352 16552
Interaction 194687 269721 237004
Total File 4.18M 5.30M 5.19M
Min_len 5 17 5
Max_len 50 50 50
Avg_len 8.7057 44.6557 10.37443
Density 0.07194251% 1.3322134% 0.06267784%
Attributes 2320 18 1158
Min. Attribute / Item 1 1 0
Max. Attribute / Item 9 6 33
Avg. Attribute / Item 3.9391 1.7072 4.9205
length Beauty ML-1M Yelp
[0,20) 21228 | 94.9202% 177 | 2.9305% 20744 | 90.8032%
[20,30) 655 | 2.9289% 684 | 11.3245% 1094 | 4.7888%
[30,40) 231 | 1.0330% 543 | 8.9901% 511 | 2.2368%
[40,50] 250 | 1.1179% 4636 | 76.7550% 496 | 2.1712%
overall 22364 | 100% 6040 | 100% 22845 | 100%

Datasets PreProcessing

We refer to the method in [1,2,3] to process the datasets. If the user interacts with the item, we will convert the interaction with a clear score into implicit positive feedback. After that, we will group the interactive information according to users. We will sort the items for each user according to the timestamp of their interaction with the items. Because this work aims not to investigate the "cold start" issue in the recommendation system, we circularly filter out users with less than 5 interactions and items with less than 5 interactions. In addition, there are users with too much interactive data in the dataset used in this work, so we limit the maximum length of the user interaction sequences to 50. Because the yelp dataset is too large, we adopted a processing method similar to [3], and only the 2019 data of the dataset was intercepted.

Reference:

[1] Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recommendation. In ICDM. IEEE, 197–206.

[2] Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. 2019. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In CIKM. 1441–1450.

[3] S3-Rec: Self-Supervised Learning for Sequential Recommendation with Mutual Information Maximization. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM '20).

Usage

Environments

For each model, the requirement.txt is provided, you can use

pip install -r requirements.txt if you use pip.

conda install --yes --file requirements.txt if you use conda.

Slurm

Besides, we provide the slurm execute script in each baseline folder.

More details about slurm usage can be found on this link: https://slurm.schedmd.com/documentation.html

[Note: please modify "conda activate envname" to your environment]

#!/bin/bash
#SBATCH -e sas_ans_FT.err
#SBATCH -o sas_ans_FT.out
#SBATCH -J sas4recFT # jobname

#SBATCH --partition=debug 
#SBATCH --nodelist=gpuxx
#SBATCH --gres=gpu:1
#SBATCH --cpus-per-task=4
#SBATCH --time=999:00:00


conda activate xxx
python main.py