Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用recbole1.2.0时发现ml-1m的数据数量对不上 #2051

Open
yunshanlucky opened this issue May 17, 2024 · 0 comments
Open

使用recbole1.2.0时发现ml-1m的数据数量对不上 #2051

yunshanlucky opened this issue May 17, 2024 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@yunshanlucky
Copy link

模型用的是kgat,数据集选用ml-1m,发现无论是原始未作任何处理的ml-1m数据集,还是官方提供的最优的数据配置下的用户数项目数,都与我操作出来的实际数量对不上,

以下是我复现未作任何处理的数据集,其中官方给出的用户数为6041,物品数量为3707,结果不一致
17 May 10:42 INFO ml-1m
The number of users: 6041
Average actions of users: 165.16225165562915
The number of items: 3656
Average actions of items: 272.93570451436386
The number of inters: 997580
The sparsity of the dataset: 95.48318075934071%
Remain Fields: ['entity_id', 'user_id', 'item_id', 'rating', 'head_id', 'relation_id', 'tail_id']
The number of entities: 79348
The number of relations: 51
The number of triples: 385923
The number of items that have been linked to KG: 3655

同时给出我复现的官方最优参数的结果,其中官方给出的用户数为6034,物品数量为3096,结果不一致
17 May 11:19 INFO ml-1m
The number of users: 6034
Average actions of users: 137.95773247140727
The number of items: 3104
Average actions of items: 268.2239767966484
The number of inters: 832299
The sparsity of the dataset: 95.55622200144201%
Remain Fields: ['entity_id', 'user_id', 'item_id', 'rating', 'head_id', 'relation_id', 'tail_id']
The number of entities: 17012
The number of relations: 100
The number of triples: 483000
The number of items that have been linked to KG: 3103

以下是官方中找到的最佳配置信息:

dataset config

field_separator: "\t"
seq_separator: " "
USER_ID_FIELD: user_id
ITEM_ID_FIELD: item_id
RATING_FIELD: rating
HEAD_ENTITY_ID_FIELD: head_id
TAIL_ENTITY_ID_FIELD: tail_id
RELATION_ID_FIELD: relation_id
ENTITY_ID_FIELD: entity_id
NEG_PREFIX: neg_
LABEL_FIELD: label
load_col:
inter: [user_id, item_id, rating]
kg: [head_id, relation_id, tail_id]
link: [item_id, entity_id]

data filtering for interactions

val_interval:
rating: "[3,inf)"
unused_col:
inter: [rating]

user_inter_num_interval: "[10,inf)"
item_inter_num_interval: "[10,inf)"

data preprocessing for knowledge graph triples

kg_reverse_r: True
entity_kg_num_interval: "[5,inf)"
relation_kg_num_interval: "[5,inf)"

training and evaluation

epochs: 500
train_batch_size: 4096
eval_batch_size: 40960000
valid_metric: NDCG@10
train_neg_sample_args:
distribution: uniform
sample_num: 1
dynamic: False

model

embedding_size: 64

@yunshanlucky yunshanlucky added the bug Something isn't working label May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants