Question about leave_k_out function #571

Deemjan · 2022-05-14T19:48:27Z

I noticed something weird when I was using this function to split my data into train and test set
I had a distribution of users and number of times they have rated items looking something like this:

Number of ratings given	Number of users
1	6000
2	3000
3	200
4	30

The documentation states that users > K ratings have one of their rating put into test set, and the others in the train set.
So when I used the function with k = 1 I was expecting to get 3230 records in the test set, but only got 230

So my question is shoudln't this line then

implicit/implicit/evaluation.pyx

Line 189 in 6491663

candidate_mask = counts > K + 1

look like this

candidate_mask = counts >= K + 1

or this

candidate_mask = counts > K

instead ?

I have a guess that it was done this way to prevent situation where user with 2 ratings gets only 1 rating in the train set, because If I understand it correctly users with 1 rating are useless for training? Please verify

The text was updated successfully, but these errors were encountered:

ita9naiwa · 2022-07-09T00:08:57Z

yes, it looks it's bug and it must be fixed.

ita9naiwa · 2022-07-09T00:12:57Z

I'm sorry, it's intended.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about leave_k_out function #571

Question about leave_k_out function #571

Deemjan commented May 14, 2022

ita9naiwa commented Jul 9, 2022

ita9naiwa commented Jul 9, 2022

Question about leave_k_out function #571

Question about leave_k_out function #571

Comments

Deemjan commented May 14, 2022

ita9naiwa commented Jul 9, 2022

ita9naiwa commented Jul 9, 2022