Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

candidates argument for FactorizedTopK #688

Open
datasciyj opened this issue Aug 16, 2023 · 3 comments
Open

candidates argument for FactorizedTopK #688

datasciyj opened this issue Aug 16, 2023 · 3 comments

Comments

@datasciyj
Copy link

datasciyj commented Aug 16, 2023

Hi,
metrics = tfrs.metrics.FactorizedTopK( candidates=movies.batch(128).map(movie_model) )
I'm trying to figure out how 'candidates' argument works for FactorizedTopK metric from the retrieval tutorial.
The tutorial uses 'movies' dataset, and I found the dataset includes some duplicates.
I tested using an array of unique movies for that argument and I got different accuracy compared to using 'movies' dataset.
Can anyone help me to understand how the candidates are used to calculate accuracy and how I should create this from the dataset I have (order of items and batch size)?

@rlcauvin
Copy link

Top K categorical accuracy is the percentage of records for which the (non-zero) targets are in the top K predictions. So, if a user clicked or rated a movie positively, and that movie has the 11th highest score in the model's predictions for that user, then it wouldn't qualify for the top 10 categorical accuracy, but it would qualify for the top 25 categorical accuracy, for example.

@datasciyj
Copy link
Author

datasciyj commented Aug 27, 2023

Thanks for your answer, @rlcauvin
Can I also get your help with understanding why I can't use unique values of movies for 'candidates' argument?
I tried using the unique movies for 'candidates' but the top k accuracy got different. I couldn't understand why I can't just use unique items if the 'candidates' is used as implicit negatives.

@datasciyj datasciyj reopened this Aug 27, 2023
@rlcauvin
Copy link

I use unique candidates in my retrieval models. I suppose specifying candidates with duplicates could result in some of the duplicates appearing more than once in the top K recommendations for a user, or in implicit negatives skewing the model. I haven't examined the MovieLens dataset, but I don't see any good reason that it should contain duplicates in the movies file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants