candidates argument for FactorizedTopK #688

datasciyj · 2023-08-16T22:26:19Z

Hi,
metrics = tfrs.metrics.FactorizedTopK( candidates=movies.batch(128).map(movie_model) )
I'm trying to figure out how 'candidates' argument works for FactorizedTopK metric from the retrieval tutorial.
The tutorial uses 'movies' dataset, and I found the dataset includes some duplicates.
I tested using an array of unique movies for that argument and I got different accuracy compared to using 'movies' dataset.
Can anyone help me to understand how the candidates are used to calculate accuracy and how I should create this from the dataset I have (order of items and batch size)?

The text was updated successfully, but these errors were encountered:

rlcauvin · 2023-08-21T15:35:18Z

Top K categorical accuracy is the percentage of records for which the (non-zero) targets are in the top K predictions. So, if a user clicked or rated a movie positively, and that movie has the 11th highest score in the model's predictions for that user, then it wouldn't qualify for the top 10 categorical accuracy, but it would qualify for the top 25 categorical accuracy, for example.

datasciyj · 2023-08-27T21:43:20Z

Thanks for your answer, @rlcauvin
Can I also get your help with understanding why I can't use unique values of movies for 'candidates' argument?
I tried using the unique movies for 'candidates' but the top k accuracy got different. I couldn't understand why I can't just use unique items if the 'candidates' is used as implicit negatives.

rlcauvin · 2023-08-29T00:06:58Z

I use unique candidates in my retrieval models. I suppose specifying candidates with duplicates could result in some of the duplicates appearing more than once in the top K recommendations for a user, or in implicit negatives skewing the model. I haven't examined the MovieLens dataset, but I don't see any good reason that it should contain duplicates in the movies file.

datasciyj closed this as completed Aug 27, 2023

datasciyj reopened this Aug 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

candidates argument for FactorizedTopK #688

candidates argument for FactorizedTopK #688

datasciyj commented Aug 16, 2023 •

edited

rlcauvin commented Aug 21, 2023

datasciyj commented Aug 27, 2023 •

edited

rlcauvin commented Aug 29, 2023

candidates argument for FactorizedTopK #688

candidates argument for FactorizedTopK #688

Comments

datasciyj commented Aug 16, 2023 • edited

rlcauvin commented Aug 21, 2023

datasciyj commented Aug 27, 2023 • edited

rlcauvin commented Aug 29, 2023

datasciyj commented Aug 16, 2023 •

edited

datasciyj commented Aug 27, 2023 •

edited