Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] User factors and Item factors do not change over iterations #499

Open
neg-loss opened this issue Jan 25, 2023 · 0 comments
Open

[BUG] User factors and Item factors do not change over iterations #499

neg-loss opened this issue Jan 25, 2023 · 0 comments

Comments

@neg-loss
Copy link

Description

I have been working extensively with Bayesian Personalized Ranking (BPR) for quite a sometime and I found something very strange with it. Suppose I have a data containing userId, itemId, rating and train the bpr on it and save the factors. Next, suppose I retrain another bpr on the same data but initialized with the previously trained factors, and then when I compare the user_factors or item factors, the newly created factors change only in very few dimensions and everything else remains the same. Also when I try to log the two runs of BPR I find it very strange that the logs in first iteration are seen as expected but logs in second run are un-explainable.

In my use case,
I have users, items and timeframe(suppose one month). As the timeframe slides, interaction between users and items changes, for example, a user may buy multiple or zero items spanning the timeframe and an item may or may not be available in a timeframe. Also new users/items may get added in subsequent timeframes.
So I believed that let's train bpr on initial timeframe, let's call it T1, get the factors(embeddings) for user and item, on second timeframe T2, let's train the bpr2 but this time let's initialize the user and item factors as follows:

  • For those users/items in T2 for which we have factors available from T1, initialize with them.
  • For those users/items in T2 for which no factors are available from T1, initialize them with random vectors.

Later I found out that even if I run the bpr on same dataset, inconsistent behaviour is shown.

I modified the /cornac/cornac/models/bpr/recom_bpr.pyx for logging, and have attached the same.
Also, I have attached a sample dataset, and a notebook to run on.

Different variables like log_initial log_final etc. are defined in notebook.

Now, I tried comparing the learned factors from these two runs and found these observations-

As we read the log_initial for any item at index let's say 0:

  • The last entry i.e. item_factor whether i or j, in the log appears as an embedding in the df_item exactly as expected).
  • If we take a look at the logs, every after item_{i/j} factor and following previous item_{i/j} factor are exactly the same as expected.

Now when we read log_final for the same index as read in log_initial:

  • The last entry i.e. item_factor{i/j} in the log is not same as what appears in the df_item_final.
  • if we take a look at the logs, every after item_{i/j} factor and following previous item_{i/j} are not same. I don't know why this happens.

So I have following questions:

  • Why the logs are consistent in log_initial and inconsistent in log_final??
  • Why the final entry in log_final does not show up as item_factor in df_item_final??
  • When I compare both the embeddings, except item_bias and values in first few dimensions change, everything else remains the same in df_item and df_item_final.

In which platform does it happen?

Cornac 1.14.2
Python 3.8.16
Debian 11 Bullseye

How do we replicate the issue?

  • First download the repo cornac and extract it and then replace the file cornac/cornac/models/bpr/recom_bpr.pyx with the one available here
  • Next install it by running the command python3 setup.py install
  • Run the notebook available here
  • The data to run on is debug_bpr_train.csv
  • You will be able to generate the issue.

Expected behavior (i.e. solution)

  1. First the logs in second run should be consistent.
  2. Second, if everything goes correct, then the last entry i.e. factor should appear as a factor in the df_item_final.
  3. The factors in df_item and df_item_final should differ considerably in nearly all of the dimensions for any arbitrary item.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant