Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Item and User Normalization #697

Open
wahabaftab opened this issue Jul 10, 2023 · 0 comments
Open

Item and User Normalization #697

wahabaftab opened this issue Jul 10, 2023 · 0 comments

Comments

@wahabaftab
Copy link

I trained my LightFM model on user and item features along with interactions. I noticed some things which didnt make sense to me, So I'm hoping someone would make me understand. I am using the following code for splitting, training and evaluation:

train, test = random_train_test_split(interactions, test_percentage=0.2, random_state=np.random.RandomState(5))
train_weights, test_weights = random_train_test_split(weights, test_percentage=0.2, random_state=np.random.RandomState(5))

#model training
model.fit(train,
      user_features=user_features,
      item_features= item_features,
      sample_weight= train_weights,
      epochs=10)


# Evaluate the model on the test set using auc
auc = auc_score(model,
                      test,
                      user_features=user_features,
                      item_features=item_features,
                     ).mean()

The things I need to understand is the effect of user and item normalization on the evaluation. Following is the code :

user_features = dataset.build_user_features(User_df['features'].tolist(), normalize= True)

item_features = dataset.build_item_features(Product_df['features'].tolist(), normalize= True)

Things which are confusing:

  • When I put normalize =False for both, then I get AUC approx 84%.
  • When I put normalize =True for both, then I get AUC approx 99%.
  • When I normalize only user features, the AUC is still 99%.
  • When I normalize only item features, the AUC is still 99%.
  • When I put normalize =False for user and exclude item features from training, the AUC is still 99%.
  • When I put normalize =False for user and exclude user features from training, the AUC is 84%.

I'd like to know if normalization can have this much effect and if above scenarios make any sense. Also additionally, AUC of 99% seems too good to be true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant