New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A question about Negative sampling? #3
Comments
|
The first problem I understand. For the second problem, I can understand the situation c_{ui} != 0. I use your code and use the Logistic likelihood function with Negative sampling to replace the loss function as below, the I do some experiments using the Logistic likelihood function with Negative sampling in setting of different K. And I find the conclusion that the performance is improve by enlarge the K. And the performance of Logistic likelihood function reach best when take all the zero entry as negative samples. And I find that there is a paper which take the variational auto-encoder with Logistic likelihood function[1], being similar with your work. And the Negative sampling improve his result a lot. [1] Augmented Variational Autoencoders for Collaborative Filtering with Auxiliary Information,2017,CIKM. |
I am not sure I follow, but for logistic isn't what I did is using all the 0's as negatives? |
I think I understand now, and maybe you misunderstood what I did -- for both Gaussian and logistic, I used all the 0's in the training. With Gaussian, I applied the c_{ui} weight which is in effect down-weighting all the negatives. With logistic, I simply used all the 0's, which I think corresponds to what you mean by setting K to the largest possible. |
Your diagram looks correct. (One minor detail is that the splitting between red and green for each test user is random, not like certain items will only in red or green for all test users, so just to make that clear.) I think there is only one sensible way to do it. Rather than me directly feeding you the answer, maybe you can think about it first and tell me how you would do it? |
You are right, the split is random. To see it simply, I draw the diagram like the above. This is how I think, but I think training WMF like this exists some problems above. Is there anything wrong, and how do you do it? |
Yes, you are right that this would leak the validation data for WMF. A simple fix (this is how I did) is to train WMF only with the blue box and only keep the item factors. Then during evaluation, keep the item factors fixed and learn the validation user factors (which corresponds to one ALS update) with the red box and make prediction for the green box. This is known as strong generalization. |
I wonder why you didnt use Binary cross entropy over Cross entropy also. Since it is a multi-label problem. |
Also, in production, how do you represent new videos with this architecture? |
Hi, I have two questions about the paper that I can't understand to ask you for full sincerity, hoping you can give some detail or explanation about them.
The first is the assumption that multinomial distribution is better suited for ranking metrics, in other view multinomial distribution means the limited budget for probability mass, and the purchase of different goods is exclusive. But in some situation, the purchase of different goods is Not exclusive, for example, Buying a mobile phone and mobile phone case is not mutually exclusive.
The second is the experiment about Table 4 which compare the performance of different likelihood functions. As I know, most collaborative filter method using Gaussian likelihood functions and logistic likelihood functions with Negative sampling or weighting. In the equation (3), you have showed the Gaussian likelihood functions with c_{ui}, that means you only care about the entry 1, and as well as in equation (4). But the most important trick or method in recommendation, negative sampling which I can't find in Gaussian ,Logistic, Multinomial likelihood function. As I know the NCF[1] and CVAE[2] and many other method both use Negative sampling in their method (including Gaussian and Logistic) which can boost their performance. And I concern that the Multinomial likelihood function can't take the Negative sampling cause the it mathematical form. So I wonder can the Multinomial likelihood function beat the logistic likelihood functions with Negative sampling. And did you use NCF with Negative sampling ?
[1] Neural Collaborative Filtering ∗ Xiangnan,2017,WWW
[2] Collaborative Variational Autoencoder for Recommender Systems, 2017,KDD
The text was updated successfully, but these errors were encountered: