Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Idea] Learn from Positives and Negatives Using Two Retrieval Models #675

Open
rlcauvin opened this issue Jun 16, 2023 · 3 comments
Open

Comments

@rlcauvin
Copy link

In a recommendation system with retrieval and ranking stages, the retrieval model typically learns only from positive examples in the training dataset. Consequently, it fails to learn which items users don't like. The initial set of recommendations from the retrieval model may include items that the user doesn't like and exclude items that the user does like.

Below is an idea for leveraging negative examples in retrieval models to address this problem. The approach is to train two retrieval models:

  1. Positive Retrieval Model. A retrieval model trained on positive examples. This model is the conventional one; we train it to predict the items the user will like.
  2. Negative Retrieval Model. A retrieval model trained on negative examples. We train this model to predict the items the user won't like.

Armed with positive and negative retrieval models, we can invoke both of them in the retrieval stage. We query the positive retrieval model, then the negative retrieval model, and combine the lists of recommended candidates. The wrinkle is that, instead of using the negative retrieval model to find the nearest neighbors of a query vector, we want to use it to find the furthest neighbors of a query vector. I'm not sure we can do so with tfrs.layers.factorized_top_k.ScaNN or tfrs.layers.factorized_top_k.BruteForce.

Thoughts on this idea, or any pointers to code, classes, or libraries for retrieving furthest neighbors?

@rlcauvin
Copy link
Author

rlcauvin commented Jun 25, 2023

I looked at the code for tfrs.layers.factorized_top_k.BruteForce and realized I could create a nearly identical class (BruteForceFurthest) that returns the furthest neighbors by changing one line of code. Here is the relevant code excerpt from the call() function.

    scores = self._compute_score(queries, self._candidates)
    values, indices = tf.math.top_k(-scores, k = k)  # Negate the scores to get the furthest neighbors
    return values, tf.gather(self._identifiers, indices)

As you can see, I think it is as simple as negating the scores so that the "top K" scores are really the bottom K scores.

I then trained a "negative" retrieval model on the negative examples and instantiated BruteForceFurthest as follows:

    query_model = negative_retrieval_model.query_model
    candidate_model = negative_retrieval_model.candidate_model
    index = BruteForceFurthest(query_model = query_model, k = k)
    top_k_retriever = index.index_from_dataset(candidate_ds.batch(100).map(lambda c: (c["item_id"], candidate_model(c))))

After creating that top_k_retriever, I can retrieve the candidates the user is least likely not to click:

scores, items = top_k_retriever(query, 100)

Ultimately, I want to combine these candidates with the candidates from the "positive" retrieval model. I'm wondering if I can do so in a special BruteForcePosNeg class that computes the nearest neighbors from the "positive" retrieval model and the furthest neighbors from the "negative" retrieval model, somehow averages the scores, and returns the top scoring candidates.

@caesarjuly
Copy link

caesarjuly commented Jun 27, 2023

This is a really interesting topic. But I have some questions:

  1. Are the negative samples come from explicit user feedback?
  2. If yes, is the dataset sparse or too small?
  3. If not, the dataset could be very noisy, how to clean it?
  4. Are the scores from two separate models comparable? The scores can be highly affected by the sampling strategy. How to calibrate it before fusing?
  5. The original idea of in-batch negative sampling is to take other users' positive samples as the current user's negative samples. In general, this idea can provide fast training speed. But there could be an issue regarding the sampling distribution

Actually, there is a method called Mixed Negative Sampling that can mitigate this issue. You can refer to my blog for details.

@rlcauvin
Copy link
Author

Thanks for sharing the Mixed Negative Sampling method, @caesarjuly. I'm going to study it and try to figure out how it works.

As for your questions:

1. Are the negative samples come from explicit user feedback?
My use case is similar to a daily email in which the user clicks or doesn't click the primary link in the message. A click is positive feedback, and a non-click is negative feedback.

2. If yes, is the dataset sparse or too small?
The sample contains about a million rows, roughly balanced between clicks (positive) and non-clicks (negative). It has a rich set of query and candidate features.

3. If not, the dataset could be very noisy, how to clean it?
Having a balanced sample seems to help a lot.

4. Are the scores from two separate models comparable? The scores can be highly affected by the sampling strategy. How to calibrate it before fusing?
I'm not sure if they're comparable. The negative retrieval model seems to produce higher scores and has a significantly higher ROC-AUC than the positive retrieval model.

6. The original idea of in-batch negative sampling is to take other users' positive samples as the current user's negative samples. In general, this idea can provide fast training speed. But there could be an issue regarding the sampling distribution.
I haven't tried in-batch negative sampling, but if I understand it, it's unlikely to perform well, because instead of "real" negatives, it's assuming that other users' positives are negatives. In my case, I know the negatives, because I know that the user didn't click.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants