Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve computation of ranking related metrics #530

Open
darrylong opened this issue Oct 12, 2023 · 5 comments
Open

Improve computation of ranking related metrics #530

darrylong opened this issue Oct 12, 2023 · 5 comments
Assignees
Labels
feature New feature/enhancement request

Comments

@darrylong
Copy link
Member

Description

Certain ranking metrics currently take a considerably long time for a single calculation output.

Other Comments

It could be due to:

  1. Some slow calculation in the ranking_eval function.

    test_set.user_indices, desc="Ranking", disable=not verbose, miniters=100

    More testing is required to determine which part is to be optimised.

  2. How the score function calculates during evaluation on a user-by-user basis.

    def score(self, user_idx, item_idx=None):

Further discussion to find how we could improve performance would be great. :)

Thank you all!

@darrylong darrylong added the feature New feature/enhancement request label Oct 12, 2023
@tqtg
Copy link
Member

tqtg commented Oct 12, 2023

One way to speed up evaluation is to do sampling for negative items. This usually makes sense when the set of items is huge and we can't afford to rank all of them. The slowness is probably caused by argsort function:

# rank items based on their scores

Sketch idea is to let user define how many items they want to use as negatives (e.g., 100), then we do sampling somewhere here:

# filter items being considered for evaluation

After that, where are two ways to continue:

  1. let the model scores all items as usual, we then only compare the true positives and the sampled negatives while computing metrics.
    • pros: utilize matrix ops which are quite efficient, less tedious changes
    • cons: still compute scores for some items that will not be used
  2. ask the model to score only true positives and sampled negatives, put dummy scores for the items not being used while computing metrics.
    • pros: could be faster if negative items set to a small number
    • cons: diminishing return when not utilizing matrix ops and negative items set to a large number

A smarter way is adaptive, analyzing both and set a flag to pick one or the other for subsequent iterations. Just something coming on top of my head at the moment. Let's discuss if anyone has more clever ideas.

@hieuddo
Copy link
Member

hieuddo commented Oct 13, 2023

My thought on this is because of the score() function in Recommender class:

def score(self, user_idx, item_idx=None):

where we only evaluate users one by one. Will doing batch user evaluation be faster for GPU-based models?

@tqtg
Copy link
Member

tqtg commented Oct 13, 2023

Most of the GPU-based models only utilize GPU during training but not evaluation. Their predictive functions tend to be dot-product between user and item embedding which is quite efficient with Numpy. It wouldn't hurt to have, let's say score_batch() function, to do scoring for a batch of users. However, the speedup might be negligible based on my experience . Also, to take full advantage of this, we might need to re-design metrics to compute for a batch of users, which is not trivial for some cases.

@lthoang
Copy link
Member

lthoang commented Oct 15, 2023

Some models with a large number of parameters cannot generate scores for a large number of items (using batch scoring function) due to insufficient memory. Especially, when the scoring function requires to compute the score for every pairs of user and item that leads to scalability issue when ranking the full set of items.

I prefer to add an optional parameter to specify the number of negative samples for ranking evaluation. This has been applied in many research. Of course, we need to ensure reproducibility by specifying a random seed for whatever pseudo-random generator was used.

2. ask the model to score only true positives and sampled negatives, put dummy scores for the items not being used while computing metrics.
   
   * pros: could be faster if negative items set to a small number
   * cons: diminishing return when not utilizing matrix ops and negative items set to a large number

@tqtg
Copy link
Member

tqtg commented Oct 17, 2023

@lthoang @hieuddo @darrylong it seems that we can have an option for negative sampling during evaluation. Anyone wants to take a lead on this improvement? 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature/enhancement request
Projects
None yet
Development

No branches or pull requests

4 participants