Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question about calculating precision@k #2

Open
caolingyu opened this issue Apr 3, 2018 · 4 comments
Open

A question about calculating precision@k #2

caolingyu opened this issue Apr 3, 2018 · 4 comments

Comments

@caolingyu
Copy link

caolingyu commented Apr 3, 2018

Hello,

I have a question about the function precision_at_k in evaluation.py. I think the denominator should be the amount of 1 predictions made among the top k predictions, however, in the code, length of top k is used. For example, if only 1 true prediction in the top 5, the denominator should be 1 but in this case it would still be 5.

Here is my modification:

def precision_at_k(yhat, yhat_raw, y, k):
    #num true labels in top k predictions / num 1 predictions in top k 
    sortd = np.argsort(yhat_raw)[:,::-1]
    topk = sortd[:,:k]

    #get precision at k for each example
    vals = []
    for i, tk in enumerate(topk):
        if len(tk) > 0:
            num_true_in_top_k = y[i,tk].sum()
            denom = yhat[i,tk].sum()
            if denom == 0: # in case no true predictions made in top k
                vals.append(1)
            else:
                vals.append(num_true_in_top_k / float(denom))

    return np.mean(vals)

Could you take a look at it? Correct me if I am wrong.

@jamesmullenbach
Copy link
Owner

For example, if only 1 true prediction in the top 5, the denominator should be 1 but in this case it would still be 5.

Do you mean if there's only one true positive code? Precision @ k is generally defined as the fraction of the k highest scored-labels that are in the set ground truth labels (it is defined this way at least in our paper). So we always want the denominator to be k, even if there are fewer than k ground truth labels.

@caolingyu
Copy link
Author

Thanks for your reply.
I mean, I think in general the denominator should be how many positive predictions the model makes among the top k predictions.
For example, the top 5 predictions are [1,0,0,1,1] and the true labels are [1,0,1,0,1]. I think in this case the precision @ 5 should be 2/3 but in the code it will be 2/5.
I notice that, in the code comment you define precision @ k as 'num true labels in top k predictions / num 1 predictions in top k', where the denominator is not k.

@jamesmullenbach
Copy link
Owner

Ah, I see what you mean now. Yes you can define it that way. I'm not actually sure what the standard practice is in information retrieval, which is where I think this metric is most prevalent. I suspect the difference will not be major for this ICD coding task, as a trained model will usually predict more than 8 codes. For internal consistency I think I will update that comment, but keep the implementation as is now.

@simon19891101
Copy link

The original implementation doesn't seem to be wrong. This article may help: https://medium.com/@m_n_malaeb/recall-and-precision-at-k-for-recommender-systems-618483226c54

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants