Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The rank_answer function in BLIP is different from that in ALBEF #208

Open
littleFlyDance opened this issue Apr 18, 2024 · 0 comments
Open

Comments

@littleFlyDance
Copy link

Thank you for your wonderful paper. When I read the function rank_answer in blip_vqa.py, I find it different from that in ALBEF.
In BLIP, we calculate the log-likelihood of generating the entire answer sequence:

``
log_probs_sum = -output.loss
log_probs_sum = log_probs_sum.view(num_ques,k)

    max_topk_ids = log_probs_sum.argmax(dim=1) 
    max_ids = topk_ids[max_topk_ids>=0,max_topk_ids]

``

But in ALBEF, log-likelihood of predicting the second token from cls token is added:

``
answer_loss = output.loss
answer_loss = answer_loss.view(input_ids.size(0),-1)

    # topk_prob: first token probability
    topk_probs = topk_probs.view(-1,1)
    log_probs = torch.cat([topk_probs.log(), -answer_loss],dim=1)

    # re-calculate log probabilities for the answer sequences using chain rule
    log_probs_sum = log_probs.sum(1)
    log_probs_sum = log_probs_sum.view(num_ques,k)

    topk_probs = F.softmax(log_probs_sum, dim=-1)
    # get top-k after re-ranking
    topk_probs, rerank_id = topk_probs.topk(k,dim=1) 
    topk_ids = torch.gather(topk_ids, 1, rerank_id)    

``

Could you tell me why this is?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant