Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quetion about the loss and grad of "mbr" #322

Open
Cescfangs opened this issue Apr 25, 2021 · 3 comments
Open

quetion about the loss and grad of "mbr" #322

Cescfangs opened this issue Apr 25, 2021 · 3 comments

Comments

@Cescfangs
Copy link

for b in range(bs):
nbest_hyps_id_b = [np.fromiter(y, dtype=np.int64) for y in nbest_hyps_id[b]]
nbest_hyps_id_batch += nbest_hyps_id_b
scores_b = np2tensor(np.array(scores[b], dtype=np.float32), eouts.device)
probs_b_norm = torch.softmax(scaling_factor * scores_b, dim=-1) # `[nbest]`
wers_b = np2tensor(np.array([
compute_wer(ref=idx2token(ys_ref[b]).split(' '),
hyp=idx2token(nbest_hyps_id_b[n]).split(' '))[0] / 100
for n in range(nbest)], dtype=np.float32), eouts.device)
exp_wer_b = (probs_b_norm * wers_b).sum()
grad_list += [(probs_b_norm * (wers_b - exp_wer_b)).sum()]
exp_wer += exp_wer_b
exp_wer /= bs

I don't know much about mbr, according to these lines, it looks like a mWER loss and gradient to me

@hirofumi0810
Copy link
Owner

@Cescfangs yes

@Cescfangs
Copy link
Author

@Cescfangs yes

Thanks for the reply, and I'm curious about the improvement of this mWER tuning, say 5% relative wer reduction?

@Cescfangs
Copy link
Author

class MBR(torch.autograd.Function):
"""Minimum Bayes Risk (MBR) training.
Args:
vocab (int): number of nodes in softmax layer
"""
@staticmethod
def forward(ctx, log_probs, hyps, exp_risk, grad_input):
"""Forward pass.
Args:
log_probs (FloatTensor): `[B * nbest, L, vocab]`
hyps (LongTensor): `[B * nbest, L]`
exp_risk (FloatTensor): `[1]` (for forward)
grad_input (FloatTensor): `[1]` (for backward)
Returns:
loss (FloatTensor): `[1]`
"""
ctx.save_for_backward(grad_input)
return exp_risk
@staticmethod
def backward(ctx, grad_output):
grads, = ctx.saved_tensors
# grads = torch.mul(grads, grad_output)
return grads, None, None, None

Also, I am a little confused about the “mbr” loss, the inputs are not used in backward function, how does the grad flow to model params?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants