quetion about the loss and grad of "mbr" #322

Cescfangs · 2021-04-25T06:22:35Z

neural_sp/neural_sp/models/seq2seq/decoders/las.py

Lines 535 to 548 in 2b10b9c

    
           for b in range(bs): 
        
               nbest_hyps_id_b = [np.fromiter(y, dtype=np.int64) for y in nbest_hyps_id[b]] 
        
               nbest_hyps_id_batch += nbest_hyps_id_b 
        
               scores_b = np2tensor(np.array(scores[b], dtype=np.float32), eouts.device) 
        
               probs_b_norm = torch.softmax(scaling_factor * scores_b, dim=-1)  # `[nbest]` 
        
               wers_b = np2tensor(np.array([ 
        
                   compute_wer(ref=idx2token(ys_ref[b]).split(' '), 
        
                               hyp=idx2token(nbest_hyps_id_b[n]).split(' '))[0] / 100 
        
                   for n in range(nbest)], dtype=np.float32), eouts.device) 
        
               exp_wer_b = (probs_b_norm * wers_b).sum() 
        
               grad_list += [(probs_b_norm * (wers_b - exp_wer_b)).sum()] 
        
               exp_wer += exp_wer_b 
        
           exp_wer /= bs

I don't know much about mbr, according to these lines, it looks like a mWER loss and gradient to me

hirofumi0810 · 2021-04-25T10:20:05Z

@Cescfangs yes

Cescfangs · 2021-04-26T01:31:16Z

@Cescfangs yes

Thanks for the reply, and I'm curious about the improvement of this mWER tuning, say 5% relative wer reduction?

Cescfangs · 2021-04-26T12:04:11Z

neural_sp/neural_sp/models/criterion.py

Lines 12 to 39 in 2b10b9c

    
           class MBR(torch.autograd.Function): 
        
               """Minimum Bayes Risk (MBR) training. 
        
               Args: 
        
                   vocab (int): number of nodes in softmax layer 
        
               """ 
        
               @staticmethod 
        
               def forward(ctx, log_probs, hyps, exp_risk, grad_input): 
        
                   """Forward pass. 
        
                   Args: 
        
                       log_probs (FloatTensor): `[B * nbest, L, vocab]` 
        
                       hyps (LongTensor): `[B * nbest, L]` 
        
                       exp_risk (FloatTensor): `[1]` (for forward) 
        
                       grad_input (FloatTensor): `[1]` (for backward) 
        
                   Returns: 
        
                       loss (FloatTensor): `[1]` 
        
                   """ 
        
                   ctx.save_for_backward(grad_input) 
        
                   return exp_risk 
        
               @staticmethod 
        
               def backward(ctx, grad_output): 
        
                   grads, = ctx.saved_tensors 
        
                   # grads = torch.mul(grads, grad_output) 
        
                   return grads, None, None, None

Also, I am a little confused about the “mbr” loss, the inputs are not used in backward function, how does the grad flow to model params?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quetion about the loss and grad of "mbr" #322

quetion about the loss and grad of "mbr" #322

Cescfangs commented Apr 25, 2021

hirofumi0810 commented Apr 25, 2021

Cescfangs commented Apr 26, 2021

Cescfangs commented Apr 26, 2021

quetion about the loss and grad of "mbr" #322

quetion about the loss and grad of "mbr" #322

Comments

Cescfangs commented Apr 25, 2021

hirofumi0810 commented Apr 25, 2021

Cescfangs commented Apr 26, 2021

Cescfangs commented Apr 26, 2021