Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancy with implementation and the paper #60

Open
AetherPrior opened this issue May 8, 2021 · 0 comments
Open

Discrepancy with implementation and the paper #60

AetherPrior opened this issue May 8, 2021 · 0 comments

Comments

@AetherPrior
Copy link

Hi, I hope you are doing well. While going through your implementation on the pointer generator, I have noticed that there's a difference in the implementation of the p_gen calculation versus the formula mentioned in the paper.
I request some clarity as to why it has been implemented this way (if there is any advantage in doing so).

        y_t_1_embd = self.embedding(y_t_1)
        x = self.x_context(torch.cat((c_t_1, y_t_1_embd), 1))
        lstm_out, s_t = self.lstm(x.unsqueeze(1), s_t_1)

        h_decoder, c_decoder = s_t
        s_t_hat = torch.cat((h_decoder.view(-1, config.hidden_dim),
                             c_decoder.view(-1, config.hidden_dim)), 1)  # B x 2*hidden_dim
        c_t, attn_dist, coverage_next = self.attention_network(s_t_hat, encoder_outputs, encoder_feature,
                                                          enc_padding_mask, coverage)

        if self.training or step > 0:
            coverage = coverage_next

        p_gen = None
        if config.pointer_gen:
            p_gen_input = torch.cat((c_t, s_t_hat, x), 1)  # B x (2*2*hidden_dim + emb_dim)
            p_gen = self.p_gen_linear(p_gen_input)
            p_gen = F.sigmoid(p_gen)

From what I know, the p_gen takes in the context vector c_t , the s_t_hat and the input y_t_1 separately, but you've passed the concatenated input x .
I am attaching a screenshot from the original paper as a reference.
pointer_gen
From what I can see here, they are directly passing in the decoder input x_t into the sigmoid instead of concatenating the context vector with it.
In this line however,

x = self.x_context(torch.cat((c_t_1, y_t_1_embd), 1))

the context vector is being concatenated with the input, before being fed into the sigmoid function:

            p_gen_input = torch.cat((c_t, s_t_hat, x), 1)  # B x (2*2*hidden_dim + emb_dim)
            p_gen = self.p_gen_linear(p_gen_input)
            p_gen = F.sigmoid(p_gen)

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant