Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Init hidden state for the 2nd sentence onward #15

Open
smutahoang opened this issue Jan 25, 2019 · 2 comments
Open

Init hidden state for the 2nd sentence onward #15

smutahoang opened this issue Jan 25, 2019 · 2 comments

Comments

@smutahoang
Copy link

Hi,

Thanks for sharing your implementation. This helps me a lot.

I just wonder the way you initialize the hidden state for the question second question onward. Precisely, in the "def train_data(mini_batch, targets, word_attn_model, sent_attn_model, word_optimizer, sent_optimizer, criterion):" function (in the "attention_model_validation_experiments" notebook), you currently use a loop over the sentence: "_s, state_word, _ = word_attn_model(mini_batch[i,:,:].transpose(0,1), state_word)". That means, both "the forward and backward states of the last word in the sentence i" are used for initializing the forward and backward states of sentence i+1. I can understand the case for forward state as the two sentence are consecutive, but the backward state initialization seems not very reasonable.

Can you please explain this in more detail? Thanks.

@Sandeep42
Copy link
Contributor

Hi,

I'm sorry that I don't understand what you are asking for. You will have to initialise both the forward and the backward states initially to start the training process.

Please revert back to me with a bit more clarity so that I will be able to help you out.

Thanks.

@smutahoang
Copy link
Author

Lets use last_h_S = (last_h_forward, last_h_backward) to denote the hidden states of the last word in sentence number S, and use init_h_[S+1] to denote the init hidden states of the sentence number S + 1.

From the code, I understand that you assign init_h_[S+1] = last_h_S = (last_h_forward, last_h_backward) (am I right?). Should it be more reasonable to set init_h_[S+1] = (last_h_forward, 0) ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants