New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LSTM forget gate bias initialization #750
Comments
Yes, the ordering of weights a biases is the same for all implementations and is |
What is the difference between "bias_ih" and "bias_hh" in the LSTM and GRU cells? Should both be initialized with ones between 1/4 and 1/2? |
One of them is added to the linear transform of the input, another one to the hidden transform. It's redundant - there could be only one bias, and the model would be equivalent. However, that's what cuDNN does, so we preferred to keep it like that for consistency. |
forget gate bias initialization. Now set the bias for both the hidden states input and the memory state input. It is still not clear what should be the best value: 0.5, 1, 2, something else? See: http://proceedings.mlr.press/v37/jozefowicz15.pdf pytorch/pytorch#750 modified: modules/multi_dimensional_lstm.py modified: modules/multi_dimensional_lstm_parameters.py modified: modules/train_multi_dimensional_rnn.py
Otherwise, mkl-2021.2 gets installed which contains both .so and .so.1 binaries
Some papers suggest to set forget gate bias of LSTMs to a specific value. For example:
http://jmlr.org/proceedings/papers/v37/jozefowicz15.pdf
Is it possible to do using current implementation of LSTM/LSTMCell?
The text was updated successfully, but these errors were encountered: