Skip to content
This repository has been archived by the owner on Aug 18, 2021. It is now read-only.

question: seq2seq with attention tutorial, optimizing encoder and decoder separately? #113

Open
pucktada opened this issue Sep 15, 2018 · 1 comment

Comments

@pucktada
Copy link

regarding this tutorial: https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html

I just have a question (which probably sound very stupid). I am just wondering is it necessary to optimize parameters of the decoder and the encoder separately here?

encoder_optimizer = optim.SGD(encoder.parameters(), lr=learning_rate)
decoder_optimizer = optim.SGD(decoder.parameters(), lr=learning_rate)
...
loss.backward()
encoder_optimizer.step()
decoder_optimizer.step()

so the decoder.parameters() doesn't include encoder.parameters()?? i cant just do "decoder_optimizer.step()"? the loss is backprop all the way, but the parameters aren't?

thanks

@kwasnydam
Copy link

Though this is outdated already, to the best of my understanding, in PyTorch, the parameter becomes registered to the Module when it is assigned to the 'self' in the constructor. So the encoder and decoder will hold references only to the parameters of the submodules that were registered to them. In other words, the encoder.parameters() return only the weights and biases contained within the encoder object, similarly with the decoder. So with such a design, the backward call would calculate all the gradients, but the

decoder_optimizer.step() 

would only update the weights of the decoder object's layers.

I have, however, another question on similar topic. My approach for assigning parameters to the PyTorch optimizer object with a multi-model architecture is as follows:

optim = torch.optim.Adam([
                 {'params': text_encoder.parameters()},
                 {'params': speech_encoder.parameters()},
                 {'params': decoder.parameters()}
             ]

Is there any particular reason for assigning separate optimizer objects in such a scenario, apart from the fact that it enables us to configure optimizers differently for each module? Is there some mistake with my approach?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants