question: seq2seq with attention tutorial, optimizing encoder and decoder separately? #113

pucktada · 2018-09-15T14:53:38Z

regarding this tutorial: https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html

I just have a question (which probably sound very stupid). I am just wondering is it necessary to optimize parameters of the decoder and the encoder separately here?

encoder_optimizer = optim.SGD(encoder.parameters(), lr=learning_rate)
decoder_optimizer = optim.SGD(decoder.parameters(), lr=learning_rate)
...
loss.backward()
encoder_optimizer.step()
decoder_optimizer.step()

so the decoder.parameters() doesn't include encoder.parameters()?? i cant just do "decoder_optimizer.step()"? the loss is backprop all the way, but the parameters aren't?

thanks

kwasnydam · 2019-04-19T07:07:03Z

Though this is outdated already, to the best of my understanding, in PyTorch, the parameter becomes registered to the Module when it is assigned to the 'self' in the constructor. So the encoder and decoder will hold references only to the parameters of the submodules that were registered to them. In other words, the encoder.parameters() return only the weights and biases contained within the encoder object, similarly with the decoder. So with such a design, the backward call would calculate all the gradients, but the

decoder_optimizer.step()

would only update the weights of the decoder object's layers.

I have, however, another question on similar topic. My approach for assigning parameters to the PyTorch optimizer object with a multi-model architecture is as follows:

optim = torch.optim.Adam([
                 {'params': text_encoder.parameters()},
                 {'params': speech_encoder.parameters()},
                 {'params': decoder.parameters()}
             ]

Is there any particular reason for assigning separate optimizer objects in such a scenario, apart from the fact that it enables us to configure optimizers differently for each module? Is there some mistake with my approach?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question: seq2seq with attention tutorial, optimizing encoder and decoder separately? #113

question: seq2seq with attention tutorial, optimizing encoder and decoder separately? #113

pucktada commented Sep 15, 2018

kwasnydam commented Apr 19, 2019

question: seq2seq with attention tutorial, optimizing encoder and decoder separately? #113

question: seq2seq with attention tutorial, optimizing encoder and decoder separately? #113

Comments

pucktada commented Sep 15, 2018

kwasnydam commented Apr 19, 2019