New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are local gradients accumulated and never reset? #21
Comments
I just found out that the From the pytorch documentation: " To me, that sounds like it loads a copy of the global parameters, meaning that the gradients will be added to the previous global gradients. |
I have a similar question about the gradient. Acturally, after # copy from continuous A3C, consider the cases after the 1st iteration
opt.zero_grad() # zero gradient in both lnet and gnet
loss.backward() # parameters in both lnet and gnet have the same gradients
for lp, gp in zip(lnet.parameters(), gnet.parameters()): # the for loop is useless
# if gp.grad is not None:
# return # This "if-return" code are copied from above link
gp._grad = lp.grad
opt.step() # update gnet parameters (parameters in lnet will not change!)
lnet.load_state_dict(gnet.state_dict()) # update lnet parameters It is confused to me and it might be a (serious) bug. What if worker A is updating gnet by opt.step and worker B just clears/modifies the gradients by opt.zero_grad()/loss.backward() ? However, the code just works (look the episode reward curve and the visualization)! BTW, the |
The def load_state_dict(self, state_dict):
# deepcopy, to be consistent with module API
state_dict = deepcopy(state_dict)
# Validate the state_dict
groups = self.param_groups
saved_groups = state_dict['param_groups'] it uses deepcopy to isolate parameters from the
Once local worker has moved to another worker, the |
Thanks for your reply! @MorvanZhou
|
A lock could be applied in this case, but take a look of HOGWILD for the analysis of backprop without locking. |
I can't see that the local gradients are ever reset. The values are overwritten by the global weights, but the optimizer
opt
is assigned to the global parameters, so won't this accumulate gradients in the local network?pytorch-A3C/utils.py
Line 41 in 5ab27ab
The text was updated successfully, but these errors were encountered: