some question about the position of 'optimizer.zero_grad()' #238

languandong · 2021-12-01T08:27:27Z

I think the correct way the code the training
is that

    optimizer.zero_grad()
    # Forward pass
    outputs = model(images)
    loss = criterion(outputs, labels)
    
    # Backward and optimize
    loss.backward()
    optimizer.step()

not that

    # Forward pass
    outputs = model(images)
    loss = criterion(outputs, labels)
    
    # Backward and optimize
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

The text was updated successfully, but these errors were encountered:

Vandaci · 2022-07-15T02:09:54Z

any difference?

silky1708 · 2022-09-04T18:02:52Z

@languandong
You can use both, doesn't matter as long as optimizer.zero_grad() is called before loss.backward().
Note that optimizer.zero_grad() zeroes out the gradients in the grad field of the tensors, and loss.backward() compute s the gradients which are then stored in the grad field.

githraj · 2023-10-31T12:16:57Z

As pointed out by @languandong, the critical factor is the correct sequence in which optimizer.zero_grad() and loss.backward() are called. Both code snippets are valid as long as optimizer.zero_grad() is invoked before loss.backward(). This ensures that the gradients are properly zeroed out and then computed and stored in the appropriate tensors' grad field.

luyuwuli · 2023-12-21T02:33:45Z

@languandong I think the confusion originates from the misconception that the gradient would be computed and stored during the forward pass. In fact, in the forward pass, only the DAG is constructed. The grad is computed in a lazy mode: it is not computed until explicit loss.backward() is invoked.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some question about the position of 'optimizer.zero_grad()' #238

some question about the position of 'optimizer.zero_grad()' #238

languandong commented Dec 1, 2021

Vandaci commented Jul 15, 2022

silky1708 commented Sep 4, 2022

githraj commented Oct 31, 2023

luyuwuli commented Dec 21, 2023

some question about the position of 'optimizer.zero_grad()' #238

some question about the position of 'optimizer.zero_grad()' #238

Comments

languandong commented Dec 1, 2021

Vandaci commented Jul 15, 2022

silky1708 commented Sep 4, 2022

githraj commented Oct 31, 2023

luyuwuli commented Dec 21, 2023