Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modifies loss storage in TD3 and Soft Actor Critic #195

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

prabhatnagarajan
Copy link
Contributor

@prabhatnagarajan prabhatnagarajan commented Apr 13, 2024

Note that the changes we have here follows what we do in DQN and in DDPG.

For example:

@github-actions github-actions bot requested a review from muupan April 13, 2024 00:00
@prabhatnagarajan
Copy link
Contributor Author

Friendly reminder on this :)

@muupan
Copy link
Member

muupan commented Apr 29, 2024

The changes will probably not break anything, but can you elaborate on why you suggest them? If it is just for consistency, I think it is better to force recorded losses to be floats, not scalar ndarrays, for simplicity.

@prabhatnagarajan
Copy link
Contributor Author

prabhatnagarajan commented Apr 29, 2024

I think to justify it I need more data. But I did it mainly for consistency. But I think it may require less memory in some instances (at least some informal experiments showed slight improvements in memory usage though I can't be sure without a more complete experiment).

If it is just for consistency, I think it is better to force recorded losses to be floats, not scalar ndarrays, for simplicity.

To be clear, we do indeed force them to be floats, right? We simply do loss.detach().cpu().numpy() before casting them to floats.

Do you remember why we did detach().cpu().numpy() originally?

@muupan
Copy link
Member

muupan commented Apr 29, 2024

To be clear, we do indeed force them to be floats, right? We simply do loss.detach().cpu().numpy() before casting them to floats.

Ah sorry I missed float around it. Right, both are eventually casting losses to floats. So I think they should function in the same way. Directly casting to float could be more efficient as it could skip making a numpy.ndarray, but I don't know the implementation details of Torch.__float__.

Do you remember why we did detach().cpu().numpy() originally?

IIRC I was just unaware it is possible to directly cast torch.Tensor to float. After I learned it is safe to cast (pytorch/pytorch#1129) I switched.

I think it is better to choose the simpler way unless there is a memory leak issue in it. There seems to be yet another way of doing this, Tensor.item(), perhaps now recommended by pytorch:
https://pytorch.org/docs/stable/tensors.html#initializing-and-basic-operations

Use torch.Tensor.item() to get a Python number from a tensor containing a single value:

@prabhatnagarajan
Copy link
Contributor Author

prabhatnagarajan commented Apr 29, 2024

Oh, thanks!

Yes, tensor.item() actually seems quite like what we want. The documentation also states that: This operation is not differentiable.

This tells me that it is implicitly detaching from the computation graph (which is what we apparently we do when we detach()).

Perhaps I can replace instances of detach().cpu().numpy() with tensor.item()? And check the tests and maybe some runs and so forth?

@muupan
Copy link
Member

muupan commented Apr 29, 2024

Perhaps I can replace instances of detach().cpu().numpy() with tensor.item()? And check the tests and maybe some runs and so forth?

Sounds good. I think it is ok to write like self.q_func1_loss_record.append(loss1.item()) as torch.zeros(1, dtype=torch.float32).item() is already float.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants