Modifies loss storage in TD3 and Soft Actor Critic #195

prabhatnagarajan · 2024-04-13T00:00:05Z

Note that the changes we have here follows what we do in DQN and in DDPG.

For example:

https://github.com/pfnet/pfrl/blob/master/pfrl/agents/ddpg.py#L171
pfrl/pfrl/agents/dqn.py

Line 358 in b29533b

self.loss_record.append(float(loss.detach().cpu().numpy()))

prabhatnagarajan · 2024-04-29T06:55:07Z

Friendly reminder on this :)

muupan · 2024-04-29T07:03:07Z

The changes will probably not break anything, but can you elaborate on why you suggest them? If it is just for consistency, I think it is better to force recorded losses to be floats, not scalar ndarrays, for simplicity.

prabhatnagarajan · 2024-04-29T07:18:11Z

I think to justify it I need more data. But I did it mainly for consistency. But I think it may require less memory in some instances (at least some informal experiments showed slight improvements in memory usage though I can't be sure without a more complete experiment).

If it is just for consistency, I think it is better to force recorded losses to be floats, not scalar ndarrays, for simplicity.

To be clear, we do indeed force them to be floats, right? We simply do loss.detach().cpu().numpy() before casting them to floats.

Do you remember why we did detach().cpu().numpy() originally?

muupan · 2024-04-29T08:01:00Z

To be clear, we do indeed force them to be floats, right? We simply do loss.detach().cpu().numpy() before casting them to floats.

Ah sorry I missed float around it. Right, both are eventually casting losses to floats. So I think they should function in the same way. Directly casting to float could be more efficient as it could skip making a numpy.ndarray, but I don't know the implementation details of Torch.__float__.

Do you remember why we did detach().cpu().numpy() originally?

IIRC I was just unaware it is possible to directly cast torch.Tensor to float. After I learned it is safe to cast (pytorch/pytorch#1129) I switched.

I think it is better to choose the simpler way unless there is a memory leak issue in it. There seems to be yet another way of doing this, Tensor.item(), perhaps now recommended by pytorch:
https://pytorch.org/docs/stable/tensors.html#initializing-and-basic-operations

Use torch.Tensor.item() to get a Python number from a tensor containing a single value:

prabhatnagarajan · 2024-04-29T08:19:15Z

Oh, thanks!

Yes, tensor.item() actually seems quite like what we want. The documentation also states that: This operation is not differentiable.

This tells me that it is implicitly detaching from the computation graph (which is what we apparently we do when we detach()).

Perhaps I can replace instances of detach().cpu().numpy() with tensor.item()? And check the tests and maybe some runs and so forth?

muupan · 2024-04-29T08:33:55Z

Perhaps I can replace instances of detach().cpu().numpy() with tensor.item()? And check the tests and maybe some runs and so forth?

Sounds good. I think it is ok to write like self.q_func1_loss_record.append(loss1.item()) as torch.zeros(1, dtype=torch.float32).item() is already float.

prabhatnagarajan added 7 commits September 15, 2022 15:36

tries to fix issue

6a9f758

using packaging version

e8be181

minor bug fix

e66d5a0

updates version to match ours

05ed456

merges with upstream/master

776bb8c

Merge remote-tracking branch 'upstream/master'

83459b4

detaches losses for records

b7e6478

github-actions bot requested a review from muupan April 13, 2024 00:00

reverts numpy version

b5b35d5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modifies loss storage in TD3 and Soft Actor Critic #195

Modifies loss storage in TD3 and Soft Actor Critic #195

prabhatnagarajan commented Apr 13, 2024 •

edited

prabhatnagarajan commented Apr 29, 2024

muupan commented Apr 29, 2024

prabhatnagarajan commented Apr 29, 2024 •

edited

muupan commented Apr 29, 2024 •

edited

prabhatnagarajan commented Apr 29, 2024 •

edited

muupan commented Apr 29, 2024

Modifies loss storage in TD3 and Soft Actor Critic #195

Are you sure you want to change the base?

Modifies loss storage in TD3 and Soft Actor Critic #195

Conversation

prabhatnagarajan commented Apr 13, 2024 • edited

prabhatnagarajan commented Apr 29, 2024

muupan commented Apr 29, 2024

prabhatnagarajan commented Apr 29, 2024 • edited

muupan commented Apr 29, 2024 • edited

prabhatnagarajan commented Apr 29, 2024 • edited

muupan commented Apr 29, 2024

prabhatnagarajan commented Apr 13, 2024 •

edited

prabhatnagarajan commented Apr 29, 2024 •

edited

muupan commented Apr 29, 2024 •

edited

prabhatnagarajan commented Apr 29, 2024 •

edited