Rework garage.torch.optimizers #2177

krzentner · 2020-11-16T18:34:01Z

No description provided.

krzentner · 2020-11-16T18:34:31Z

This change does not yet pass tests, but is 90% complete.

ryanjulian · 2020-11-16T22:50:14Z

Can you add a little bit more explanation for the design here? I'm concerned about using an ADT as the blanket input to policies, which makes the interface pretty complicated even in the simplest use cases.

krzentner · 2020-11-18T10:55:45Z

The core motivation here is to provide a way for recurrent and non-recurrent policies to share the same API at optimization time.
However, I definitely agree that making this change has shown me that it significantly increases the complexity of the garage.torch APIs. It doesn't result in a significant increase in complexity in any algorithm (except for TutorialVPG), but it's noticeable.
In the future, I also generally intended for this datatype to play a role (although with a very different design to) state_info_spec in the TF branch.

This PR only adds the bare minimum fields needed for recurrent policies to have reasonable .forward methods. However, we could replace the observation field on PolicyInput by instead having PolicyInput inherit from torch.Tensor.
Then, algorithms that only want to train stochastic non-recurrent policies (i.e. SAC), could just pass a torch.Tensor (as they do now). Alternatively, we could use a helper function at the start of every torch policies .forward method to convert any torch.Tensor input into a PolicyInput (in SHUFFLED mode).

irisliucy · 2020-11-19T18:11:18Z

src/garage/torch/policies/categorical_cnn_policy.py

+        cnn_output = self._cnn_module(observations)
+        mlp_output = self._mlp_module(cnn_output)[0]
+        logits = torch.softmax(mlp_output, axis=1)
+        dist = torch.probability.Categorical(logits=logits)


It should be torch.distributions.Categorical ?

WIP torch optimizer refactor WIP torch optimizer refactor WIP

krzentner requested a review from a team as a code owner November 16, 2020 18:34

krzentner requested review from ahtsan, irisliucy and ryanjulian and removed request for a team and ahtsan November 16, 2020 18:34

mergify bot requested review from a team and zequnyu and removed request for a team November 16, 2020 18:34

irisliucy reviewed Nov 19, 2020

View reviewed changes

mergify bot requested a review from a team November 19, 2020 18:11

krzentner force-pushed the kr-pytorch-policy-input branch from 089c20f to 017274f Compare January 19, 2021 20:02

krzentner changed the title ~~Rework StochasticPolicy to use PolicyInput~~ Rework garage.torch.optimizers Jan 19, 2021

krzentner added 5 commits July 2, 2022 18:24

Add garage.torch.ObservationBatch

d450450

Use ObservationBatch in StochasticPolicy

b540f21

Make garage.torch.ObservationBatch constructable

9985301

Implement garage.torch.GaussianLSTMPolicy

19e4dbb

Torch VPG rework

df3a137

krzentner force-pushed the kr-pytorch-policy-input branch from 017274f to df3a137 Compare July 21, 2022 01:06

Torch VPG rework

bca7b68

WIP torch optimizer refactor WIP torch optimizer refactor WIP

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework garage.torch.optimizers #2177

Rework garage.torch.optimizers #2177

krzentner commented Nov 16, 2020

krzentner commented Nov 16, 2020

ryanjulian commented Nov 16, 2020

krzentner commented Nov 18, 2020 •

edited

irisliucy Nov 19, 2020

Rework garage.torch.optimizers #2177

Are you sure you want to change the base?

Rework garage.torch.optimizers #2177

Conversation

krzentner commented Nov 16, 2020

krzentner commented Nov 16, 2020

ryanjulian commented Nov 16, 2020

krzentner commented Nov 18, 2020 • edited

irisliucy Nov 19, 2020

Choose a reason for hiding this comment

krzentner commented Nov 18, 2020 •

edited