New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

WIP: add Diversity is All You Need implementation #267

Draft

kinalmehta wants to merge 10 commits into vwxyzjn:master from kinalmehta:diayn-sac

Collaborator

kinalmehta commented Aug 27, 2022

Description

Adds implementation of Diversity is All You Need paper. It is an unsupervised option learning framework which can later be used for transfer learning.

To-Do

Implement unsupervised skill learning
implement model saving and loading logic
use the pre-trained model to train on a task from paper to benchmark
add documentation and benchmarks

Types of changes

Bug fix
New feature
New algorithm
Documentation

Checklist:

I've read the CONTRIBUTION guide (required).
I have ensured pre-commit run --all-files passes (required).
I have updated the documentation and previewed the changes via mkdocs serve.
I have updated the tests accordingly (if applicable).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.

kinalmehta added 5 commits

August 17, 2022 18:12


          diayn base file by modifying sac

8b78d55


          feat: DIAYN first working version

f67f298


          formatting fixes

a74d48e


          Merge branch 'vwxyzjn:master' into diayn-sac

0e742dd


          Merge branch 'vwxyzjn:master' into diayn-sac

6cf42b4

kinalmehta self-assigned this

vercel bot commented Aug 27, 2022 •

edited

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated
cleanrl	✅ Ready (Inspect)	Visit Preview	Dec 9, 2022 at 7:05PM (UTC)

vercel bot deployed to Preview

August 27, 2022 12:34

View deployment

vwxyzjn mentioned this pull request

Adding Hierarchical RL Algorithms #248

Closed


          add model saving and loading ability

fdae8fd

refactor to use classes to enable easy options

vercel bot deployed to Preview

August 30, 2022 17:00

View deployment

kinalmehta marked this pull request as draft

August 30, 2022 17:07


          feat: add finetuning and skill evaluation code

28ae0cf

vercel bot deployed to Preview

September 1, 2022 17:21

View deployment

kinalmehta requested review from vwxyzjn and dosssman

September 1, 2022 17:21

kinalmehta mentioned this pull request

HuggingFace's model hub integration #110

Closed

9 tasks

vwxyzjn reviewed

View reviewed changes

Owner

vwxyzjn left a comment

Great work @kinalmehta! DIAYN looks like a pretty interesting paper. Some thoughts:

Have you run some preliminary experiments to see if you can replicate the results reported in the paper?
learn_skills and use_skills have a huge amount of duplicate code. If their purpose is to save and load models, consider the approach listed in https://docs.cleanrl.dev/advanced/resume-training/#resume-training_1.

cleanrl/diayn_sac_continuous_action.py Outdated

Comment on lines 78 to 80

+                  group.add_argument("--learn-skills", action='store_true', default=False)
+                  group.add_argument("--use-skills", action='store_true', default=False)
+                  group.add_argument("--evaluate-skills", action='store_true', default=False)

Owner

vwxyzjn Nov 1, 2022

Please use the following configuration for bool:

    parser.add_argument("--autotune", type=lambda x:bool(strtobool(x)), default=False, nargs="?", const=True,
        help="automatic tuning of the entropy coefficient")

cleanrl/diayn_sac_continuous_action.py Outdated

Comment on lines 119 to 133

+              # INFO: don't need to use OptionsPolicy as it is not used in the paper.
+              # Instead skill is uniformly sampled from the skills space.
+              # This can be used later to use pretrained skills to optimize for a specific reward function.
+              # class OptionsPolicy(nn.Module):
+              #     def __init__(self, env, num_skills):
+              #         super().__init__()
+              #         self.fc1 = nn.Linear(np.array(env.single_observation_space.shape).prod(), 256)
+              #         self.fc2 = nn.Linear(256, 256)
+              #         self.fc3 = nn.Linear(256, num_skills)
+              #     def forward(self, x):
+              #         x = F.relu(self.fc1(x))
+              #         x = F.relu(self.fc2(x))
+              #         x = self.fc3(x)
+              #         return OneHotCategorical(logits = x)

Owner

vwxyzjn Nov 1, 2022

This should go to docs, under the implementation details section.

cleanrl/diayn_sac_continuous_action.py Outdated



		def split_aug_obs(aug_obs, num_skills):
		assert type(aug_obs) in [torch.Tensor, np.ndarray] and type(num_skills) is int, "invalid input type"

Owner

vwxyzjn Nov 1, 2022

The check may not be needed for simplicity. Otherwise we may also need a check for aug_obs_z.

cleanrl/diayn_sac_continuous_action.py Outdated

Comment on lines 202 to 206

+              class DIAYN:
+                  def __init__(self, args, run_name=None, device=torch.device("cpu")):
+                      self.args = args
+                      self.device = device

Owner

vwxyzjn Nov 1, 2022

Please use the standard single-file implementation format in place of classes.

cleanrl/diayn_sac_continuous_action.py Outdated

+                      # TRY NOT TO MODIFY: start the game
+                      obs = self.envs.reset()
+                      z_aug_obs = aug_obs_z(obs, one_hot_z)
+                      for global_step in range(self.args.total_timesteps):

Owner

vwxyzjn Nov 1, 2022

What is the difference between learn_skills and use_skills? Why are both of them going over 1000000 steps?

Collaborator Author

kinalmehta Nov 1, 2022

learn_skills is the unsupervised skill learning phase, whereas use_skills is to fine-tune the trained model to optimize for the environment reward

cleanrl/diayn_sac_continuous_action.py Outdated

+                                  self.writer.add_scalar("charts/SPS", int(global_step / (time.time() - start_time)), global_step)
+                                  if self.args.autotune:
+                                      self.writer.add_scalar("losses/alpha_loss", alpha_loss.item(), global_step)
+                      return

Owner

vwxyzjn Nov 1, 2022

There is no need for return — it's implicit.

cleanrl/diayn_sac_continuous_action.py Outdated

+                      # self.actor_optimizer.load_state_dict(models_info["actor_optimizer"])
+                      # self.q_optimizer.load_state_dict(models_info["q_optimizer"])
+                      # self.discriminator_optimizer.load_state_dict(models_info["discriminator_optimizer"])
+                      return

Owner

vwxyzjn Nov 1, 2022

There is no need for return — it's implicit.


          bug fix: diayn reward to be "-ve" of discriminator loss

6479e1d

vercel bot had a problem deploying to Preview

November 23, 2022 10:41

Failure

kinalmehta added 2 commits

December 9, 2022 23:06


          Merge remote-tracking branch 'origin/master' into diyan-sac

69cf91a


          add cleanrl style learn_skill file for DIAYN

vercel bot deployed to Preview

December 9, 2022 19:05

View deployment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment