Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Domain Randomization #180

Open
1 task done
KonstantinRamthun opened this issue May 7, 2023 · 2 comments
Open
1 task done

[Feature Request] Domain Randomization #180

KonstantinRamthun opened this issue May 7, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@KonstantinRamthun
Copy link

馃殌 Feature

Literature suggests different techniques for domain randomization. This includes:

These are mostly independent of the RL algorithm used to train the policy. Thus, they could be implemented as callbacks in SB3.

Motivation

When using RL for continuous control tasks the motivation is often a more robust and general controller/agent. Domain Randomization is one technique for achieving this goal. Having different techniques available may help others to compare what works best for their environments.

Pitch

I suggest implementing a base domain randomization callback. This allows interacting with the environments at each reset by setting domain randomization/reset parameters in the environment. Environments have to be controller through e.g., an adapter to support this interaction. Users of the callback are responsible for setting the parameters in their reset implementations.

Individual domain randomization techniques inherit from the base callback and provide their corresponding functionality. Some techniques like Automatic Domain Randomization may need an additional evaluation callback to adapt their parameter space.

Additionally, one might want to add an adapted version of the EvalCallback to allow evaluating environments with a predetermined and constant set of parameters. I don't know if this is possible with the current EvalCallback.

For the start I would implement simpler techniques like Unifrom Domain Randomization and Automatic Domain Randomization.

Alternatives

No response

Additional context

What do you think of this suggestion? If you find this a suitable extension to this repo, I could implement it.

Checklist

  • I have checked that there is no similar issue in the repo
@KonstantinRamthun KonstantinRamthun added the enhancement New feature or request label May 7, 2023
@araffin
Copy link
Member

araffin commented May 11, 2023

Hello,
thanks for the suggestion, it is true that domain randomization is independent of the RL algorithm but in my mind, domain randomization is highly dependent on the environment, so it would be hard to provide a common callbacks that works for many.

I would also rather implement that on the environment side (so more a Gym/Gymnasium).
Or are you proposing something different/a common interface that could be re-used adapted?
if so, do you have working proof of concept that you can share?

@KonstantinRamthun
Copy link
Author

I think you can't implement all DR approaches in the environments alone. For e.g., Automatic Domain Randomization, you need additional evaluation episodes. Thus, I see DR more as an extension to RL algorithm than something related to the environments.

I tought of using an extension to the gym interface with reset parameters, which are set by the DR at each reset and used in the reset method. I'll implement a PoC and get back to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants