Robust Meta Reinforcement Learning (RoML)

The paper Train Hard, Fight Easy: Robust Meta Reinforcement Learning introduces RoML - a meta-algorithm that takes any meta-learning baseline algorithm and generates a robust version of it. This repo implements RoML on top of the original implementation of VariBAD (we also provide implementations op top of PEARL and MAML). See below how to run RoML in your own favorite algorithmic framework in a few simple steps.

RoML
How to reproduce our experiments
How to use RoML in your own framework

What is RoML

Reinforcement Learning (RL) aims to learn a policy that makes decisions and maximizes the cumulative rewards (AKA returns) within a given environment.
Meta-RL aims to learn a "meta-policy" that can adapt quickly to new environments (AKA tasks).
Robust Meta RL (RoML) is a meta-algorithm that takes a meta-RL baseline algorithm, and generates a robust version of this baseline.

Robustness in what sense?

RoML optimizes the returns of the high-risk tasks instead of the average task. Specifically, it focuses on the $\alpha$% lowest-return tasks in the task-space, where the robustness level $\alpha\in[0,1]$ is controlled by the user. Formally, this objective is defined as the Conditional Value at Risk (CVaR) of the returns over the tasks.

How does RoML work?

During meta-training, RoML uses the Cross Entropy Method (CEM) to modify the selection of tasks, aiming to sample tasks whose expected return is among the worst $\alpha$%.

A sample of test tasks in HalfCheetah (left) and Humanoid (right). In both examples, the task corresponds to high body mass, which is difficult to control and typically leads to lower scores. Within each figure, the right meta-agent was trained by RoML and the left one by the baseline VariBAD. In both environments, RoML learned to handle the high mass by leaning forward and letting gravity do the hard work, leading to higher velocities than VariBAD.


In the bridge environment of Khazad-Dum, VariBAD (left) attempts to take the short path through the bridge, but sometimes falls to the abyss. RoML (right) goes around and avoids the falling risk.

How to reproduce the experiments of the paper

To train the meta-policies, download this repo and run:

python main.py --env-type ENV --seed 0 1 2 3 4

Replace ENV with the desired environment: khazad_dum_varibad, cheetah_vel_varibad, cheetah_mass_varibad, cheetah_body_varibad or humanoid_mass_varibad.
The line above runs the baseline VariBAD algorithm. For RoML add --cem 1. For CVaR-ML (defined in the paper) add --tail 1 (without --cem).
To reproduce the full experiments of the paper, add seeds up to 29.

To process the results after training, use the module analysis.py as demonstrated in the notebooks in this repo (.ipynb files).

How to use RoML in your own framework - on top of your own meta-RL baseline

RoML can be easily implemented on top of any meta-RL baseline algorithm (instead of VariBAD). To run RoML in your own algorithmic framework, just modify the process of task selection during meta-training:

Create a CEM sampler before training (e.g., using the Dynamic CEM package).
When choosing the tasks, use the CEM to do the sampling.
After running the tasks, update the CEM with the resulted returns.

For example, search "cem" in the module metalearner.py in this repo.

Important implementation notes:

Only modify task sampling in training - not in testing.
The CEM modifies the distribution from which the tasks are selected. For this, the user must define in advance a parametric family of distributions over which the CEM operates, as explained in the CEM package documentation. For example, if the tasks are defined within a bounded interval, we might use Beta distribution; or if the tasks are defined by positive numbers, we could use the exponential distribution. See examples in the module cross_entropy_sampler.py in this repo.

Name		Name	Last commit message	Last commit date
Latest commit History 161 Commits
algorithms		algorithms
config		config
configurations		configurations
docker_cfg		docker_cfg
environments		environments
media		media
models		models
ngc		ngc
utils		utils
.gitignore		.gitignore
Dockerfile		Dockerfile
HalfCheetahBody.ipynb		HalfCheetahBody.ipynb
HalfCheetahMass.ipynb		HalfCheetahMass.ipynb
HalfCheetahVel.ipynb		HalfCheetahVel.ipynb
HumanoidBody.ipynb		HumanoidBody.ipynb
HumanoidMass.ipynb		HumanoidMass.ipynb
HumanoidVel.ipynb		HumanoidVel.ipynb
KhazadDum.ipynb		KhazadDum.ipynb
README.md		README.md
analysis.py		analysis.py
cross_entropy_sampler.py		cross_entropy_sampler.py
general_utils.py		general_utils.py
launch_screenshot.png		launch_screenshot.png
launch_sweep.py		launch_sweep.py
learner.py		learner.py
main.py		main.py
metalearner.py		metalearner.py
requirements.txt		requirements.txt
run_agent.py		run_agent.py
vae.py		vae.py

ido90/RobustMetaRL

Folders and files

Latest commit

History

Repository files navigation

Robust Meta Reinforcement Learning (RoML)

What is RoML

Robustness in what sense?

How does RoML work?

How to reproduce the experiments of the paper

How to use RoML in your own framework - on top of your own meta-RL baseline

About

Resources

Stars

Watchers

Forks

Languages