Multiplicative Compositional Policies

This repo provides implementations of Multiplicative Compositional Policies (MCP), which is a method for learning reusable motor skills that can be composed to produce a range of complex behaviors. All code is written in Python 3, using PyTorch, NumPy, and Stable-Baselines3. Experiments are simulated with the MuJoCo Physics Engine. The project is built on DRLoco, an implementation of DeepMimic Framework with Stable-Baselines3.

MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies. NeurIPS 2019.
[Paper] [Our Slides]

Experiments Details

The character we decided to work with is a simple degrees-of-freedom (DoFs) ant. Although the paper employs imitative rewards for other characters, it trains the ant by a common RL approach (no imitation). Consequently, we trained the ant in this manner. Additionally, we devised various training methods:

Model Name	Description
MCPPO	The paper jointly trained primitives end-to-end, leading to the specializations. In MCPPO, we trained each primitive separately for an individual task.
MCP_I	Like the other characters, we incorporate expert demonstrations in the pre-training phase.

Pre-training Tasks

In our experiments, the pre-training phase consists of four different heading tasks: heading north, south, east, and west. For MCP Naive, we provided a corpus of reference motions. We followed the approach used to pre-train the humanoid in the paper. For the rest, we specified the goals and the reward functions to encourage the agent to navigate the desired direction.

Fine-tuning Tasks

To evaluate the agents, we considered four new heading tasks: north-west, north-east, south-west, and south-east. The goal and the reward function are defined in the same way as in pre-training.

Expert Data for MCP Naive

Since there is no mocap data for the ant, we needed to develop experts to generate reference data. Accordingly, we trained four different MLP policies with PPO, each of which learned to navigate north/south/east/west. We consider each policy as an expert in a particular direction, and thus, we produce actions that can be regarded as reference trajectories.

The reference trajectories used for MCP Naive that were produced by PPO were observed to be very noisy leading to poor performance. Also, the sensitivity of Ant to the reward scaling factors led to the Ant being unable to learn via imitation of reference trajtory.

Installation

To install requirements, please refer to DRLoco installation documentation.

How to run

Initial experiments

python mcppo.py

python mcp_naive.py

Experiments from paper

cd mcp

python train_mcp.py
python train_mcppo.py
python scratch_ant.py

python transfer.py

To generate the trajectories

Initial Experiments

bash gen_plots.sh
bash make_traj.sh

Experiments from paper

cd mcp

bash gen_plots.sh
bash make_traj.sh

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
animations		animations
data		data
docs		docs
drloco.egg-info		drloco.egg-info
drloco		drloco
logs		logs
mcp		mcp
mocaps		mocaps
venv		venv
visualizations		visualizations
wandb		wandb
.gitignore		.gitignore
.readthedocs.yml		.readthedocs.yml
Installation.md		Installation.md
Migration.md		Migration.md
README.MD		README.MD
archi.png		archi.png
env.py		env.py
gen_mocap.sh		gen_mocap.sh
gen_plots.sh		gen_plots.sh
make_traj.sh		make_traj.sh
make_traj_plot.py		make_traj_plot.py
mcp_n.sh		mcp_n.sh
mcp_naive.py		mcp_naive.py
mcppo.py		mcppo.py
new_run.sh		new_run.sh
plot_ant_mocap_dist.py		plot_ant_mocap_dist.py
run.sh		run.sh
setup.py		setup.py
temp.jpg		temp.jpg
test.py		test.py
test_mppo.py		test_mppo.py

mrsamsami/MCP-Stable

Folders and files

Latest commit

History

Repository files navigation

Multiplicative Compositional Policies

Experiments Details

Pre-training Tasks

Fine-tuning Tasks

Expert Data for MCP Naive

Installation

How to run

Initial experiments

Experiments from paper

To generate the trajectories

Initial Experiments

Experiments from paper

About

Topics

Resources

Stars

Watchers

Forks

Languages