Application of Neural Ordinary Differential Equations for Continuous Control Reinforcement Learning

This repository contains implementation of the adjoint method for backpropagating through ODE solvers on top of Eager TensorFlow and experiments with models containing ODE layers in MuJoCo and Roboschool environments with policies training using PPO.

Install and Run

Firstly, install TensorFlow version 1.13.1. Note that GPU version may not be necessary as the models are quite simple and could run fast on a powerful CPU. Cloning the repo and installing the requirements:

git clone --recursive https://github.com/MichaelKonobeev/neuralode-rl.git
cd neuralode-rl
pip install -r requirements.txt

You will need to install environment dependencies for MuJoCo and/or Roboschool envs separately. For roboschool, use version 1.0.48 which is the latest version compatible with gym:

pip install roboschool==1.0.48

To run baseline MLP-model experiment on a single env:

python run-mujoco.py --env-id HalfCheetah-v3 --logdir logdir/mlp/half-cheetah.00
# OR
python run-roboschool.py --env-id RoboschoolHumanoidFlagrun-v1 \
     --logdir logdir/mlp/roboschool-humanoid-flagrun.00

To run experiments with models containing ode-layers for both policy and value function:

python run-mujoco.py --env-id HalfCheetah-v3 \
    --logdir logdir/ode/half-cheetah.00 --ode-policy --ode-value
# OR
python run-roboschool.py --env-id RoboschoolHumanoidFlagrun-v1 \
    --logdir logdir/ode/roboschool-humanoid-flagrun.00

You can also schedule all of the experiments using task-spooler which could be install on Ubuntu with sudo apt-get install task-spooler. After that launching run.py should work:

python run.py --logdir-prefix logdir/mlp/
python run.py --logdir-prefix logdir/ode/ --ode-policy --ode-value

With the same script it is possible to run only a subset of environments, e.g. by specifying --env-ids roboschool or --env-ids mujoco or (possibly in addition) one or several env ids.

This will schedule 5 runs with different seeds for each MuJoCo env, and 3 runs with different seeds for each Roboschool env. You can set the number of tasks that could run concurrently to e.g. 5 using the following command:

tsp -S 5

Additionally, to watch the task queue you may run

watch -n 2 zsh -c "tsp | tr -s ' ' | cut -d ' ' -f 1,2,4,8-"

Results

MuJoCo

Roboschool

The plots show average reward over last 100 episodes during training. For MuJoCo envs the error bars represent standard deviation from the mean shown by a bold line. For Roboschool experiments each line of the same color corresponds to a run with a different seed.

References

torchdiffeq: author's implementation of the adjoint method and their experiments with neural ODES
neural-ode: implementation of the adjoint method on top of Eager TensorFlow and experiments with neural ODEs for image and text sentiment classification.

Citation

Please cite this repository if it was useful for your research

@misc{konobeev2018,
  author={Mikhail Konobeev},
  title={Neural Ordinary Differential Equations for Continuous Control},
  year={2019},
  publisher={GitHub},
  journal={GitHub repository},
  howpublished={\url{https://github.com/MichaelKonobeev/neuralode-rl}},
}

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
assets		assets
derl @ 06fcd44		derl @ 06fcd44
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
models.py		models.py
odeint.py		odeint.py
requirements.txt		requirements.txt
run-mujoco.py		run-mujoco.py
run-roboschool.py		run-roboschool.py
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

derl @ 06fcd44

derl @ 06fcd44

.gitmodules

.gitmodules

LICENSE

LICENSE

README.md

README.md

models.py

models.py

odeint.py

odeint.py

requirements.txt

requirements.txt

run-mujoco.py

run-mujoco.py

run-roboschool.py

run-roboschool.py

run.py

run.py

Repository files navigation

Application of Neural Ordinary Differential Equations for Continuous Control Reinforcement Learning

Install and Run

Results

MuJoCo

Roboschool

References

Citation

About

Languages

License

mknbv/neuralode-rl

Folders and files

Latest commit

History

Repository files navigation

Application of Neural Ordinary Differential Equations for Continuous Control Reinforcement Learning

Install and Run

Results

MuJoCo

Roboschool

References

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages