Skip to content

kairproject/kair_algorithms_draft

Repository files navigation

KAIR

Build Status Google Docstring style Pre-commit enabled

KAIR algorithm is a research repository with state of the art reinforcement learning algorithms for robot control tasks. It allows the researchers to experiment with novel ideas with minimal code changes.

Algorithms

The scripts folder contains implementations of a curated list of RL algorithms verified in MuJoCo environment.

  • Twin Delayed Deep Deterministic Policy Gradient (TD3)

    • TD3 (Fujimoto et al., 2018) is an extension of DDPG (Lillicrap et al., 2015), a deterministic policy gradient algorithm that uses deep neural networks for function approximation. Inspired by Deep Q-Networks (Mnih et al., 2015), DDPG uses experience replay and target network to improve stability. TD3 further improves DDPG by adding clipped double Q-learning (Van Hasselt, 2010) to mitigate overestimation bias (Thrun & Schwartz, 1993) and delaying policy updates to address variance.
    • Example Script on LunarLander
    • ArXiv Preprint
  • (Twin) Soft Actor Critic (SAC)

    • SAC (Haarnoja et al., 2018a) incorporates maximum entropy reinforcment learning, where the agent's goal is to maximize expected reward and entropy concurrently. Combined with TD3, SAC achieves state of the art performance in various continuous control tasks. SAC has been extended to allow automatically tuning of the temperature parameter (Haarnoja et al., 2018b), which determines the importance of entropy against the expected reward.
    • Example Script on LunarLander
    • ArXiv Preprint (Original SAC)
    • ArXiv Preprint (SAC with autotuned temperature)
  • TD3 from Demonstrations, SAC from Demonstrations (TD3fD, SACfD)

    • DDPGfD (Vecerik et al., 2017) is an imitation learning algorithm that infuses demonstration data into experience replay. DDPGfD also improved DDPG by (1) using prioritized experience replay (Schaul et al., 2015), (2) adding n-step returns, (3) learning multiple times per environment step, and (4) adding L2 regularizers to actor and critic losses. We incorporated these improvements to TD3 and SAC and found that it dramatically improves their performance.
    • Example Script of TD3fD on LunarLander
    • Example Script of SACfD on LunarLander
    • ArXiv Preprint

Installation

To use the algorithms, first use the requirements.txt file to install appropriate Python packages from PyPI.

cd scripts
pip install -r requirements.txt

To train LunarLanderContinuous-v2, install OpenAI Gym environment.

To train Reacher-v1, install MuJoCo environment.

Environment

The code is developed using python 2.7, ROS kinetic on Ubuntu 16.04. NVIDIA GPU is needed. The code is developed and tested using 1 NVIDIA GeForce GTX 1080 Ti GPU card. Other platforms or GPU cards are not fully tested.

How to Train

Docker

To use docker, check the installation guide.

Build Image

docker build -t kairproject/open_manipulator:0.1

or

docker pull kairproject/open_manipulator:0.1

OpenManipulator

docker run -v [path_to_kair_algorithms_draft]/save:/root/catkin_ws/src/kair_algorithms_draft/scripts/save --runtime=nvidia [image_id] openmanipulator [algo]

LunarLanderContinuous-v2

docker run -v [path_to_kair_algorithms_draft]/save:/root/catkin_ws/src/kair_algorithms_draft/scripts/save --runtime=nvidia [image_id] lunarlander [algo]

Reacher-v1

docker run -v [path_to_kair_algorithms_draft]/save:/root/catkin_ws/src/kair_algorithms_draft/scripts/save --runtime=nvidia [image_id] reacher [algo]

Local

Our training wandb log can be found in https://app.wandb.ai/kairproject/kair_algorithms_draft-scripts.

cd scripts
wandb login

OpenManipulator

Follow the ROS installation commands in Dockerfile to train.

roslaunch kair_algorithms open_manipulator_env.launch gui:=false &
rosrun run_open_manipulator_reacher_v0.py --algo [algo] --off-render --log

LunarLanderContinuous-v2

python run_lunarlander_continuous.py --algo [algo] --off-render --log

Reacher-v1

python run_reacher_v1.py --algo [algo] --off-render --log

How to Test

OpenManipulator

roslaunch kair_algorithms open_manipulator_env.launch gui:=false &
rosrun python run_open_manipulator_reacher_v0.py --algo [algo] --off-render --test --load-from [trained_weight_path]

LunarLanderContinuous-v2

python run_lunarlander_continuous.py --algo [algo] --off-render --test --load-from [trained_weight_path]

Reacher-v1

python run_reacher_v1.py --algo [algo] --off-render --test --load-from [trained_weight_path]

How to Cite

We are currently writing a white paper to summarize the results. We will add a BibTeX entry below once the paper is finalized.

About

Reinforcement learning algorithms for robot control tasks

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published