Skip to content

mingyucai/Modular_Deep_RL_E-LDGBA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

Modular_Deep_RL_E-LDGBA

The repository implements a modular Deep Deterministic Policy Gradients (DDPG) Reinforcement Learning (RL) with liear temporal logic specifications as high-level misstion specifications.


Software

Under Construction. If any questions, feel free to contact: mingyucai0915@gmail.com.


Publications

@article{Cai2021modular,
  title={Modular Deep Reinforcement Learning for Continuous Motion Planning with Temporal Logic},
  author={Cai, Mingyu and Hasanbeig, Mohammadhosein and Xiao, Shaoping and Abate, Alessandro and Kan, Zhen},
  journal={IEEE Robotics and Automation Letters},
  volume={6},
  number={4},
  pages={7973-7980},
  year={2021},
  publisher={IEEE}
}

Ball-Pass and Cart-Pole Environment

The tasks are performed on a custom environments in DeepRL-LTL and CartPole developed via Gym-OpenAI

Results for CartPole using EP-MDP

In addition to preventing the Cartpole from falling over, Task1 is a surveillance mission that requires the cart to visit region yellow and region green periodically (infinite horizon). Task2 requires the cart to visit yellow first and then green (finite horizon). The demos for Task1 and Task2 are shown in left and right respectively.

task1 task2




Results for Ball-Pass using EP-MDP

Task 1

Task1 is a surveillance mission that requires the ball to visit region 1 and region 2 periodically (infinite horizon). The modular DDPG (on the left) can completely solve the specified task with a 100% success rate. Standard DDPG on the right (the worst scenarios) fails for this repetitve pattern.

Modular Standard




Task 2

Task2 requires the ball to visit region 1, and then region 2 (finite horizon). The modular DDPG (on the left) is able to completely solve the specified task with a 100% success rate. The success rate of Standard DDPG (on the right) is around 86%.

Modular Standard




Comparison with Standard Product MDP

Here are the results (the worst scenarios) using Standard Product MDP, which cannot guarantee the completion of repetitive tasks over the infinite horizon in the CartPole and Ball-Pass problems, respectively.

CartPole Bass-pass




About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages