Skip to content

OpenAI CartPole-v0 DeepRL-based solutions (DQN, DuelingDQN, D3QN)

License

Notifications You must be signed in to change notification settings

DanielPalaio/CartPole-v0_DeepRL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenAI CartPole-v0 DeepRL-based solutions

Investigation under the development of the master thesis "DeepRL-based Motion Planning for Indoor Mobile Robot Navigation" @ Institute of Systems and Robotics - University of Coimbra (ISR-UC)

Requirements

Module Software/Hardware
Python IDE Pycharm
Deep Learning library Tensorflow + Keras
GPU GeForce GTX 1060
Interpreter Python 3.8
Packages requirements.txt

To setup Pycharm + Anaconda + GPU, consult the setup file here.
To import the required packages (requirements.txt), download the file into the project folder and type the following instruction in the project environment terminal:

pip install -r requirements.txt

⚠️ WARNING ⚠️

The training process generates a .txt file that track the network models (in 'tf' and .h5 formats) which achieved the solved requirement of the environment. Additionally, an overview image (graph) of the training procedure is created.
To perform several training procedures, the .txt, .png, and directory names must be change. Otherwise, the information of previous training models will get overwritten, and therefore lost.

Regarding testing the saved network models, if using the .h5 model, a 5 episode training is required to initialize/build the keras.model network. Thus, the warnings above mentioned are also appliable to this situation.
Loading the saved model in 'tf' is the recommended option. After finishing the testing, an overview image (graph) of the training procedure is also generated.

OpenAI CartPole-v0

Actions:
0 - Push cart to the left
1 - Push cart to the right

States:
0 - Cart position [-2.4, 2.4]
1 - Cart velocity [-inf, inf]
2 - Pole angle [-41.8°, 41.8°]
3 - Pole velocity (at top) [-inf, inf]

Rewards:
Scalar value (1) for every step taken

Episode termination:
12° < Pole angle (State 2) < -12°
2.4 < Cart position (State 0) < -2.4
Episode length > 200

Solved Requirement:
Average reward of 195.0 over 100 consecutive trials

Deep Q-Network (DQN)

Train Test
Parameter Value
Number of episodes 400
Learning rate 0.001
Discount Factor 0.99
Epsilon 1.0
Batch size 64
TargetNet update rate (steps) 100
Actions 2
States 4
Parameter Value
Number of episodes 100
Epsilon 0.01
Actions 2
States 4

Network model used for testing: 'saved_networks/dqn_model10' ('tf' model, also available in .h5)

Dueling DQN

Train Test
Parameter Value
Number of episodes 400
Learning rate 0.00075
Discount Factor 0.99
Epsilon 1.0
Batch size 64
TargetNet update rate (steps) 120
Actions 2
States 4
Parameter Value
Number of episodes 100
Epsilon 0.01
Actions 2
States 4

Network model used for testing: 'saved_networks/duelingdqn_model20' ('tf' model, also available in .h5)

Dueling Double DQN (D3QN)

Train Test
Parameter Value
Number of episodes 400
Learning rate 0.00075
Discount Factor 0.99
Epsilon 1.0
Batch size 64
TargetNet update rate (steps) 120
Actions 2
States 4
Parameter Value
Number of episodes 100
Epsilon 0.01
Actions 2
States 4

Network model used for testing: 'saved_networks/d3qn_model20' ('tf' model, also available in .h5)