Skip to content

suyoung-lee/Episodic-Backward-Update

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 

Repository files navigation

Episodic-Backward-Update

Lasagne/Theano-based implementation of "Sample-Efficient Deep Reinforcement Learning via Episodic Backward Update", NeurIPS 2019.

Episodic Backward Update (EBU) with a constant diffusion factor for the ATARI environment is uploaded.

Dependencies

  • Numpy
  • Scipy
  • Pillow
  • Matplotlib
  • Lasagne
  • ALE
  • Theano (0.9.0)

Our implementation is based on Shibi He's implementation of Optimality Tightening which is based on Nathan Sprague's implementation of deep Q RL. Please refer to https://github.com/spragunr/deep_q_rl for installing the dependencies.

We ran the code with CUDA 8.0/CUDNN 5.1.5/TITAN Xp.

Major changes from the deep Q RL implementation

  • ale_agents.py / _do_training : generate temporary target Q table and update
  • ale_data_set.py / random_episode : sample an episode instead of a minibatch of transitions
  • ale_experiment.py / run, run_epoch, run_episode : fixed to apply the Nature DQN setting so that each episode is played at most 4,500 steps (18,000 frames or 5 minutes).
  • launcher.py contains a hyperparameter beta for the diffusion factor

Running

You can train an EBU agent with a constant diffusion factor 0.5 in breakout using random seed 12 on gpu0 as follows. THEANO_FLAGS='device=gpu0, allow_gc=False' python code/run_EBU.py -r 'breakout' --Seed 12 --beta 0.5

By default, it returns the test scores at every 62,500 steps for 40 times (62,500 steps x 4 frames/step x 40 = 10M frames in total).

You may modify the STEPS_PER_EPOCH and EPOCHS parameter in run_EBU.py to change the total number of training steps and the frequency of evaluation.

You will see the process as below if everything runs fine.

running

About

Implementation of "Sample-Efficient Deep Reinforcement Learning via Episodic Backward Update", NeurIPS 2019.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages