Skip to content

brett-daley/dqn-lambda

Repository files navigation

DQN(λ) — Reconciling λ-Returns with Experience Replay

DQN(λ) is an instantiation of the ideas proposed in [1] that extends DQN [2] to efficiently utilize various types of λ-returns [3]. These can significantly improve sample efficiency.

If you use this repository in published work, please cite the paper:

@inproceedings{daley2019reconciling,
  title={Reconciling $\lambda$-Returns with Experience Replay},
  author={Daley, Brett and Amato, Christopher},
  booktitle={Advances in Neural Information Processing Systems},
  pages={1133--1142},
  year={2019}
}

Contents

Setup

Quickstart: DQN(λ)

Quickstart: DQN

Atari Environment Naming Convention

Return Estimators

License, Acknowledgments, and References


Setup

This repository requires Python 3. To automatically install working package versions, just clone the repository and run pip:

git clone https://github.com/brett-daley/dqn-lambda.git
cd dqn-lambda
pip install -r requirements.txt

Note: Training will likely be impractical without GPU support. See this TensorFlow guide for tensorflow-gpu and CUDA setup.


Quickstart: DQN(λ)

Atari Games

You can train DQN(λ) on any of the Atari games included in the OpenAI Gym (see Atari Environment Naming Convention). For example, the following command runs DQN(λ) with λ=0.75 on Pong for 1.5 million timesteps:

python run_dqn_atari.py --env pong --return-est pengs-0.75 --timesteps 1.5e6

See Return Estimators for all of the n-step returns and λ-returns supported by --return-est. To get a description of the other possible command-line arguments, run this:

python run_dqn_atari.py --help

Classic Control Environments

You can run DQN(λ) on CartPole-v0 by simply executing python run_dqn_control.py. This is useful to test code on laptops or low-end desktops — particularly those without GPUs.

run_dqn_control.py does not take command-line arguments; all values are hard-coded. You need to edit the file directly to change parameters. A one-line change to the environment name is all you need to run other environments (discrete action spaces only; e.g. Acrobot-v1 or MountainCar-v0).


Quickstart: DQN

This repository also includes a standard target-network implementation of DQN for reference. Add the --legacy flag to run it instead of DQN(λ):

python run_dqn_atari.py --legacy

Note that setting --legacy along with any DQN(λ)-specific arguments (--cache-size, --block-size, or --priority) will throw an error because they are undefined for DQN. For example:

python run_dqn_atari.py --cache-size 10000 --legacy

Traceback (most recent call last):
  File "run_dqn_atari.py", line 82, in <module>
    main()
  File "run_dqn_atari.py", line 56, in main
    assert args.cache_size == 80000  # Cache-related args are undefined for legacy DQN
AssertionError

Similarly, trying to use --legacy with a return estimator other than n-step returns will also throw an error:

python run_dqn_atari.py --return-est pengs-0.75 --legacy

Traceback (most recent call last):
  File "run_dqn_atari.py", line 82, in <module>
    main()
  File "run_dqn_atari.py", line 59, in main
    replay_memory = make_legacy_replay_memory(args.return_est, replay_mem_size, args.history_len, discount)
  File "/home/brett/dqn-lambda/replay_memory_legacy.py", line 10, in make_legacy_replay_memory
    raise ValueError('Legacy mode only supports n-step returns but requested {}'.format(return_est))
ValueError: Legacy mode only supports n-step returns but requested pengs-0.75

Atari Environment Naming Convention

The --env argument does not use the same string format that OpenAI Gym uses. Environment names should be lowercase and use underscores instead of CamelCase. The trailing -v0 should also be removed. For example:

OpenAI Name Usage
BeamRider-v0 python run_dqn_atari.py --env beam_rider
Breakout-v0 python run_dqn_atari.py --env breakout
Pong-v0 python run_dqn_atari.py --env pong
Qbert-v0 python run_dqn_atari.py --env qbert
Seaquest-v0 python run_dqn_atari.py --env seaquest
SpaceInvaders-v0 python run_dqn_atari.py --env space_invaders

This pattern applies to all of the Atari games supported by OpenAI Gym.


Return Estimators

The --return-est argument accepts a string that determines which return estimator should be used. The estimator might be parameterized by an <int> (greater than 0) or a <float> (between 0.0 and 1.0 (inclusive) — decimal point mandatory). The table below summarizes all of the possible return estimators supported by DQN(λ).

Return Estimator Format Example Description
n-step nstep-<int> nstep-3 Classic n-step return [3].
Standard DQN uses n=1.
n=<int>
Peng's Q(λ) pengs-<float> pengs-0.75 λ-return, unconditionally uses
max Q-values [4].
A good "default" λ-return.
λ=<float>
Peng's Q(λ)
+ median
pengs-median pengs-median Peng's Q(λ)
+ median λ selection [1].
Peng's Q(λ)
+ bounded 𝛿
pengs-maxtd-<float> pengs-maxtd-0.01 Peng's Q(λ)
+ bounded-error λ selection [1].
𝛿=<float>
Watkin's Q(λ) watkins-<float> watkins-0.75 Peng's Q(λ), but sets λ=0
if Q-value is non-max [4].
Ensures on-policy data.
λ=<float>
Watkin's Q(λ)
+ median
watkins-median watkins-median Watkin's Q(λ)
+ median λ selection [1].
Watkin's Q(λ)
+ bounded 𝛿
watkins-maxtd-<float> watkins-maxtd-0.01 Watkin's Q(λ)
+ bounded-error λ selection [1].
𝛿=<float>

See chapter 7.6 of [4] for a side-by-side comparison of Peng's Q(λ) and Watkin's Q(λ).


License

This code is released under the MIT License.

Acknowledgments

This codebase evolved from the partial DQN implementation made available by the Berkeley Deep RL course, in turn based on Szymon Sidor's OpenAI implementation. Special thanks to them.

References

[1] Reconciling λ-Returns with Experience Replay

[2] Human-Level Control Through Deep Reinforcement Learning

[3] Reinforcement Learning: An Introduction (2nd edition)

[4] Reinforcement Learning: An Introduction (1st edition)