Skip to content

Latest commit

 

History

History
18 lines (10 loc) · 964 Bytes

nature_dqn_paper.md

File metadata and controls

18 lines (10 loc) · 964 Bytes

01/06/2019

tl;dr: the founding paper of DQN

Key ideas

  • Approximating action values (Q) with neural nets are known to be unstable. Two tricks are used to solve this: experience replay buffer, and a periodically updated target network.
  • The authors tied the important ideas of adjusting representation based on reward (end-to-end learning) and replay buffer (hippocampus) with biological evidence.

Notes/Questions

  • Drawbacks: It does not make much progress toward solving Montezuma's revenge.

Nevertheless, games demanding more temporally extended planning strategies still constitute a major challenge for all existing agents including DQN (for example, Montezuma’s Revenge).

Overall impression: fill this out last; it should be a distilled, accessible description of your high-level thoughts on the paper.