Skip to content
muupan edited this page May 10, 2016 · 3 revisions

On the authors' implementation details

I received a confirmation by e-mail from Dr. Mnih:

  • On optimization
  • They use the exact RMSprop represented by the equations (8) and (9)
  • The RMSprop parameters they used are: eta=7e-4, epsilon=0.1, alpha=0.99
  • They linearly decrease eta to zero in the course of training
  • They keep only single RMSprop 'g' while summing up the gradients of pi and V
  • They multiply the gradients of V by 0.5
  • They didn't clip losses
  • They ran it 320 million frames (= 80 million non-skipped frames) for one-day results, 1 billion frames for four-day results
  • On networks
  • Pi and V share the network except the last layers
  • They initialized parameters with default Torch initialization: https://github.com/torch/nn/blob/master/Linear.lua
  • On Atari
  • They clipped rewards so that they are in [-1, 1]
Clone this wiki locally