Home

I received a confirmation by e-mail from Dr. Mnih:

On optimization
They use the exact RMSprop represented by the equations (8) and (9)
The RMSprop parameters they used are: eta=7e-4, epsilon=0.1, alpha=0.99
They linearly decrease eta to zero in the course of training
They keep only single RMSprop 'g' while summing up the gradients of pi and V
They multiply the gradients of V by 0.5
They didn't clip losses
They ran it 320 million frames (= 80 million non-skipped frames) for one-day results, 1 billion frames for four-day results
On networks
Pi and V share the network except the last layers
They initialized parameters with default Torch initialization: https://github.com/torch/nn/blob/master/Linear.lua
On Atari
They clipped rewards so that they are in [-1, 1]

Provide feedback