Skip to content

pemami4911/awesome-hyperparams

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 

Repository files navigation

Awesome Hyperparams

These hyperparams should provide generally good starting points. Please check the original paper/code/etc (esp. in Deep RL) as the hyperparams for a specific model can vary based on your task or environment.

Contributors

Please provide citations (e.g., arxiv link, blog post, github repo, etc). Any info on the hyperparameter search process taken in the original work (if available) is a bonus. Please use scientific "e" notation (10e5 instead of 1000000).

Example contribution:

Original DQN

hyperparam name default value
lr 25e-5
RMSprop momentum 0.95
RMSprop epsilon 0.01
discount factor 0.99
epsilon(-greedy) 1 annealed to 0.1 over 1 million frames
minibatch size 32
replay memory size 10e5
weight init Xavier (Torch default)

Computer Vision

DCGAN

hyperparam name default value
ADAM lr 2e-4
ADAM momentum beta1 0.5
minibatch size 64 or 128
image scaling [-1, 1]
LeakyReLU slope 0.2
Real labels (label smoothing) 1 -> [0.7, 1.2]
Fake labels (label smoothing 0 -> [0.0, 0.3]
Weight init N(0, 0.02)
Z distribution n-dim uniform or gaussian (e.g., uniform (-0.2, 0.2) from this implementation

For Z, sampling from a uniform distribution is simpler, but see the discussion here about interpolation in the latent space; current recommendation is to use a spherical Z and interpolate via a great circle

Natural Language Processing

Deep Reinforcement Learning

See the OpenAI baselines repo for solid implementations of many of these algorithms

Deep Deterministic Policy Gradient

In the original paper, the actor and critic learning rates are reversed. However, to help stabilize the actor network during training, you generally want to encourage the critic network to converge faster; hence the larger initial lr for the critic is suggested here.

hyperparam name default value
policy network 400/64 -> relu -> 300/64 -> relu -> tanh
critic network 400/64 -> relu -> 300/64 -> relu -> linear
actor lr 10e-4
critic lr 10e-3
critic L2 weight decay 10e-2
discount factor 0.99
target network update tau 10e-4
Ornstein-Uhlenbeck theta 0.15
Ornstein-Uhlenbeck sigma 0.3
minibatch size 64 on low-dim input, 16 on pixel-input
replay memory size 10e5
weight init final layer of actor & critic are uniform(-3 * 10-3, 3 * 10-3) for low-dim input and uniform(-3 * 10-4, 3 * 10-4) for pixel-input; other layers -> Xavier

A3C

hyperparam name default value
discount factor 0.99
shared RMSprop eta 7e-4
shared RMSprop alpha 0.99
shared RMSprop epsilon 0.1
A3C entropy regularization beta 0.01
V-network gradients multiplied by 0.5
Weight init Xavier (Torch default)
Reward clipping [-1, 1] on Atari
# of threads w/ best performance 16

TRPO

hyperparam name default value
policy network 400/64 -> tanh -> 300/64 -> tanh -> linear, + std dev
value network 400/64 -> tanh -> 300/64 -> tanh -> linear
timesteps per batch 5000
max KL 0.01
conjugate gradient iters 20
conjugate gradient damping 0.1
value function (VF) optimizer Adam
VF iters 3-5
VF batch size 64
VF step size 1e-3
discount 0.995
entropy coeff 0.0
GAE lambda 0.97

PPO

  • paper
    • More hyperparams are reported in the appendix
  • hyperparam analysis
  • reported below are hyperparams for the Mujoco environment
hyperparam name default value
policy network 64 -> tanh -> 64 -> tanh -> linear, + std dev
value network 64 -> tanh -> 64 -> tanh -> linear
timesteps per batch 2048
clip param 0.2
optimizer Adam
optimizer epochs per iter 10
optimizer step size 3e-4
optimizer batch size 64
lr schedule linear
discount 0.995
entropy coeff 0.0
GAE lambda 0.97

General

About

A curated list of awesome hyperparameters for deep learning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published