RAdam-Tensorflow On the Variance of the Adaptive Learning Rate and Beyond Installation NOTE: This implementation is for Tensorflow 1.x only - pip install tf-1.x-rectified-adam Paper | Official Pytorch code Usage from radam import RAdamOptimizer train_op = RAdamOptimizer(learning_rate=0.001, beta1=0.9, beta2=0.999, weight_decay=0.0).minimize(loss) Algorithm Result Author Junho Kim